This repository is a curated collection of Jupyter notebooks demonstrating foundational and intermediate Natural Language Processing (NLP) concepts using real-world libraries such as SpaCy, NLTK, Scikit-learn, Gensim, and FastText.
The notebooks are structured to support learning, experimentation, and practical application of key NLP building blocks including tokenization, POS tagging, NER, embeddings, and text classification.
| Notebook | Description |
|---|---|
tokenizer.ipynb |
Introduction to tokenization using SpaCy |
stop_words.ipynb |
Removing stop words from text |
Bag_Of_Words.ipynb |
Building Bag-of-Words representations |
bag_of_n_grams.ipynb |
N-gram generation and vectorization |
tf-IDF.ipynb |
Computing TF-IDF vectors using Scikit-learn |
stemming&lemmatization.ipynb |
Stemming and Lemmatization comparison (NLTK vs SpaCy) |
POS_tagging.ipynb |
Part-of-Speech tagging using SpaCy |
NER.ipynb |
Named Entity Recognition with SpaCy |
spacy.ipynb |
Overview of core SpaCy functionalities |
spacy_lang_processing_pipeline.ipynb |
Constructing custom NLP pipelines in SpaCy |
fasttext_classification.ipynb |
Text classification using FastText word embeddings |
gensim_word_vectors_text_classification.ipynb |
Word2Vec embedding-based classification using Gensim |
news_clasification_spacy_word_vectors.ipynb |
News text classification using SpaCy word vectors |
- Python 3.8+
- Jupyter Notebook
- SpaCy
- NLTK
- Scikit-learn
- Gensim
- FastText (via Gensim or Facebook)
git clone https://github.com/anjaliy11/NLP_.git
cd NLP_
2. Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
3. Install dependencies
pip install -r requirements.txt
4. Run Jupyter
jupyter notebook
---
## ๐ฆ Dependencies
spacy
nltk
gensim
scikit-learn
matplotlib
jupyter
Then run:
python -m spacy download en_core_web_sm
---
## ๐ฏ Use Cases
๐ก Education and training on NLP fundamentals
๐ Rapid prototyping of NLP preprocessing pipelines
๐ Experimentation with vectorization and classification techniques
๐ค Baseline development for text classification tasks