Skip to content
/ NLP_ Public

๐Ÿ’ฌ A hands-on collection of Jupyter notebooks exploring essential Natural Language Processing (NLP) concepts including tokenization, stop words, Bag of Words, TF-IDF, POS tagging, NER, SpaCy pipelines, stemming, lemmatization, and word embeddings with FastText and Gensim.

Notifications You must be signed in to change notification settings

anjaliy11/NLP_

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

NLP_

๐Ÿง  Natural Language Processing Concepts โ€“ Notebook Implementations

This repository is a curated collection of Jupyter notebooks demonstrating foundational and intermediate Natural Language Processing (NLP) concepts using real-world libraries such as SpaCy, NLTK, Scikit-learn, Gensim, and FastText.

The notebooks are structured to support learning, experimentation, and practical application of key NLP building blocks including tokenization, POS tagging, NER, embeddings, and text classification.


๐Ÿ“Œ Contents

Notebook Description
tokenizer.ipynb Introduction to tokenization using SpaCy
stop_words.ipynb Removing stop words from text
Bag_Of_Words.ipynb Building Bag-of-Words representations
bag_of_n_grams.ipynb N-gram generation and vectorization
tf-IDF.ipynb Computing TF-IDF vectors using Scikit-learn
stemming&lemmatization.ipynb Stemming and Lemmatization comparison (NLTK vs SpaCy)
POS_tagging.ipynb Part-of-Speech tagging using SpaCy
NER.ipynb Named Entity Recognition with SpaCy
spacy.ipynb Overview of core SpaCy functionalities
spacy_lang_processing_pipeline.ipynb Constructing custom NLP pipelines in SpaCy
fasttext_classification.ipynb Text classification using FastText word embeddings
gensim_word_vectors_text_classification.ipynb Word2Vec embedding-based classification using Gensim
news_clasification_spacy_word_vectors.ipynb News text classification using SpaCy word vectors

๐Ÿ› ๏ธ Technologies

  • Python 3.8+
  • Jupyter Notebook
  • SpaCy
  • NLTK
  • Scikit-learn
  • Gensim
  • FastText (via Gensim or Facebook)

๐Ÿงช Getting Started

1. Clone the repository

git clone https://github.com/anjaliy11/NLP_.git
cd NLP_

2. Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

3. Install dependencies
pip install -r requirements.txt

4. Run Jupyter
jupyter notebook
---

## ๐Ÿ“ฆ Dependencies
spacy
nltk
gensim
scikit-learn
matplotlib
jupyter
Then run:
python -m spacy download en_core_web_sm

---

 ## ๐ŸŽฏ Use Cases
๐Ÿ’ก Education and training on NLP fundamentals

๐Ÿ” Rapid prototyping of NLP preprocessing pipelines

๐Ÿ“Š Experimentation with vectorization and classification techniques

๐Ÿค– Baseline development for text classification tasks














About

๐Ÿ’ฌ A hands-on collection of Jupyter notebooks exploring essential Natural Language Processing (NLP) concepts including tokenization, stop words, Bag of Words, TF-IDF, POS tagging, NER, SpaCy pipelines, stemming, lemmatization, and word embeddings with FastText and Gensim.

Topics

Resources

Stars

Watchers

Forks