Skip to content

In this repo I provided simple examples to demonstrate how the the fundamentals of NLP on the NLTK library in Python works; Tokenization, Stopword Removal, Parts of Speech Tagging, Named Entity Recognition, Sentiment Analysis using VADER. For better understanding check this NLTK documentation:

Notifications You must be signed in to change notification settings

mansourshebli/nltkPython

Repository files navigation

NLP with NLTK in Python 🚀

Welcome to my NLTK NLP project on GitHub! 📚 This repository is a documentation of my hands-on exploration of Natural Language Processing (NLP) concepts and techniques using the NLTK library in Python. Below, I'll walk you through some very very very simple examples I did with the following:

💡 Note: It took me a while to put together this documentation. I hope you find it helpful! 👀

Tokenization Stopword Removal Parts of Speech Tagging Named Entity Recognition Sentiment Analysis using VADER

Tokenization 📝

In this phase, I explored the fascinating world of tokenization, where text is sliced into meaningful units called tokens. Here's what I accomplished:

  • Learned the Concept: Understood the essence of tokenization and its importance.
  • Applied Techniques: Utilized NLTK's nltk.tokenize module to segment text into words and sentences.
  • Practical Implementation: Delved into Python code to practice tokenization.
  • Exercises and Examples: Worked on hands-on exercise and example with matplotlib library showcased in Tokenization.py.

Stopword Removal 🛑

This phase helped me understand the significance of stopwords and how they impact NLP tasks. My achievements include:

  • Identifying Stopwords: Recognized commonly used stopwords and their role in text analysis.
  • Removal Techniques: Explored effective strategies to eliminate irrelevant words from text data.
  • Python Implementation: Applied NLTK's nltk.corpus.stopwords and text preprocessing techniques.
  • Practical Application: Engaged with exercise showcased in Stopword_Removal.py to practice stopword removal.

Parts of Speech Tagging 📊

Diving into grammatical analysis, I focused on understanding parts of speech and their roles. Here's what I achieved:

  • Understanding POS: Explored the concept of parts of speech and their grammatical categories.
  • POS Tagging: Leveraged NLTK's nltk.pos_tag to assign appropriate tags to words.
  • Real-world Application: Implemented parts of speech tagging through practical exercise and example in Parts_of_Speech_Tagging.py.

Named Entity Recognition (NER) 🏙️

Named entities gained my attention as I delved into identifying and extracting various types. Here's a summary of my achievements:

  • Significance of NER: Understood the importance of named entities in NLP.
  • Types of Entities: Identified different categories like persons, locations, organizations, and dates.
  • NER Techniques: Applied NLTK's nltk.ne_chunk to extract named entities from text.
  • Hands-on Practice: Engaged in interactive activities and exercises in Named_Entity_Recognition.py to reinforce NER skills.

Sentiment Analysis using VADER 😃😔

Emotions in text fascinated me as I ventured into sentiment analysis using the VADER tool. Here's what I accomplished:

  • Understanding Sentiment Analysis: Grasped the role of sentiment analysis in determining emotional polarity ( it was cooool :)
  • Introduction to VADER: Explored the Valence Aware Dictionary and Sentiment Reasoner as a pre-trained model.
  • Analyzing Sentiment: Applied VADER to analyze text sentiment and interpreted results.
  • Practical Exercises: Engaged in hands-on activities in Sentiment_Analysis_using_VADER.py to perform sentiment analysis using VADER.

And most importantly Enjoy the process of learning and discovery! 🌟🐍

Tags: python, nltk, natural language processing, text analysis, tokenization, stopword removal, parts of speech tagging, named entity recognition, sentiment analysis

About

In this repo I provided simple examples to demonstrate how the the fundamentals of NLP on the NLTK library in Python works; Tokenization, Stopword Removal, Parts of Speech Tagging, Named Entity Recognition, Sentiment Analysis using VADER. For better understanding check this NLTK documentation:

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages