Basic Text-Preprocessing with Python

Pada Natural Language Processing (NLP), informasi yang akan digali berisi data-data yang strukturnya “sembarang” atau tidak terstruktur. Oleh karena itu, diperlukan proses pengubahan bentuk menjadi data yang terstruktur untuk kebutuhan lebih lanjut (sentiment analysis, topic modelling, dll).

Text data needs to be cleaned and encoded to numerical values before giving them to machine learning models, this process of cleaning and encoding is called as Text Preprocessing.

Kode ini executable dan vieawable tersedia di Jupyter Notebook.

Library

Kode pada repositori ini menggunakan beberapa library Python untuk melakukan text-preprocessing yaitu:

Natural Language Toolkit (NLTK) - Permodelan teks
PySastrawi - Stemming bahasa Indonesia
Matplotlib - Visualisasi data

Artikel

Penjelasan sederhana dari setiap tahapan text-preprocessing pada repositori ini saya tulis pada artikel disini.

Penulis

Kuncahyo Setyo Nugroho
✉️ [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
text-preprocessing.ipynb		text-preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic Text-Preprocessing with Python

Library

Artikel

Penulis

About

Languages

License

ksnugroho/basic-text-preprocessing

Folders and files

Latest commit

History

Repository files navigation

Basic Text-Preprocessing with Python

Library

Artikel

Penulis

About

Topics

Resources

License

Stars

Watchers

Forks

Languages