Collection of Bahasa Indonesia (Indonesian) Natural Language Processing (NLP) software libraries, dictionaries, and corpus. Always welcome for pull requests.
Library | Description | Programming Languages | License | Author & Link |
---|---|---|---|---|
bahasa | Pre-alpha development stage NLP toolkit for Bahasa Indonesia | Python | MIT License (MIT) | Sutrisno Efendi |
Library | Description | Programming Languages | License | Author & Link |
---|---|---|---|---|
python-sentianalysis-id | Sentiment Analysis for Bahasa Indonesia | Python | yasirutomo |
Library | Description | Programming Languages | License | Author & Link |
---|---|---|---|---|
Open NLP | POS tagging with predefined training and test data | Java | yohanesgultom |
Library | Description | Programming Languages | License | Author & Link |
---|---|---|---|---|
indonesia-ner | Named Entity Recoginition for Bahasa Indonesia | Java | MIT License (MIT) | yusufsyaifudin |
Library | Description | Programming Languages | License | Author & Link |
---|---|---|---|---|
sastrawi | High quality stemmer library for Indonesian Language (Bahasa) | PHP | MIT License (MIT) | sastrawi |
Library | Description | Programming Languages | License | Author & Link |
---|---|---|---|---|
indonesian-word-embedding | A web application that demonstrates Indonesian word embedding | Python | galuhsahid |
Service | Description | Language | Author & Link |
---|---|---|---|
QA | Question Answering System for Bahasa Indonesia | Java | takin |
Library | Description | Size | Features | License | Link |
---|---|---|---|---|---|
MALINDO_Morph | Morphological dictionary for Malay / Indonesian | English-Malay, English-Indonesian | CC BY-NC-SA 4.0 TH | english | |
TALPCo | The TUFS Asian Language Parallel Corpus | Japanese -> Indonesian | Creative Commons Attribution 4.0 International (CC BY 4.0) license | matbahasa |
Library | Description | Size | Features | License | Link |
---|---|---|---|---|---|
Indonesian-annotated-conll17 | CoNLL Universal Dependency Parsing | 29.64 GB | Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts, provided for the CoNLL 2017 Shared Task in UD Parsing. | CC BY-NC-SA 4.0 TH | LINDAT / CLARIN |
ID-OpinionWords | List of Opinion Words (positive/negative) in Bahasa Indonesia for Sentiment Analysis | masdevid | |||
freq-dist-id | Most Common Bahasa Words on Twitter, Wikipedia and other sources | ardwort | |||
idn-tagged-corpus | Indonesian Manually Tagged Corpus | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License | famrashel |
Library | Description | License | Link |
---|---|---|---|
WordNet Bahasa | Wordnet Bahasa, inspired by the Princeton WordNet and the Global WordNet Grid | Large scale, freely available, semantic dictionary | MIT License (MIT) |
Library | Description | License | Link |
---|---|---|---|
UD_Indonesian-GSD | The Indonesian UD is converted from the content head version of the universal dependency treebank v2.0 | Query text by genre, domain | CC BY-NC-SA 3.0 US |
Pre-trained Model | Description | Size | Dimensions | License | Link |
---|---|---|---|---|---|
fastText | Skip-Gram model trained on Wikipedia using fastText | 300 | CC BY-SA 3.0 | Facebook + Bin & Text + Text Only | |
word2vec Indonesian | 402MB | 300 | Indonesian |
Model | Description | License | Link |
---|---|---|---|
INDRA | Indonesian Resource Grammar (INDRA) - an implemented HPSG grammar for Indonesian | MIT license | INDRA |