ModelZoo

trained models (with training scripts) for use across different projects

pip install JarbasModelZoo

Models

this package includes utility methods to (down)load models

training scripts can be found in the train folder

NER

model_id	language	dataset	accuracy
nltk_clftagger_conll2003_NER	en	CONLL2003	0.874%
nltk_clftagger_gmb_NER	en	GMB 2.2.0	0%
nltk_clftagger_slsmovies_NER	en	MIT Movie Corpus	0%
nltk_clftagger_slstrivia10k13_NER	en	MIT Movie Corpus - Trivia	0.806%
nltk_clftagger_slsrestaurants_NER	en	MIT Restaurant Corpus	0%
nltk_clftagger_onto5_NER	en	OntoNotes-5.0-NER-BIO	0.910%
nltk_clftagger_paramopama_NER	pt	Paramopama	0%
nltk_clftagger_paramopama+harem_NER	pt	Paramopama + HAREM (v2)	0%
nltk_clftagger_WNUT17_NER	en	WNUT17	0%
nltk_clftagger_leNERbr_NER	pt-br	leNER-Br	0%

POSTAG

model_id	language	dataset	tagset	accuracy
nltk_floresta_macmorpho_brill_tagger	pt	floresta + macmorpho	universal	0%
nltk_brown_brill_tagger	en	brown	brown	0.941%
nltk_brown_maxent_tagger	en	brown	brown	0%
nltk_brown_ngram_tagger	en	brown	brown	0.930%
nltk_floresta_brill_tagger	pt	floresta	VISL (Portuguese)	0.938%
nltk_floresta_ngram_tagger	pt	floresta	VISL (Portuguese)	0.925%
nltk_cess_cat_udep_brill_tagger	ca	cess_cat_udep	Universal Dependencies	0.974%
nltk_cess_esp_udep_brill_tagger	es	cess_esp_udep	Universal Dependencies	0.975%
nltk_macmorpho_unvtagset_brill_tagger	pt	macmorpho	Universal Dependencies	0.966%
nltk_onto5_brill_tagger	en	OntoNotes-5.0-NER-BIO	Penn Treebank	0%
nltk_treebank_clftagger	en	treebank	Penn Treebank	0%
nltk_treebank_brill_tagger	en	treebank	Penn Treebank	0%
nltk_treebank_ngram_tagger	en	treebank	Penn Treebank	0%
nltk_treebank_maxent_tagger	en	treebank	Penn Treebank	0%
nltk_treebank_tnt_tagger	en	treebank	Penn Treebank	0%
nltk_nilc_brill_tagger	pt-br	NILC_taggers	NILC	0.881%
nltk_nilc_ngram_tagger	pt-br	NILC_taggers	NILC	0.869%
nltk_cess_cat_brill_tagger	ca	cess_cat	EAGLES	0.939%
nltk_cess_esp_brill_tagger	es	cess_esp	EAGLES	0.926%
nltk_macmorpho_brill_tagger	pt	macmorpho		0%

Security Concerns With the Python pickle Module

The serialization process is very convenient when you need to save your object’s state to disk or to transmit it over a network.

However, there’s one more thing you need to know about the Python pickle module: It’s not secure. the __setstate__ method is great for doing more initialization while unpickling, but it can also be used to execute arbitrary code during the unpickling process!

So, what can you do to reduce this risk? Train the models yourself with the provided scripts!

Usage

Postag

from nltk import word_tokenize
from JarbasModelZoo import load_model

# will auto download if missing
# ~/.local/share/JarbasModelZoo/brill_tagger_floresta_mcmorpho_pt.pkl
tagger = load_model("brill_tagger_floresta_mcmorpho_pt")
tokens = word_tokenize("Olá, o meu nome é Joaquim")
postagged = tagger.tag(tokens)
# [('Olá', 'NOUN'), (',', '.'), ('o', 'DET'), ('meu', 'PRON'), ('nome', 'NOUN'), ('é', 'VERB'), ('Joaquim', 'NOUN')]

# ~/.local/share/JarbasModelZoo/brill_tagger_cess_es.pkl
tagger = load_model("brill_tagger_cess_es")
tokens = word_tokenize("Hola, mi nombre es Daniel")
postagged = tagger.tag(tokens)
# [('Hola', 'NOUN'), (',', 'fc'), ('mi', 'DET'), ('nombre', 'NOUN'), ('es', 'VERB'), ('Daniel', 'NOUN')]

# ~/.local/share/JarbasModelZoo/brill_tagger_cess_ca.pkl
tagger = load_model("brill_tagger_cess_ca")
tokens = word_tokenize("Quién es el presidente de Cataluña?")
postagged = tagger.tag(tokens)
# [('Quién', 'NOUN'), ('es', 'PRON'), ('el', 'DET'), ('presidente', 'NOUN'), ('de', 'ADP'), ('Cataluña', 'NOUN'), ('?', 'fit')]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
JarbasModelZoo		JarbasModelZoo
examples		examples
models		models
train		train
MANIFEST.in		MANIFEST.in
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelZoo

Models

NER

POSTAG

Security Concerns With the Python pickle Module

Usage

Postag

About

Releases 3

Sponsor this project

Packages

Contributors 2

Languages

OpenJarbas/ModelZoo

Folders and files

Latest commit

History

Repository files navigation

ModelZoo

Models

NER

POSTAG

Security Concerns With the Python pickle Module

Usage

Postag

About

Topics

Resources

Stars

Watchers

Forks

Releases 3

Sponsor this project

Packages 0

Contributors 2

Languages

Packages