SentenceClassifier_ipavlov

iPavlov library based cnn sentence classifier

Use 'run' method to perform classification:

    ic = IntentsClassifier(root_config_path='root/cf_config.json')
    mes = input()
    print(ic.run(mes))

The result is a nested dictionary containing the decisions and confidence levels for both root and sub-categories.

Use 'train' method to train new model. Parameters for 'train':

model_level - model level, 'root' or 'subs'.
model_name - subcategory name. Set to '' for root model.
path_to_data - path to training data. It should be stored in csv format with 'text' and 'labels' columns.
path_to_config - path to config json file.
test_size - fraction of data to use in hold-out dataset (default value: 0.15)
aug_method - the way of augmenting training data (not applied for test, default value: 'word_dropout'). Set samples_per_class to None to disable data augmentation.
samples_per_class number of samples per class in equalized dataset, None for leaving the classes distribution intact (default value: None).
path_to_global_embeddings - path to embeddings file in fasttext '.bin' format.
path_to_save_file - path to folder to store the weights obtained during training.
path_to_resulting_file - path to folder to store the best weights after training (the last saved weights file will be copied to this folder).

For example:

ic.train(model_level='root',
     model_name= '',
     path_to_data='../ai_models_train/42/df_raw.csv',
     path_to_config='../ai_models_train/42/cf_config_dual_bilstm_cnn_model.json',
     path_to_global_embeddings='../ai_models/shared/ft_native_300_ru_wiki_lenta_lemmatize.bin',
     samples_per_class=1500,
     class_names=['доставка', 'оплата', 'другое', 'намерение сделать заказ'],
     path_to_save_file='../ai_models_train/42/',
     path_to_resulting_file='../ai_models_train/42/')

Use 'get_performance' method to evaluate model on test set with f1 metric (macro averaging). Called automatically at the end of 'train':

    perf = self.get_performance(path_to_config, model_path+'df_test.csv')

Use 'get_status' method to check if a particular model (specified via directory name) is currently training.

Use 'check_config' method to validate the config file for the model:

    from utils.check_config import check_config
    check_results = check_config(path_to_config)
    if len(check_results)>0:
        raise InvalidConfig(check_results,'Config file is invalid')

All model's files are stored in config['model_path'] folder. Other paths contain just filenames.

The model also logs its' performance with tensorboard. In order to retrieve the latest performace metric listed in the config['train']['metrics'] call get_latest_accuracy method with path to config file:

IntentsClassifier.get_latest_accuracy('../ai_models_train/42/cf_config_dual_bilstm_cnn_model.json')

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.spyproject		.spyproject
.vscode		.vscode
model		model
root		root
subs		subs
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
IntentsClassifier.py		IntentsClassifier.py
README.md		README.md
__init__.py		__init__.py
cf_config_dual_bilstm_cnn_model.json		cf_config_dual_bilstm_cnn_model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentenceClassifier_ipavlov

About

Releases

Packages

Contributors 3

Languages

GraphGrail/SentenceClassifier_ipavlov

Folders and files

Latest commit

History

Repository files navigation

SentenceClassifier_ipavlov

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages