How can Ludwig do the random cut when training by preserving the ratio between number of documents in each class? #1280

andreipruteanu · 2021-09-02T14:53:01Z

andreipruteanu
Sep 2, 2021

Hi,

I am training a text classifier and I have the following distribution of documents in each of the 3 classes: 5000 in "class1", 2000 in "class2" and 1000 in "class3". Can Ludwig do random split when training by preserving the ratio between the number of documents in each of the 3 classes? If yes, is "stratify" the right option to use? Unfortunately, the documentation is quite limited: https://ludwig-ai.github.io/ludwig-docs/user_guide/configuration/#preprocessing - it does not mention the effect of "stratify" option and if it matches the behavior of Scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
In my experiments, the usage of the "stratify" option makes things worse with respect to performance metrics - accuracy & F1 score.

Any help would be appreciated..
Andrei

Config file:

preprocessing:
    force_split: false
    stratify: 'class1'
    
input_features:
    -
        name: text
        type: text
        level: word
        encoder: parallel_cnn

output_features:
    -
        name: class
        type: category

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can Ludwig do the random cut when training by preserving the ratio between number of documents in each class? #1280

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How can Ludwig do the random cut when training by preserving the ratio between number of documents in each class? #1280

andreipruteanu Sep 2, 2021

Replies: 0 comments

andreipruteanu
Sep 2, 2021