How can Ludwig do the random cut when training by preserving the ratio between number of documents in each class? #1280
Unanswered
andreipruteanu
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am training a text classifier and I have the following distribution of documents in each of the 3 classes: 5000 in "class1", 2000 in "class2" and 1000 in "class3". Can Ludwig do random split when training by preserving the ratio between the number of documents in each of the 3 classes? If yes, is "stratify" the right option to use? Unfortunately, the documentation is quite limited: https://ludwig-ai.github.io/ludwig-docs/user_guide/configuration/#preprocessing - it does not mention the effect of "stratify" option and if it matches the behavior of Scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
In my experiments, the usage of the "stratify" option makes things worse with respect to performance metrics - accuracy & F1 score.
Any help would be appreciated..
Andrei
Config file:
Beta Was this translation helpful? Give feedback.
All reactions