Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data partition #3

Open
shengoy opened this issue Apr 27, 2022 · 1 comment
Open

data partition #3

shengoy opened this issue Apr 27, 2022 · 1 comment

Comments

@shengoy
Copy link

shengoy commented Apr 27, 2022

Hi !
How could I split the data for Train/dev/test?

@Hellisotherpeople
Copy link
Owner

Wow, I am sorry for not seeing this earlier!

On the off chance that you are still interested or care about this problem - I'd say that the most "fair" way to do it might be to randomly sample 10-20% from each of the year splits.

Possibly better, if you are someone from the competitive debate community, would be to randomly sample that same 10-20% but at the level of each individual file. This would prevent the random sampling from favoring certain files/annotators over others and would hopefully maximize the diversity of the samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants