-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Data Questions #23
Comments
Hi Michael, Sorry for delayed response. The answers for your questions:
Thank you for your interest! |
Thanks for the response! We have been using your orginal data to replicate some of our own experimental data, so you can imagine our interest in the new data, as well.
|
We've merged all the data into one file and added human/bot markup. The dataset is located in the same folder: https://github.com/DeepPavlov/convai/blob/master/data/summer_wild_evaluation_dialogs.json Please feel free to contact us again. |
Thank you, I appreciate it! I'll let you know what we find. |
Hi, I want to follow up on this thread - our paper has received an R & R, and we were asked a specific question by one of our reviewers, that is related to this thread above. We are wondering whether it would be possible to re-open this issue with you, now that the contest is over? Specifically, we would like to know which bots were participating in each conversation. We don't need identifiable names - rather, we simply want to have a hashed identifier of each bot, so that we can cluster our standard errors at the bot level and adjust for bot-level fixed effects. We are assuming the humans are all unique, as well? Please let me know if you think this data would be shareable here. Thank you, |
Hi Mike, At least for one part of the dataset this additional information is available: http://convai.io/data/data_tolokers.json We'll discuss if we should make available such information for other parts. Best Regards, |
Thank you! The anonymized user IDs in this file are exactly what we were looking for. It looks like you've done a bit of data cleaning, too, dropping broken bots (which mostly overlaps with our own cleaning). |
Hi, |
Hello, all! I find your dataset fascinating, and I am glad you posted new chats from this summer. But I am having some trouble understanding the formatting. It has changed significantly since the first data dump, and the documentation does not address these changes. I have listed the major issues below, can you clarify?
There is no longer any context text in the new files, was this dropped?
In some (but not all) of the files there are no longer any user profiles. Was this dropped in the middle of the data collection?
There is also only one evaluation metric ("eval_score"), rather than three ("breadth","engagement", and "quality"). Was the paradigm changed from the first rounds? And what is "profile_match" all about?
How are we supposed to know which participant is the bot and which is the human? Are they consistently labeled (e.g. participant1 is always human) or is there a separate key we need?
In summary, this is a fantastic resource but I am not sure how useful it is without understanding how the data was assembled. Or, is there an updated data dictionary available anywhere?
The text was updated successfully, but these errors were encountered: