-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarifying Questions on Summer Dataset #28
Comments
Hi, nice to hear from you!
|
Hi Michael, the data collection setting was the following. We didn't set neither the duration of dialogue, nor the min/max number of turns. Users could finish a dialogue any time, we didn't give any additional instructions about that. The task was simply "chat with a peer, learn something about her, rate her performance". However, we paid only for meaningful dialogues with >=3 utterances from every participant (dialogues were checked manually by other crowd workers). If user stopped responding, we automatically finished dialogue in 10 minutes. |
Hello, and thank you for the quick response! Let me respond to each item below.
|
|
Hi folks! I wanted to reach out again to see if you have any update on the proper citation for your NIPS 2019 Competition Chapter. Right now we have it as: "Zhang et al., (2019) The Second Annual Conversational Intelligence Challenge at NIPS. Citation Forthcoming." Even a placeholder with the right authors would be useful... |
Thanks, this was answered in another thread! -Mike
|
Hello again! We've made some great progress with the new summer dataset and we think it might be useful as an out-of-sample test case for some of our other work. In fact, I'm writing up the results and I want to make sure I describe your experiments accurately.
Do you mind if I ask a few clarifying questions? We don't have any concerns about the data, but we want to include some basic descriptives of the population in our write-up. People will be curious and I want to get it correct! Your documentation has been useful so far, but I am still wondering about a few items:
What would be the right way to cite you all? Should we point to the repository? The 2018 ACL (Zhang et al.) doesn't cover the newer data, is there an update in the works?
The crowd workers on Yandex.Toloka - Is this similar to mechanical turk? Were they paid? Do you use any attention checks, or worker qualification, etc. (english proficiency?) to select people?
Similarly - What were the humans' instructions? Are they asked to chat for a set amount of time, or turns? Were they incentivized for their responses at all, or to finish?
The new single-question evaluation (on a five-point scale) is great, but do you have the word-for-word question of what participants were asked?
The last two items are "wants" but not "needs"... You didn't collect time stamps for each turn in the transcripts, did you? Likewise, do you know which bot comes from which team? It's clear there are a finite number of bots having several conversations. For example, some bots seem to always start with a very specific line (e.g. " i am a little tired from work") Others break down in a consistent way (e.g. search for "Traceback (most recent call last):") that we scrubbed. We'd be curious which bot comes from which team.
The text was updated successfully, but these errors were encountered: