Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets #11

Open
YanhuiS opened this issue Nov 29, 2024 · 2 comments
Open

Datasets #11

YanhuiS opened this issue Nov 29, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@YanhuiS
Copy link

YanhuiS commented Nov 29, 2024

请问可以分享一下E.1中的198个传播事件数据集嘛?或者能不能详细说明一下这数据集是如何制作的?

@yiyiyi0817 yiyiyi0817 added the question Further information is requested label Nov 30, 2024
@Redtides0
Copy link
Contributor

Redtides0 commented Nov 30, 2024

你好,感谢对于OASIS的关注。
目前数据集方面由于可能存在潜在的隐私问题我们还在探讨如何进一步的匿名化处理。
数据制作流程可以参考论文中的附录E.1 REAL-WORLD PROPAGATION DATA 部分
具体来说,我们是围绕着 Real-time rumor debunking on twitter, Detecting rumors from microblogs with recurrent neural networks.这两篇工作开源的twitter 15/16 进行的数据收集,它们提供了source post的传播路径(包括参与转发的用户的user id和repost id)和source post的内容。
对于一个传播事件我们首先会去收集这些用户的profile以及这些用户之间存在的关注关系,还有他们在source post发出之前所发出的历史tweet&reply活动,这些历史活动信息将会被用来得到更加丰富的user bio以及计算activity prob(细节见附录E.1),最终整合为一个csv文件,关键列包括:
,user_id,description,followers_count,following_count,following_list,following_agentid_list,previous_tweets,tweets_id,activity_level,user_char

@Ji-Cather
Copy link

想问一下数据集大概什么时候可以开源呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants