-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Problem Statement
The twitter_samples package violates Twitter's (now X's) Developer Agreement and Terms of Service, creating significant legal liability for the NLTK project.
The Violation
Twitter's Developer Agreement explicitly prohibits the offline redistribution of Twitter content. The twitter_samples package constitutes a systematic scraping and redistribution of user-generated, copyrighted content, which is a direct breach of contract.
Unlike licensing ambiguities with other corpora (see related issue #250), this is a straightforward, unequivocal violation of platform rules. There is no plausible "fair use" defense for this type of wholesale redistribution.
Recommended Action
- Immediately remove the
twitter_samplespackage from the mainnltk_dataindex. - Move it to a separate, clearly labeled "archive-legacy" or "high-risk" collection that is not included in any default download or bundle.
- Add a prominent warning that it violates the Twitter ToS and should not be used.
This action is necessary to protect the NLTK project from legal risk and to maintain its standing as a responsible open-source project.
Related Issues
This issue is part of a broader licensing review. For discussion on the Brown Corpus and semcor, see #250.