Skip to content

Remove or archive twitter_samples due to Twitter/X ToS violation #251

@ekaf

Description

@ekaf

Problem Statement

The twitter_samples package violates Twitter's (now X's) Developer Agreement and Terms of Service, creating significant legal liability for the NLTK project.

The Violation

Twitter's Developer Agreement explicitly prohibits the offline redistribution of Twitter content. The twitter_samples package constitutes a systematic scraping and redistribution of user-generated, copyrighted content, which is a direct breach of contract.

Unlike licensing ambiguities with other corpora (see related issue #250), this is a straightforward, unequivocal violation of platform rules. There is no plausible "fair use" defense for this type of wholesale redistribution.

Recommended Action

  1. Immediately remove the twitter_samples package from the main nltk_data index.
  2. Move it to a separate, clearly labeled "archive-legacy" or "high-risk" collection that is not included in any default download or bundle.
  3. Add a prominent warning that it violates the Twitter ToS and should not be used.

This action is necessary to protect the NLTK project from legal risk and to maintain its standing as a responsible open-source project.

Related Issues

This issue is part of a broader licensing review. For discussion on the Brown Corpus and semcor, see #250.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions