Dataset Card for ToxiGen

annotations_creators

language_creators

languages

licenses

multilinguality

pretty_name

size_categories

source_datasets

task_categories

task_ids

expert-generated

machine-generated

en-US

monolingual

ToxiGen

100K<n<1M

original

text-classification

hate-speech-detection

Dataset Card for ToxiGen

Dataset Description

Repository: https://github.com/microsoft/toxigen
Paper: https://arxiv.org/abs/2203.09509

Dataset Structure

Data Fields

We release TOXIGEN as a dataframe with the following fields:

prompt is the prompt used for generation.
generation is the TOXIGEN generated text.
generation_method denotes whether or not ALICE was used to generate the corresponding generation. If this value is ALICE, then ALICE was used, if it is TopK, then ALICE was not used.
prompt_label is the binary value indicating whether or not the prompt is toxic (1 is toxic, 0 is benign).
group indicates the target group of the prompt.
roberta_prediction is the probability predicted by our corresponding RoBERTa model for each instance.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
annotated_test.csv		annotated_test.csv
annotated_train.csv		annotated_train.csv
toxigen.csv		toxigen.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset Card for ToxiGen

Dataset Description

Dataset Structure

Data Fields

About

Releases

Packages

bramdelisse/toxigen-data

Folders and files

Latest commit

History

Repository files navigation

Dataset Card for ToxiGen

Dataset Description

Dataset Structure

Data Fields

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages