Skip to content

bramdelisse/toxigen-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

annotations_creators language_creators languages licenses multilinguality pretty_name size_categories source_datasets task_categories task_ids
expert-generated
machine-generated
en-US
monolingual
ToxiGen
100K<n<1M
original
text-classification
hate-speech-detection

Dataset Card for ToxiGen

Dataset Description

Dataset Structure

Data Fields

We release TOXIGEN as a dataframe with the following fields:

  • prompt is the prompt used for generation.
  • generation is the TOXIGEN generated text.
  • generation_method denotes whether or not ALICE was used to generate the corresponding generation. If this value is ALICE, then ALICE was used, if it is TopK, then ALICE was not used.
  • prompt_label is the binary value indicating whether or not the prompt is toxic (1 is toxic, 0 is benign).
  • group indicates the target group of the prompt.
  • roberta_prediction is the probability predicted by our corresponding RoBERTa model for each instance.

About

Data for the USE Basis course 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published