This repository contains code and public data for the paper Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims.
A conda environment is provided in environment.yml
.
Run language detection on the ClaimReview data and subset to English claims and relevant columns.
Topic modelling on ClaimReview data, then saves claims related to the war in Ukraine.
Fine-tune and evaluate the RoBERTa model following Uchendu et al. (2020)
The ChatGPT-generated data is available upon request through this Zenodo repository. To use with the experiment code, place in the data/
directory.
The 282 human-authored claims related to the war in Ukraine that were extracted from ClaimReview. For each claim, the claim text (from the claimReviewed
field) and URL of the fact check (from the url
field) are included.
This subset is derived from the ClaimReview markups feed. The compilation of this dataset is licensed under CC-BY 4.0. Each individual claim is under the license terms of the publisher of the fact check. Please cite the ClaimReview dataset as the original source of this data and the associated paper as the source for the subsetting.
Instructions provided to human annotators for the human-AI claim classification task, and the classification form.
Complete list of topics extracted from the ClaimReview data.
The mean and standard deviation of LIWC-22 statistics for each of the three datasets. This can be used to compare this data with the LIWC-22 statistics of other datasets, for example those provided in the LIWC-22 Descriptive Statistics and Norms spreadsheet.
Frequency of claims assigned to the 9 ZeroGPT labels and their assingments to a boolean label.