Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims

This repository contains code and public data for the paper Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims.

Experiment Code

Environment

A conda environment is provided in environment.yml.

`code/claimreview_subset.ipynb`

Run language detection on the ClaimReview data and subset to English claims and relevant columns.

`code/claimreview_topics.ipynb`

Topic modelling on ClaimReview data, then saves claims related to the war in Ukraine.

`code/fine_tune_roberta.ipynb`

Fine-tune and evaluate the RoBERTa model following Uchendu et al. (2020)

Data

ChatGPT-Generated Data

The ChatGPT-generated data is available upon request through this Zenodo repository. To use with the experiment code, place in the data/ directory.

`data/claimreview.csv`

The 282 human-authored claims related to the war in Ukraine that were extracted from ClaimReview. For each claim, the claim text (from the claimReviewed field) and URL of the fact check (from the url field) are included.

This subset is derived from the ClaimReview markups feed. The compilation of this dataset is licensed under CC-BY 4.0. Each individual claim is under the license terms of the publisher of the fact check. Please cite the ClaimReview dataset as the original source of this data and the associated paper as the source for the subsetting.

Additional Data and Statistics

`annotation_instructions.md`

Instructions provided to human annotators for the human-AI claim classification task, and the classification form.

`additional_data/topic_list.csv`

Complete list of topics extracted from the ClaimReview data.

`additional_data/liwc-22_descriptive_stats.csv`

The mean and standard deviation of LIWC-22 statistics for each of the three datasets. This can be used to compare this data with the LIWC-22 statistics of other datasets, for example those provided in the LIWC-22 Descriptive Statistics and Norms spreadsheet.

`additional_data/zerogpt-full-labels.csv`

Frequency of claims assigned to the 9 ZeroGPT labels and their assingments to a boolean label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims

Experiment Code

Environment

`code/claimreview_subset.ipynb`

`code/claimreview_topics.ipynb`

`code/fine_tune_roberta.ipynb`

Data

ChatGPT-Generated Data

`data/claimreview.csv`

Additional Data and Statistics

`annotation_instructions.md`

`additional_data/topic_list.csv`

`additional_data/liwc-22_descriptive_stats.csv`

`additional_data/zerogpt-full-labels.csv`

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
additional_data		additional_data
code		code
data		data
.gitignore		.gitignore
README.md		README.md
annotation_instructions.md		annotation_instructions.md
environment.yml		environment.yml

GateNLP/chatgpt-ukraine-disinfo

Folders and files

Latest commit

History

Repository files navigation

Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims

Experiment Code

Environment

code/claimreview_subset.ipynb

code/claimreview_topics.ipynb

code/fine_tune_roberta.ipynb

Data

ChatGPT-Generated Data

data/claimreview.csv

Additional Data and Statistics

annotation_instructions.md

additional_data/topic_list.csv

additional_data/liwc-22_descriptive_stats.csv

additional_data/zerogpt-full-labels.csv

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`code/claimreview_subset.ipynb`

`code/claimreview_topics.ipynb`

`code/fine_tune_roberta.ipynb`

`data/claimreview.csv`

`annotation_instructions.md`

`additional_data/topic_list.csv`

`additional_data/liwc-22_descriptive_stats.csv`

`additional_data/zerogpt-full-labels.csv`

Packages