Skip to content

Repository for the RefuteClaim paper and the FlawCheck dataset, focusing on the tasks of veracity classification and justification generation within the domain of automatic fact-checking.

License

Notifications You must be signed in to change notification settings

NYCU-NLP-Lab/FlawCheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlawCheck🔍

FlawCheck is a flaw-oriented fact-checking dataset introducted in How We Refute Claims: Automatic Fact-Checking through Flaw Identification and Explanation. Each claim is annotated with maximum four aspects and the explanations of presence or absence of seven flaws: Contradicting facts, Exaggeration, Understatement, Occasional faltering, Insufficient support, Problematic assumptions, Existence of alternative explanations. This dataset encapsulates the expertise of human fact-checking professionals, establishing a new benchmark of flaw-oriented automatic fact-checking.

Information (June 2024 Update)

The paper was officially accepted and published in the WWW 2024 Companion 🎉.

Introduction

This dataset is based on a previous work: WatClaimCheck. Given the the demand for premise articles and complete review articles written by human experts, we chose WatClaimCheck as our data source due to its ample and varied collection of premise articles and review articles from eight fact-checking websites. We extend WatClaimCheck to study a novel approach of flaw-oriented fact-checking. We thus collected a total of 33,721 claims in WatClaimCheck to construct FlawCheck. As the original content in WatClaimCheck included a significant amount of irrelevant web crawl data, we collected the web data again and conducted data cleaning to ensure relatively clean review articles for justification generation evaluation. We used GPT-3.5-turbo to gather both the aspects and the identified flaws from the review articles. For more details, please refer to the paper.

In this repo, we provided a direct access to FlawCheck dataset, including generated aspects, flaw explanation, and the renewal review articles. We use WatClaimCheck's claim index refering to each claim, and metadata like premise articles can also be accessed in WatClaimCheck dataset using the index. As a result, you have to request the access to WatClaimCheck dataset to use this dataset.

Usage

Dataset

Dataset structure

All data is under the dataset folder, and the file structure looks like this:

├── dataset
│   ├── train
│   │   ├── aspect
│   │   │   ├── 1.json
│   │   │   ├── 2.json
│   │   │   └── 3.json
│   │   ├── flaw
│   │   └── review
│   ├── dev
|   └── test

Dataset collection

We also provide the source code responsible for data collection in FlawCheck, accessible at code/get_gpt_result.py. To replicate the process, kindly ensure that you store your own OpenAI access token in the environment variables.

Content retrieval

We employed the Haystack framework to construct the retriever, responsible for fetching pertinent evidence to assess claims. To prepare the data for training the retriever model with WatClaimCheck data, refer to utils/Retriever/prepare_data.py. For inference, utilize utils/Retriever/retrieve.py to extract content from raw evidence.

LLM Agents

Direct Usage

In this paper, we employed Vicuna-7b-v1.5 as the foundational LLM. Refer to the original repository for usage details. In the direct usage scenario, the roles of various agents are solely determined by the provided prompts. The example in code/predict.py illustrates justification generation using LLM in a baseline setting.

Finetuning

For finetuning the LLM using LoRA, we utilized the LMFlow framework. Follow the instructions in the original repository to set up the framework correctly for your needs. We made modifications solely to the run_finetune_with_lora.sh.sh file, adapting it for custom settings and data for different components within the proposed RefuteClaim framework.

How to Cite this resource

Please cite the following paper when referring to FlawCheck in academic publications and papers.

@misc{kao2024refute,
      title={How We Refute Claims: Automatic Fact-Checking through Flaw Identification and Explanation}, 
      author={Wei-Yu Kao and An-Zi Yen},
      year={2024},
      eprint={2401.15312},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Repository for the RefuteClaim paper and the FlawCheck dataset, focusing on the tasks of veracity classification and justification generation within the domain of automatic fact-checking.

Resources

License

Stars

Watchers

Forks