Skip to content
/ ETHIC Public

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

Notifications You must be signed in to change notification settings

dmis-lab/ETHIC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

📃 Paper | 🤗 Dataset

📋 Introduction

ETHIC is a long-context benchmark designed to assess whether LLMs can fully utilize the provided information. ETHIC comprises tasks with high Information Coverage (IC) scores (~91%), i.e. the proportion of input context necessary for answering queries.

⚒️ Setup

We recommend using the following versions for compatibility.

  • PyTorch 2.4.0
  • Cuda 12.1
# create a new environment
conda create -n ethic python==3.9.19
conda activate ethic

# install required packages
pip install -r requirements.txt

⏩ Quickstart

To use our dataset directly, simply download it using 🤗 Datasets:

from datasets import load_dataset

task = "Recalling" # Choose from "Recalling", "Summarizing", "Organizing", "Attributing"
dataset = load_dataset("dmis-lab/ETHIC", task)["test"]

For model inference and evaluation, prepare your OpenAI API key (or other keys for authorization) in api_config.py, as we utilize gpt-4o in the Summarizing task.

# run.sh

CUDA_VISIBLE_DEVICES=1

# arguments
task=Attributing # Recalling, Summarizing, Organizing, Attributing
model_name_or_path=meta-llama/Meta-Llama-3.1-8B-Instruct
cache_dir=""

cmd="CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES python inference.py \
    --task $task \
    --model_name_or_path $model_name_or_path"

if [ -n "$cache_dir" ]; then
    cmd="$cmd --cache_dir $cache_dir"
fi

eval $cmd

Citation

@article{lee2024ethic,
  title={ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage},
  author={Lee, Taewhoo and Yoon, Chanwoong and Jang, Kyochul and Lee, Donghyeon and Song, Minju and Kim, Hyunjae and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2410.16848},
  year={2024}
}

About

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published