Skip to content

ZJU-DAILY/Operation-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

environment

python=3.12.4 torch=2.4.0 peft=0.15.2 vllm=0.6.2 tqdm=4.66.5 rapidfuzz=3.13.0 scikit-learn openai=1.78.1 datetime jsonlines

script

./experiments/final_text.sh

python src/pipeline_op_qa.py \
    --file_name 'tabfact-test-clean' \
    --task_type 'TabFact' \
    --model_path '{"base_url": "http://10.10.10.8:8880/v1", "model": "~/pretrain/Qwen2.5-7B-Instruct", "tokenizer": "~/pretrain/Qwen2.5-7B-Instruct"}' \
    --save_dir 'experiments/baseline-7B/' \
    --num_workers 8 \
  • file_name: the name of json file in './data', schema is ['id', 'table_path', 'table_caption'(for TabFact), 'query']
  • task_type: WikiTQ or TabFact
  • model_path: local vllm server or online api service

Code Structure

src/
  pipeline_op_qa.py            Main inference code.
  operator_call.py             Operator prompt.
  operators_merge.py           Operation merge.
  utils/
    model_infer.py             OpenAI-compatible/vLLM client.
    table_serialize.py         CSV reader with header dedup; markdown table serializer; dicts_to_list.
    postprocess.py             Answer extraction from LLM outputs;
    tableqa.py                 Prompt templates.
    add_column.py              Operator add_column.
    select_columns.py          Operator select_column.
    sort_table.py              Operator sort_by.
    group_by.py                Operator group_by.
    column_filter.py           Operator filter.
    clean_column.py            Operator clean_column.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published