Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA
python=3.12.4 torch=2.4.0 peft=0.15.2 vllm=0.6.2 tqdm=4.66.5 rapidfuzz=3.13.0 scikit-learn openai=1.78.1 datetime jsonlines
./experiments/final_text.sh
python src/pipeline_op_qa.py \
--file_name 'tabfact-test-clean' \
--task_type 'TabFact' \
--model_path '{"base_url": "http://10.10.10.8:8880/v1", "model": "~/pretrain/Qwen2.5-7B-Instruct", "tokenizer": "~/pretrain/Qwen2.5-7B-Instruct"}' \
--save_dir 'experiments/baseline-7B/' \
--num_workers 8 \
- file_name: the name of json file in './data', schema is ['id', 'table_path', 'table_caption'(for TabFact), 'query']
- task_type: WikiTQ or TabFact
- model_path: local vllm server or online api service
src/
pipeline_op_qa.py Main inference code.
operator_call.py Operator prompt.
operators_merge.py Operation merge.
utils/
model_infer.py OpenAI-compatible/vLLM client.
table_serialize.py CSV reader with header dedup; markdown table serializer; dicts_to_list.
postprocess.py Answer extraction from LLM outputs;
tableqa.py Prompt templates.
add_column.py Operator add_column.
select_columns.py Operator select_column.
sort_table.py Operator sort_by.
group_by.py Operator group_by.
column_filter.py Operator filter.
clean_column.py Operator clean_column.