If you find our work useful, please β star this repository for the latest updates.
- [January 8, 2026] π Paper available on arXiv
- [January 9, 2026] π Model available on Hugging Face
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness.
SmartSearch addresses this challenge through a novel framework built upon two key mechanisms:
- Process Rewards: Provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment
- Query Refinement: Promote query generation optimization by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements
To enable the search agent to progressively internalize the ability to improve query quality under process reward guidance, we design a three-stage curriculum learning framework that guides the agent through a progression from:
- Imitation β Alignment β Generalization
SmartSearch/
βββ src/ # Source code for reproducing results
βββ scripts/ # Experiment scripts
β βββ serving/ # Service deployment scripts
β βββ evaluation/ # Evaluation scripts
β βββ data_construction/ # Dataset construction scripts
β βββ train/ # Training scripts
βββ data/ # Dataset preprocessing and storage
βββ LLaMA-Factory/ # Training framework integration
- Python 3.10+
- CUDA-compatible GPU (recommended)
- Sufficient disk space for datasets and models
# Clone the repository
git clone https://github.com/M/SmartSearch.git
cd SmartSearch
# Install dependencies
pip install -r requirements.txtSmartSearch is trained on the ASearcher dataset. The training data can be downloaded from Hugging Face.
To download other test datasets:
cd data
sh download_dataset.shTo construct the RL dataset:
cd data
python prepare_dataset.pycd scripts/serving
python sandbox.py --port {port}Prerequisites:
- Download pre-indexed Wikipedia
- Download Wikipedia corpus and retriever models
Configuration:
- Update
scripts/serving/retriever_config.yamlwith correct paths:- Retriever model path
- Index path
- Corpus path
- Available GPU IDs
Launch:
cd scripts/serving
python retriever_serving.py \
--config retriever_config.yaml \
--num_retriever {num_retriever} \
--port {port}python3 -m sglang.launch_server \
--served-model-name {model-name} \
--model-path {model-path} \
--tp {tp_num} \
--dp {dp_num} \
--context-length 16384 \
--enable-metrics \
--dtype bfloat16 \
--host 0.0.0.0 \
--port {port} \
--trust-remote-code \
--disable-overlap \
--disable-radix-cache \
--mem-fraction-static 0.7Ensure all services (sandbox, retriever, and model) are running, then execute:
cd scripts/evaluation
python run_eval.py \
--config_path eval_config.yaml \
--data_dir {data-dir} \
--dataset_name {dataset-name} \
--split {split-name} \
--save_dir {save-dir} \
--save_note {model-name} \
--sgl_remote_url {model-url} \
--remote_retriever_url {retriever-url} \
--sandbox_url {sandbox-url} \
--generator_model {model-path}Step 1: Trajectory Sampling
cd scripts/evaluation
python run_eval.py \
--config_path eval_config.yaml \
--data_dir ../../data \
--dataset_name asearcher \
--split train \
--save_dir {save-dir} \
--save_note {policy-model-name} \
--sgl_remote_url {policy-model-url} \
--remote_retriever_url {retriever-url} \
--sandbox_url {sandbox-url} \
--generator_model {policy-model-path}Step 2: Apply Process Rewards
cd scripts/data_construction
# Usefulness check by model
python process_reward.py \
--model_url {process-reward-model-url} \
--input_file {step1-output-path} \
--output_file process_reward.json
# Diversity check by rule
python detect_redundancy.py \
--input_file process_reward.json \
--output_file process_reward.jsonStep 3: Construct SFT Dataset
cd scripts/data_construction
python transfer_sft.py \
--input_file process_reward.json \
--output_file sft.jsonStep 4: SFT Training
- Register the dataset in
dataset_info.json - Specify dataset paths in
qwen_full_sft.yaml
cd LLaMA-Factory
llamafactory-cli train examples/train_full/qwen_full_sft.yamlStep 1: Trajectory Sampling
cd scripts/evaluation
python run_eval.py \
--config_path eval_config.yaml \
--data_dir ../../data \
--dataset_name asearcher \
--split train \
--save_dir {save-dir} \
--save_note {sft-model-name} \
--sgl_remote_url {sft-model-url} \
--remote_retriever_url {retriever-url} \
--sandbox_url {sandbox-url} \
--generator_model {sft-model-path}Step 2: Query Refinement
cd scripts/data_construction
# Select low-quality queries
python process_reward.py \
--model_url {process-reward-model-url} \
--input_file {step1-output-path} \
--output_file process_reward_1.json
python detect_redundancy.py \
--input_file process_reward_1.json \
--output_file process_reward_1.json
# Refine low-quality queries
python query_refinement.py \
--model_url {process-reward-model-url} \
--input_file process_reward_1.json \
--output_file query_refinement.json
# Regenerate subsequent steps
python transfer_generate.py \
--input_file query_refinement.json \
--output_file prefix.json
cd ../evaluation
python run_eval.py \
--config_path eval_config.yaml \
--data_dir ../../data \
--dataset_name asearcher \
--split prefix \
--save_dir {save-dir} \
--save_note {sft-model-name} \
--sgl_remote_url {sft-model-url} \
--remote_retriever_url {retriever-url} \
--sandbox_url {sandbox-url} \
--generator_model {sft-model-path}Step 3: Construct DPO Dataset
python process_reward.py \
--model_url {process-reward-model-url} \
--input_file {step2-output-path} \
--output_file process_reward_2.json
python detect_redundancy.py \
--input_file process_reward_2.json \
--output_file process_reward_2.json
python transfer_dpo.py \
--input_file1 process_reward_1.json \
--input_file2 process_reward_2.json \
--output_file dpo.jsonStep 4: DPO Training
- Register the dataset in
dataset_info.json - Specify dataset paths in
qwen_lora_dpo.yaml
cd LLaMA-Factory
llamafactory-cli train examples/train_lora/qwen_lora_dpo.yaml
llamafactory-cli export examples/merge_lora/qwen_lora_dpo.yamlcd scripts/train
bash train.sh \
--train_batch_size 8 \
--ppo_mini_batch_size 16 \
--actor_model_path {dpo-model-path} \
--search_url {retriever-url} \
--sandbox_url {sandbox-url} \
--project_name smart_search \
--experiment_name smart_search \
--nnodes 1 \
--n_gpus_per_node 4 \
--save_freq 5 \
--test_freq 5 \
--total_epochs 2 \
--wandb_api_key {wandb-api-key} \
--save_path {save-path} \
--train_files {train-file-path} \
--test_files {test-file-path}If you find SmartSearch useful in your research, please cite our paper:
@article{smartsearch2026,
title={SmartSearch: Process Reward-Guided Query Refinement for Search Agents},
author={Tongyu Wen and Guanting Dong and Zhicheng Dou},
journal={arXiv preprint arXiv:2601.04888},
year={2026}
}This project is licensed under the MIT License - see the LICENSE file for details.
We thank the authors of ReCall, VERL, and FlashRAG for their excellent frameworks that inspired this work.