-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Description
Hi,
Thanks for the work. I have a question about the performance of MCTS compared to Best-of-N on MATH500 dataset using Qwen2.5-Math-7B-Instruct model. In my experiments, MCTS could not get higher majority_vote results than best-of-N. I am sharing my configs and the comparative results below. Considering MCTS's more complex structure, I believe that it should achieve higher results than the Best-of-N, which has a very direct way of reasoning. Do you have any suggestions on improving MCTS results?
Thanks.
Table 1. Comparative results of different reasoning techniques.
| method | majority_vote |
|---|---|
| CoT | 0.836 |
| best-of-N | 0.876 |
| MCTS | 0.872 |
Table 2. The parameter setting used in the experiments.
| parameter | value |
|---|---|
| temperature | 0.7 |
| num_sequence | 8 |
| max_new_tokens | 2048 |
| num_worker | 32 |
System Info
Operating System = Linux
Python version = 3.10
Hardware = A40
Metadata
Metadata
Assignees
Labels
No labels