Skip to content

MCTS performs not good enough (a comparison with CoT, and best-of-N) #95

@rtarikt

Description

@rtarikt

Hi,

Thanks for the work. I have a question about the performance of MCTS compared to Best-of-N on MATH500 dataset using Qwen2.5-Math-7B-Instruct model. In my experiments, MCTS could not get higher majority_vote results than best-of-N. I am sharing my configs and the comparative results below. Considering MCTS's more complex structure, I believe that it should achieve higher results than the Best-of-N, which has a very direct way of reasoning. Do you have any suggestions on improving MCTS results?

Thanks.

Table 1. Comparative results of different reasoning techniques.

method majority_vote
CoT 0.836
best-of-N 0.876
MCTS 0.872

Table 2. The parameter setting used in the experiments.

parameter value
temperature 0.7
num_sequence 8
max_new_tokens 2048
num_worker 32

System Info

Operating System = Linux
Python version = 3.10
Hardware = A40

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions