-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update vllm start, stop and use random port for vllm #96
Update vllm start, stop and use random port for vllm #96
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us know when this is not wip anymore and open for review :), thanks!
fbd9f5f
to
459a22d
Compare
459a22d
to
f3c3329
Compare
f3c3329
to
c8a4c45
Compare
eval/mt_bench/components.py
Outdated
# Rename the best model directory to "candidate_model" for the next step | ||
# So we know which model to use for the final evaluation | ||
os.rename( | ||
os.path.join(models_path_prefix, best_model), | ||
os.path.join(models_path_prefix, "candidate_model"), | ||
) | ||
best_model = f"{models_path_prefix}/candidate_model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use os.path.join
instead of string formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why don't we print the "sample_" instead? If we print "candidate_model", the user has no idea which model was the best I guess...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this - but we rename it to candidate_model
so samples_
no longer exists, right? What's saved to S3 is "candidate_model" right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the output a bit, ptal
test run here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
eval/mt_bench/components.py
Outdated
@@ -193,13 +193,16 @@ def shutdown_vllm(process: subprocess.Popen, timeout: int = 20): | |||
os.path.join(models_path_prefix, best_model), | |||
os.path.join(models_path_prefix, "candidate_model"), | |||
) | |||
best_model = f"{models_path_prefix}/candidate_model" | |||
best_model_renamed = os.path.join(models_path_prefix, "candidate_model") | |||
best_model_output = f"Candidate model: {best_model} located at {best_model_renamed}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be a message? Later on we use this to print the best model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Job logs:
INFO 2024-10-18 11:29:26,271 __main__:2433: Job completed successfully.
INFO 2024-10-18 11:29:26,792 __main__:956: Best model: /data/model/output/phase_2/hf_format/samples_320
INFO 2024-10-18 11:29:26,792 __main__:3140: Running final evaluation.
Pod log:
{
"best_model": "/data/model/output/phase_2/hf_format/samples_320",
"best_score": 8.5
}
Later on for final eval, we know the model has been renamed to candidate_model
so everything is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the script uses the Pod logs to print the best model, the content of best_model
should not be a sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but the pipeline uses the output to find the model for final eval - for now, I've hard-coded that value as ../hf_format/candidate_model
in the pipeline and removed the updated output here.
43f83c0
to
c02bf63
Compare
3rd commit should not be included or reworked
Signed-off-by: sallyom <[email protected]>
Signed-off-by: sallyom <[email protected]>
c02bf63
to
4697fb1
Compare
See Issue #95
This PR: