Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metrics documentation #3264

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

binarycrayon
Copy link
Contributor

Motivation

Update metrics documentation with an updated grafana example dashboard, and add some debugging tips, for issue #2677

Modifications

Changed sample production_metrics, added note to run server with --enable-metrics flag, also added debugging tips

Checklist

@zhaochenyang20
Copy link
Collaborator

This looks nice. I will ask ziliang for checking it. And will fix the style later. @shuaills Could you also review this? Thanks!

@zhaochenyang20
Copy link
Collaborator

@ziliangpeng @shuaills Thanks for reviewing this.

Copy link

@ziliangpeng ziliangpeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the change. generally looks good! i haven't spun up a server and containers to test, but nothing looks obviously wrong to me.

also, i recommend further automate metrics set up for users to improve experience. i bet most sglang users are ml engineers or ai researchers that might not have time to figure out prometheus / grafana like an infra engineer.
maybe we can have a mini project for the end to end automation for this.

@@ -142,4 +146,28 @@ Then you can access the Grafana dashboard at http://localhost:3000.

### Grafana Dashboard

In a new Grafana setup, ensure that you have the `Prometheus` data source enabled. To check that, go to `http://localhost:3000/connections/datasources` and ensure that `Prometheus` is enabled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to prepare a config file (or list of commands in Dockerfile) that gets the datasource set up, then supply it to the grafana container, and get the entire thing ready to use end to end?

I feel that requiring a manual step would be error prone and difficult for the team to troubleshoot for users. we're targeting users with no metrics knowledge (or no time to to spend on metrics set up) so "ready-to-use"ness is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me see what I can do with a grafana.ini

- `model_name` (name: `model_name`, label: `model name`, Data source: `Prometheus`, Type: `Label values`)
- `instance` (name: `instance`, label: `instance`, Data source: `Prometheus`, Type: `Label values`)

If you don't have these variables, you can create them manually.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, i think there's way to automate this by either grafana iac or making api calls, which we can get them done in a dockerfile for the grafana image, and make it ready-to-use.

creating variables could be even more non-trivial than adding datasource, for users who have no grafana experience. would be really nice to have this automated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I feel it's better to be more explicit and less automated. Setting up variables and seeing previewed values is a also good indication that prometheus is actively receiving metrics data from sglang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants