You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
当服务的 pod 较多时(我的服务时 150个 pod),get ehpa 命令看到如下错误,pod 数少时没问题:
the HPA was unable to compute the replica count: unable to get metric
tensorflow_serving_latency_999: unable to fetch metrics from custom metrics
API: Internal error occurred: unable to fetch metrics
ehpa 中配置的 prometheus query 如下:
annotations:
metric-query.autoscaling.crane.io/services.tensorflow_serving_latency_999: avg(tensorflow_serving_latency_999{namespace="namespace",pod~="abcd."})
但是 prometheus-adapter 中有如下的 api 报错 uri 太长:
GET http://..../api/v1/query?query=sum%28tensorflow_serving_latency_999%7Bnamespace%3D%22qke-generic-jarvis-cupid-algo%22%2Cpod%3D~%22jarvis-ads-algo-cpx-e2-episode-pcvr-26035-qpaas-hslf-6d56d225cd.......
该 uri 是把负载的所有pod都列入了,导致 URI 过长。但 ehpa 中的 query 是 avg(tensorflow_serving_latency_999{namespace="namespace",pod~="abcd."}),怎么会有报错的 uri 请求呢?
当服务的 pod 较多时(我的服务时 150个 pod),get ehpa 命令看到如下错误,pod 数少时没问题:
the HPA was unable to compute the replica count: unable to get metric
tensorflow_serving_latency_999: unable to fetch metrics from custom metrics
API: Internal error occurred: unable to fetch metrics
prometheus-adapter 里有如下日志:
E0717 06:07:04.633224 1 provider.go:150] unable to fetch metrics from prometheus: bad_response: unknown response code 414
I0717 06:07:04.633771 1 httplog.go:132] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1/namespaces/qke-generic-jarvis-cupid-algo/pods/%2A/tensorflow_serving_latency_999?labelSelector=name%3Djarvis-ads-algo-cpx-e2-episode-pcvr-26035-qpaas-hslf" latency="339.610784ms" userAgent="kube-controller-manager/v1.24.15 (linux/amd64) kubernetes/887f5c3/system:serviceaccount:kube-system:horizontal-pod-autoscaler" audit-ID="08f823f8-afd8-43e0-b44c-fdc09a64b612" srcIP="10.188.121.103:58614" resp=500 statusStack=<
The text was updated successfully, but these errors were encountered: