Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC of Observability Platform API #3820

Open
4 tasks done
Tracked by #3782
Rotfuks opened this issue Jan 8, 2025 · 3 comments
Open
4 tasks done
Tracked by #3782

PoC of Observability Platform API #3820

Rotfuks opened this issue Jan 8, 2025 · 3 comments
Assignees
Labels
team/atlas Team Atlas

Comments

@Rotfuks
Copy link
Contributor

Rotfuks commented Jan 8, 2025

Motivation

with #3783 we found a concept for the Observability API - but now we also have to test that this concept is feasible. For this we can setup the API in one cluster and use it through a Grafana living in another cluster, while using our dex as sso. This will mirror the customer setup and we'll learn how to work with the API.

Todo

Keep in mind that this is a proof-of-concept - we are not yet interested in the perfect solution, but interested in learning, so fake it until we make it and cut corners where possible. We will have future implementation stories coming up once we proven that the concept works.

  • Setup the Observability Platform API in a Giant Swarm Installation
  • Setup a plain Grafana instance in a Giant Swarm workload cluster
  • Connect the plain Grafana instance from a workload cluster with the Observability Platform through the API
  • Use our own SSO to authenticate the connection like we would do with the sso of a customer

Outcome

  • We have tested the setup and know what's still missing and what we have to do for the implementation
@QuantumEnigmaa
Copy link

After testing various iterations of the ingresses, I was able to make the following setup work (without authentication for now) :

  • Ingresses on the MC :
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-giantswarm
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/use-regex: "true"
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  labels:
  name: observability-platform-endpoint
  namespace: loki
spec:
  ingressClassName: nginx
  rules:
  - host: observability.golem.gaws.gigantic.io
    http:
      paths:
      - backend:
          service:
            name: loki-read
            port:
              number: 3100
        path: /logs(/|$)(.*)
        pathType: ImplementationSpecific

  tls:
  - hosts:
    - observability.golem.gaws.gigantic.io
    secretName: observability-platform-endpoint-ingress-cert
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-giantswarm
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/use-regex: "true"
  labels:
  name: observability-platform-endpoint
  namespace: mimir
spec:
  ingressClassName: nginx
  rules:
  - host: observability.golem.gaws.gigantic.io
    http:
      paths:
      - backend:
          service:
            name: mimir-query-frontend
            port:
              number: 8080
        - backend:
          service:
            name: mimir-ruler
            port:
              number: 8080
        path: /prometheus/api/v1/rules
        pathType: ImplementationSpecific

  tls:
  - hosts:
    - observability.golem.gaws.gigantic.io
    secretName: observability-platform-endpoint-ingress-cert
  • Setting up the following header in the datasources :

Image

Concerning Loki, eveything is going smoothly and I can query MC logs from my grafana deployed on a WC so only authentication is missing there.

As for Mimir, I'm facing an issue that prevents the datasource to work as intended. I had to put an additional backend to allow access to the ruler as I noticed by looking at the ingress-nginx-controller pods that whenever I try to execute some queries from the "explore" panel grafana sends requests to the ruler. But even with this configured and all requests returning 200 codes, I still can't see any data 😞

I looked at the requests going through nginx after executing some queries from the MC's grafana, and they look quite different from the WC's ones :

  • sample requests from the WC's grafana :
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "GET /prometheus/api/v1/rules HTTP/1.1" 308 164 "-" "Grafana/11.3.1" 1030 0.000 [mimir-mimir-ruler-8080] [] - - - - cf207055f47cadf57357faf620e9e4ca
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "POST /prometheus/api/v1/query_exemplars HTTP/1.1" 308 164 "-" "Grafana/11.3.1" 1110 0.000 [mimir-mimir-query-frontend-8080] [] - - - - ff9996ea5ba0ff4ac87a4c7c6efb838a
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "GET /prometheus/api/v1/label/__name__/values?start=1737389340&end=1737392940 HTTP/1.1" 308 164 "-" "Grafana/11.3.1" 1116 0.000 [mimir-mimir-query-frontend-8080] [] - - - - 96dadbaadcb8de6c59a91cee6f0beca8
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "GET /prometheus/api/v1/rules HTTP/1.1" 200 67 "http://observability.golem.gaws.gigantic.io/prometheus/api/v1/rules" "Grafana/11.3.1" 1108 0.005 [mimir-mimir-ruler-8080] [] <ip>:8080 67 0.005 200 3ef1251337747ff4e550ed810fef9b06
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "POST /prometheus/api/v1/query_exemplars HTTP/1.1" 200 60 "http://observability.golem.gaws.gigantic.io/prometheus/api/v1/query_exemplars" "Grafana/11.3.1" 1246 0.006 [mimir-mimir-query-frontend-8080] [] 100.64.140.249:8080 60 0.006 200 b9d367926de03eaff920843ac734a850
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "GET /prometheus/api/v1/label/__name__/values?start=1737389340&end=1737392940 HTTP/1.1" 200 60 "http://observability.golem.gaws.gigantic.io/prometheus/api/v1/label/__name__/values?start=1737389340&end=1737392940" "Grafana/11.3.1" 1242 0.006 [mimir-mimir-query-frontend-8080] [] <ip>:8080 60 0.007 200 e9b3b44eb2ea73eb3f2e7ca99b92ef11
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "POST /prometheus/api/v1/labels HTTP/1.1" 308 164 "-" "Grafana/11.3.1" 1514 0.000 [mimir-mimir-query-frontend-8080] [] - - - - 0937823ab0e004c1b3adc92cd0c3a3fd
18.170.157.137 - - [20/Jan/2025:17:08:59 +0000] "POST /prometheus/api/v1/labels HTTP/1.1" 200 60 "http://observability.golem.gaws.gigantic.io/prometheus/api/v1/labels" "Grafana/11.3.1" 1624 0.003 [mimir-mimir-query-frontend-8080] [] 100.64.140.249:8080 60 0.003 200 73447a25690970024b8a07f9a1a17588
  • sample requests from the MC's grafana :
90.2.115.78 - - [20/Jan/2025:16:54:08 +0000] "GET /api/datasources/uid/eeakmzvj012pse/resources/api/v1/metadata HTTP/2.0" 200 30 "https://grafana.golem.gaws.gigantic.io/explore?schemaVersion=1&panes=%7B%22ocb%22%3A%7B%22datasource%22%3A%22eeakmzvj012pse%22%2C%22queries%22%3A%5B%7B%
22refId%22%3A%22A%22%2C%22expr%22%3A%22%22%2C%22range%22%3Atrue%2C%22instant%22%3Atrue%2C%22datasource%22%3A%7B%22type%22%3A%22prometheus%22%2C%22uid%22%3A%22eeakmzvj012pse%22%7D%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-1h%22%2C%22to%22%3A%22now%22%7D%7D%7D&orgId=1" "Mozilla/5.
0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36" 397 0.017 [monitoring-grafana-80] [] <ip>:3000 30 0.018 200 2f181cd2b4b9f4afcd35bfafa6a344d8
90.2.115.78 - - [20/Jan/2025:16:54:08 +0000] "POST /api/datasources/uid/eeakmzvj012pse/resources/api/v1/labels HTTP/2.0" 200 3265 "https://grafana.golem.gaws.gigantic.io/explore?schemaVersion=1&panes=%7B%22ocb%22%3A%7B%22datasource%22%3A%22eeakmzvj012pse%22%2C%22queries%22%3A%5B%7B
%22refId%22%3A%22A%22%2C%22expr%22%3A%22%22%2C%22range%22%3Atrue%2C%22instant%22%3Atrue%2C%22datasource%22%3A%7B%22type%22%3A%22prometheus%22%2C%22uid%22%3A%22eeakmzvj012pse%22%7D%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-1h%22%2C%22to%22%3A%22now%22%7D%7D%7D&orgId=1" "Mozilla/5
.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36" 100 0.024 [monitoring-grafana-80] [] <ip>:3000 3277 0.024 200 677b541bcd1ccd05cbecca26f4fad625
90.2.115.78 - - [20/Jan/2025:16:54:16 +0000] "POST /api/frontend-metrics HTTP/2.0" 200 0 "https://grafana.golem.gaws.gigantic.io/explore?schemaVersion=1&panes=%7B%22ocb%22%3A%7B%22datasource%22%3A%22eeakmzvj012pse%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%
22%2C%22range%22%3Atrue%2C%22instant%22%3Atrue%2C%22datasource%22%3A%7B%22type%22%3A%22prometheus%22%2C%22uid%22%3A%22eeakmzvj012pse%22%7D%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-1h%22%2C%22to%22%3A%22now%22%7D%7D%7D&orgId=1" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36" 653 0.005 [monitoring-grafana-80] [] <ip>:3000 0 0.005 200 a13412e21085b9f7efd4667f40d7a4c2

@QuantumEnigmaa
Copy link

Update concerning the current setup : now the ingresses are defined as such :

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-giantswarm
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/auth-url: https://oidc.golem.gaws.gigantic.io/oauth2/dex/auth
    nginx.ingress.kubernetes.io/auth-signin: https://oidc.golem.gaws.gigantic.io/oauth2/dex/start?rd=$escaped_request_uri
  labels:
  name: observability-platform-endpoint
  namespace: loki
spec:
  ingressClassName: nginx
  rules:
  - host: observability.golem.gaws.gigantic.io
    http:
      paths:
      - backend:
          service:
            name: loki-gateway
            port:
              number: 80
        path: /loki/api/v1/query
        pathType: ImplementationSpecific
      - backend:
          service:
            name: loki-gateway
            port:
              number: 80
        path: /loki/api/v1/labels
        pathType: ImplementationSpecific
      - backend:
          service:
            name: loki-gateway
            port:
              number: 80
        path: /loki/api/v1/label
        pathType: ImplementationSpecific
      - backend:
          service:
            name: loki-gateway
            port:
              number: 80
        path: /loki/api/v1/query_range
        pathType: ImplementationSpecific
      - backend:
          service:
            name: loki-gateway
            port:
              number: 80
        path: /loki/api/v1/index
        pathType: ImplementationSpecific
      - backend:
          service:
            name: loki-gateway
            port:
              number: 80
        path: /loki/api/v1/series
        pathType: ImplementationSpecific

  tls:
  - hosts:
    - observability.golem.gaws.gigantic.io
    secretName: observability-platform-endpoint-ingress-cert
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-giantswarm
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/auth-url: https://oidc.golem.gaws.gigantic.io/oauth2/dex/auth
    nginx.ingress.kubernetes.io/auth-signin: https://oidc.golem.gaws.gigantic.io/oauth2/dex/start?rd=$escaped_request_uri
  labels:
  name: observability-platform-endpoint
  namespace: mimir
spec:
  ingressClassName: nginx
  rules:
  - host: observability.golem.gaws.gigantic.io
    http:
      paths:
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/labels
        pathType: ImplementationSpecific
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/label
        pathType: ImplementationSpecific
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/rules
        pathType: ImplementationSpecific
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/query
        pathType: ImplementationSpecific
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/query_exemplars
        pathType: ImplementationSpecific
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/status
        pathType: ImplementationSpecific
      - backend:
          service:
            name: mimir-gateway
            port:
              number: 80
        path: /prometheus/api/v1/metadata
        pathType: ImplementationSpecific

  tls:
  - hosts:
    - observability.golem.gaws.gigantic.io
    secretName: observability-platform-endpoint-ingress-cert

So each path is hard-coded as discussed in the investigation issue.

With this setup, I'm still having exactly the same issues described in my previous comment though.

@QuantumEnigmaa
Copy link

So after pairing with @QuentinBisson we discovered that the reason the metrics requests were failing was because of the X-Scope-OrgID header added by the datasource to the requests. I initially put giantswarm as a value but switching it to anonymous solved the issue.

Moreover, we finally managed to make the POC work with authentication using dex on golem. Here are the steps we needed to go through :

  • Deploy ingress-nginx app in the WC
  • Deploy a user-values in the MC for the WC ingress-nginx app to set its baseDomain to the WC one
  • Deploy a service + ingress with tls for grafana in the WC
  • Set WC grafana config so that user accessing this grafana are redirected to golem's dex for authentication.
  • Log into the WC grafana and create Loki and Mimir datasources using the X-Scope-OrgID header with either the giantswarm value for loki or the anonymous one for mimir + set Forward OAuth Identity option
  • Set the 2 observability-platform-endpoint ingresses in the MC with the following annotation : nginx.ingress.kubernetes.io/auth-url: https://dex.golem.gaws.gigantic.io/auth (so using dex directly and not going through oauth2-proxy)
  • Deploy a dex-user-values in the MC for dex so that the grafana static client and service redirect to the WC grafana instead of the MC one.

In the end, everything is working fine : one can log into the WC grafana and query observability data (whether logs or metrics) from the MC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/atlas Team Atlas
Projects
Status: Inbox 📥
Development

No branches or pull requests

2 participants