Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLOps - MLflow SharingHub does not work #39

Open
blaziq opened this issue Nov 25, 2024 · 0 comments
Open

MLOps - MLflow SharingHub does not work #39

blaziq opened this issue Nov 25, 2024 · 0 comments

Comments

@blaziq
Copy link

blaziq commented Nov 25, 2024

This might be a consequence of #38 or an entirely different issue but the MLflow Sharing Hub does not work after deployment.
It produces HTTP error 502 Bad Gateway:
image

Logs:

$ kubectl logs -n sharinghub mlflow-sharinghub-7b56cc8f74-twf44
2024/11/25 14:10:20 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 0.1 seconds
2024/11/25 14:10:20 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 0.3 seconds
2024/11/25 14:10:20 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 0.7 seconds
2024/11/25 14:10:21 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 1.5 seconds
2024/11/25 14:10:22 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 3.1 seconds
2024/11/25 14:10:25 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 6.3 seconds
2024/11/25 14:10:32 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 12.7 seconds
2024/11/25 14:10:44 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 25.5 seconds
2024/11/25 14:11:10 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 51.1 seconds

Describe:

$ kubectl describe -n sharinghub  pod mlflow-sharinghub-7b56cc8f74-twf44
Name:             mlflow-sharinghub-7b56cc8f74-twf44
Namespace:        sharinghub
Priority:         0
Service Account:  mlflow-sharinghub
Node:             eoepcak2/10.28.9.16
Start Time:       Mon, 25 Nov 2024 14:04:24 +0000
Labels:           app.kubernetes.io/instance=mlflow-sharinghub
                  app.kubernetes.io/name=mlflow-sharinghub
                  pod-template-hash=7b56cc8f74
Annotations:      cni.projectcalico.org/containerID: 1b9bd0346d9007d4db6665646799ec908ea22d5636286d0ff51cef8174f66c77
                  cni.projectcalico.org/podIP: 10.42.127.141/32
                  cni.projectcalico.org/podIPs: 10.42.127.141/32
Status:           Running
IP:               10.42.127.141
IPs:
  IP:           10.42.127.141
Controlled By:  ReplicaSet/mlflow-sharinghub-7b56cc8f74
Containers:
  mlflow-sharinghub:
    Container ID:  containerd://c404a8caf95850903f8a318c44b0c3f35a1f7a374704116985a12dda6b25f05a
    Image:         eoepca/mlflow-sharinghub:0.2.0
    Image ID:      docker.io/eoepca/mlflow-sharinghub@sha256:19a96343f933203312308ae5c1b0b5632555c2bf55fff730fe3e96c2f7ab52db
    Port:          5000/TCP
    Host Port:     0/TCP
    Args:
      mlflow
      server
      --app-name
      sharinghub
    State:          Waiting
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 25 Nov 2024 14:10:18 +0000
      Finished:     Mon, 25 Nov 2024 14:12:01 +0000
    Ready:          False
    Restart Count:  3
    Environment:
      SECRET_KEY:                     <set to the key 'secret-key' in secret 'mlflow-sharinghub'>  Optional: false
      MLFLOW_HOST:                    0.0.0.0
      MLFLOW_PORT:                    5000
      MLFLOW_WORKERS:                 4
      MLFLOW_ARTIFACTS_DESTINATION:   s3://mlops-bucket
      MLFLOW_S3_ENDPOINT_URL:         https://minio.eoepca-sciencehub.esa.int
      AWS_ACCESS_KEY_ID:              <set to the key 'access-key-id' in secret 'mlflow-sharinghub-s3'>      Optional: false
      AWS_SECRET_ACCESS_KEY:          <set to the key 'secret-access-key' in secret 'mlflow-sharinghub-s3'>  Optional: false
      LOGIN_AUTO_REDIRECT:            false
      SHARINGHUB_URL:                 https://sharinghub.eoepca-sciencehub.esa.int
      SHARINGHUB_STAC_COLLECTION:     ai-model
      SHARINGHUB_AUTH_DEFAULT_TOKEN:  false
    Mounts:
      /home/mlflow/data from store (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wjj9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  store:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mlflow-sharinghub-store-pvc
    ReadOnly:   false
  kube-api-access-8wjj9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  8m1s                default-scheduler  Successfully assigned sharinghub/mlflow-sharinghub-7b56cc8f74-twf44 to eoepcak2
  Normal   Pulled     2m7s (x4 over 8m)   kubelet            Container image "eoepca/mlflow-sharinghub:0.2.0" already present on machine
  Normal   Created    2m7s (x4 over 8m)   kubelet            Created container mlflow-sharinghub
  Normal   Started    2m7s (x4 over 8m)   kubelet            Started container mlflow-sharinghub
  Warning  BackOff    9s (x5 over 4m32s)  kubelet            Back-off restarting failed container mlflow-sharinghub in pod mlflow-sharinghub-7b56cc8f74-twf44_sharinghub(66386697-b95f-4d96-b97b-e4f83d243a32)
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 25 Nov 2024 14:10:18 +0000
      Finished:     Mon, 25 Nov 2024 14:12:01 +0000
    Ready:          False
    Restart Count:  3
    Environment:
      SECRET_KEY:                     <set to the key 'secret-key' in secret 'mlflow-sharinghub'>  Optional: false
      MLFLOW_HOST:                    0.0.0.0
      MLFLOW_PORT:                    5000
      MLFLOW_WORKERS:                 4
      MLFLOW_ARTIFACTS_DESTINATION:   s3://mlops-bucket
      MLFLOW_S3_ENDPOINT_URL:         https://minio.eoepca-sciencehub.esa.int
      AWS_ACCESS_KEY_ID:              <set to the key 'access-key-id' in secret 'mlflow-sharinghub-s3'>      Optional: false
      AWS_SECRET_ACCESS_KEY:          <set to the key 'secret-access-key' in secret 'mlflow-sharinghub-s3'>  Optional: false
      LOGIN_AUTO_REDIRECT:            false
      SHARINGHUB_URL:                 https://sharinghub.eoepca-sciencehub.esa.int
      SHARINGHUB_STAC_COLLECTION:     ai-model
      SHARINGHUB_AUTH_DEFAULT_TOKEN:  false
    Mounts:
      /home/mlflow/data from store (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wjj9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  store:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mlflow-sharinghub-store-pvc
    ReadOnly:   false
  kube-api-access-8wjj9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  8m1s                default-scheduler  Successfully assigned sharinghub/mlflow-sharinghub-7b56cc8f74-twf44 to eoepcak2
  Normal   Pulled     2m7s (x4 over 8m)   kubelet            Container image "eoepca/mlflow-sharinghub:0.2.0" already present on machine
  Normal   Created    2m7s (x4 over 8m)   kubelet            Created container mlflow-sharinghub
  Normal   Started    2m7s (x4 over 8m)   kubelet            Started container mlflow-sharinghub
  Warning  BackOff    9s (x5 over 4m32s)  kubelet            Back-off restarting failed container mlflow-sharinghub in pod mlflow-sharinghub-7b56cc8f74-twf44_sharinghub(66386697-b95f-4d96-b97b-e4f83d243a32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant