You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This might be a consequence of #38 or an entirely different issue but the MLflow Sharing Hub does not work after deployment.
It produces HTTP error 502 Bad Gateway:
Logs:
$ kubectl logs -n sharinghub mlflow-sharinghub-7b56cc8f74-twf44
2024/11/25 14:10:20 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 0.1 seconds
2024/11/25 14:10:20 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 0.3 seconds
2024/11/25 14:10:20 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 0.7 seconds
2024/11/25 14:10:21 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 1.5 seconds
2024/11/25 14:10:22 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 3.1 seconds
2024/11/25 14:10:25 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 6.3 seconds
2024/11/25 14:10:32 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 12.7 seconds
2024/11/25 14:10:44 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 25.5 seconds
2024/11/25 14:11:10 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Operation will be retried in 51.1 seconds
Describe:
$ kubectl describe -n sharinghub pod mlflow-sharinghub-7b56cc8f74-twf44
Name: mlflow-sharinghub-7b56cc8f74-twf44
Namespace: sharinghub
Priority: 0
Service Account: mlflow-sharinghub
Node: eoepcak2/10.28.9.16
Start Time: Mon, 25 Nov 2024 14:04:24 +0000
Labels: app.kubernetes.io/instance=mlflow-sharinghub
app.kubernetes.io/name=mlflow-sharinghub
pod-template-hash=7b56cc8f74
Annotations: cni.projectcalico.org/containerID: 1b9bd0346d9007d4db6665646799ec908ea22d5636286d0ff51cef8174f66c77
cni.projectcalico.org/podIP: 10.42.127.141/32
cni.projectcalico.org/podIPs: 10.42.127.141/32
Status: Running
IP: 10.42.127.141
IPs:
IP: 10.42.127.141
Controlled By: ReplicaSet/mlflow-sharinghub-7b56cc8f74
Containers:
mlflow-sharinghub:
Container ID: containerd://c404a8caf95850903f8a318c44b0c3f35a1f7a374704116985a12dda6b25f05a
Image: eoepca/mlflow-sharinghub:0.2.0
Image ID: docker.io/eoepca/mlflow-sharinghub@sha256:19a96343f933203312308ae5c1b0b5632555c2bf55fff730fe3e96c2f7ab52db
Port: 5000/TCP
Host Port: 0/TCP
Args:
mlflow
server
--app-name
sharinghub
State: Waiting
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 25 Nov 2024 14:10:18 +0000
Finished: Mon, 25 Nov 2024 14:12:01 +0000
Ready: False
Restart Count: 3
Environment:
SECRET_KEY: <set to the key 'secret-key' in secret 'mlflow-sharinghub'> Optional: false
MLFLOW_HOST: 0.0.0.0
MLFLOW_PORT: 5000
MLFLOW_WORKERS: 4
MLFLOW_ARTIFACTS_DESTINATION: s3://mlops-bucket
MLFLOW_S3_ENDPOINT_URL: https://minio.eoepca-sciencehub.esa.int
AWS_ACCESS_KEY_ID: <set to the key 'access-key-id' in secret 'mlflow-sharinghub-s3'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'secret-access-key' in secret 'mlflow-sharinghub-s3'> Optional: false
LOGIN_AUTO_REDIRECT: false
SHARINGHUB_URL: https://sharinghub.eoepca-sciencehub.esa.int
SHARINGHUB_STAC_COLLECTION: ai-model
SHARINGHUB_AUTH_DEFAULT_TOKEN: false
Mounts:
/home/mlflow/data from store (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wjj9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
store:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mlflow-sharinghub-store-pvc
ReadOnly: false
kube-api-access-8wjj9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m1s default-scheduler Successfully assigned sharinghub/mlflow-sharinghub-7b56cc8f74-twf44 to eoepcak2
Normal Pulled 2m7s (x4 over 8m) kubelet Container image "eoepca/mlflow-sharinghub:0.2.0" already present on machine
Normal Created 2m7s (x4 over 8m) kubelet Created container mlflow-sharinghub
Normal Started 2m7s (x4 over 8m) kubelet Started container mlflow-sharinghub
Warning BackOff 9s (x5 over 4m32s) kubelet Back-off restarting failed container mlflow-sharinghub in pod mlflow-sharinghub-7b56cc8f74-twf44_sharinghub(66386697-b95f-4d96-b97b-e4f83d243a32)
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 25 Nov 2024 14:10:18 +0000
Finished: Mon, 25 Nov 2024 14:12:01 +0000
Ready: False
Restart Count: 3
Environment:
SECRET_KEY: <set to the key 'secret-key' in secret 'mlflow-sharinghub'> Optional: false
MLFLOW_HOST: 0.0.0.0
MLFLOW_PORT: 5000
MLFLOW_WORKERS: 4
MLFLOW_ARTIFACTS_DESTINATION: s3://mlops-bucket
MLFLOW_S3_ENDPOINT_URL: https://minio.eoepca-sciencehub.esa.int
AWS_ACCESS_KEY_ID: <set to the key 'access-key-id' in secret 'mlflow-sharinghub-s3'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'secret-access-key' in secret 'mlflow-sharinghub-s3'> Optional: false
LOGIN_AUTO_REDIRECT: false
SHARINGHUB_URL: https://sharinghub.eoepca-sciencehub.esa.int
SHARINGHUB_STAC_COLLECTION: ai-model
SHARINGHUB_AUTH_DEFAULT_TOKEN: false
Mounts:
/home/mlflow/data from store (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wjj9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
store:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mlflow-sharinghub-store-pvc
ReadOnly: false
kube-api-access-8wjj9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m1s default-scheduler Successfully assigned sharinghub/mlflow-sharinghub-7b56cc8f74-twf44 to eoepcak2
Normal Pulled 2m7s (x4 over 8m) kubelet Container image "eoepca/mlflow-sharinghub:0.2.0" already present on machine
Normal Created 2m7s (x4 over 8m) kubelet Created container mlflow-sharinghub
Normal Started 2m7s (x4 over 8m) kubelet Started container mlflow-sharinghub
Warning BackOff 9s (x5 over 4m32s) kubelet Back-off restarting failed container mlflow-sharinghub in pod mlflow-sharinghub-7b56cc8f74-twf44_sharinghub(66386697-b95f-4d96-b97b-e4f83d243a32)
The text was updated successfully, but these errors were encountered:
This might be a consequence of #38 or an entirely different issue but the MLflow Sharing Hub does not work after deployment.
It produces HTTP error 502 Bad Gateway:
Logs:
Describe:
The text was updated successfully, but these errors were encountered: