-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ObjProt (/resourcedefinitions/PVC-2ABC1180-FAEB-4B82-B35F-7B7F1FBF6B09) not found! #433
Comments
Unfortunately there is no chance to figure out how you got to this database with missing entries. I am attaching a list of JSON objects that when I create in my local setup the controller seems to be booting again. This is the place where I usually remind people to create a backup before applying some changes to the database, but since we still have your 1.tgz "backup" attached in the first post, that should be fine. Created object{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "SecObjectProtection",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:22Z",
"generation": 1,
"name": "6cfea00753c6da3770751bc23d624fd58909ee98a9bba647e2eb08eab1a9d6bc",
"resourceVersion": "24548",
"uid": "5e1ddac4-fab4-4c3e-ba01-3e4ab9af374b"
},
"spec": {
"creator_identity_name": "PUBLIC",
"object_path": "/resourcedefinitions/PVC-2ABC1180-FAEB-4B82-B35F-7B7F1FBF6B09",
"owner_role_name": "PUBLIC",
"security_type_name": "PUBLIC"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "ResourceDefinitions",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:14Z",
"generation": 1,
"name": "24edeb9e11a204afbf7fb5fc4eda4f1fc556a654644602021f6b79060e471117",
"resourceVersion": "23845",
"uid": "0946ade6-460d-48ea-848a-f4d3369d79f2"
},
"spec": {
"layer_stack": "[\"DRBD\",\"STORAGE\"]",
"resource_dsp_name": "pvc-c7b1550a-c903-45cf-9d4b-662febef2334",
"resource_flags": 1,
"resource_group_name": "SC-414F8532-9472-5CF5-9521-5012BA2ABF74",
"resource_name": "PVC-C7B1550A-C903-45CF-9D4B-662FEBEF2334",
"snapshot_dsp_name": "snapshot-07b6f5d9-bbd8-4db4-a7e0-eb6d2926edc2",
"snapshot_name": "SNAPSHOT-07B6F5D9-BBD8-4DB4-A7E0-EB6D2926EDC2",
"uuid": "d085cc23-5b80-4ee8-93d0-667f5d6afbf3"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "ResourceDefinitions",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:14Z",
"generation": 1,
"name": "29014b7486c01f50e615b6ab438efa1f19a9cbada5077d2a53720284f2c7f429",
"resourceVersion": "23842",
"uid": "922b83c1-6e5d-4b8b-832b-32459fa851ef"
},
"spec": {
"layer_stack": "[\"DRBD\",\"STORAGE\"]",
"resource_dsp_name": "pvc-c7b1550a-c903-45cf-9d4b-662febef2334",
"resource_flags": 1,
"resource_group_name": "SC-414F8532-9472-5CF5-9521-5012BA2ABF74",
"resource_name": "PVC-C7B1550A-C903-45CF-9D4B-662FEBEF2334",
"snapshot_dsp_name": "snapshot-f61cda0f-8754-4e92-98e9-15075a605c4d",
"snapshot_name": "SNAPSHOT-F61CDA0F-8754-4E92-98E9-15075A605C4D",
"uuid": "6fae6ae1-2efe-4898-821d-c85b061b0496"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "ResourceDefinitions",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:14Z",
"generation": 1,
"name": "7f921867f1c7692ddde77bd4399a2a7871fdc5506e076baa3ef109484d73b4ef",
"resourceVersion": "23843",
"uid": "b6cc4b76-925a-4808-b32a-9eb0a91fc8af"
},
"spec": {
"layer_stack": "[\"DRBD\",\"STORAGE\"]",
"resource_dsp_name": "pvc-c7b1550a-c903-45cf-9d4b-662febef2334",
"resource_flags": 1,
"resource_group_name": "SC-414F8532-9472-5CF5-9521-5012BA2ABF74",
"resource_name": "PVC-C7B1550A-C903-45CF-9D4B-662FEBEF2334",
"snapshot_dsp_name": "snapshot-971a9943-056c-46f4-ae84-448184492752",
"snapshot_name": "SNAPSHOT-971A9943-056C-46F4-AE84-448184492752",
"uuid": "11cf8abb-84f8-4e1a-8401-e3bb2fdef70c"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "ResourceDefinitions",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:14Z",
"generation": 1,
"name": "98191392caf4fdb33751dd09e0bd2357c6128f4ed1be213e01db3982e2797908",
"resourceVersion": "23846",
"uid": "43b65333-82c1-4c14-b990-1571079240b9"
},
"spec": {
"layer_stack": "[\"DRBD\",\"STORAGE\"]",
"resource_dsp_name": "pvc-c7b1550a-c903-45cf-9d4b-662febef2334",
"resource_flags": 1,
"resource_group_name": "SC-414F8532-9472-5CF5-9521-5012BA2ABF74",
"resource_name": "PVC-C7B1550A-C903-45CF-9D4B-662FEBEF2334",
"snapshot_dsp_name": "snapshot-58d7257c-689a-4a5f-a06e-406ec682154b",
"snapshot_name": "SNAPSHOT-58D7257C-689A-4A5F-A06E-406EC682154B",
"uuid": "2e8a6e7a-81ee-4613-9a37-b9cab34689c7"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "ResourceDefinitions",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:14Z",
"generation": 1,
"name": "b0b4cf509d6b1befeb682c0f755b06625c30fc634ea2201e897a829dc540fed2",
"resourceVersion": "23844",
"uid": "e46ff53d-4902-4ce6-a130-dae4ae0a65fc"
},
"spec": {
"layer_stack": "[\"DRBD\",\"STORAGE\"]",
"resource_dsp_name": "pvc-c7b1550a-c903-45cf-9d4b-662febef2334",
"resource_flags": 1,
"resource_group_name": "SC-414F8532-9472-5CF5-9521-5012BA2ABF74",
"resource_name": "PVC-C7B1550A-C903-45CF-9D4B-662FEBEF2334",
"snapshot_dsp_name": "snapshot-feb02af6-7b64-46cd-9bce-f0f806bece09",
"snapshot_name": "SNAPSHOT-FEB02AF6-7B64-46CD-9BCE-F0F806BECE09",
"uuid": "3627c859-0430-4bdc-b38b-120570d83f87"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "ResourceDefinitions",
"metadata": {
"creationTimestamp": "2025-01-27T11:43:14Z",
"generation": 1,
"name": "c7263fcf19f10fbc5a9ffa37864110dd72b11c6b2e757a714eae2f1626a2eada",
"resourceVersion": "23841",
"uid": "b499006e-d285-42cb-863d-a2d884856959"
},
"spec": {
"layer_stack": "[\"DRBD\",\"STORAGE\"]",
"resource_dsp_name": "pvc-c7b1550a-c903-45cf-9d4b-662febef2334",
"resource_flags": 1,
"resource_group_name": "SC-414F8532-9472-5CF5-9521-5012BA2ABF74",
"resource_name": "PVC-C7B1550A-C903-45CF-9D4B-662FEBEF2334",
"snapshot_dsp_name": "snapshot-51663312-2043-4410-8e5c-8c787e9bafc0",
"snapshot_name": "SNAPSHOT-51663312-2043-4410-8E5C-8C787E9BAFC0",
"uuid": "d249f6c5-32c6-49ba-babf-3098c85417b2"
}
} Let me know if this helped. |
Thank you I have solved this already by removing defect resources from the db |
@ghernadi haven't you thought about some kind of fsck tool for the database? |
I am not sure if such a tool would really help. Sure, I do see the point that there would be a chance that such a tool could repair a database like in your case, but if other entries would have been missing there is no chance any tool could just reconstruct the missing data... You are correct that this is not the first case of such problems but, although I am not 100% sure about this, I believe such issues only occur in K8s setups. Just to be clear, I do not want to blame K8s and call it a day, but that the issue might be somewhere in LINSTOR's K8s driver, or LINSTOR's usage of multiple versions of CRD schemas, or the rollback-mechanism of LINSTOR that might have issues or something else within LINSTOR that is related with K8s. Such a tool that you are talking about would only fix the symptoms, but I would be rather interested in fixing the root problem. |
Maybe @kvaps meant, just a checker to prove current data integrity and so you could better spot when things got wrong. |
I meant a tool that would help fix the database in some way, just to make it functional. If no solution is available, the tool should remove all defective resources to get it running again.
This might be useful for set-up allerting, but still have no idea what to do with such alerts 🙂
LINSTOR uses Kubernetes not in very standard way. In fact, it creates and modifies many resources at once. |
Unfortunately, I don’t share the view that these issues are too rare to diagnose. We’ve encountered at least five separate incidents in the past 12 months where the LINSTOR database ended up in a corrupted state. Some of these are documented in #415 and this forum post. In fact, one of our clusters is currently down because the controller fails to start due to a corrupted state which we don't know how to solve. In previous occasions the error logs would give me some pointers on which records to delete from the database. Repeating the process until eventually the controller would boot again. I agree that addressing the root cause is ultimately the best approach. However, even a single resource corruption can bring down the entire system, causing major downtime. A tool to restore the database to a functional state would at least keep us operational while the root cause can be investigated in the background. Per the Piraesu operator’s default, we use Kubernetes as the backing datastore. With having faced multiple corruptions, I’ve often wondered if using an external PostgreSQL database would have yielded a different outcome, especially based on #338 (comment). I was told LINSTOR does build its own basic transaction support, but how would that compare with a backing postgres datastore? |
linstor controller 1.29.2.
I created a few PVC using snapshot mechanism in Kubernetes, after that linstor-controller stopped booting.
I'm attaching db dump
1.tgz
The text was updated successfully, but these errors were encountered: