Skip to content

Database entry of table LAYER_DRBD_VOLUMES could not be restored #470

@hkraal

Description

@hkraal

I forgot to disable a linstor schedule resulting in quite some snapshots. After removing ~1048 snapshot backups the database seems to got corrupted.

While the schedule was configured to keep only a few copies local I ended up having >1000 snapshots and s3 objects.

LINSTOR ==> schedule list
╭───────────────────────────────────────────────────────────────────────╮
┊ Name  ┊ Full      ┊ Incremental  ┊ KeepLocal ┊ KeepRemote ┊ OnFailure ┊
╞═══════════════════════════════════════════════════════════════════════╡
┊ Daily ┊ 2 * * * * ┊ 2/15 * * * * ┊ 2         ┊ 4          ┊ RETRY(2)  ┊
╰───────────────────────────────────────────────────────────────────────╯

A partial list of the snapshots: partial_snapshot_list.txt

Removing the backups from S3 (linstor backup delete all s3) worked as in that the files are gone but errored out with the following;

root@linstor-controller-6c455fb579-bbk62:/# linstor backup delete all s3
ERROR:
    attempt to replace an active transMgr
Show reports:
    linstor error-reports show 692965E4-00000-000909

Report: report_692965E4-00000-000909.txt

At this point snapshots were in a deleting state. Removing them using the Web UI or linstor CLI yielded similar attempt to replace an active transMgr errors. At this point I restarted the controller which failed with the error in question;

linstor-controller time="2025-12-08T15:32:25Z" level=info msg="running k8s-await-election" version=refs/tags/v0.4.1
linstor-controller time="2025-12-08T15:32:25Z" level=info msg="no status endpoint specified, will not be created"
linstor-controller I1208 15:32:25.887402       1 leaderelection.go:250] attempting to acquire leader lease piraeus-datastore/linstor-controller...
linstor-controller I1208 15:32:25.903707       1 leaderelection.go:260] successfully acquired lease piraeus-datastore/linstor-controller
linstor-controller time="2025-12-08T15:32:25Z" level=info msg="long live our new leader: 'linstor-controller-5fff475694-9ln29'!"
linstor-controller time="2025-12-08T15:32:25Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startController]'"
linstor-controller LINSTOR, Module Controller
linstor-controller Version:            1.32.1 (e04f98efc3aeb643cf109ffd322a4f2506000da1)
linstor-controller Build time:         2025-09-16T09:03:12+00:00 Log v2
linstor-controller Java Version:       17
linstor-controller Java VM:            Debian, Version 17.0.16+8-Debian-1deb12u1
linstor-controller Operating system:   Linux, Version 6.12.57-talos
linstor-controller Environment:        amd64, 4 processors, 8192 MiB memory reserved for allocations
linstor-controller 
linstor-controller 
linstor-controller System components initialization in progress
linstor-controller 
linstor-controller Loading configuration file "/etc/linstor/linstor.toml"
linstor-controller 2025-12-08 15:32:26.988 [main] INFO  LINSTOR/Controller/ffffff SYSTEM - ErrorReporter DB version 1 found.
linstor-controller 2025-12-08 15:32:26.991 [main] INFO  LINSTOR/Controller/ffffff SYSTEM - Log directory set to: '/var/log/linstor-controller'
linstor-controller 2025-12-08 15:32:27.032 [main] INFO  LINSTOR/Controller/ffffff SYSTEM - Database type is Kubernetes-CRD
linstor-controller 2025-12-08 15:32:27.033 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Loading API classes started.
linstor-controller 2025-12-08 15:32:27.482 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - API classes loading finished: 449ms
linstor-controller 2025-12-08 15:32:27.482 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dependency injection started.
linstor-controller 2025-12-08 15:32:27.499 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
linstor-controller 2025-12-08 15:32:27.500 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
linstor-controller 2025-12-08 15:32:27.500 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
linstor-controller 2025-12-08 15:32:27.515 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
linstor-controller 2025-12-08 15:32:27.515 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
linstor-controller 2025-12-08 15:32:27.516 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" was successful
linstor-controller 2025-12-08 15:32:28.573 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Dependency injection finished: 1090ms
linstor-controller 2025-12-08 15:32:28.574 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Cryptography provider: Using default cryptography module
linstor-controller 2025-12-08 15:32:28.928 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Initializing authentication subsystem
linstor-controller 2025-12-08 15:32:29.265 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - SpaceTrackingService: Instance added as a system service
linstor-controller 2025-12-08 15:32:29.266 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
linstor-controller 2025-12-08 15:32:29.267 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Initializing the k8s crd database connector
linstor-controller 2025-12-08 15:32:29.267 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Kubernetes-CRD connection URL is "k8s"
linstor-controller 2025-12-08 15:32:31.462 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Starting service instance 'K8sCrdDatabaseService' of type K8sCrdDatabaseService
linstor-controller 2025-12-08 15:32:31.473 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Security objects load from database is in progress
linstor-controller 2025-12-08 15:32:31.928 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Security objects load from database completed
linstor-controller 2025-12-08 15:32:31.928 [Main] INFO  LINSTOR/Controller/ffffff SYSTEM - Core objects load from database is in progress
linstor-controller 2025-12-08 15:32:34.918 [Main[] ERROR LINSTOR/Controller/ffffff SYSTEM - Database entry of table LAYER_DRBD_VOLUMES could not be restored. [Report number 6936EF8A-00000-000000]
linstor-controller 
linstor-controller 2025-12-08 15:32:34.922 [Main[] ERROR LINSTOR/Controller/ffffff SYSTEM - Unhandled exception [Report number 6936EF8A-00000-000001]
linstor-controller 
linstor-controller 2025-12-08 15:32:34.950 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutdown in progress
linstor-controller 2025-12-08 15:32:34.950 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutting down service instance 'EbsStatusPoll' of type EbsStatusPoll
linstor-controller 2025-12-08 15:32:34.950 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutting down service instance 'ScheduleBackupService' of type ScheduleBackupService
linstor-controller 2025-12-08 15:32:34.950 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutting down service instance 'SpaceTrackingService' of type SpaceTrackingService
linstor-controller 2025-12-08 15:32:34.951 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutting down service instance 'TaskScheduleService' of type TaskScheduleService
linstor-controller 2025-12-08 15:32:34.951 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutting down service instance 'K8sCrdDatabaseService' of type K8sCrdDatabaseService
linstor-controller 2025-12-08 15:32:34.968 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutting down service instance 'TimerEventService' of type TimerEventService
linstor-controller 2025-12-08 15:32:34.969 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Waiting for service instance 'EbsStatusPoll' to complete shutdown
linstor-controller 2025-12-08 15:32:34.969 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Waiting for service instance 'ScheduleBackupService' to complete shutdown
linstor-controller 2025-12-08 15:32:34.969 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Waiting for service instance 'SpaceTrackingService' to complete shutdown
linstor-controller 2025-12-08 15:32:34.969 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Waiting for service instance 'TaskScheduleService' to complete shutdown
linstor-controller 2025-12-08 15:32:34.969 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Waiting for service instance 'K8sCrdDatabaseService' to complete shutdown
linstor-controller 2025-12-08 15:32:34.970 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Waiting for service instance 'TimerEventService' to complete shutdown
linstor-controller 2025-12-08 15:32:34.970 [Thread-2] INFO  LINSTOR/Controller/a85651 SYSTEM - Shutdown complete
linstor-controller time="2025-12-08T15:32:35Z" level=fatal msg="failed to run" err="exit status 20"
stream closed: EOF for piraeus-datastore/linstor-controller-5fff475694-9ln29 (linstor-controller)

I have complete logs stored in Loki (for 14 or so more days) so if more information is required let me know. I've left the (non production) cluster in the same state so debugging it is an option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions