You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In separate command line load etcd (2MB of data) and defrag.
for num in {1..20}; do
./bin/etcdctl put a `tr -dc A-Za-z0-9 </dev/urandom | head -c 100000`
done
./bin/etcdctl defrag
/bin/etcdctl put a 1
Expect defrag to fail due to no space, and etcd to crash next time it touches backend. Sometimes it needs additional put call to ensure it accesses db and crashes.
The reason is that defragment closes the backend db (bbolt), but it doesn't reopen it when it fails for whatever reason. So etcdserver panics when other jobs access the backend db.
The immediate solution that I can think of is to restore the environment (i.e. reopen the backend db) if defrag fails for whatever reason, and
either panicking if it fails to restore the environment
or add protection when accessing the backend, i.e return error if the backend has closed.
It should be very easy to reproduce this issue by adding a failpoint similar to db.go#L490-L491. Note don't panicking the failpoint, instead return an error.
Bug report criteria
What happened?
Etcd crashes with stacktrace
What did you expect to happen?
Etcd should not crash, either:
How can we reproduce it (as minimally and precisely as possible)?
Run etcd with 64 MB (62MB is used for WAL)
In separate command line load etcd (2MB of data) and defrag.
Expect defrag to fail due to no space, and etcd to crash next time it touches backend. Sometimes it needs additional put call to ensure it accesses db and crashes.
cleanup of mount
Anything else we need to know?
No response
Etcd version (please run commands below)
Reproduced on all latest branches.
Etcd configuration (command line flags or environment variables)
Just ensure that --data-dir points to directory with limited diskspace.
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
The text was updated successfully, but these errors were encountered: