Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: kv/splits/nodes=3/quiesce=false/lease=leader failed #138118

Open
cockroach-teamcity opened this issue Dec 31, 2024 · 2 comments
Open
Assignees
Labels
A-leader-leases Related to the introduction of leader leases branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Dec 31, 2024

roachtest.kv/splits/nodes=3/quiesce=false/lease=leader failed with artifacts on master @ 47699f3887ad5d1b8c7c5905eb5c49628aa59bbe:

(cluster.go:2481).Run: full command output in run_075326.211381454_n4_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/kv/splits/nodes=3/quiesce=false/lease=leader/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-45903

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Dec 31, 2024
@cockroach-teamcity
Copy link
Member Author

roachtest.kv/splits/nodes=3/quiesce=false/lease=leader failed with artifacts on master @ 47699f3887ad5d1b8c7c5905eb5c49628aa59bbe:

(cluster.go:2481).Run: full command output in run_081944.177141830_n4_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/kv/splits/nodes=3/quiesce=false/lease=leader/cpu_arch=arm64/run_1

Parameters:

  • arch=arm64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • runtimeAssertionsBuild=false
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@miraradeva
Copy link
Contributor

Nodes definitely became overloaded in both failures above.

In the first failure, overload started around 08:03 when the CPU hit 96-98% utilization. The number of ranges reached 79k (80k is the stopping point for the test).

In the second failure, the CPU peaked at 93-95%, but the number of split ranges reached only 62k.

We could lower the expected number of splits for this test, but I think I'd rather watch at what point it fails and observe the improvements as part of the LeadSupportUntil optimization (that Ibrahim is working on) and the store-liveness-informed Raft quiescence of followers (#133885).

@miraradeva miraradeva self-assigned this Jan 2, 2025
@miraradeva miraradeva added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker P-2 Issues/test failures with a fix SLA of 3 months A-leader-leases Related to the introduction of leader leases and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-leader-leases Related to the introduction of leader leases branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants