Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug logging around Ingress PP node routing and partition refreshes #2224

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pcholakov
Copy link
Contributor

@pcholakov pcholakov commented Nov 6, 2024

Additional trace-level logging for some cases that might help for troubleshooting partition routing; increase log level to warn for error cases to highlight that nodes might be operating with stale information.

Copy link

github-actions bot commented Nov 6, 2024

Test Results

  7 files  ±0    7 suites  ±0   4m 22s ⏱️ -2s
 47 tests ±0   46 ✅ ±0  1 💤 ±0  0 ❌ ±0 
182 runs  ±0  179 ✅ ±0  3 💤 ±0  0 ❌ ±0 

Results for commit 9a1b4cb. ± Comparison against base commit 674501a.

♻️ This comment has been updated with latest results.

@pcholakov pcholakov force-pushed the test/partition-routing-refresh-logging branch from a17f33a to f8a45d3 Compare November 15, 2024 17:50
@pcholakov pcholakov marked this pull request as ready for review November 15, 2024 17:52
@pcholakov pcholakov force-pushed the test/partition-routing-refresh-logging branch from f8a45d3 to 9a1b4cb Compare November 15, 2024 17:56
let result: Result<Option<SchedulingPlan>, _> =
metadata_store_client.get(SCHEDULING_PLAN_KEY.clone()).await;

let Ok(scheduling_plan) = result else {
debug!(
warn!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should be warn TBH, because it gets retried, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retired?

I think it's a bad situation that the operator should know about; if we can't fetch or refresh the plan, routing decisions will be failing and things will look very odd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants