Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arch/version to cpu profiler #24658

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

travisdowns
Copy link
Member

  • debug api: wrong ID for cpu_profile_shard_samples
  • remove unused headers
  • utils: add arch library
  • cpu_profiler: expose the sample period
  • admin: include more info in cpu profile sample

@travisdowns travisdowns requested a review from a team as a code owner December 24, 2024 19:31
@travisdowns travisdowns changed the title td add arch version utils Add arch/version to cpu profiler Dec 24, 2024
id was set to cpu_profile_sample, should be cpu_profile_shard_samples.
A small header which exposes the architecture of the current process
as a constexpr value.

We will use this in the cpu_profiler in order to record the arch in
the output.

Includes a (very trivial) test.
Expose this so we can include it in the admin API result. This can be
useful to estimate how busy the reactor was, as we can calculate the
utilization based on the expected number of samples (at 100%
util) vs the observed number.
Change the output format of the cpu profiler API to include:

 - The CPU architecture
 - The version string
 - The wait_ms, if specified
 - The profiler sample period
 - The schema version of the API response

The first two of the above enable us to symbolize profiles directly from
the result without needing to know the version/arch and download
symbols separately.
@travisdowns travisdowns force-pushed the td-add-arch-version-utils branch from 1b69f68 to d339977 Compare December 24, 2024 19:34
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 24, 2024

Retry command for Build#60132

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/cpu_profiler_admin_api_test.py::CPUProfilerAdminAPITest.test_get_cpu_profile
tests/rptest/tests/cpu_profiler_admin_api_test.py::CPUProfilerAdminAPITest.test_get_cpu_profile_with_override

Copy link
Member

@StephanDollberg StephanDollberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some tests will need updating.

Is there a reason you didn't update the memory profiler as well?

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 24, 2024

CI test results

test results on build#60132
test_id test_kind job_url test_status passed
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile ducktape https://buildkite.com/redpanda/redpanda/builds/60132#0193fa6c-8e82-428e-8e6c-4178b3a23dd5 FAIL 0/1
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile ducktape https://buildkite.com/redpanda/redpanda/builds/60132#0193fa87-d0ac-497b-8cd9-de602b6e2b07 FAIL 0/1
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile_with_override ducktape https://buildkite.com/redpanda/redpanda/builds/60132#0193fa6c-8e82-49cd-b21d-00a779b61f71 FAIL 0/1
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile_with_override ducktape https://buildkite.com/redpanda/redpanda/builds/60132#0193fa87-d0a9-4cf9-aaf8-f066942dddb5 FAIL 0/1
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60132#0193fa87-d0ac-497b-8cd9-de602b6e2b07 FLAKY 4/6
rptest.tests.maintenance_test.MaintenanceTest.test_maintenance_sticky.use_rpk=True ducktape https://buildkite.com/redpanda/redpanda/builds/60132#0193fa87-d0a9-4cf9-aaf8-f066942dddb5 FLAKY 5/6
test results on build#60226
test_id test_kind job_url test_status passed
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile ducktape https://buildkite.com/redpanda/redpanda/builds/60226#019427b5-c7dd-4983-8fb6-f7b6c1254488 FAIL 0/1
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile ducktape https://buildkite.com/redpanda/redpanda/builds/60226#019427cf-ee10-49cd-b1ad-96f34ab06566 FAIL 0/1
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile_with_override ducktape https://buildkite.com/redpanda/redpanda/builds/60226#019427b5-c7de-4e14-ab05-e5ca1c377319 FAIL 0/1
rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile_with_override ducktape https://buildkite.com/redpanda/redpanda/builds/60226#019427cf-ee0e-4928-bc72-1425979ebc5b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60226#019427cf-ee0e-4928-bc72-1425979ebc5b FAIL 0/1

@travisdowns
Copy link
Member Author

Is there a reason you didn't update the memory profiler as well?

No it's coming, mostly just wanted to put this v1 up to see if there were concerns with the approach, etc.

Add an option to set the stack depth of the of spin loop in the
stress fiber, i.e., the spin loop will at the end of a recursive
call chain (not inlined) of depth N.

Good for stressing CPU proflier.
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 2, 2025

Retry command for Build#60226

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/cpu_profiler_admin_api_test.py::CPUProfilerAdminAPITest.test_get_cpu_profile
tests/rptest/tests/cpu_profiler_admin_api_test.py::CPUProfilerAdminAPITest.test_get_cpu_profile_with_override
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":false}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants