sd-agent: bazel: use Cargo for crates versions #45318

Yumasi · 2026-01-21T13:33:21Z

What does this PR do?

This PR uses Bazel rules_rust from_cargo feature to get the list of crates dependencies from Cargo.toml and Cargo.lock. This prevents developers from having to manually ensure dependencies and their versions are in sync between Cargo and Bazel.

Now with this PR, Bazel's also has it own lockfile for sd-agent's crates. This lockfile is updated when setting the CARGO_BAZEL_REPIN=1 env variable and running a bazel command related to sd-agent. One can typically update the lockfile by running:

CARGO_BAZEL_REPIN=1 bazel fetch //pkg/discovery/module/rust:sd-agent

Motivation

Remove manual syncing of dependencies between Cargo and Bazel.

Describe how you validated your changes

Additional Notes

agent-platform-auto-pr · 2026-01-21T14:16:29Z

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 5ef7a94
📊 Static Quality Gates Dashboard

31 successful checks with minimal change (< 2 KiB)

	Quality gate	Current Size
✅	agent_deb_amd64	704.857 MiB
✅	agent_deb_amd64_fips	700.150 MiB
✅	agent_heroku_amd64	327.140 MiB
✅	agent_msi	572.011 MiB
✅	agent_rpm_amd64	704.843 MiB
✅	agent_rpm_amd64_fips	700.136 MiB
✅	agent_rpm_arm64	686.178 MiB
✅	agent_rpm_arm64_fips	682.331 MiB
✅	agent_suse_amd64	704.843 MiB
✅	agent_suse_amd64_fips	700.136 MiB
✅	agent_suse_arm64	686.178 MiB
✅	agent_suse_arm64_fips	682.331 MiB
✅	docker_agent_amd64	766.984 MiB
✅	docker_agent_arm64	772.949 MiB
✅	docker_agent_jmx_amd64	957.863 MiB
✅	docker_agent_jmx_arm64	952.547 MiB
✅	docker_cluster_agent_amd64	180.944 MiB
✅	docker_cluster_agent_arm64	196.790 MiB
✅	docker_cws_instrumentation_amd64	7.135 MiB
✅	docker_cws_instrumentation_arm64	6.689 MiB
✅	docker_dogstatsd_amd64	38.858 MiB
✅	docker_dogstatsd_arm64	37.127 MiB
✅	dogstatsd_deb_amd64	30.077 MiB
✅	dogstatsd_deb_arm64	28.222 MiB
✅	dogstatsd_rpm_amd64	30.077 MiB
✅	dogstatsd_suse_amd64	30.077 MiB
✅	iot_agent_deb_amd64	43.162 MiB
✅	iot_agent_deb_arm64	40.268 MiB
✅	iot_agent_deb_armhf	40.857 MiB
✅	iot_agent_rpm_amd64	43.163 MiB
✅	iot_agent_suse_amd64	43.163 MiB

On-wire sizes (compressed)

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	+12.04 KiB (0.01% increase)	173.365 → 173.376 → 174.490
✅	agent_deb_amd64_fips	+10.27 KiB (0.01% increase)	172.335 → 172.345 → 173.750
✅	agent_heroku_amd64	neutral	87.159 MiB
✅	agent_msi	-8.0 KiB (0.01% reduction)	143.047 → 143.039 → 143.270
✅	agent_rpm_amd64	+23.43 KiB (0.01% increase)	176.387 → 176.410 → 177.660
✅	agent_rpm_amd64_fips	+13.88 KiB (0.01% increase)	174.868 → 174.881 → 176.600
✅	agent_rpm_arm64	+51.49 KiB (0.03% increase)	159.411 → 159.461 → 161.260
✅	agent_rpm_arm64_fips	+46.06 KiB (0.03% increase)	158.902 → 158.947 → 160.550
✅	agent_suse_amd64	+23.43 KiB (0.01% increase)	176.387 → 176.410 → 177.660
✅	agent_suse_amd64_fips	+13.88 KiB (0.01% increase)	174.868 → 174.881 → 176.600
✅	agent_suse_arm64	+51.49 KiB (0.03% increase)	159.411 → 159.461 → 161.260
✅	agent_suse_arm64_fips	+46.06 KiB (0.03% increase)	158.902 → 158.947 → 160.550
✅	docker_agent_amd64	neutral	260.951 MiB
✅	docker_agent_arm64	-11.67 KiB (0.00% reduction)	249.962 → 249.950 → 252.630
✅	docker_agent_jmx_amd64	+12.67 KiB (0.00% increase)	329.576 → 329.589 → 331.080
✅	docker_agent_jmx_arm64	-10.6 KiB (0.00% reduction)	314.587 → 314.577 → 317.270
✅	docker_cluster_agent_amd64	neutral	63.942 MiB
✅	docker_cluster_agent_arm64	neutral	60.203 MiB
✅	docker_cws_instrumentation_amd64	neutral	2.994 MiB
✅	docker_cws_instrumentation_arm64	neutral	2.726 MiB
✅	docker_dogstatsd_amd64	neutral	15.042 MiB
✅	docker_dogstatsd_arm64	neutral	14.364 MiB
✅	dogstatsd_deb_amd64	neutral	7.957 MiB
✅	dogstatsd_deb_arm64	neutral	6.830 MiB
✅	dogstatsd_rpm_amd64	neutral	7.967 MiB
✅	dogstatsd_suse_amd64	neutral	7.967 MiB
✅	iot_agent_deb_amd64	-2.15 KiB (0.02% reduction)	11.310 → 11.308 → 12.040
✅	iot_agent_deb_arm64	neutral	9.667 MiB
✅	iot_agent_deb_armhf	neutral	9.864 MiB
✅	iot_agent_rpm_amd64	neutral	11.326 MiB
✅	iot_agent_suse_amd64	neutral	11.326 MiB

cit-pr-commenter-54b7da · 2026-01-21T14:35:05Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 7a3d737e-2cdb-4ed2-9e84-79bd34881c5e

Baseline: 5ef7a94
Comparison: 41e7ee3
Diff

❌ Experiments with retried target crashes

This is a critical error. One or more replicates failed with a non-zero exit code. These replicates may have been retried. See Replicate Execution Details for more information.

quality_gate_idle_all_features

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+0.44	[-2.46, +3.35]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_logs	% cpu utilization	+1.71	[+0.24, +3.18]	1	Logs bounds checks dashboard
➖	quality_gate_metrics_logs	memory utilization	+0.70	[+0.48, +0.91]	1	Logs bounds checks dashboard
➖	otlp_ingest_logs	memory utilization	+0.59	[+0.49, +0.69]	1	Logs
➖	ddot_logs	memory utilization	+0.54	[+0.47, +0.61]	1	Logs
➖	docker_containers_cpu	% cpu utilization	+0.44	[-2.46, +3.35]	1	Logs
➖	file_tree	memory utilization	+0.30	[+0.25, +0.36]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	+0.10	[+0.06, +0.14]	1	Logs bounds checks dashboard
➖	file_to_blackhole_0ms_latency	egress throughput	+0.05	[-0.47, +0.57]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.01	[-0.11, +0.14]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.01	[-0.05, +0.06]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.09, +0.10]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.01	[-0.40, +0.38]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.01	[-0.06, +0.04]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.02	[-0.15, +0.12]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	-0.04	[-0.21, +0.12]	1	Logs
➖	quality_gate_idle	memory utilization	-0.05	[-0.09, -0.00]	1	Logs bounds checks dashboard
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.06	[-0.47, +0.36]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	-0.20	[-0.40, -0.01]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	-0.21	[-0.44, +0.02]	1	Logs
➖	docker_containers_memory	memory utilization	-0.22	[-0.30, -0.14]	1	Logs
➖	otlp_ingest_metrics	memory utilization	-0.54	[-0.69, -0.39]	1	Logs
➖	ddot_metrics	memory utilization	-0.79	[-1.00, -0.57]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-1.76	[-1.83, -1.68]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	docker_containers_cpu	simple_check_run	10/10
✅	docker_containers_memory	memory_usage	10/10
✅	docker_containers_memory	simple_check_run	10/10
✅	file_to_blackhole_0ms_latency	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency	lost_bytes	10/10
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10
✅	file_to_blackhole_100ms_latency	lost_bytes	10/10
✅	file_to_blackhole_100ms_latency	memory_usage	10/10
✅	file_to_blackhole_500ms_latency	lost_bytes	10/10
✅	file_to_blackhole_500ms_latency	memory_usage	10/10
✅	quality_gate_idle	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_logs	lost_bytes	10/10	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	lost_bytes	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

Replicate Execution Details

We run multiple replicates for each experiment/variant. However, we allow replicates to be automatically retried if there are any failures, up to 8 times, at which point the replicate is marked dead and we are unable to run analysis for the entire experiment. We call each of these attempts at running replicates a replicate execution. This section lists all replicate executions that failed due to the target crashing or being oom killed.

Note: In the below tables we bucket failures by experiment, variant, and failure type. For each of these buckets we list out the replicate indexes that failed with an annotation signifying how many times said replicate failed with the given failure mode. In the below example the baseline variant of the experiment named experiment_with_failures had two replicates that failed by oom kills. Replicate 0, which failed 8 executions, and replicate 1 which failed 6 executions, all with the same failure mode.

Experiment	Variant	Replicates	Failure	Logs	Debug Dashboard
experiment_with_failures	baseline	0 (x8) 1 (x6)	Oom killed		Debug Dashboard

The debug dashboard links will take you to a debugging dashboard specifically designed to investigate replicate execution failures.

❌ Retried Normal Replicate Execution Failures (non-profiling)

Experiment	Variant	Replicates	Failure	Debug Dashboard
quality_gate_idle_all_features	comparison	1	Oom killed	Debug Dashboard

❌ Retried Profiling Replicate Execution Failures (target internal profiling)

Note: Profiling replicas may still be executing. See the debug dashboard for up to date status.

Experiment	Variant	Replicates	Failure	Debug Dashboard
quality_gate_idle_all_features	baseline	11 (x4)	Oom killed	Debug Dashboard
quality_gate_idle_all_features	comparison	11 (x4)	Oom killed	Debug Dashboard

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.

aiuto

I love where this is going w.r.t. reducing noise in the module file. But I have a meta-thought.
It would somewhat desireable to pin the crates for the entire repo rather than just sdagent. That would allow rust code shared between you, QAgent and ADP to interoperate.

Do you have any strong objection to trying it that way now and onboarding new rust code into the global crate lock?

Yumasi · 2026-01-22T14:27:19Z

It would somewhat desireable to pin the crates for the entire repo rather than just sdagent. That would allow rust code shared between you, QAgent and ADP to interoperate.

Do you have any strong objection to trying it that way now and onboarding new rust code into the global crate lock?

I am all for unifying and simplifying things when possible, but I have a few concerns:

For now, we are the only Rust component whose source code lives in the agent repo. I've not heard of ADP or others components wanting to move their code into it in the short/mid term. We also don't use or depend on any of their code. We do have some common dependencies though, and I can see how it looks desirable to put them in common if they decide to move their code to this repo.

One issue I can see is about the tooling we use for licenses compliance:

We use dd-rust-license-tool to generate the LICENSE-3rdparty.csv file, which requires Cargo.toml and Cargo.lock.
We also use cargo deny to make sure we don't bring dependencies with licenses we don't want, as well as check for known vulnerabilities.
Bazel's from_cargo allows for multiple Cargo.toml files but only a single cargo.lock to create a crate repository, so we can't have it take into account the lockfiles of multiple components.

We would need to rely solely on Bazel's Cargo.Bazel.lock like added in this PR, but we need to have the tooling equivalent to dd-rust-license-tool and cargo deny for it first. I do think we should work on removing Cargo in the future to only leave Bazel.

There is however a bigger issue, which is that sd-agent aims to be as lightweight as possible. It is its reason to be and why we chose to use Rust instead of Go. To achieve this we only enable the specific crate features that we require from each dependency. In Bazel's crate repository model, features are specified at the repository level and shared by all consumers. This means if other components need features we don't use, those features get compiled into our binary regardless, even with fat LTO.

If we want to avoid this bloat, we will need a separate Bazel crate repo for those crates (8 are concerned right now) to have only the features we require. We could however have all the other crates for which we use default features in a global crate repo that would be shared between other future Rust components. But any crate where we need specific feature requirements will need to be kept separate.

lovasoa

nice to see the famous crate.from_cargo ! should we update the ci to ensure the cargo and bazel lockfiles don't get out of sync ?

Yumasi · 2026-01-23T10:35:24Z

nice to see the famous crate.from_cargo ! should we update the ci to ensure the cargo and bazel lockfiles don't get out of sync ?

Yep, I am working on a followup PR to add lint jobs for this and for the 3rd party license file generation. :)

aiuto · 2026-01-28T03:13:47Z

Thanks for the explanations. I understand your needs better now.
Let's do this and fix it incrementally as we learn more.

github-actions bot added component/system-probe long review PR is complex, plan time to review it team/agent-build team/agent-discovery labels Jan 21, 2026

sd-agent: use Cargo for crates versions

41e7ee3

Yumasi force-pushed the guillaume.pagnoux/sd-agent-crates-from-cargo branch from fc261f5 to 41e7ee3 Compare January 21, 2026 15:07

Yumasi added changelog/no-changelog qa/no-code-change No code change in Agent code requiring validation ask-review Ask required teams to review this PR labels Jan 21, 2026

Yumasi marked this pull request as ready for review January 21, 2026 15:41

Yumasi requested review from a team as code owners January 21, 2026 15:41

vitkyrka approved these changes Jan 21, 2026

View reviewed changes

aiuto reviewed Jan 21, 2026

View reviewed changes

lovasoa reviewed Jan 23, 2026

View reviewed changes

Yumasi requested a review from aiuto January 26, 2026 08:57

aiuto approved these changes Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sd-agent: bazel: use Cargo for crates versions #45318

sd-agent: bazel: use Cargo for crates versions #45318

Yumasi commented Jan 21, 2026

Uh oh!

agent-platform-auto-pr bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

cit-pr-commenter-54b7da bot commented Jan 21, 2026 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Replicate Execution Details

❌ Retried Normal Replicate Execution Failures (non-profiling)

❌ Retried Profiling Replicate Execution Failures (target internal profiling)

Uh oh!

aiuto left a comment

Uh oh!

Yumasi commented Jan 22, 2026 •

edited

Loading

Uh oh!

lovasoa left a comment

Uh oh!

Yumasi commented Jan 23, 2026

Uh oh!

aiuto commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sd-agent: bazel: use Cargo for crates versions #45318

Are you sure you want to change the base?

sd-agent: bazel: use Cargo for crates versions #45318

Conversation

Yumasi commented Jan 21, 2026

What does this PR do?

Motivation

Describe how you validated your changes

Additional Notes

Uh oh!

agent-platform-auto-pr bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static quality checks

Uh oh!

cit-pr-commenter-54b7da bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

❌ Experiments with retried target crashes

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Replicate Execution Details

❌ Retried Normal Replicate Execution Failures (non-profiling)

❌ Retried Profiling Replicate Execution Failures (target internal profiling)

CI Pass/Fail Decision

Uh oh!

aiuto left a comment

Choose a reason for hiding this comment

Uh oh!

Yumasi commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lovasoa left a comment

Choose a reason for hiding this comment

Uh oh!

Yumasi commented Jan 23, 2026

Uh oh!

aiuto commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

agent-platform-auto-pr bot commented Jan 21, 2026 •

edited

Loading

cit-pr-commenter-54b7da bot commented Jan 21, 2026 •

edited

Loading

Yumasi commented Jan 22, 2026 •

edited

Loading