Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus telemetry #1526

Closed
wants to merge 34 commits into from
Closed

Prometheus telemetry #1526

wants to merge 34 commits into from

Conversation

rakanalh
Copy link
Contributor

@rakanalh rakanalh commented Nov 26, 2024

Description

From the referenced issue, the following was the list of metrics to collect:

All

  • Storage size ( I think this can be fed into prometheus outside the scope of citrea)
  • RPC served in last 5 min (Same as this one, nginx?)

Sequencer

  • Current number txs in mempool
  • Mempool inbound tx/s (Can be calculated from the current number)
  • dry_run_transactions execution time
  • produce_block transactions execution time
  • (this one will be hard to track so if we cant no problem) avg. wait time for tx in mempool (Can be calculated i believe)
  • Send sequencer commitment execution time
  • Sequencer commitment number of blocks
  • Current L1 block number used in block production

Full node

  • Sync progress (Not applicable since we have to know sequencer latest L2 vs fullnode or prover current L2)
  • Current L2 block number
  • Current L1 block number
  • Soft confirmation pull rate (important while syncing)
  • Soft confirmaiton process durations
  • L1 scan duration per L1 block

Prover(s)

  • Current L2 block number
  • Current L1 blcok number
  • Cycle count for each proof
  • DA tx mining time

TODO

  • Implement / Export graphana dashboards to display these metrics.

Linked Issues

@rakanalh rakanalh force-pushed the rakanalh/prometheus branch from cf570fb to 3d6a341 Compare December 3, 2024 13:38
@rakanalh rakanalh marked this pull request as ready for review December 9, 2024 13:07
@rakanalh rakanalh force-pushed the rakanalh/prometheus branch from 13a37ac to 800e22e Compare December 9, 2024 16:12
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 81.85053% with 153 lines in your changes missing coverage. Please review.

Project coverage is 76.2%. Comparing base (8535094) to head (9d72d33).
Report is 3 commits behind head on nightly.

Files with missing lines Patch % Lines
crates/common/src/telemetry/mod.rs 0.0% 51 Missing ⚠️
crates/light-client-prover/src/runner.rs 0.0% 23 Missing ⚠️
bin/citrea/src/rollup/mod.rs 66.6% 18 Missing ⚠️
crates/batch-prover/src/runner.rs 51.4% 17 Missing ⚠️
crates/fullnode/src/runner.rs 52.7% 17 Missing ⚠️
crates/sequencer/src/runner.rs 57.5% 17 Missing ⚠️
crates/light-client-prover/src/da_block_handler.rs 0.0% 6 Missing ⚠️
crates/bitcoin-da/src/service.rs 62.5% 3 Missing ⚠️
bin/citrea/src/rollup/bitcoin.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
bin/citrea/src/main.rs 0.0% <ø> (ø)
bin/citrea/src/rollup/mock.rs 80.0% <100.0%> (ø)
crates/batch-prover/src/da_block_handler.rs 78.3% <100.0%> (+0.4%) ⬆️
crates/common/src/config.rs 95.1% <100.0%> (+0.3%) ⬆️
crates/common/src/telemetry/provers.rs 100.0% <100.0%> (ø)
crates/fullnode/src/da_block_handler.rs 77.1% <100.0%> (+0.9%) ⬆️
crates/fullnode/src/telemetry.rs 100.0% <100.0%> (ø)
crates/primitives/src/utils.rs 100.0% <100.0%> (ø)
crates/risc0/src/host.rs 72.2% <100.0%> (+1.6%) ⬆️
crates/sequencer/src/commitment/mod.rs 86.8% <100.0%> (+1.1%) ⬆️
... and 20 more

... and 8 files with indirect coverage changes

@eyusufatik
Copy link
Member

(Cycle count is NOT part of the output, cannot be reported)

in risc0 adapter, here you can get cycle counts from stats. change return types if you wish, but we need this information. probably the most important telemetry both provers can have. also why don't you have any telemetry implemented for light client prover 😅?

        let ProveInfo { receipt, stats } =
            prover.prove_with_opts(env, &elf, &ProverOpts::groth16())?;

Copy link
Member

@ercecan ercecan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM small fixes and couple of suggestions

@rakanalh
Copy link
Contributor Author

(Cycle count is NOT part of the output, cannot be reported)

in risc0 adapter, here you can get cycle counts from stats. change return types if you wish, but we need this information. probably the most important telemetry both provers can have. also why don't you have any telemetry implemented for light client prover 😅?

        let ProveInfo { receipt, stats } =
            prover.prove_with_opts(env, &elf, &ProverOpts::groth16())?;

Added... thanks for the hint.

@rakanalh rakanalh marked this pull request as draft December 10, 2024 11:56
Copy link
Contributor

@yaziciahmet yaziciahmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR. Left some comments as in it's current state. I will re-review once PR is not draft anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make telemetry ports of sequencer, fullnode, and prover for mock and regtest different? it causes problems when running different nodes locally.

@@ -716,6 +716,7 @@ dependencies = [
"bincode",
"borsh",
"hex",
"prometheus-client",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if you can guard this dep behind native feature.

Copy link
Contributor

@yaziciahmet yaziciahmet Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, do we have to spawn up a prometheus container for working locally?

@@ -127,6 +129,9 @@ pub trait DaService: Send + Sync + 'static {
&self,
sequencer_da_pub_key: &[u8],
) -> Vec<SequencerCommitment>;

/// Returns the telemetry targets type for the DA service.
fn telemetry_targets(&self) -> &DaTelemetryTargets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be native only as well

Comment on lines +445 to +449
#[inline]
pub fn duration_to_seconds(d: Duration) -> f64 {
let nanos = f64::from(d.subsec_nanos()) / 1e9;
d.as_secs() as f64 + nanos
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO nanos is a bit too much precision. We can use micros and just do d.as_micros() as u64 with no overflow no floating point whatsoever.

Comment on lines +129 to +132
.get_or_create(&vec![(
"cf_name".to_owned(),
S::COLUMN_FAMILY_NAME.to_owned(),
)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_or_create is used everywhere in sov-schema-db, but we already know the dbs, and column families. can we somehow create these at the startup as well?

@rakanalh
Copy link
Contributor Author

Closing in favor of #1589 1589

@rakanalh rakanalh closed this Dec 11, 2024
@rakanalh rakanalh deleted the rakanalh/prometheus branch December 11, 2024 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiling / metrics for full nodes, provers, sequencer
4 participants