Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] failures #15631

Closed
wants to merge 9 commits into from
Closed

[test] failures #15631

wants to merge 9 commits into from

Conversation

danielxiangzl
Copy link
Contributor

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@danielxiangzl danielxiangzl added the CICD:build-failpoints-images Build failpoints docker image label Dec 18, 2024
Copy link

trunk-io bot commented Dec 18, 2024

⏱️ 3h 36m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
test-target-determinator 35m 🟩🟩🟩🟩🟩 (+5 more)
adhoc-forge-test / forge 29m 🟥
adhoc-forge-test / forge 24m
rust-cargo-deny 17m 🟩🟩🟩🟩🟩 (+5 more)
check-dynamic-deps 16m 🟩🟩🟩🟩🟩 (+6 more)
rust-move-tests 9m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥
rust-move-tests 7m 🟥

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
check-dynamic-deps 3m 1m +146%

settingsfeedbackdocs ⋅ learn more about trunk.io

)
.await
.unwrap();
panic!("test_fault_tolerance_of_leader_equivocation");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic!() statement causes the test to fail before it can validate the leader equivocation behavior. Consider removing it to allow the test to complete and verify the fault tolerance mechanisms are working as expected.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +22 to +47
async fn test(
&self,
swarm: Arc<tokio::sync::RwLock<Box<dyn Swarm>>>,
_report: &mut TestReport,
duration: Duration,
) -> Result<()> {
let validators = { swarm.read().await.get_validator_clients_with_names() };
// 10 vals, test 1,2,3 failures
let num_bad_leaders = 3;
for (name, validator) in validators[..num_bad_leaders].iter() {
validator
.set_failpoint(
"consensus::leader_equivocation".to_string(),
"return".to_string(),
)
.await
.map_err(|e| {
anyhow!(
"set_failpoint to set consensus leader equivocation on {} failed, {:?}",
name,
e
)
})?;
};
Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test currently sets up the failpoints but returns immediately without running for the specified test duration. Consider adding tokio::time::sleep(duration).await before returning to ensure the test runs for the full duration after the failpoints are configured. This will provide more realistic test coverage of how the system behaves under sustained equivocation conditions.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 22 to 47
async fn test(
&self,
swarm: Arc<tokio::sync::RwLock<Box<dyn Swarm>>>,
_report: &mut TestReport,
duration: Duration,
) -> Result<()> {
let validators = { swarm.read().await.get_validator_clients_with_names() };
// 10 vals, test 1,2,3 failures
let num_bad_leaders = 1;
for (name, validator) in validators[..num_bad_leaders].iter() {
validator
.set_failpoint(
"consensus::leader_equivocation".to_string(),
"return".to_string(),
)
.await
.map_err(|e| {
anyhow!(
"set_failpoint to set consensus leader equivocation on {} failed, {:?}",
name,
e
)
})?;
};
Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test currently returns immediately after setting the failpoint, without waiting for the specified duration. This means the test may complete before the system has had time to exhibit the failure behavior being tested. Consider adding tokio::time::sleep(duration).await before returning to ensure the test runs for the full duration and properly exercises the equivocation scenario.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-failpoints-images Build failpoints docker image
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant