Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-24221 Implement new benchmarks that cover creating a distribution zone and table #5081

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sk0x50
Copy link
Contributor

@sk0x50 sk0x50 commented Jan 20, 2025

https://issues.apache.org/jira/browse/IGNITE-24221

Creating a new distribution zone:

Benchmark (clusterSize) (fsync) (partitionCount) (replicaCount) Mode Cnt Score ± Error Units
createEmptyDistributionZone 3 false 1 1 avgt 5 218.719 ± 0.916 ms/op
createEmptyDistributionZone 3 false 1 3 avgt 5 218.821 ± 0.660 ms/op
createEmptyDistributionZone 3 false 8 1 avgt 5 218.955 ± 1.294 ms/op
createEmptyDistributionZone 3 false 8 3 avgt 5 218.657 ± 1.082 ms/op

Creating a new table in the default distribution zone:

Benchmark (clusterSize) (fsync) (partitionCount) (replicaCount) Mode Cnt Score ± Error Units
createTableInDefaultZone 3 false 1 1 avgt 5 1003.333 ± 15.817 ms/op
createTableInDefaultZone 3 false 1 3 avgt 5 2102.834 ± 716.554 ms/op
createTableInDefaultZone 3 false 8 1 avgt 5 1004.364 ± 2.833 ms/op
createTableInDefaultZone 3 false 8 3 avgt 5 2138.579 ± 882.148 ms/op

Thank you for submitting the pull request.

To streamline the review process of the patch and ensure better code quality
we ask both an author and a reviewer to verify the following:

The Review Checklist

  • Formal criteria: TC status, codestyle, mandatory documentation. Also make sure to complete the following:
    - There is a single JIRA ticket related to the pull request.
    - The web-link to the pull request is attached to the JIRA ticket.
    - The JIRA ticket has the Patch Available state.
    - The description of the JIRA ticket explains WHAT was made, WHY and HOW.
    - The pull request title is treated as the final commit message. The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • Design: new code conforms with the design principles of the components it is added to.
  • Patch quality: patch cannot be split into smaller pieces, its size must be reasonable.
  • Code quality: code is clean and readable, necessary developer documentation is added if needed.
  • Tests code quality: test set covers positive/negative scenarios, happy/edge cases. Tests are effective in terms of execution time and resources.

Notes

@sk0x50 sk0x50 requested review from sanpwc and rpuch January 21, 2025 07:31
@@ -90,25 +90,35 @@ public void nodeSetUp() throws Exception {
startCluster();

try {
var queryEngine = igniteImpl.queryEngine();
// Create a default zone on the cluster's start-up.
createDefaultZoneOnStartup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a clash in terminology. There is a default zone that is created by the cluster on initialization automatically (implicitly). Here, another zone is created explicitly, so it probably should not be named a 'default zone'. How about just createZoneOnStartup()?

var createZoneStatement = "CREATE ZONE IF NOT EXISTS " + ZONE_NAME + " WITH partitions=" + partitionCount()
+ ", replicas=" + replicaCount() + ", storage_profiles ='" + DEFAULT_STORAGE_PROFILE + "'";

getAllFromCursor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we don't create a zone via public SQL API. Does it make sense to ask the guys who wrote this initially, and if there is no good reason for this, to switch to public API usage? It would make it more difficult to break something later accidentally

private int replicaCount;

/** Distribution zones counter. */
private AtomicInteger cnt = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private AtomicInteger cnt = new AtomicInteger();
private final AtomicInteger cnt = new AtomicInteger();

@OutputTimeUnit(MILLISECONDS)
public void createEmptyDistributionZone() {
ZoneDefinition zone = ZoneDefinition.builder("zone_test_" + cnt.incrementAndGet())
.ifNotExists()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Could it mask a programming error if we try to create a zone that already exists?

*/
@Fork(1)
@State(Scope.Benchmark)
public class CreatingDistributionZoneBenchmark extends AbstractMultiNodeBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractMultiNodeBenchmark inits cluster with TestIgnitionManager.init(), which uses test defaults for things like delayDuration, idleSafeTimePropagationInterval, maxClockSkew. Test defaults for these values are ridiculously low; benchmarking with them has some value as it allows to (almost) exclude waits imposed by the schema sync protocol.

But maybe we also need to benchmark with real defaults? There is a crutch: you can pass TestIgnitionManager#PRODUCTION_CLUSTER_CONFIG_STRING as cluster config to instruct TestIgnitionManager#init() to NOT apply test defaults.

Maybe we could have a boolean parameter in the benchmark, like tinySchemaSyncWaits? If it is true, we could keep current behavior; otherwise, we could use production defaults to make schema sync look real.

public void createTableInDefaultZone() {
String tableName = "table_test_" + cnt.incrementAndGet();

createTable(tableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be switched to public API as otherwise it's not clear what we measure here

@Measurement(iterations = 5, time = 5)
@BenchmarkMode(AverageTime)
@OutputTimeUnit(MILLISECONDS)
public void createTableInDefaultZone() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having a boolean parameter that would tell whether a put should be made or not? We would be able to see the gap between creating an empty table and it becoming ready for puts.

*/
@Fork(1)
@State(Scope.Benchmark)
public class CreatingTableBenchmark extends AbstractMultiNodeBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing about test defaults

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants