Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster ID generation #3818

Merged
merged 36 commits into from
Nov 11, 2024
Merged

Faster ID generation #3818

merged 36 commits into from
Nov 11, 2024

Conversation

lbloder
Copy link
Collaborator

@lbloder lbloder commented Oct 23, 2024

📜 Description

As proposed in #3809, this PR implements a faster way to generate IDs used in Sentry.
In addition to the proposal two methods were added to directly generate an ID String without the need to call two methods:

  • SentryUUID.generateSentryId()
  • SentryUUID.generateSpanId()

SentryUUID.generateSpanId() only generates the first part of a UUID as before we would create a full uuid and then only use the first 16 out of 32 characters. Therefore we only need to create 8 random bytes instead of 16.

💡 Motivation and Context

Implements #3809

💚 How did you test it?

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

lbloder and others added 19 commits October 7, 2024 10:52
…e difference of .equals if objects are not the same
Copy link
Contributor

github-actions bot commented Oct 23, 2024

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 5c2c580


/**
* Utility class for faster id generation for SentryId, SpanId, and unique filenames uses throughout
* the SDK It uses our vendored Random class instead of SecureRandom for improved performance It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it doesn't 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet :)
Sorry, wanted to add a comment here in the PR, that we need to first merge the main branch to 8.x.x and then to this one

package io.sentry;

import java.util.Arrays;
import java.util.Random;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import java.util.Random;
import io.sentry.util.Random;

* @throws IllegalArgumentException if the given character sequence does not conform to the string
* representation as described in {@link UUID#toString()}
*/
public static UUID parseUUID(final CharSequence uuidSequence) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we even need this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume no, as long as we use String values in the SDK.

Comment on lines 279 to 282
randomBytes[6] &= 0x0f; /* clear version */
randomBytes[6] |= 0x40; /* set to version 4 */
randomBytes[8] &= 0x3f; /* clear variant */
randomBytes[8] |= (byte) 0x80; /* set to IETF variant */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is taken from java.uitl.UUID.
Do we need this part? Do we want/need to add those flags to our random ids?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to clarify internally. Will update here when I get an answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep it in. Also see https://develop.sentry.dev/sdk/data-model/event-payloads/#required-attributes

There's some validations for IDs, e.g.:

Let's verify those are compatible.

It's expecting a 16/32 char hexstring that isn't all zeroes. To avoid this even in case random generates an all 0 ID, we should keep the version / variant bits set.

* @param uuid the UUID to represent as a string
* @return a string representation of the given UUID
*/
public static String toString(final UUID uuid) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns an acutal UUID String including -. Do we need this for our purposes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to replace all UUID usages that have - with simply using the String value without - as it should mostly be for our own IDs and some unique paths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, will remove this method then.

@lbloder
Copy link
Collaborator Author

lbloder commented Oct 24, 2024

In addition to the proposal in #3809, which focused on the generation of a 32 char SentryId, this PR also aims to further improve the generation of SpanId which is only 16 char in length.
Previously a SpanId was generated like this UUID.randomUUID().toString().replace("-", "").substring(0, 16).

Therefore there is even more room for optimization here, as we do not actually need to generate a full UUID, e.g. 16 random bytes, but only 8 random bytes as only half of the resulting String is being used. Also, when generating only the needed 8 bytes the substring part can also be eliminated.

I did some benchmarks on this with the same parameters as the benchmarks in #3809 comparing the original way of generating a SpanId with the changes of this PR:

Java 17

Benchmark Score   Error    
generateSpanIdSecureRandomToStringReplaceAndSubstring16 2357463,435 ± 11044,652 ops/s 100%
generateSpanIdRandomFastUUIDString 28488104,485 ± 236823,660 ops/s 1208%

Java 8

Benchmark Score   Error    
generateSpanIdSecureRandomToStringReplaceAndSubstring16 1017070,653 ± 5762,135 ops/s 100%
generateSpanIdRandomFastUUIDString 28461132,111 ± 122920,961 ops/s 2798%

* @param uuid the UUID to represent as a string
* @return a string representation of the given UUID
*/
public static String toString(final UUID uuid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to replace all UUID usages that have - with simply using the String value without - as it should mostly be for our own IDs and some unique paths.

Comment on lines 279 to 282
randomBytes[6] &= 0x0f; /* clear version */
randomBytes[6] |= 0x40; /* set to version 4 */
randomBytes[8] &= 0x3f; /* clear variant */
randomBytes[8] |= (byte) 0x80; /* set to IETF variant */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to clarify internally. Will update here when I get an answer.

Comment on lines 279 to 282
randomBytes[6] &= 0x0f; /* clear version */
randomBytes[6] |= 0x40; /* set to version 4 */
randomBytes[8] &= 0x3f; /* clear variant */
randomBytes[8] |= (byte) 0x80; /* set to IETF variant */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep it in. Also see https://develop.sentry.dev/sdk/data-model/event-payloads/#required-attributes

There's some validations for IDs, e.g.:

Let's verify those are compatible.

It's expecting a 16/32 char hexstring that isn't all zeroes. To avoid this even in case random generates an all 0 ID, we should keep the version / variant bits set.

}

private static class Holder {
static final Random numberGenerator = new Random();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.oracle.com/javase/7/docs/api/java/util/Random.html there might be poor performance when using Random from multiple threads.

Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs.

Should we vendor ThreadLocalRandom as well and use that?

Did you also do (performance) testing in a multi threaded environment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. No, didn't do multithreaded perf testing. could try to spin up something to compare Random to ThreadLocalRandom

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does make quite a difference. Below is a comparison for UUID generation:

Benchmark Score   Error Units Threads Comparison
uuidGenerationRandom 23789834,814 ± 117697,18 ops/s 1 100%
uuidGenerationRandomMultiThreaded 2280766,845 ± 263522,36 ops/s 10 10%
uuidGenerationThreadLocalRandomMultiThreaded 185639075,638 ± 5090316,9 ops/s 10 780%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whew, what's the baseline we're comparing to here? are all of those using the FastUUID code just with a different Random implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code from FastUUID is only responsible for converting the UUID to string. This was not done in this test.

I tested the SentryUUID.randomUUID() function, which is a clone of the java.util.UUID.randomUUID() function with java.util.Random as the random generator.

So, Results 1 and 2 test the above mentioned function in single-threaded vs multi-threaded scenarios.

Result 3 tests the same function with java.util.Random replaced with java.util.concurrent.ThreadLocalRandom. I replaced Random random = SentryUUID.Holder.numberGenerator; with Random random = ThreadLocalRandom.current();

Copy link
Collaborator Author

@lbloder lbloder Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the benchmark code can now be found here: https://github.com/lbloder/sentry-uuid-bench

* @throws IllegalArgumentException if the given character sequence does not conform to the string
* representation as described in {@link UUID#toString()}
*/
public static UUID parseUUID(final CharSequence uuidSequence) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume no, as long as we use String values in the SDK.

@lbloder lbloder marked this pull request as ready for review October 29, 2024 14:21
Copy link
Contributor

github-actions bot commented Oct 29, 2024

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 449.27 ms 481.96 ms 32.69 ms
Size 1.65 MiB 2.31 MiB 672.61 KiB

Previous results on branch: feat/fast-id-generation

Startup times

Revision Plain With Sentry Diff
d909f73 455.98 ms 472.71 ms 16.74 ms
6449d66 435.18 ms 477.48 ms 42.30 ms
c46ade5 406.26 ms 446.62 ms 40.36 ms
470431c 431.02 ms 469.31 ms 38.29 ms
70c2f67 426.29 ms 460.26 ms 33.97 ms

App size

Revision Plain With Sentry Diff
d909f73 1.65 MiB 2.30 MiB 672.27 KiB
6449d66 1.70 MiB 2.36 MiB 671.75 KiB
c46ade5 1.65 MiB 2.30 MiB 672.27 KiB
470431c 1.70 MiB 2.36 MiB 671.77 KiB
70c2f67 1.70 MiB 2.36 MiB 671.63 KiB

@lbloder lbloder mentioned this pull request Oct 29, 2024
7 tasks
Base automatically changed from feat/lazy-span-id-v8 to 8.x.x October 29, 2024 14:58
An error occurred while trying to automatically change base from feat/lazy-span-id-v8 to 8.x.x October 29, 2024 14:58
@lbloder
Copy link
Collaborator Author

lbloder commented Oct 29, 2024

Did some additional Testing for MultiThreading based on this comment from @adinauer.

The results below show Test runs for:

  1. java.util.Random single-threaded
  2. java.util.random multi-threaded with 10 Threads
  3. java.util.concurrent.ThreadLocalRandom with 10 Threads
  4. A version in which we manage a ThreadLocal<java.util.Random> ourselves

Results

Benchmark Score   Error Units Threads Comparison  
uuidGenerationRandom 23829634,539 ± 101112,796 ops/s 1 100%  
uuidGenerationRandomMultiThreaded 2458588,180 ± 339397,214 ops/s 10 10%  
uuidGenerationThreadLocalRandomMultiThreaded 180884073,702 ± 2482343,414 ops/s 10 759% 157%
uuidGenerationThreadLocalRandomMultiThreadedSelfManaged 114937169,398 ± 1555655,255 ops/s 10 482% 100%

We can see, that ThreadLocalRandom is highly optimized, outpacing our self-managed version by ~50%. However, we still get an almost 5x increase compared to using a single Random instance.

Secure Random

Out of curiosity, I also did a benchmark run to compare UUID.randomUUID in single-threaded and multi-threaded scenarios:

Benchmark Score   Error Units Threads Comparison
uuidGenerationSecureRandom 3229714,852 ± 13916,035 ops/s 1 100%
uuidGenerationSecureRandomMultiThreaded 1567325,134 ± 36688,679 ops/s 10 49%

We can see, that SecureRandom performance is also impacted when running in multiple threads, however, not nearly as much as Random.

# Conflicts:
#	CHANGELOG.md
#	sentry/src/main/java/io/sentry/SpanId.java
#	sentry/src/main/java/io/sentry/protocol/SentryId.java
#	sentry/src/test/java/io/sentry/protocol/SentryIdTest.kt
@adinauer adinauer merged commit 741807a into 8.x.x Nov 11, 2024
30 checks passed
@adinauer adinauer deleted the feat/fast-id-generation branch November 11, 2024 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants