-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(NODE-6450): Lazy objectId hex string cache #722
Conversation
Hi @SeanReece, thank you for submitting this pull request! As of now, we won't explore fully supporting lazy loading hex string cache until the next major release (see NODE-6480) for the following reasons:
For the time being, when |
Thanks for the review @aditi-khare-mongoDB Good point about deep equality. That's the same thing holding my other PR (#707) as well 😅 Being able to use By using a weakMap instead of # to create actual private __id I don't see any performance hit so that's one option if we don't want to bump our target anytime soon. Another option is to provide a I can split the PR to implement the Edit: This is what I'm thinking in terms of using weakMap for private members (if moving to ES2022 is not an option). Performance seems to be good with this implementation. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakMap#emulating_private_members |
Hi again @SeanReece, thanks for sharing your thoughts and replying so promptly! Next week, the team will consider your points, discuss the best path forward, and get back to you. |
Hey @SeanReece, thanks for bearing with us on this PR. Private properties implemented with weakmaps work for us, if you're willing to make those changes. We'll need to figure out a way to test the caching behavior though, the existing tests rely on accessing the TS-private |
@baileympearson great! I don’t mind implementing this. Good point about the tests. I’ll investigate how to best make these changes and make them testable 🤔 |
@baileympearson I've updated the code to hide __id using weakMap implementation. There is a slight performance hit vs direct property access but it's still faster than using # with ES2021 target, and there's still meaningful performance gains vs main. I've added a // TODO to convert this to # private field when target is updated to ES2022. As for the tests, I've added a Let me know if you have any questions or concerns about anything here. Perf testsmain vs lazy-weakMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small change - otherwise LGTM!
@baileympearson Good catch. I've added it back. Any idea when v7 is planned for? |
@SeanReece We don't have a target date for BSON v7 or driver v7. |
Co-authored-by: Bailey Pearson <[email protected]>
Description
When ObjectId.cacheHexString is enabled there is a performance impact on ObjectId creation and the extra memory used by the hex string is consumed even if
.toHexString()
is never called on the ObjectId.By lazily caching the hex string it provides the following benefits:
Possible Negatives:
.toHexString()
invocation. Overall performance is still consistent, but it is no longer front loaded on ObjectId creation.Performance
Custom benchmarks with
cacheHexString=true
new ObjectId(string) + toHexString() test sees a performance improvement as we were performing an unnecessary
.toHex
when string is passed in.Running granular benchmarks with
cacheHexString=true
What is changing?
When cacheHexString=true we are no longer calling
ByteUtils.toHex(buffer)
in the constructor, this was already being checked/performed in thetoHexString()
call. If a string is passed into the constructor we can cache this immediately as there is no performance impact and no extra memory that needs to be allocated.Is there new documentation needed for these changes?
No.
What is the motivation for this change?
We are often pulling many ObjectIds into memory (possibly millions) and need to deduplicate arrays of objectIds. In order to efficiently deduplicate ObjectIds (using Set) we need the string representation of an ObjectId, then when serializing documents to JSON we need the string representation of the ObjectId again. Converting an ObjectId to a hex string is a somewhat expensive operation that we'd like to avoid if necessary.
Enabling
cacheHexString
would improve the efficiency of these specific scenarios, but would lead to a performance impact and increased memory usage across the board for other scenarios that may never need ObjectId string.Implementing lazy hex string caching will allow us to enable
cacheHexString
with zero performance impact, but see performance improvements in some scenarios.Release Highlight
Cache the hex string of an ObjectId lazily
When
ObjectId.cacheHexString
is set totrue
we no longer convert the buffer to a hex string in the constructor, since the cache is already being filled in any call toobjectid.toHexString()
.Additionally, if a string is passed into the constructor we can cache this immediately as there is no performance impact and no extra memory that needs to be allocated.
This improves the performance for situations where you are parsing ObjectIds from a string (ex.
JSON
) and want to avoid recalculating the hex. It also improves situations where you have ObjectIds coming from BSON and only convert some of them strings perhaps after applying some filter to eliminate some.With
cacheHexString
enabled deserializing ObjectIds from BSON shows ~80% performance improvement andtoString
-ing ObjectIds that were constructed from a string convert ~40% faster!Thanks to @SeanReece for contributing this improvement!
Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat(NODE-1234)!: rewriting everything in coffeescript