-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAK-11165 - Cache the Path field in the NodeDocument class. #1758
base: trunk
Are you sure you want to change the base?
Conversation
Are there any concerns about increased memory usage by caching the Path object in memory in every case? I understand it can improve performance on indexing job, but it could increase memory consumption during normal operations of the repository. Have you evaluated this impact? Maybe we could have a way to enable/disable the cache of this object depending on the needs. For example, enable it as part of indexing, but keep it disabled on other parts, at least until we assess any possible impact. |
I have not evaluated the potential increase in memory in other situations, I was hoping someone more knowledgeable in this module would be able to comment. The main place where there could be a increased in memory usage that I can think of is in the NodeDocument cache. If I was hoping that in most cases the lifetime of the Path instance created by calling I'm ok with a way to enabled/disable this feature, could be a static flag set from an env variable with a default value of disabled. And we would enable it only for indexing jobs. But this optimization could benefit other uses of Oak, but I'm not sure how to evaluate it. |
The reason why calling |
I think there is a very high likelihood that |
Compared to the "data" map, memory usage of the path is small. To implement estimateMemoryUsage, the memory usage was estimated once, and the overheads (at that time, with an old JVM) are listed: 112 bytes overhead for just an entry, 64 bytes per map entry. |
Creating a Path instance from a NodeDocument is an expensive operation, so it is worth to cache it. The performance improvements are noticeable in the indexing job, which calls
#getPath
twice in each NodeDocument that it processes.Similar idea as what is done here:
jackrabbit-oak/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/Path.java
Lines 297 to 306 in 35950be