Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] improve graph performance #1391

Closed
wants to merge 9 commits into from

Conversation

pderaaij
Copy link
Collaborator

@pderaaij pderaaij commented Sep 2, 2024

Based on some performance findings found #1375 I started to experiment with some improvements. I've noticed the functionlistByIdentifier being called alot by other functions. In this function, there is a foreach loop running over all resources. A function which works fine for small notebooks, but large notebooks seems to be hit exponentionally.

In this concept I've added a new state object that lists all resources for an identifier. In this case, I choose the identifier to be the file-name of the note. Using that identifier, I list all resource paths with that same identifier. Using this I can remove the foreach loop and just use the identifier to list a small set of resources.

For the 10000 linked notes folder from this repo (https://github.com/rcvd/interconnected-markdown/tree/main/Markdown) this reduces the startup time to:

Foam Logging: info
[info - 21:39:06] Starting Foam
[info - 21:39:07] Loading from directories:
[info - 21:39:07] - /Users/paulderaaij/projects/interconnected-markdown/Markdown/10000s/10000
[info - 21:39:07]   Include: **/*
[info - 21:39:07]   Exclude: **/.foam/**,**/.vscode/**/*,**/_layouts/**/*,**/_site/**/*,**/node_modules/**/*,**/.git,**/.svn,**/.hg,**/CVS,**/.DS_Store,**/Thumbs.db
[info - 21:39:15] Workspace loaded in 7755ms
[info - 21:39:15] Graph loaded in 660ms
[info - 21:39:15] Tags loaded in 4ms
[info - 21:39:15] Loaded 10001 resources

Using the current master the numbers are:

Foam Logging: info
[info - 21:46:53] Starting Foam
[info - 21:46:53] Loading from directories:
[info - 21:46:53] - /Users/paulderaaij/projects/interconnected-markdown/Markdown/10000s/10000
[info - 21:46:53]   Include: **/*
[info - 21:46:53]   Exclude: **/.foam/**,**/.vscode/**/*,**/_layouts/**/*,**/_site/**/*,**/node_modules/**/*,**/.git,**/.svn,**/.hg,**/CVS,**/.DS_Store,**/Thumbs.db
[info - 21:47:01] Workspace loaded in 8147ms
[info - 21:47:27] Graph loaded in 25513ms
[info - 21:47:27] Tags loaded in 2ms
[info - 21:47:27] Loaded 10001 resources

A delta of almost 25 seconds for loading the graph. Not only a performance improvement for loading the workspace, but also for some other operations that uses listByIdentifier directly or via find. This should make several UI components work better and more responsive.

The downside is that it adds a new stateful object, but I don't see an other way to increase the performance of the foreach loop.

@riccardoferretti curious to your thoughts on this.

}
}

if (needle === '/page b.md') {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to remove this ;-)

@pderaaij pderaaij changed the title [WIP] Add state for identifier to reduce foreach [WIP] improve graph performance Sep 2, 2024
@riccardoferretti
Copy link
Collaborator

This is amazing @pderaaij ! I am back from holidays (you know, italy + august = holiday) so will look a bit more into the PR this week

@riccardoferretti
Copy link
Collaborator

Sharing some high level thoughts here.
As I was doing some research in the approach, I came across the trie data structure, which fundamentally feels like what we are kinda implementing here.
As I looked into it, I realized that using the reversed path (similar to what we do in getShortestIdentifier) and saving in in a trie would be the cleanest solution, as it would out of the box replace the existing indexing while giving us the features and performance we want.

What do you think? Again, this was at a cursory look, I might be missing something

@pderaaij
Copy link
Collaborator Author

pderaaij commented Sep 6, 2024

Never really looked into the trie datastructure. Will do some research and fiddle around!

@pderaaij
Copy link
Collaborator Author

I need a bit more time to figure out the right approach. Have been looking into tie data structures, but most implementation I found are immutable. A characteristic we don't want for our use case here. So, I need to find an implementation that does allow mutations. Did a quick checkout of https://rimbu.org, but couldn't find a right data structure off the bat for our use case. So, I require some more time to look around and find a solution. Ideas welcome of course.

@pderaaij
Copy link
Collaborator Author

I got a long way tonight using https://yomguithereal.github.io/mnemonist/trie-map. Need to finish some things and get the last use cases working. Will do a new performance test when things are working. Hopefully I can finish it later this week.

@pderaaij
Copy link
Collaborator Author

@riccardoferretti I've updated the implementation with my WIP so far.

It has a little impact on performance. Compared to the earlier version, I now load the 10000s workspace in 2 to 3 seconds. Which still is a massive improvement compared to the current implementation. I do feel the performance impact is acceptable given the ease of maintenance this implementation brings in the code.

I have one remaining issue and I hope you can help. Originally, a Map holds the insertion order. TrieMap obviously does not do that. But, this means that resources() needs to return a sorted Iterator. I can't find a correct way to sort the values and then return an IteratableIterator. Perhaps you have an idea here?

If we can resolve this issue, all tests will be green so we can do some further testing and enhancing before shipping it.

@pderaaij
Copy link
Collaborator Author

pderaaij commented Oct 9, 2024

@riccardoferretti it seems I was on the right track, but simply not sharp after a long day of work. Got it working now. Please review and check accordingly.

Copy link
Collaborator

@riccardoferretti riccardoferretti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is looking like the right approach, I left a few comments. Let me know what you think

packages/foam-vscode/src/core/model/workspace.ts Outdated Show resolved Hide resolved
packages/foam-vscode/src/core/model/workspace.ts Outdated Show resolved Hide resolved
packages/foam-vscode/src/core/model/workspace.ts Outdated Show resolved Hide resolved
this._resources.values()
).sort(Resource.sortByPath);

return resources.values();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the .values() again on this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems so. The only way I can see to return an IterableIterator

@@ -70,17 +79,24 @@ export class FoamWorkspace implements IDisposable {
}

public listByIdentifier(identifier: string): Resource[] {
const needle = normalize('/' + identifier);
let needle = this.getReversedIdentifier(identifier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you moved the addition of \ from here into the getReverseIdentifier fn. I wonder if there is any side effect from doing that?

Tbh, I am kinda relying on the tests passing here, I find this code (the existing one, that I wrote :) ) hard to properly process, I think at some point I will need a round to make it clearer for my future self and others :)\

I think fundamentally the semantics we want here is that resources can be found by either the identifier as is and by the identifier + default extension. Maybe we save both keys in the Trie.

In any case, just sharing some thoughts, I am ok with moving forward with the PR before sorting this out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I haven't found any side effect. I guess it was done originally to always have a start of a path.

I agree with optimising the logic on a next round. Also to solve #1399.

packages/foam-vscode/src/core/model/workspace.ts Outdated Show resolved Hide resolved
packages/foam-vscode/src/core/model/workspace.ts Outdated Show resolved Hide resolved
packages/foam-vscode/src/core/model/workspace.ts Outdated Show resolved Hide resolved
@pderaaij pderaaij force-pushed the performance-upgrade branch from 87807e9 to 958a03c Compare October 11, 2024 17:53
@pderaaij
Copy link
Collaborator Author

@riccardoferretti somehow I managed to break the build in CI. Locally all works, but can't get it going on Actions. Could you have a look?

@riccardoferretti
Copy link
Collaborator

@pderaaij I think it's related to the markdown-it version change

@pderaaij
Copy link
Collaborator Author

Yeah, I was hoping that would fix it. Had the problems after rebasing the lock files. Which was a hassle on its own....

@pderaaij
Copy link
Collaborator Author

Well... I found the little bugger:

markdown-it/linkify-it#111

@pderaaij
Copy link
Collaborator Author

I am giving up on this PR. Going to do a fresh branch on master and open a new PR. Not trusting all the mess this has created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants