-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache DocBlocks #608
base: master
Are you sure you want to change the base?
Cache DocBlocks #608
Conversation
Codecov Report
@@ Coverage Diff @@
## master #608 +/- ##
============================================
+ Coverage 81.63% 81.67% +0.04%
- Complexity 910 914 +4
============================================
Files 61 62 +1
Lines 2075 2085 +10
============================================
+ Hits 1694 1703 +9
- Misses 381 382 +1
|
public function getDocBlock(Node $node) | ||
{ | ||
$cacheKey = $node->getStart() . ':' . $node->getUri(); | ||
if (array_key_exists($cacheKey, $this->cache)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is a rather hot path, so using
If(isset(...) || array_key_exists(...))
Could speed it up further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I could try how that affects the performance, but I'm not too optimistic. I think that forming the cache key dominates the performance impact of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why only isset
is not sufficient? $cache
doesn't have a type declaration unfortunately but I assume it would not contain null
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has null
s at least for fields where there is no comment text, but the cost for those fields is probably not big, since the comment is never parsed. But there was the following comment in the code, which is the primary reason I added this - I have not verified if that is still the case though:
try {
// create() throws when it thinks the doc comment has invalid fields.
// For example, a @see tag that is followed by something that doesn't look like a valid fqsen will throw.
return $this->docBlockFactory->create($text, $context);
} catch (\InvalidArgumentException $e) {
return null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use a NullObject instead of null in this cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the NullObject
pattern - so just define class NullObject {}
and store those in the array, without NullObject
s ever leaving the cache class? I'm not sure if that would make the code clearer here, since there would have to be a check for instanceof NullObject
when reading the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NullObject would only use unneeded memory and put pressure on the GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with
class NullObject {
public static function getInstance(): self {
static $instance;
return $instance ?? ($instance = new self);
}
}
However, I'm still not convinced this would make the code clearer, especially since nullness only needs to be checked in one place.
return $this->cache[$cacheKey]; | ||
} | ||
$text = $node->getDocCommentText(); | ||
return $this->cache[$cacheKey] = $text === null ? null : $this->createDocBlockFromNodeAndText($node, $text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to read, could you set parenthesis?
/** | ||
* Maps file + node start positions to DocBlocks. | ||
*/ | ||
private $cache = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add an @var
annotation
@@ -346,6 +310,11 @@ public function resolveReferenceNodeToFqn(Node $node) | |||
return null; | |||
} | |||
|
|||
public function clearCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docblock.
@@ -141,6 +141,10 @@ public function __construct(ProtocolReader $reader, ProtocolWriter $writer) | |||
$e | |||
); | |||
} | |||
|
|||
// When a request is processed, clear the caches of definition resolver as not to leak memory. | |||
$this->definitionResolver->clearCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the cache needs to be cleared after every single protocol message. This is a very hot code path that is even hit in the middle of indexing. Could you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to
- clear the cache when some user action causes the cache to be filled to avoid memory leaks, and
- not have to worry about invalidating the cache on user edits.
You're right, the code does not fulfill that intent. I also did not realize that it could be hit in the middle of indexing - I imagine that this might cause some weird behavior if user edit is performed on a file that is being parsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't happen while a file is parsing, but whenever control is yielded back to the event loop (definitely between indexing files) or getting file contents.
The problem I have is that a cache is not useful if it is constantly cleared when it doesn't have to. For example, moving the mouse over a document can trigger thousands of hover requests within milliseconds. All of these would cause the whole cache to be cleared every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - it currently only speeds things up when the same docblock is parsed multiple times on the same request. If it's moved into PhpDocument
, this issue could also be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixfbecker - I started to move this to PhpDocument
and found that it was rather difficult to pass PhpDocument
to all functions requiring access to docblocks.
Should I
- Pass
PhpDocument
as argument individually to allDefinitionResolver
functions requiring it, even if it requires adding the same argument to many places. - Have a per-document
DefinitionResolver
, containing a reference toPhpDocument
- Have
PhpDocumentFactory
inDefinitionResolver
, access the document through the URI of a node. This would require some changes though, sincePhpDocument
parses in constructor, so it would not be accessible viaPhpDocumentFactory
during the initial parse. - Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe create a dedicated class for getting docblocks that caches them? The only constraint is that the memory needs to be disposed when a PhpDocument is disposed.
class CachingDocBlockFactory | ||
{ | ||
/** | ||
* Maps file + node start positions to DocBlocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the pre-tolerant parser version I used to save the docblock instance on the Node itself so they would get naturally get cached and garbage collected when the document gets garbage collected. The CachingDocBlockFactory
instance is saved in the DefinitionResolver, which is a long-living singleton-like object. Why not save this map on PhpDocument
instances? That way it is not a cache that needs to be cleared on seemingly unrelated events like new messages, it is simply a memoization store to remember lazily computed docblocks. It might actually be just as fast to eagerly compute them all because we need to do that anyway to get the hover contents for the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense. Perhaps the name resolution cache from the Scope thing could also live in PhpDocument
then?
Is there something I can do to help getting this into master? |
any updates? |
@felixfbecker I wonder whether you still have enough power (given the circumstances) to maintain this project? If so, I'd maybe invest some time in this in the following weeks. I'm happy to hearing back from you. :) |
This PR adds caching for DocBlocks in
DefinitionResolver
. The cache is cleared after each file is parsed and after each server operation to avoid memory leaks. This makes indexing ~17% faster on my machine.Performance.php before (total 76 sec):
Performance.php after (total 63 sec):