Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor / performance: Track scope information while traversing #609

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Declspeck
Copy link
Contributor

@Declspeck Declspeck commented Feb 24, 2018

This PR makes the resolution of variable types go forwards instead of backwards. It keeps an active scope. In my opinion, this makes the code more understandable (although the number of lines increased.) Code duplication is removed between completion for variables and resolving them in DefinitionResolver.

Additionally, the scope is used to cache results of getResolvedName, since it is quite slow, at least with the current Tolerant PHP Parser. This cache is invalidated when a namespace declaration is met.

There are no tests for Scope yet, in case there are severe issues with the general approach.

Edit: This increases performance of Performance.php by ~10% on my machine.

@codecov
Copy link

codecov bot commented Feb 24, 2018

Codecov Report

Merging #609 into master will increase coverage by 0.38%.
The diff coverage is 88.31%.

@@             Coverage Diff              @@
##             master     #609      +/-   ##
============================================
+ Coverage     81.63%   82.02%   +0.38%     
+ Complexity      910      882      -28     
============================================
  Files            61       65       +4     
  Lines          2075     2097      +22     
============================================
+ Hits           1694     1720      +26     
+ Misses          381      377       -4
Impacted Files Coverage Δ Complexity Δ
src/SignatureInformationFactory.php 100% <100%> (ø) 10 <6> (ø) ⬇️
src/SignatureHelpProvider.php 98.66% <100%> (ø) 26 <0> (ø) ⬇️
src/Scope/GetScopeAtNode.php 100% <100%> (ø) 0 <0> (?)
src/Scope/Variable.php 100% <100%> (ø) 1 <1> (?)
src/Scope/TreeTraverser.php 100% <100%> (ø) 37 <37> (?)
src/Scope/Scope.php 100% <100%> (ø) 4 <4> (?)
src/CompletionProvider.php 98.4% <100%> (+3.88%) 75 <0> (-33) ⬇️
src/Server/TextDocument.php 75.37% <75%> (ø) 56 <0> (ø) ⬇️
src/DefinitionResolver.php 86.11% <76.97%> (-1.22%) 299 <275> (-31)
src/TreeAnalyzer.php 92.85% <96.66%> (-1.43%) 47 <38> (-6)
... and 5 more

}

if ($node instanceof ClassLike
&& (in_array($childName, ['classMembers', 'interfaceMembers','traitMembers'], true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to strictly compare here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - I thought it would be good form, just like using === instead of == - do you think there's a reason to change it?

Copy link
Contributor

@jens1o jens1o Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow my benchmarks are showing different results:
https://gist.github.com/jens1o/621c1307ee0b9d618839211688d46dba

PS C:\nginx\html\projects\php7.2-playground> php .\in_array_strict_benchmark.php
Took 0.42s with strict compare
Took 0.35s without strict compare
Took 5.47s with strict compare
Took 0.43s without strict compare
Took 0.54s with strict compare
Took 0.47s without strict compare

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your benchmark shows searching for an empty string which does not is the case here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic is it expexcted that strictly searching for any empty string via in_array is slower then when doing it non-strict?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the in_array strict path is slightly less optimized than the non-strict one. Though if you want to micro-optimize this, the way to do it is to use \in_array (with strict), which will transform to an HT lookup. Or even better, just directly implement it as an HT lookup so you're not dependent on optimization behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't actually mean performance-wise but style-wise.

I think it's better because the code does not perform convoluted type casts under the hood. It's not that the code would be more incorrect that way, just that it feels that way - when I'm searching for a bug, I don't have to wonder if this comparison here does something funny.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... although I would have expected the strict version also to be faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that specific benchmark is rather useless, as the second array contains 0 as first element, which is (non-strict) equal to the empty string, so what you see there is the difference between matching on the first element in the array vs scanning through all 10000 elements and not finding anything.

Good point!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that strict comparison is always easier to reason about.



// TODO: Handle use (&$x) when $x is not defined in scope.
// TODO: Handle list(...) = $a;
Copy link
Contributor

@jens1o jens1o Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and PHP7.1+ [...] = $a;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I didn't even know about that construct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added a TODO comment for that and foreach ($a as [...])

@@ -716,6 +603,7 @@ public function resolveExpressionNodeToType($expr)
if ($token === PhpParser\TokenKind::NullReservedWord) {
return new Types\Null_;
}
return new Types\Mixed_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect this case to ever get hit? Or is it for future-proofing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were added accidentally (same goes the Types\Mixed_ below) - I tried various things in the same branch if they improved performance, including bailing early here. It did not really affect performance, but I apparently forgot to revert some changes. I can revert this.

$traverser = new TreeTraverser($definitionResolver);
$resultScope = null;
$traverser->traverse(
$sourceFile,
Copy link
Owner

@felixfbecker felixfbecker Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why you would want to traverse the whole source file from the root if you already have the node and every node is linked to its parent. This seems very inefficient, as you will visit a lot of nodes that you don't care about.

Intuitively, a scope is anything enclosed in by the closest function node, so only that should the nodes inside that boundary should be visited for any needed operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Scope is a misnomer, since it also contains information about other names in effect - in particular the resolved name cache depends on the current namespace. Maybe ParsingContext could be more appropriate.

However, after reading your comment, I now realize that scanning the whole file is not necessary, since the code is not interested in what the current namespace is, only when it changes.

Also, the scope contains $this variable and $currentSelf, these depend on the class. These could be handled separately and start the scanning from the function like you said. It might be more efficient on large files.

class Scope
{
/**
* @var Variable|null "Variable" representing this/self
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or null if the scope is not inside a class

use Microsoft\PhpParser\Node\QualifiedName;

/**
* Contains information about variables at a point.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate what "point" means here? How is the scope boundary defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scope when evaluating the expression at the node being traversed, if it were an expression.

... perhaps? I'm not sure how to express it clearly.

public function getResolvedName(QualifiedName $name)
{
$nameStr = (string)$name;
if (array_key_exists($nameStr, $this->resolvedNameCache)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming there are no null values in the array so isset would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll fix it. I just copy-pasted code from the DocBlock thingie.

/**
* Traversers AST with Scope information.
*/
class TreeTraverser
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser already exposes utilities to traverse the tree (and uses the iterator pattern instead of the visitor pattern, which is superior imo). Given that this is a fair amount of code, could you explain why this is needed? What is the difference between this scope-tree traverser and a general traverser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into the parser code and see if we could use it - my initial guess is that we can't though, since the traversing needs to happen in the following fashion:

Scope = []                      //    Scope 1 - Root
    Scope = [$this]             //    Scope 2 - Class entered 
        Scope = [$this, $a]     //    Scope 3 - Function entered
    Scope = [$this]             // !! Scope 2 - Same scope again 
Scope = []                      // !! Scope 1 - Same scope again

The lines marked with !! - if we only get a callback per Node or Token, we cannot know when a scope is exited. Additionally, we'd need to manage a stack manually instead of relying on the call stack.

Further, we need information on the context of the element - if we have a BracedStatementList (or whatever it's called), we need to know if it is a functionBodyOrSemicolon to start a new Scope, so the Node alone is not sufficient.

If the parser does not provide a traverser that fits those two needs, I think that working around them might make the code more confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this, and getDescendantNodesAndTokens or getDescendantNodes cannot really be used, since we won't know when a node is exited and the scope should be restored.

I think that TreeTraverser could be converted into a generator yielding something - e.g. rename Scope to TraversingContext which contains both $node and $variables etc. Do you think that's worth pursuing?

(I accidentally posted this with a wrong GitHub account first - sorry about the noise)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So first of all I still think parent traversal is the better way to go.
And then getDescendentNodes afaik takes a callback to decide whether a node should be entered or not.
For traversal that needs full control over the recursive aspect I did a PR a while ago to tolerant-php-parser with RecursiveIterator support: microsoft/tolerant-php-parser#139
I didn't need it at the time though and it turned out it was a tad slower than the generators. It would be interesting to see how it compares to this TreeTraverser and the previous implementation, especially on latest PHP 7.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RecursiveIterator thing certainly looks interesting. It would require manual handling of a stack though, if I understand it correctly.

I think that traversing the code backwards to find types has a few problems, which I tried to solve with this:

1. Duplicated work:

Find references on line 2:

1 $a = new A;     // 2.                    Ok, `$a is A`
2 $b = $a->foo(); // 1. Start traversing - need to find what $a is
3 $b->bar();

After that, find references on line 3:

1 $a = new A;     // 3. Duplicated work  - Ok, `$a is A`
2 $b = $a->foo(); // 2. Duplicated work  - Need to find out what $a is
3 $b->bar();      // 1. Start Traversing - need to find what $b is to get reference to bar

This could of course be solved with some sort of memoization. It would probably end up building a scope, but in reverse order.

2. Slowness of parsing backwards in Tolerant PHP Parser

Getting a previous sibling gets its parent, enumerates its children, and when it finds the original node, it returns the previously met one.

(3. Harder to understand - this is mostly a matter of preference maybe)

Since code is evaluated from top to bottom, an approach evaluating code from top to bottom is easier to understand, at least for me.

I'm going to sleep now, so I won't reply for a while. I might take a look at the other PRs and comments tomorrow.

Copy link
Owner

@felixfbecker felixfbecker Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Duplicated work:

I don't think this problem has anything to do with how to traverse. The difference is that this PR introduces a Scope object, where before this was always computed JIT. I never had any performance problems with find-refs on variables though because their scope is naturally small in PHP, so it's a CPU vs memory usage trade off.

  1. Slowness of parsing backwards in Tolerant PHP Parser
    Getting a previous sibling gets its parent, enumerates its children, and when it finds the original node, it returns the previously met one.

This is a good point. With PHPParser I added the siblings as properties so that was fast. We could do the same for tolerant-parser.


Are you maintaining the same behaviour as before with the new traverser?

I believe before, I would always look for the closest assignment to the variable, in case it got overridden:

1 $a = 123;
2 $a = 'abc';
3 $a // should jump to L2

For that I find it very natural to search backwards and confusing to search top-down. If you intend to get to L1, then I would agree that top-down is more natural, but perf-wise (leaving aside 2.) the assumption is that variables are defined close to the reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never had any performance problems with find-refs on variables though because their scope is naturally small in PHP, so it's a CPU vs memory usage trade off.

It might be that the performance benefits here come from caching getResolvedName results in the Scope object. I started working on this because XDebug profiling showed finding variable types as a significant cost. XDebug has very high overhead though, so it might be completely inaccurate. It might be worth trying to cache the getResolvedName results alone, especially if you don't like the Scope stuff.

Are you maintaining the same behaviour as before with the new traverser?

I believe so - the variables in Scope are overwritten when a new assignment is met, so on line 3 $a would be a string.

}

if ($node instanceof ClassLike
&& (in_array($childName, ['classMembers', 'interfaceMembers','traitMembers'], true))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that strict comparison is always easier to reason about.

@@ -175,10 +177,15 @@ private function getDocBlock(Node $node)
*
* @param Node $node
* @param string $fqn
* @param Scope|null $scope Scope at the point of Node. If not provided, will be computed from $node.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you ensure that the scope passed in is the scope of the node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think about this a bit more.

One thing that comes to mind is that the Scope could contain information about the Node or FunctionLike it is defined at? The added check would have a performance penalty (which I guess might be significant), since you'd need to find the closest FunctionLike, ClassLike, or SourceFileNode ancestor every time you wanted to check if the scope belongs to a node.

@@ -726,6 +614,7 @@ public function resolveExpressionNodeToType($expr)
if ($def !== null) {
return $def->type;
}
return new Types\Mixed_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfbecker This is not related to this PR, I left it here by accident. Should I revert it or let it stay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants