Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the ability to force the use of an index #22559

Merged
merged 16 commits into from
Oct 22, 2024
Merged

Conversation

snake14
Copy link
Contributor

@snake14 snake14 commented Sep 5, 2024

Description:

Added the ability to force the use of an index when aggregating logs. This is to try and improve efficiency when the query engine doesn't always automatically select the correct index.

Internal ticket: PG-3720

Review

@snake14
Copy link
Contributor Author

snake14 commented Sep 8, 2024

Thank you for the input @michalkleiner . I have committed your suggested changes.

@sgiehl
Copy link
Member

sgiehl commented Sep 9, 2024

@snake14 Did you check the resulting queries on a bigger database using different segment types and also on mysql and mariadb? We need to ensure that forcing an index doesn't slow down anything. Also mysql and mariadb might behave differently in terms of query optimisation.
Maybe there could also be other ways to get mysql to use the index, e.g. by changing order of where conditions or similar.
In the end we need some reliable numbers to be able to merge that.

@snake14
Copy link
Contributor Author

snake14 commented Sep 9, 2024

@snake14 Did you check the resulting queries on a bigger database using different segment types and also on mysql and mariadb? We need to ensure that forcing an index doesn't slow down anything. Also mysql and mariadb might behave differently in terms of query optimisation. Maybe there could also be other ways to get mysql to use the index, e.g. by changing order of where conditions or similar. In the end we need some reliable numbers to be able to merge that.

@sgiehl I haven't tested these specific queries with a large database, but a similar change has been live for the Funnels plugin for a few months. This PR is based on those findings. I suppose I could put this feature behind a config, have it deployed to Cloud using a patch, and profile a customer's archiving process. What do you think @patrickli ?

@patrickli
Copy link

yeah sure why not

Copy link
Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Sep 23, 2024
@snake14
Copy link
Contributor Author

snake14 commented Sep 23, 2024

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

These changes have been put into a patch for Cloud. It's been merged and should be deployed this week. We'll then activate the changes for a customer, see if we can profile their archiving process, and compare to previous profiling results.

@github-actions github-actions bot removed the Stale The label used by the Close Stale Issues action label Sep 24, 2024
Copy link
Contributor

github-actions bot commented Oct 1, 2024

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Oct 1, 2024
@snake14
Copy link
Contributor Author

snake14 commented Oct 7, 2024

@sgiehl The change didn't appear to make a difference for a couple plugins, but it appeared to make an improvement for CustomVariables. Please see this comment on our internal ticket. I believe that profiling was done using the site of one of our larger Cloud customers. Is that good enough for merging or is there something else that you would like to see tested?

I have PRs for the UsersFlow and MultiChannelConversionAttribution plugins which I could also patch to Cloud and compare, but I don't know if that's necessary. They might already be using the index similar to the first two plugins I tested. This is mainly to ensure that the expected index is always used.

@github-actions github-actions bot removed the Stale The label used by the Close Stale Issues action label Oct 8, 2024
@snake14
Copy link
Contributor Author

snake14 commented Oct 14, 2024

@sgiehl Just trying to check in again if we'd be able to merge this PR? Cloud testing went well. When profiling a customer's site with the index change enabled/disabled, two plugins had no noticeable difference (likely already using index) while one plugin had a reasonable improvement in performance. As no harm to performance was noticed and there was some improvement, I think that we're probably safe to merge this, especially since it's only used on a case-by-case basis and not all queries.

@sgiehl
Copy link
Member

sgiehl commented Oct 15, 2024

@snake14 should we remove the config option to enable/disable that again? If it was well tested and it works as expected, I don't think the config option will be of use anymore.

@snake14
Copy link
Contributor Author

snake14 commented Oct 15, 2024

@snake14 should we remove the config option to enable/disable that again? If it was well tested and it works as expected, I don't think the config option will be of use anymore.

@sgiehl I'd kind of like to keep it if possible. I've currently got several plugin PRs checking for method_exists(SettingsPiwik::class, 'isForceSiteDateIndexEnabled') so that we don't have to increase the minimum required Matomo version for the plugin.

@sgiehl
Copy link
Member

sgiehl commented Oct 17, 2024

@snake14 I still don't see any value in keeping that config check code. If there is no reason for the need to disable it, then there is no need to keep the code.
And in terms of plugins I don't see the problem. Didn't you 'only' introduce an additional parameter to a method? So if you always pass that new parameter to the method, the parameter would be simply ignored from Matomo versions where it was not yet defined, right?

@snake14
Copy link
Contributor Author

snake14 commented Oct 17, 2024

@sgiehl That makes sense. I don't see any harm in leaving it and defaulting it to on in the future, though.

I don't really like the idea of making method calls that don't match the method signature; maybe it's just from my years working with a more strict language like Java. However, I have other plugin PRs that are using LogAggregator->generateQuery() directly and need to be able to know whether or not to pass the from table as an array or not. Otherwise, it would cause type exceptions. I was trying to avoid making more Core changes as this PR has taken quite a bit of time already.

@sgiehl
Copy link
Member

sgiehl commented Oct 17, 2024

In your plugins you can also simply check for the Matomo version this is included in

@snake14
Copy link
Contributor Author

snake14 commented Oct 17, 2024

In your plugins you can also simply check for the Matomo version this is included in

@sgiehl Good point. I forgot that I had Matomo version checks in my PRs before I switched to the config. I have removed the config.

@snake14 snake14 requested a review from sgiehl October 21, 2024 01:34
@sgiehl sgiehl added this to the 5.2.0 milestone Oct 22, 2024
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me now.

@sgiehl sgiehl merged commit b8cbe71 into 5.x-dev Oct 22, 2024
25 of 26 checks passed
@sgiehl sgiehl deleted the PG-3720-allow-force-index branch October 22, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants