Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First try on benchmark CI #315

Merged
merged 1 commit into from
Nov 28, 2023
Merged

Conversation

filchristou
Copy link
Contributor

@filchristou filchristou commented Nov 16, 2023

#310

Options

As mentioned in the discourse thread https://discourse.julialang.org/t/recommendation-for-ci-benchmark-to-catch-regression/104698 the available options to have CI benchmarks are

Implementation

In this approach PkgBenchmark.jl and BenchmarkCI.jl are used.

Benchmarks

Regarding the benchmarks in ./benchmark/* directory I did nothing but translating them to a compatible formulation.
This means that the SUITE:BenchmarkGroup must be defined in ./benchmark/benchmark.jl
All options above need the same form (with minor changes for benchmark-action/github-action-benchmark).
This means that we can continue authoring more benchmarks with the security that the code is reusable even if in the CI approach changes in the future.

CI

This is described in the single file /.github/workflows/benchmark.yaml
There are 2 paths: if the event is a pull_request or if it is a push.

If the event is a pull_request a comparison is made (using BenchmarkCI.judge) between the base and the target refs.
This will also post a summary in the PR as a comment.
The base and target benchmarks are uploaded as artifacts in the workflow.

If the event is a push (or a merge) the benchmarks from the commit are run (using PkgBenchmark.benchmarkpkg).
A median estimation of the benchmarks is uploaded as an artifact in the workflow.

Further discussion

Strategy

Benchmarks are just running on ubuntu-latest and julia-v1.
I don't see any reason to populate the workflow with more data.

Artifacts uploads

Artifacts uploads remain available only for 90 days.
After that, the only thing that will stay are the comment-posts in the PRs.
If we want to store all benchmarks we can discuss on some alternatives.

  • uploading them in another github repo
  • uploading them as part of the current repo (the files need to be small (that's the thinking behind the median estimations)
  • uploading in another cloud service

I would like that to be part of the repo, since it's way easier to manage, but it's also completely useless for the users to get.

Permissions and approval

Right now the PR benchmark is run everytime a new update of the branch is pushed.
That might not be necessary.
I think it would be better to have some kind of a manual trigger to start the machinery when other more obvious issues are already resolved.
Also, it is insecure to allow everyone to run or modify the workflow, but I think that's already solved in the organization level.

Single and multithread

Right now both single and multithread benchmarking code is given 2 threads.
We can seperate that.

Example

See how a PR can look
filchristou#3
which triggers the workflow
https://github.com/filchristou/Graphs.jl/actions/runs/6889672830
After a PR is merged, a benchmark will run again
https://github.com/filchristou/Graphs.jl/actions/runs/6889854479

Overall I cannot promise it will work perfectly directly, since I was testing stuff in the fork and not on the main or protected branch. Also, I am not very familiar with the Graphs.jl way of things, so I guess we can tune several preferences in the future.

@filchristou
Copy link
Contributor Author

For experimentation reasons we can make a benchx branch here actually.

Copy link

codecov bot commented Nov 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cc3052f) 97.26% compared to head (e1e7882) 97.26%.
Report is 1 commits behind head on benchx.

❗ Current head e1e7882 differs from pull request most recent head bf18af6. Consider uploading reports for the commit bf18af6 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           benchx     #315   +/-   ##
=======================================
  Coverage   97.26%   97.26%           
=======================================
  Files         115      115           
  Lines        6795     6795           
=======================================
  Hits         6609     6609           
  Misses        186      186           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for taking this on! The general setup seems okay to me, I didn't review the benchmark contents in detail because that's not the point of this PR: let's first get it working

suite = BenchmarkGroup()
include("core.jl")
serialbenchmarks = [
"serial/core.jl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use joinpath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah. with benchdir = dirname(@__FILE__) and the joinpath(benchdir, ...). yeah that's the way it should be.

@@ -0,0 +1 @@
[{"Julia":"1.9.4","BenchmarkTools":"1.0.0"},[["BenchmarkGroup",{"data":{"centrality":["BenchmarkGroup",{"data":{"graphs":["BenchmarkGroup",{"data":{"closeness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"betweenness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"katz_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"degree_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":15,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"digraphs":["BenchmarkGroup",{"data":{"closeness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"betweenness_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"katz_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"degree_centrality":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":10,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"pagerank":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"parallel":["BenchmarkGroup",{"data":{"egonet":["BenchmarkGroup",{"data":{"vertexfunction":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"twohop":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"core":["BenchmarkGroup",{"data":{"nv":["BenchmarkGroup",{"data":{"graphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":723,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"digraphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":784,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"edges":["BenchmarkGroup",{"data":{"graphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":7,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"digraphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":8,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"has_edge":["BenchmarkGroup",{"data":{"graphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":7,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"digraphs":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":8,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"connectivity":["BenchmarkGroup",{"data":{"graphs":["BenchmarkGroup",{"data":{"connected_components":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"digraphs":["BenchmarkGroup",{"data":{"strongly_connected_components":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"insertions":["BenchmarkGroup",{"data":{"SG(n,e) Generation":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"traversals":["BenchmarkGroup",{"data":{"graphs":["BenchmarkGroup",{"data":{"bfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"dfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}],"digraphs":["BenchmarkGroup",{"data":{"bfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"dfs_tree":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"serial":["BenchmarkGroup",{"data":{"egonet":["BenchmarkGroup",{"data":{"vertexfunction":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"twohop":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}],"edges":["BenchmarkGroup",{"data":{"fillp":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"fille":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"tsume":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}],"tsump":["Parameters",{"gctrial":true,"time_tolerance":0.05,"samples":10000,"evals":1,"gcsample":false,"seconds":5.0,"overhead":0.0,"memory_tolerance":0.01}]},"tags":[]}]},"tags":[]}]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Why is described on the Caching Parameters docs of BenchmarkTools.jl. Basically we win in twofold:

  • tuning is slow and it doesn't have to happen everytime when benchmark.
  • better consistency as the same evaluations/samples are being done in every benchmark

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a way to automate the re-generation of the json every time the suite object is updated? Otherwise people will forget

"complete100" => complete_digraph(100), "path500" => path_digraph(500)
)

GRAPHS = Dict{String,Graph}(
const GRAPHS = Dict{String,Graph}(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC benchmarking is always done on all of these, and the total time is reported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The benchmark results are accumulated across all Graphs and DiGraphs. I find it too granular to get benchmarks for each one of the Graphs. For me, It's better to add them up.
Now the const is not really needed I think, since the Graph is always interpolated in the benchmark, but it also doesn't harm.

@gdalle
Copy link
Member

gdalle commented Nov 21, 2023

Benchmarks are just running on ubuntu-latest and julia-v1.

Agreed

Artifacts uploads remain available only for 90 days.

It would be useful to track the performance of the main branch over longer periods of time. For pull requests however we don't really care. Where are benchmarks for Julia itself stored for example?

Right now the PR benchmark is run everytime a new update of the branch is pushed.

Can we base this on a GitHub label, like benchmark, which repo admins would add to a PR when it's nearly ready?

For experimentation reasons we can make a benchx branch here actually.

Knock yourself out!

@gdalle
Copy link
Member

gdalle commented Nov 21, 2023

I'd be curious to have @simonschoelly's opinion on the general approach

@filchristou
Copy link
Contributor Author

Can we base this on a GitHub label, like benchmark, which repo admins would add to a PR when it's nearly ready?

I believe that is technically possible. Maybe the label should be more like to-benchmark because with benchmark there might be confusion that the PR itself was made for performance/benchmarking reasons ?

@filchristou
Copy link
Contributor Author

The PR is failing because the benchmark baseline (i.e. the master branch) doesn't follow the requirements. I believe this is okey, and I wouldn't want to introduce complicated if statements just for this first PR. From the second on, once the master branch is properly structured, there should be no problem.
But as I said before we can test it further in a seperate benchx branch.

@filchristou
Copy link
Contributor Author

Where are benchmarks for Julia itself stored for example?

In a separate github repo
https://github.com/JuliaCI/NanosoldierReports

@gdalle
Copy link
Member

gdalle commented Nov 23, 2023

See JuliaGraphs/JuliaGraphs-meta#12 for the roadmap

@filchristou filchristou changed the base branch from master to benchx November 28, 2023 09:29
(WIP)
TODO: store results and PR label triggering
@filchristou
Copy link
Contributor Author

I merged on the branch with the following TODOs pending before pushing to master:

  • make sure consecutive PRs benchmarks work as intended
  • find a way to trigger PR benchmarks (e.g. labels)
  • long store benchmark results upon push to master (github post or other repo)

@filchristou filchristou merged commit b274d35 into JuliaGraphs:benchx Nov 28, 2023
1 check failed
filchristou added a commit that referenced this pull request Mar 15, 2024
* First try on benchmark CI (#315)

(WIP)
TODO: store results and PR label triggering

* Test consecutive PR (#346)

Merge all new advancements on the `benchx` branch in order to do a follow up consecutive PR to test github actions

* Trivial change, test PR (#347)

* Trivial change, test PR

* Add PR write permission

* Test full permissions

* Granular permissions

* try write-all

* try label event

* labeled in pull_request

* Store results as artifact and trigger a consecutive workflow_run

* Workflow chain almost complete (WIP)

* correct .benchmarkci path

* Add benchx support branch

* Just try pull_request_target instead

* Remove target

* Get rid of s to get the chain going

---------

Co-authored-by: Guillaume Dalle <[email protected]>

* Simplify for master merge

* Fixed formatting

* Integrate review comments

---------

Co-authored-by: Guillaume Dalle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants