-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to train TTM on multiple repos #603
Comments
Currently the workflow also fails for repositories having more than ~120 PRs redhat-et/time-to-merge-tool#4 even when the Github API rate limit is not reached. Explore options of replacing data collection notebook with script, to rule out the possibility of timing out because of the jupyter notebook cells running for too long. |
Also evaluate the size of the dataset generated and the storage space available on the github workflow worker |
Currently the workflow succeeds on repos ~400 PRs. During the workflow the Github API token got rate limited, but the workflow continued running and the data download resumed when the API rate was restored in an hour
Currently triggered workflows on larger repos with ~750 PRs, awaiting their results
As a next step, will modify the data collection step to collect PRs across an organization and monitor a sample workflow |
Jobs with a large number of PRs example
During the course of running the workflow, it pauses several times upon reaching the rate limit But each time, it failed after running for around 6 hrs, with an |
The workflow fails each time after 6 hrs even if the job was continuing to run. It seems this is in line with Github's usage limits which does not allow jobs within a workflow to run longer than 6 hrs.
|
Even if we split a workflow into multiple workflow or jobs, we will need to save intermediate results from the first job. However, for repos such as |
Add functionality to train time to merge this on multiple repos or a Github org
The text was updated successfully, but these errors were encountered: