Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Sync Repository | Wait for finish the repository sync" throws "Timeout exceeded" #393

Open
bogdanmuresan opened this issue May 16, 2024 · 4 comments
Labels
bug Something isn't working inactive new

Comments

@bogdanmuresan
Copy link

Summary

"Sync Repository | Wait for finish the repository sync" throws "Timeout exceeded" after about 16 minutes (1000 seconds).

The async values should be increased or made configurable?
https://github.com/ansible/galaxy_collection/blob/devel/roles/collection_repository_sync/tasks/main.yml#L19

16 minutes seems like a small amount of time for syncing the remote repos. We have a scheduled job running it every 3 days, and sometimes the community one takes 30-40 minutes to complete.

Thank you.

Issue Type

  • Bug Report
@bogdanmuresan bogdanmuresan added bug Something isn't working new labels May 16, 2024
@sean-m-sullivan
Copy link
Contributor

I want to make sure we are on the same page, have you set the following variables, and the async dies after the 1000 seconds times out?
https://github.com/ansible/galaxy_collection/tree/devel/roles/collection_repository_sync#asynchronous-retry-variables

and if you have a LONG list of repos you may need to increase the request timeout as well,
ah_request_timeout

I am for adding that as a variable, just want to make sure that is indeed the problem.

@bogdanmuresan
Copy link
Author

bogdanmuresan commented May 16, 2024

Hi Sean,

I am not 100% sure that is the problem, but based on the ansible doc (https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_async.html#run-tasks-concurrently-poll-0) it seems to be. From that paragraph my understanding is that when poll is 0, the timeout happens when the task runs for more that async value.

This is our config:

---
ah_collection_repositories:
  - name: rh-certified
    remote: rh-certified
    wait: true
    timeout: 3600
  - name: community
    remote: community
    wait: true
    timeout: 3600

ah_configuration_collection_repository_sync_async_retries: 360
ah_configuration_collection_repository_sync_async_delay: 10

Our AAP job fails on attempt 99

{
  "started": 1,
  "finished": 1,
  "stdout": "",
  "stderr": "",
  "stdout_lines": [],
  "stderr_lines": [],
  "ansible_job_id": "j447553509370.765",
  "results_file": "/home/runner/.ansible_async/j447553509370.765",
  "msg": "Timeout exceeded",
  "child_pid": 769,
  "invocation": {
    "module_args": {
      "jid": "j447553509370.765",
      "mode": "status",
      "_async_dir": "/home/runner/.ansible_async"
    }
  },
  "_ansible_no_log": false,
  "attempts": 99,
  "changed": false,
  "__collection_repository_sync_job_async_result_item": {
    "failed": 0,
    "started": 1,
    "finished": 0,
    "ansible_job_id": "j447553509370.765",
    "results_file": "/home/runner/.ansible_async/j447553509370.765",
    "changed": false,
    "__collection_repository_sync_item": {
      "name": "rh-certified",
      "remote": "rh-certified",
      "wait": true,
      "timeout": 3600
    },
    "ansible_loop_var": "__collection_repository_sync_item"
  },
  "ansible_loop_var": "__collection_repository_sync_job_async_result_item",
  "_ansible_item_label": {
    "failed": 0,
    "started": 1,
    "finished": 0,
    "ansible_job_id": "j447553509370.765",
    "results_file": "/home/runner/.ansible_async/j447553509370.765",
    "changed": false,
    "__collection_repository_sync_item": {
      "name": "rh-certified",
      "remote": "rh-certified",
      "wait": true,
      "timeout": 3600
    },
    "ansible_loop_var": "__collection_repository_sync_item"
  }
}

So after 99*10 seconds.. The AH
pulp_ansible.app.tasks.collections.sync task completes successfully in about 20 mins.
When the sync happens in 14 mins for example, there is no timeout error .

Based on all this, I suspect that hardcodded 1000 in https://github.com/ansible/galaxy_collection/blob/devel/roles/collection_repository_sync/tasks/main.yml#L19

@Tompage1994
Copy link
Contributor

At first glance I think the solution here would be to set the async to be equal to ah_configuration_collection_repository_sync_async_retries * ah_configuration_collection_repository_sync_async_delay

@sean-m-sullivan
Copy link
Contributor

sean-m-sullivan commented May 16, 2024

ah_configuration_collection_repository_sync_async_retries * ah_configuration_collection_repository_sync_async_delay+1

Just in case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inactive new
Projects
None yet
Development

No branches or pull requests

3 participants