Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot scan rustc software - keeps timing out - after 24 hours #1623

Open
ilovemaui opened this issue Mar 12, 2025 · 4 comments
Open

cannot scan rustc software - keeps timing out - after 24 hours #1623

ilovemaui opened this issue Mar 12, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@ilovemaui
Copy link

Describe the bug
So I have been trying for a week now to scan Rust for licenses, and no matter what I do, it keeps timing out after 24 hours. I am not sure how to get this to scan, or possibly to extend the 24 hour timeout.

System configuration

  • scancode.io version 34.9.5
  • running app using docker
  • windows laptop -
    Device name PF4FF87R
    Processor 12th Gen Intel(R) Core(TM) i7-1270P 2.20 GHz
    Installed RAM 32.0 GB (31.6 GB usable)
    Device ID 2BFEED46-1F14-4226-9935-88F501727E7A
    Product ID 00330-80000-00000-AA029
    System type 64-bit operating system, x64-based processor
    Pen and touch No pen or touch input is available for this display

Edition Windows 10 Enterprise
Version 22H2
Installed on ‎8/‎25/‎2023
OS build 19045.5487
Experience Windows Feature Experience Pack 1000.19061.1000.0

To Reproduce
Steps to reproduce the behavior:

  1. download source input as listed above
  2. create new project
  3. load downloaded input as listed above
  4. select scan_single_package
  5. create pipeline
  6. after 24 hours i see error:
    Task exceeded maximum timeout value (86400 seconds)

Traceback:
File "/opt/scancodeio/aboutcode/pipeline/init.py", line 199, in execute
step(self)
File "/opt/scancodeio/scanpipe/pipelines/scan_single_package.py", line 107, in run_scan
scanning_errors = scancode.run_scan(
^^^^^^^^^^^^^^^^^^
File "/opt/scancodeio/scanpipe/pipes/scancode.py", line 749, in run_scan
_success, results = scancode_run_scan(
^^^^^^^^^^^^^^^^^^
File "/opt/scancodeio/.venv/lib/python3.12/site-packages/scancode/cli.py", line 944, in run_scan
scan_success = run_scanners(
^^^^^^^^^^^^^
File "/opt/scancodeio/.venv/lib/python3.12/site-packages/scancode/cli.py", line 1186, in run_scanners
scan_success = scan_codebase(
^^^^^^^^^^^^^^
File "/opt/scancodeio/.venv/lib/python3.12/site-packages/scancode/cli.py", line 1294, in scan_codebase
scan_timings) = next(scans)
^^^^^^^^^^^
File "/opt/scancodeio/.venv/lib/python3.12/site-packages/scancode/pool.py", line 74, in wrap
result = func(self, timeout=timeout or 3600)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/multiprocessing/pool.py", line 861, in next
self._cond.wait(timeout)
File "/usr/local/lib/python3.12/threading.py", line 359, in wait
gotit = waiter.acquire(True, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/scancodeio/.venv/lib/python3.12/site-packages/rq/timeouts.py", line 63, in handle_death_penalty
raise self._exception('Task exceeded maximum timeout value ({0} seconds)'.format(self._timeout))

@ilovemaui ilovemaui added the bug Something isn't working label Mar 12, 2025
@mjherzog
Copy link
Member

I do not know how to diagnose these errors, but I ran the scan_codebase pipeline for this archive (also with SCIO v34.9.5 on an Ubuntu server. I tried this because a source archive is not really a package. The scan completed in 5.9 hours with the following key stats:

  • 4,887 Packages
  • 77,588 Dependencies
  • 669,547 Resources (629,443)
  • 1,203 Messages - all are errors with several different patterns

Unfortunately the XLSX (80 MB) and JSON (200+ MB) files are too large to upload here - which points to the fact that this is a very large codebase. I have uploaded the MESSAGES sheet from the Scan here.

static.rust-lang.org-dist-rustc-1.85.0-src-Errors.xlsx

@ilovemaui
Copy link
Author

so it is now 7:16AM and my job has been running again for 10 hours, and it is at the point "Running 6/8 run_scan" - and so far there are 68 messages (all from extract archive, appears to be files in a bad format), not sure this affects the overall scan, or just that it ends up skipping files it doesn't understand. but I don't have any faith this will succeed. so I'm stuck, and not sure what to do.

@tdruez
Copy link
Contributor

tdruez commented Mar 13, 2025

or possibly to extend the 24 hour timeout.

@ilovemaui See https://scancodeio.readthedocs.io/en/latest/application-settings.html#scancodeio-task-timeout
You can raise the value of SCANCODEIO_TASK_TIMEOUT to multiple days before your next attempt.

@mjherzog
Copy link
Member

Some stats from the error MESSAGES:

  • 1115 are: DiscoveredPackage matching query does not exist for Model: assemble_package. All of these are from: package_uid: pkg:bazel/test?uuid=580de36d-9500-4e31-bb81-a8033dbd6c42
    resource_path: rustc-1.85.0-src.tar.xz-extract/rustc-1.85.0-src/src/tools/enzyme/enzyme/test/ActivityAnalysis
  • 68 are: various errors for the Model: extract_archives. 32 of these are from: 32 are from: resource_path: /var/scancodeio/workspace/projects/rust-cargo-test-scan-codebase-65964946/codebase/rustc-1.85.0-src.tar.xz-extract/rustc-1.85.0-src/vendor/lzma-sys-0.1.20/xz-5.2/tests/files/. Most other cases have /test/ or /tests/ in the path
  • 11 are: value too long for type character varying(256) for Model:DiscoveredDependency.
  • 9 are: Processing interrupted: timeout after 120 seconds. 7 for license scanning and 2 for copyrights.

So the primary reason for errors seems to be large test files which is not an uncommon issue for scanning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants