Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird (not transient) S3 AccessDenied #88

Open
benoit74 opened this issue Nov 25, 2024 · 2 comments
Open

Weird (not transient) S3 AccessDenied #88

benoit74 opened this issue Nov 25, 2024 · 2 comments
Assignees
Labels
bug Something isn't working question Further information is requested
Milestone

Comments

@benoit74
Copy link
Contributor

https://farm.openzim.org/pipeline/db06d0ef-e3b1-4c1c-b146-cb9285987761/debug

[mindtouch2zim::MainThread::2024-11-24 11:47:13,504] ERROR:Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 201, in _download_from_s3_cache
    self.s3_storage.download_matching_fileobj(  # pyright: ignore[reportUnknownMemberType]
  File "/usr/local/lib/python3.12/site-packages/kiwixstorage/__init__.py", line 800, in download_matching_fileobj
    raise exc
  File "/usr/local/lib/python3.12/site-packages/kiwixstorage/__init__.py", line 796, in download_matching_fileobj
    remote = self.get_object(key, bucket_name).get()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/boto3/resources/factory.py", line 581, in do_action
    response = action(self, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/boto3/resources/action.py", line 88, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 1023, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/joblib/_utils.py", line 72, in __call__
    return self.func(**kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 598, in __call__
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 76, in process_asset
    self._process_asset_internal(
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 88, in _process_asset_internal
    asset_content = self.get_asset_content(
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 250, in get_asset_content
    return self._get_image_content(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 163, in _get_image_content
    if s3_data := self._download_from_s3_cache(s3_key=s3_key, meta=meta):
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 208, in _download_from_s3_cache
    raise Exception(f"Failed to download {s3_key} from S3 cache") from exc
Exception: Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/entrypoint.py", line 291, in main
    ).run()
      ^^^^^
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/processor.py", line 273, in run
    self.run_with_creator(creator)
  File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/processor.py", line 438, in run_with_creator
    for _ in res:
             ^^^
  File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 1650, in _get_outputs
    yield from self._retrieve()
  File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 1754, in _retrieve
    self._raise_error_fast()
  File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 745, in get_result
    return self._return_or_raise()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 763, in _return_or_raise
    raise self._result
Exception: Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
[mindtouch2zim::MainThread::2024-11-24 11:47:13,508] ERROR:Generation failed with the following error: Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
@benoit74 benoit74 added bug Something isn't working question Further information is requested labels Nov 25, 2024
@benoit74 benoit74 added this to the 0.1 milestone Nov 25, 2024
@benoit74 benoit74 self-assigned this Nov 25, 2024
@benoit74 benoit74 changed the title Weird (transient?) S3 AccessDenied Weird (not transient) S3 AccessDenied Nov 25, 2024
@benoit74
Copy link
Contributor Author

Error is not transient, I restarted the recipe and it happened again. To be investigated

@benoit74
Copy link
Contributor Author

It was hard to see at first sight, but it looks like there is a control character (new line) in the path / S3 key: medium/latex.codecogs.com/gif.latex?\nrate=k[A]^m[B]^n

And this is what causes the S3 AccessDenied, botocore does not seems to particularly like having control characters in the S3 key. At least I easily reproduce the AccessDenied when I use such a key.

Since the S3 key is directly computed from the ZIM path, and since I didn't achieved to build a ZIM path with a control character inside (our normalization code already ensures URLs are valid), I have no clue what is really going on.

Let's wait to have #84 implement to be able to reproduce the issue more easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant