You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[mindtouch2zim::MainThread::2024-11-24 11:47:13,504] ERROR:Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 201, in _download_from_s3_cache
self.s3_storage.download_matching_fileobj( # pyright: ignore[reportUnknownMemberType]
File "/usr/local/lib/python3.12/site-packages/kiwixstorage/__init__.py", line 800, in download_matching_fileobj
raise exc
File "/usr/local/lib/python3.12/site-packages/kiwixstorage/__init__.py", line 796, in download_matching_fileobj
remote = self.get_object(key, bucket_name).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/boto3/resources/factory.py", line 581, in do_action
response = action(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/boto3/resources/action.py", line 88, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 569, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 1023, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/joblib/_utils.py", line 72, in __call__
return self.func(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 598, in __call__
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 76, in process_asset
self._process_asset_internal(
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 88, in _process_asset_internal
asset_content = self.get_asset_content(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/backoff/_sync.py", line 105, in retry
ret = target(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 250, in get_asset_content
return self._get_image_content(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 163, in _get_image_content
if s3_data := self._download_from_s3_cache(s3_key=s3_key, meta=meta):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/asset.py", line 208, in _download_from_s3_cache
raise Exception(f"Failed to download {s3_key} from S3 cache") from exc
Exception: Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/entrypoint.py", line 291, in main
).run()
^^^^^
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/processor.py", line 273, in run
self.run_with_creator(creator)
File "/usr/local/lib/python3.12/site-packages/mindtouch2zim/processor.py", line 438, in run_with_creator
for _ in res:
^^^
File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 1650, in _get_outputs
yield from self._retrieve()
File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 1754, in _retrieve
self._raise_error_fast()
File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
error_job.get_result(self.timeout)
File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 745, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/joblib/parallel.py", line 763, in _return_or_raise
raise self._result
Exception: Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
[mindtouch2zim::MainThread::2024-11-24 11:47:13,508] ERROR:Generation failed with the following error: Failed to download medium/latex.codecogs.com/gif.latex?
rate=k[A]^m[B]^n from S3 cache
The text was updated successfully, but these errors were encountered:
It was hard to see at first sight, but it looks like there is a control character (new line) in the path / S3 key: medium/latex.codecogs.com/gif.latex?\nrate=k[A]^m[B]^n
And this is what causes the S3 AccessDenied, botocore does not seems to particularly like having control characters in the S3 key. At least I easily reproduce the AccessDenied when I use such a key.
Since the S3 key is directly computed from the ZIM path, and since I didn't achieved to build a ZIM path with a control character inside (our normalization code already ensures URLs are valid), I have no clue what is really going on.
Let's wait to have #84 implement to be able to reproduce the issue more easily.
https://farm.openzim.org/pipeline/db06d0ef-e3b1-4c1c-b146-cb9285987761/debug
The text was updated successfully, but these errors were encountered: