-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: duplicate the uploaded assets for duplicated page #6311
Conversation
WalkthroughThis pull request introduces a new mechanism for handling asset duplication during page or entity copying. A new background task ( Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
apiserver/plane/bgtasks/copy_s3_object.py (3)
34-40
: Caution: swallowing exceptions can mask underlying errors.
This function returns an empty list if parsing fails, potentially hiding issues involving malformed HTML or BeautifulSoup usage. Consider logging the exception or re-raising it to facilitate troubleshooting.
62-78
: Add request timeouts and handle unexpected responses.
Using a timeout parameter inrequests.post(...)
would help avoid indefinite waits if the external service is slow. Also consider handling unexpected response formats or status codes for enhanced resilience.
80-143
: Handle concurrency and address line-length warnings.
- This Celery task can race with other tasks for the same entity. If multiple tasks try to copy assets concurrently, unexpected duplicates or data inconsistencies could appear. Consider gating tasks, or verifying if a task for the same entity is already in progress.
- Several lines exceed 88 characters (e.g., lines 85, 86, 105). Complying with line-length standards can improve readability and pass automated lint checks.
Below is a suggested way to split line 105 as an example:
- destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}" + asset_name = original_asset.attributes.get("name") + destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{asset_name}"🧰 Tools
🪛 Ruff (0.8.2)
85-85: Line too long (104 > 88)
(E501)
86-86: Line too long (101 > 88)
(E501)
105-105: Line too long (106 > 88)
(E501)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apiserver/plane/app/views/page/base.py
(2 hunks)apiserver/plane/bgtasks/copy_s3_object.py
(1 hunks)apiserver/plane/settings/common.py
(1 hunks)apiserver/plane/settings/storage.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Build and Lint on Pull Request
apiserver/plane/settings/common.py
[error] 176-176: Line too long (90 > 88 characters)
🪛 Ruff (0.8.2)
apiserver/plane/bgtasks/copy_s3_object.py
85-85: Line too long (104 > 88)
(E501)
86-86: Line too long (101 > 88)
(E501)
105-105: Line too long (106 > 88)
(E501)
🔇 Additional comments (7)
apiserver/plane/bgtasks/copy_s3_object.py (3)
17-31
: Consider raising an error or logging more information for unrecognized entity types.
By returning an empty dictionary ifentity_type
is not found, the caller may not detect that an unsupported type was used. Incorporating at least a warning log or raising an explicit exception could help pinpoint issues early.
42-52
: Replacement logic looks correct; watch out for partial matches.
Currently, the logic replacessrc
attributes only when there's an exact match. Ensure that unintended partial matches (e.g., query parameters) don't occur, or handle them separately if needed.
54-59
: Consider concurrency or versioning for shared entities.
Whenupdate_description
is called by multiple tasks in parallel, the last update might overwrite earlier changes. If concurrency is a concern, consider implementing an optimistic locking strategy or a revision check.apiserver/plane/settings/storage.py (1)
155-167
: Copy logic is well-structured; consider verifying source object existence.
Thecopy_object
method uses a try-except block to handleClientError
, but you could also verify that the source object exists before attempting the copy. This would allow for more specific error messages and reduce reliance on exception-handling as flow control.apiserver/plane/settings/common.py (1)
339-340
: Verify that the LIVE_BASE_URL is properly set for all environments.
Ensure that this new environment variable is set in each deployment environment, or handle cases when it’s absent to prevent failures in external service calls.apiserver/plane/app/views/page/base.py (2)
42-42
: Import statement is correct.
Bringing incopy_s3_objects
here is in line with the asynchronous task usage.
590-599
: Ensure error handling for the asynchronous duplication task.
Thecopy_s3_objects.delay(...)
call does not track success or failure once initiated. Consider providing a callback or updating relevant logs/fields so you can track and alert the user if asset copying fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
apiserver/plane/bgtasks/copy_s3_object.py (1)
80-149
: 🛠️ Refactor suggestionRefactor for consistency and maintainability.
A few suggestions to improve the code:
- Move the hard-coded tag name to a constant
- Make return values consistent
- Break long lines for better readability
+# Constants +IMAGE_COMPONENT_TAG = "image-component" + @shared_task def copy_s3_objects(entity_name, entity_identifier, project_id, slug, user_id): """ Step 1: Extract asset ids from the description_html of the entity Step 2: Duplicate the assets - Step 3: Update the description_html of the entity with the new asset ids (change the src of img tag) - Step 4: Request the live server to generate the description_binary and description for the entity + Step 3: Update the description_html of the entity with the new asset ids + (change the src of img tag) + Step 4: Request the live server to generate the description_binary and + description for the entity """ try: model_class = {"PAGE": Page, "ISSUE": Issue}.get(entity_name) if not model_class: raise ValueError(f"Unsupported entity_name: {entity_name}") entity = model_class.objects.get(id=entity_identifier) - asset_ids = extract_asset_ids(entity.description_html, "image-component") + asset_ids = extract_asset_ids(entity.description_html, IMAGE_COMPONENT_TAG) duplicated_assets = [] workspace = entity.workspace storage = S3Storage() original_assets = FileAsset.objects.filter( workspace=workspace, project_id=project_id, id__in=asset_ids ) for original_asset in original_assets: - destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}" + destination_key = ( + f"{workspace.id}/" + f"{uuid.uuid4().hex}-{original_asset.attributes.get('name')}" + ) # ... rest of the code ... - return + return duplicated_assets # Return the list even on success except Exception as e: log_exception(e) return []🧰 Tools
🪛 Ruff (0.8.2)
85-85: Line too long (104 > 88)
(E501)
86-86: Line too long (101 > 88)
(E501)
105-105: Line too long (106 > 88)
(E501)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/copy_s3_object.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
apiserver/plane/bgtasks/copy_s3_object.py
85-85: Line too long (104 > 88)
(E501)
86-86: Line too long (101 > 88)
(E501)
105-105: Line too long (106 > 88)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
apiserver/plane/bgtasks/copy_s3_object.py (1)
1-15
: LGTM! Well-organized imports.The imports are properly categorized and all necessary dependencies are included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
apiserver/plane/bgtasks/copy_s3_object.py (1)
82-151
: 🛠️ Refactor suggestionRefactor the task into smaller functions and improve error handling.
The task function is complex and handles multiple responsibilities. Consider breaking it down into smaller, focused functions for better maintainability.
@shared_task +def duplicate_file_asset(original_asset, workspace, user_id, project_id, entity_identifier, storage): + destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}" + duplicated_asset = FileAsset.objects.create( + attributes={ + "name": original_asset.attributes.get("name"), + "type": original_asset.attributes.get("type"), + "size": original_asset.attributes.get("size"), + }, + asset=destination_key, + size=original_asset.size, + workspace=workspace, + created_by_id=user_id, + entity_type=original_asset.entity_type, + project_id=project_id, + storage_metadata=original_asset.storage_metadata, + **get_entity_id_field(original_asset.entity_type, entity_identifier), + ) + storage.copy_object(original_asset.asset, destination_key) + return { + "new_asset_id": str(duplicated_asset.id), + "old_asset_id": str(original_asset.id), + } + +def update_entity_with_external_data(entity, external_data): + if external_data: + entity.description = external_data.get("description") + entity.description_binary = base64.b64decode( + external_data.get("description_binary") + ) + entity.save() + @shared_task def copy_s3_objects(entity_name, entity_identifier, project_id, slug, user_id): try: model_class = {"PAGE": Page, "ISSUE": Issue}.get(entity_name) if not model_class: raise ValueError(f"Unsupported entity_name: {entity_name}") entity = model_class.objects.get(id=entity_identifier) asset_ids = extract_asset_ids(entity.description_html, "image-component") duplicated_assets = [] workspace = entity.workspace storage = S3Storage() original_assets = FileAsset.objects.filter( workspace=workspace, project_id=project_id, id__in=asset_ids ) for original_asset in original_assets: - destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}" - duplicated_asset = FileAsset.objects.create( - attributes={ - "name": original_asset.attributes.get("name"), - "type": original_asset.attributes.get("type"), - "size": original_asset.attributes.get("size"), - }, - asset=destination_key, - size=original_asset.size, - workspace=workspace, - created_by_id=user_id, - entity_type=original_asset.entity_type, - project_id=project_id, - storage_metadata=original_asset.storage_metadata, - **get_entity_id_field(original_asset.entity_type, entity_identifier), - ) - storage.copy_object(original_asset.asset, destination_key) - duplicated_assets.append( - { - "new_asset_id": str(duplicated_asset.id), - "old_asset_id": str(original_asset.id), - } - ) + duplicated_assets.append( + duplicate_file_asset( + original_asset, workspace, user_id, project_id, + entity_identifier, storage + ) + ) if duplicated_assets: FileAsset.objects.filter( pk__in=[item["new_asset_id"] for item in duplicated_assets] ).update(is_uploaded=True) updated_html = update_description( entity, duplicated_assets, "image-component" ) external_data = sync_with_external_service(entity_name, updated_html) - if external_data: - entity.description = external_data.get("description") - entity.description_binary = base64.b64decode( - external_data.get("description_binary") - ) - entity.save() + update_entity_with_external_data(entity, external_data) - return + return duplicated_assets except Exception as e: log_exception(e) - return [] + raise # Re-raise the exception to let Celery handle retriesAlso, consider adding Celery retry configuration for handling transient failures:
-@shared_task +@shared_task( + bind=True, + max_retries=3, + retry_backoff=True, + retry_backoff_max=600, + retry_jitter=True +) -def copy_s3_objects(entity_name, entity_identifier, project_id, slug, user_id): +def copy_s3_objects(self, entity_name, entity_identifier, project_id, slug, user_id):🧰 Tools
🪛 Ruff (0.8.2)
87-87: Line too long (104 > 88)
(E501)
88-88: Line too long (101 > 88)
(E501)
107-107: Line too long (106 > 88)
(E501)
♻️ Duplicate comments (1)
apiserver/plane/bgtasks/copy_s3_object.py (1)
63-80
: 🛠️ Refactor suggestionAdd retry mechanism and improve error handling.
The external service call lacks timeout and retry mechanism, which could affect reliability.
+from requests.adapters import HTTPAdapter +from requests.packages.urllib3.util.retry import Retry + def sync_with_external_service(entity_name, description_html): try: + session = requests.Session() + retry_strategy = Retry( + total=3, + backoff_factor=1, + status_forcelist=[500, 502, 503, 504], + ) + adapter = HTTPAdapter(max_retries=retry_strategy) + session.mount("http://", adapter) + session.mount("https://", adapter) + data = { "description_html": description_html, "variant": "rich" if entity_name == "PAGE" else "document", } - response = requests.post( + response = session.post( f"{settings.LIVE_BASE_URL}/convert-document/", json=data, headers=None, + timeout=(5, 30), # (connect timeout, read timeout) ) if response.status_code == 200: return response.json() + else: + log_exception(f"External service returned {response.status_code}: {response.text}") except requests.RequestException as e: log_exception(e) return {}
🧹 Nitpick comments (1)
apiserver/plane/bgtasks/copy_s3_object.py (1)
56-61
: Add transaction management for entity updates.The
update_description
function updates and saves the entity without transaction management. This could lead to data inconsistency if the save operation fails.+from django.db import transaction def update_description(entity, duplicated_assets, tag): + with transaction.atomic(): updated_html = replace_asset_ids(entity.description_html, tag, duplicated_assets) entity.description_html = updated_html entity.save() return updated_html
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apiserver/plane/bgtasks/copy_s3_object.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
apiserver/plane/bgtasks/copy_s3_object.py
87-87: Line too long (104 > 88)
(E501)
88-88: Line too long (101 > 88)
(E501)
107-107: Line too long (106 > 88)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
apiserver/plane/bgtasks/copy_s3_object.py (2)
1-15
: LGTM! Well-organized imports.The imports are properly grouped and all imported modules are used in the code.
17-32
: LGTM! Clean and maintainable entity mapping.The function provides a comprehensive mapping of entity types to their ID fields using a clean dictionary-based approach.
Description
this pull request ensures assets are duplicated along with the page during duplication.
Type of Change
References
Summary by CodeRabbit
New Features
Improvements