chore: duplicate the uploaded assets for duplicated page #6311

NarayanBavisetti · 2025-01-03T14:08:02Z

Description

this pull request ensures assets are duplicated along with the page during duplication.

Type of Change

Bug fix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Improvement (change that would cause existing functionality to not work as expected)
Code refactoring
Performance improvements
Documentation update

References

Summary by CodeRabbit

New Features
- Added ability to duplicate page assets and S3 objects when duplicating a page.
- Introduced new environment configuration for live base URL.
- Enhanced S3 storage with object copying functionality.
Improvements
- Implemented asynchronous background task for asset duplication.
- Added robust error handling for asset management processes.
- Enhanced asset management with functions for ID extraction, replacement, and synchronization.

coderabbitai · 2025-01-03T14:08:11Z

Walkthrough

This pull request introduces a new mechanism for handling asset duplication during page or entity copying. A new background task (copy_s3_objects) has been implemented to automatically copy S3 objects associated with duplicated pages. The changes span multiple files, adding functionality to extract, duplicate, and update asset references in HTML descriptions. A new S3 storage method for object copying and a live base URL configuration are also included, supporting the asset duplication workflow.

Changes

File	Change Summary
`apiserver/plane/app/views/page/base.py`	Added import and async task call to `copy_s3_objects.delay()` for page duplication.
`apiserver/plane/bgtasks/copy_s3_object.py`	New functions for asset management and duplication, including `copy_s3_objects` for asynchronous execution.
`apiserver/plane/settings/common.py`	Added `LIVE_BASE_URL` environment variable configuration.
`apiserver/plane/settings/storage.py`	Added `copy_object()` method to `S3Storage` class for S3 object copying.

Possibly related PRs

fix: changed the response structure in project page duplicate #6301: Modifications to the PageDuplicateEndpoint class related to page duplication logic.
dev: move storage metadata collection to background job #5818: Introduction of the copy_s3_objects function involving asynchronous task handling for asset duplication.
[WEB-3110] chore: changed the creator of page when duplicated #6378: Enhancements to the post method of the PageDuplicateEndpoint class to include new attributes for duplicated pages.

Suggested labels

⚙️backend

Suggested reviewers

pablohashescobar
sriramveeraghanta

Poem

🐰 Hopping through code with glee,
Duplicating assets, setting them free!
S3 objects dance, a magical sight,
Copying pages with rabbit-like might!
Async tasks spinning, a technological spree! 🚀

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

apiserver/plane/bgtasks/copy_s3_object.py (3)
34-40: Caution: swallowing exceptions can mask underlying errors.
This function returns an empty list if parsing fails, potentially hiding issues involving malformed HTML or BeautifulSoup usage. Consider logging the exception or re-raising it to facilitate troubleshooting.

62-78: Add request timeouts and handle unexpected responses.
Using a timeout parameter in requests.post(...) would help avoid indefinite waits if the external service is slow. Also consider handling unexpected response formats or status codes for enhanced resilience.

80-143: Handle concurrency and address line-length warnings.

This Celery task can race with other tasks for the same entity. If multiple tasks try to copy assets concurrently, unexpected duplicates or data inconsistencies could appear. Consider gating tasks, or verifying if a task for the same entity is already in progress.

Several lines exceed 88 characters (e.g., lines 85, 86, 105). Complying with line-length standards can improve readability and pass automated lint checks.

Below is a suggested way to split line 105 as an example:
-            destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}"
+            asset_name = original_asset.attributes.get("name")
+            destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{asset_name}"
🧰 Tools

🪛 Ruff (0.8.2)

85-85: Line too long (104 > 88)

(E501)

86-86: Line too long (101 > 88)

(E501)

105-105: Line too long (106 > 88)

(E501)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2622dd6 and 26763c4.

📒 Files selected for processing (4)

apiserver/plane/app/views/page/base.py (2 hunks)
apiserver/plane/bgtasks/copy_s3_object.py (1 hunks)
apiserver/plane/settings/common.py (1 hunks)
apiserver/plane/settings/storage.py (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Build and Lint on Pull Request

apiserver/plane/settings/common.py

[error] 176-176: Line too long (90 > 88 characters)

🪛 Ruff (0.8.2)

apiserver/plane/bgtasks/copy_s3_object.py

85-85: Line too long (104 > 88)

(E501)

86-86: Line too long (101 > 88)

(E501)

105-105: Line too long (106 > 88)

(E501)

🔇 Additional comments (7)

apiserver/plane/bgtasks/copy_s3_object.py (3)

17-31: Consider raising an error or logging more information for unrecognized entity types.
By returning an empty dictionary if entity_type is not found, the caller may not detect that an unsupported type was used. Incorporating at least a warning log or raising an explicit exception could help pinpoint issues early.

42-52: Replacement logic looks correct; watch out for partial matches.
Currently, the logic replaces src attributes only when there's an exact match. Ensure that unintended partial matches (e.g., query parameters) don't occur, or handle them separately if needed.

54-59: Consider concurrency or versioning for shared entities.
When update_description is called by multiple tasks in parallel, the last update might overwrite earlier changes. If concurrency is a concern, consider implementing an optimistic locking strategy or a revision check.

apiserver/plane/settings/storage.py (1)

155-167: Copy logic is well-structured; consider verifying source object existence.
The copy_object method uses a try-except block to handle ClientError, but you could also verify that the source object exists before attempting the copy. This would allow for more specific error messages and reduce reliance on exception-handling as flow control.

apiserver/plane/settings/common.py (1)

339-340: Verify that the LIVE_BASE_URL is properly set for all environments.
Ensure that this new environment variable is set in each deployment environment, or handle cases when it’s absent to prevent failures in external service calls.

apiserver/plane/app/views/page/base.py (2)

42-42: Import statement is correct.
Bringing in copy_s3_objects here is in line with the asynchronous task usage.

590-599: Ensure error handling for the asynchronous duplication task.
The copy_s3_objects.delay(...) call does not track success or failure once initiated. Consider providing a callback or updating relevant logs/fields so you can track and alert the user if asset copying fails.

apiserver/plane/bgtasks/copy_s3_object.py

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

apiserver/plane/bgtasks/copy_s3_object.py (1)

80-149: 🛠️ Refactor suggestion

Refactor for consistency and maintainability.

A few suggestions to improve the code:

Move the hard-coded tag name to a constant
Make return values consistent
Break long lines for better readability

+# Constants
+IMAGE_COMPONENT_TAG = "image-component"
+
 @shared_task
 def copy_s3_objects(entity_name, entity_identifier, project_id, slug, user_id):
     """
     Step 1: Extract asset ids from the description_html of the entity
     Step 2: Duplicate the assets
-    Step 3: Update the description_html of the entity with the new asset ids (change the src of img tag)
-    Step 4: Request the live server to generate the description_binary and description for the entity
+    Step 3: Update the description_html of the entity with the new asset ids
+           (change the src of img tag)
+    Step 4: Request the live server to generate the description_binary and
+           description for the entity
     """
     try:
         model_class = {"PAGE": Page, "ISSUE": Issue}.get(entity_name)
         if not model_class:
             raise ValueError(f"Unsupported entity_name: {entity_name}")

         entity = model_class.objects.get(id=entity_identifier)
-        asset_ids = extract_asset_ids(entity.description_html, "image-component")
+        asset_ids = extract_asset_ids(entity.description_html, IMAGE_COMPONENT_TAG)

         duplicated_assets = []
         workspace = entity.workspace
         storage = S3Storage()
         original_assets = FileAsset.objects.filter(
             workspace=workspace, project_id=project_id, id__in=asset_ids
         )

         for original_asset in original_assets:
-            destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}"
+            destination_key = (
+                f"{workspace.id}/"
+                f"{uuid.uuid4().hex}-{original_asset.attributes.get('name')}"
+            )
             # ... rest of the code ...

-        return
+        return duplicated_assets  # Return the list even on success
     except Exception as e:
         log_exception(e)
         return []

🧰 Tools

🪛 Ruff (0.8.2)

85-85: Line too long (104 > 88)

(E501)

86-86: Line too long (101 > 88)

(E501)

105-105: Line too long (106 > 88)

(E501)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26763c4 and 39e20a4.

📒 Files selected for processing (1)

apiserver/plane/bgtasks/copy_s3_object.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

apiserver/plane/bgtasks/copy_s3_object.py

85-85: Line too long (104 > 88)

(E501)

86-86: Line too long (101 > 88)

(E501)

105-105: Line too long (106 > 88)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Analyze (javascript)
GitHub Check: Analyze (python)

🔇 Additional comments (1)

apiserver/plane/bgtasks/copy_s3_object.py (1)

1-15: LGTM! Well-organized imports.

The imports are properly categorized and all necessary dependencies are included.

apiserver/plane/bgtasks/copy_s3_object.py

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

apiserver/plane/bgtasks/copy_s3_object.py (1)

82-151: 🛠️ Refactor suggestion

Refactor the task into smaller functions and improve error handling.

The task function is complex and handles multiple responsibilities. Consider breaking it down into smaller, focused functions for better maintainability.

 @shared_task
+def duplicate_file_asset(original_asset, workspace, user_id, project_id, entity_identifier, storage):
+    destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}"
+    duplicated_asset = FileAsset.objects.create(
+        attributes={
+            "name": original_asset.attributes.get("name"),
+            "type": original_asset.attributes.get("type"),
+            "size": original_asset.attributes.get("size"),
+        },
+        asset=destination_key,
+        size=original_asset.size,
+        workspace=workspace,
+        created_by_id=user_id,
+        entity_type=original_asset.entity_type,
+        project_id=project_id,
+        storage_metadata=original_asset.storage_metadata,
+        **get_entity_id_field(original_asset.entity_type, entity_identifier),
+    )
+    storage.copy_object(original_asset.asset, destination_key)
+    return {
+        "new_asset_id": str(duplicated_asset.id),
+        "old_asset_id": str(original_asset.id),
+    }
+
+def update_entity_with_external_data(entity, external_data):
+    if external_data:
+        entity.description = external_data.get("description")
+        entity.description_binary = base64.b64decode(
+            external_data.get("description_binary")
+        )
+        entity.save()
+
 @shared_task
 def copy_s3_objects(entity_name, entity_identifier, project_id, slug, user_id):
     try:
         model_class = {"PAGE": Page, "ISSUE": Issue}.get(entity_name)
         if not model_class:
             raise ValueError(f"Unsupported entity_name: {entity_name}")

         entity = model_class.objects.get(id=entity_identifier)
         asset_ids = extract_asset_ids(entity.description_html, "image-component")

         duplicated_assets = []
         workspace = entity.workspace
         storage = S3Storage()
         original_assets = FileAsset.objects.filter(
             workspace=workspace, project_id=project_id, id__in=asset_ids
         )

         for original_asset in original_assets:
-            destination_key = f"{workspace.id}/{uuid.uuid4().hex}-{original_asset.attributes.get('name')}"
-            duplicated_asset = FileAsset.objects.create(
-                attributes={
-                    "name": original_asset.attributes.get("name"),
-                    "type": original_asset.attributes.get("type"),
-                    "size": original_asset.attributes.get("size"),
-                },
-                asset=destination_key,
-                size=original_asset.size,
-                workspace=workspace,
-                created_by_id=user_id,
-                entity_type=original_asset.entity_type,
-                project_id=project_id,
-                storage_metadata=original_asset.storage_metadata,
-                **get_entity_id_field(original_asset.entity_type, entity_identifier),
-            )
-            storage.copy_object(original_asset.asset, destination_key)
-            duplicated_assets.append(
-                {
-                    "new_asset_id": str(duplicated_asset.id),
-                    "old_asset_id": str(original_asset.id),
-                }
-            )
+            duplicated_assets.append(
+                duplicate_file_asset(
+                    original_asset, workspace, user_id, project_id, 
+                    entity_identifier, storage
+                )
+            )

         if duplicated_assets:
             FileAsset.objects.filter(
                 pk__in=[item["new_asset_id"] for item in duplicated_assets]
             ).update(is_uploaded=True)
             updated_html = update_description(
                 entity, duplicated_assets, "image-component"
             )
             external_data = sync_with_external_service(entity_name, updated_html)
-            if external_data:
-                entity.description = external_data.get("description")
-                entity.description_binary = base64.b64decode(
-                    external_data.get("description_binary")
-                )
-                entity.save()
+            update_entity_with_external_data(entity, external_data)

-        return
+        return duplicated_assets
     except Exception as e:
         log_exception(e)
-        return []
+        raise  # Re-raise the exception to let Celery handle retries

Also, consider adding Celery retry configuration for handling transient failures:

-@shared_task
+@shared_task(
+    bind=True,
+    max_retries=3,
+    retry_backoff=True,
+    retry_backoff_max=600,
+    retry_jitter=True
+)
-def copy_s3_objects(entity_name, entity_identifier, project_id, slug, user_id):
+def copy_s3_objects(self, entity_name, entity_identifier, project_id, slug, user_id):

🧰 Tools

🪛 Ruff (0.8.2)

87-87: Line too long (104 > 88)

(E501)

88-88: Line too long (101 > 88)

(E501)

107-107: Line too long (106 > 88)

(E501)

♻️ Duplicate comments (1)

apiserver/plane/bgtasks/copy_s3_object.py (1)

63-80: 🛠️ Refactor suggestion

Add retry mechanism and improve error handling.

The external service call lacks timeout and retry mechanism, which could affect reliability.

+from requests.adapters import HTTPAdapter
+from requests.packages.urllib3.util.retry import Retry
+
 def sync_with_external_service(entity_name, description_html):
     try:
+        session = requests.Session()
+        retry_strategy = Retry(
+            total=3,
+            backoff_factor=1,
+            status_forcelist=[500, 502, 503, 504],
+        )
+        adapter = HTTPAdapter(max_retries=retry_strategy)
+        session.mount("http://", adapter)
+        session.mount("https://", adapter)
+
         data = {
             "description_html": description_html,
             "variant": "rich" if entity_name == "PAGE" else "document",
         }
-        response = requests.post(
+        response = session.post(
             f"{settings.LIVE_BASE_URL}/convert-document/",
             json=data,
             headers=None,
+            timeout=(5, 30),  # (connect timeout, read timeout)
         )
         if response.status_code == 200:
             return response.json()
+        else:
+            log_exception(f"External service returned {response.status_code}: {response.text}")
     except requests.RequestException as e:
         log_exception(e)
     return {}

🧹 Nitpick comments (1)

apiserver/plane/bgtasks/copy_s3_object.py (1)
56-61: Add transaction management for entity updates.

The update_description function updates and saves the entity without transaction management. This could lead to data inconsistency if the save operation fails.
+from django.db import transaction

 def update_description(entity, duplicated_assets, tag):
+    with transaction.atomic():
         updated_html = replace_asset_ids(entity.description_html, tag, duplicated_assets)
         entity.description_html = updated_html
         entity.save()
     return updated_html

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39e20a4 and 3cb02c2.

📒 Files selected for processing (1)

apiserver/plane/bgtasks/copy_s3_object.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

apiserver/plane/bgtasks/copy_s3_object.py

87-87: Line too long (104 > 88)

(E501)

88-88: Line too long (101 > 88)

(E501)

107-107: Line too long (106 > 88)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Analyze (javascript)
GitHub Check: Analyze (python)

🔇 Additional comments (2)

apiserver/plane/bgtasks/copy_s3_object.py (2)

1-15: LGTM! Well-organized imports.

The imports are properly grouped and all imported modules are used in the code.

17-32: LGTM! Clean and maintainable entity mapping.

The function provides a comprehensive mapping of entity types to their ID fields using a clean dictionary-based approach.

chore: duplicate the uploaded assets in the entity

26763c4

NarayanBavisetti requested a review from pablohashescobar January 3, 2025 14:08

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

NarayanBavisetti requested a review from dheeru0198 February 11, 2025 08:48

dheeru0198 reviewed Feb 11, 2025

View reviewed changes

apiserver/plane/bgtasks/copy_s3_object.py Outdated Show resolved Hide resolved

chore: changed the filtering logic

39e20a4

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

apiserver/plane/bgtasks/copy_s3_object.py Outdated Show resolved Hide resolved

apiserver/plane/bgtasks/copy_s3_object.py Outdated Show resolved Hide resolved

apiserver/plane/bgtasks/copy_s3_object.py Show resolved Hide resolved

dheeru0198 approved these changes Feb 11, 2025

View reviewed changes

chore: captured exception

3cb02c2

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

dheeru0198 approved these changes Feb 11, 2025

View reviewed changes

sriramveeraghanta merged commit 4db95cc into preview Feb 11, 2025
12 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: duplicate the uploaded assets for duplicated page #6311

chore: duplicate the uploaded assets for duplicated page #6311

NarayanBavisetti commented Jan 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 3, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

chore: duplicate the uploaded assets for duplicated page #6311

chore: duplicate the uploaded assets for duplicated page #6311

Conversation

NarayanBavisetti commented Jan 3, 2025 • edited by coderabbitai bot Loading

Description

Type of Change

References

Summary by CodeRabbit

coderabbitai bot commented Jan 3, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

NarayanBavisetti commented Jan 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)