Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add libreoffice extension #1420

Closed
wants to merge 1 commit into from

Conversation

amuwal
Copy link
Contributor

@amuwal amuwal commented Oct 20, 2024

Closes #1361

✅ Checklist

  • I have followed every step in the contributing guide
  • The PR title follows the convention.
  • I ran and tested the code works

Testing

  • Tested in dev
  • Tested in production

[Describe the steps you took to test this change]

  • Build CLI
  • Create hello-world project in references
  • Create libreofficeConvertAndUploadToS3 task in hello-world/src/trigger/libreofficeTask.ts
    • Download a .docx file
    • Convert the file to .pdf
    • Upload to s3
  • Run libreofficeConvertAndUploadToS3 task
  • Verify converted file exists in s3 bucket
Ran pnpm exec trigger deploy --profile prod -p proj_abc in /references/hello-world to test in production

test-repo: https://github.com/amuwal/trigger.dev/tree/libreoffice-test


Changelog

[Short description of what has changed]
added libreoffice extension for file conversion


Screenshots

[Screenshots]
simplescreenrecorder-2024-10-20_10.25.43.webm

💯

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced the LibreOffice extension for enhanced document processing within the build environment.
    • Added functionality to download .docx files, convert them to .pdf, and upload to Amazon S3.
  • New Dependencies

    • Integrated @aws-sdk/client-s3 for S3 interactions.
    • Added libreoffice-convert for document conversion tasks.
  • Improvements

    • Enhanced error handling and cleanup processes during file uploads.

Copy link

changeset-bot bot commented Oct 20, 2024

⚠️ No Changeset found

Latest commit: 54f21a8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Oct 20, 2024

Walkthrough

The pull request introduces a new libreoffice extension within the @trigger.dev/build package, enabling the conversion of .docx and .pptx files to PDF format. This includes updates to the package.json files to support the new extension and its dependencies. A new class, LibreOfficeExtension, is implemented to manage the build process, and additional functionality is provided in libreofficeTasks.ts for downloading, converting, and uploading files to AWS S3. The extension is integrated into the build configuration, allowing for seamless usage.

Changes

File Change Summary
packages/build/package.json Added new export for libreoffice extension and updated typesVersions section.
packages/build/src/extensions/libreoffice.ts Introduced LibreOfficeExtension class with onBuildComplete method for build integration.
references/v3-catalog/package.json Added new dependencies: @aws-sdk/client-s3 and libreoffice-convert.
references/v3-catalog/src/trigger/libreofficeTasks.ts Created functions for downloading, converting, and uploading files to S3.
references/v3-catalog/trigger.config.ts Added libreoffice extension to the build configuration.

Assessment against linked issues

Objective Addressed Explanation
Create a libreoffice build extension to allow docx/pptx to PDF conversion (#1361)

Possibly related PRs

  • New Doc: create your own build extension reference #1347: This PR adds documentation for creating build extensions, which is relevant as the main PR introduces a new build extension for libreoffice.
  • New puppeteer examples #1355: This PR includes examples and documentation for using Puppeteer, which is similar in nature to the libreoffice extension introduced in the main PR, as both involve extending the build capabilities.

Suggested reviewers

  • ericallam

Poem

In the meadow, we hop and play,
With LibreOffice here to stay.
Convert those docs with glee and cheer,
To PDFs, oh what a year!
In builds we trust, let’s dance and sing,
For every file, our joy will bring! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
packages/build/src/extensions/libreoffice.ts (1)

8-32: LGTM: Well-implemented LibreOfficeExtension class with a minor suggestion.

The LibreOfficeExtension class correctly implements the BuildExtension interface and provides appropriate logic for installing LibreOffice in non-dev environments. The installation instructions align with the lightweight requirement mentioned in the PR objectives.

Consider adding error handling to the onBuildComplete method. For example:

 async onBuildComplete(context: BuildContext, manifest: BuildManifest) {
   if (context.target === "dev") {
     return;
   }

   context.logger.debug(`Adding ${this.name} to the build`);

   const instructions = [
     `RUN apt-get update && apt-get install -y \
   libreoffice \
   --no-install-recommends \
   && rm -rf /var/lib/apt/lists/*`,
   ];

+  try {
     context.addLayer({
       id: "libreoffice",
       image: {
         instructions,
       },
     });
+  } catch (error) {
+    context.logger.error(`Failed to add ${this.name} to the build: ${error.message}`);
+    throw error;
+  }
 }

This will provide better error reporting if the addLayer operation fails.

references/v3-catalog/package.json (1)

38-38: Consider updating libreoffice-convert to the latest version

The addition of libreoffice-convert is crucial for the LibreOffice conversion functionality outlined in the PR objectives. However, the specified version ^1.6.0 is not the latest available. Consider updating to the most recent version (2.1.1 as of my last update) to ensure you have the latest features and bug fixes.

Would you like me to provide a diff for updating to the latest version?

references/v3-catalog/trigger.config.ts (1)

Line range hint 1-95: Summary: LibreOffice extension successfully integrated into the build configuration.

The changes in this file are minimal and focused, successfully integrating the LibreOffice extension into the project's build configuration. The new import and the addition to the extensions array are consistent with the PR objectives and follow the existing patterns in the file. These changes should enable the conversion of .docx and .pptx files to PDF format as intended, without disrupting the existing build process or other extensions.

To ensure smooth operation and maintainability:

  1. Consider adding documentation comments above the libreoffice() extension in the extensions array, briefly explaining its purpose and any specific configuration options.
  2. If there are any specific environment variables or system requirements for the LibreOffice extension, consider adding them to the project's setup documentation or README file.
  3. Ensure that the LibreOffice extension is tested in various environments (development, staging, production) to verify its compatibility with other extensions and the overall build process.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 67542a5 and 54f21a8.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • packages/build/package.json (3 hunks)
  • packages/build/src/extensions/libreoffice.ts (1 hunks)
  • references/v3-catalog/package.json (2 hunks)
  • references/v3-catalog/src/trigger/libreofficeTasks.ts (1 hunks)
  • references/v3-catalog/trigger.config.ts (2 hunks)
🧰 Additional context used
🔇 Additional comments (11)
packages/build/src/extensions/libreoffice.ts (3)

1-2: LGTM: Imports are appropriate and concise.

The imports from "@trigger.dev/core/v3" and "@trigger.dev/core/v3/build" are correctly used in the implementation.


4-6: LGTM: Well-implemented factory function.

The libreoffice() function is a clean and simple factory function that follows good practices. It's correctly exported for external use.


1-32: Great implementation of the LibreOffice extension!

This implementation successfully addresses the requirements outlined in the PR objectives and linked issue #1361. Key points:

  1. The extension is lightweight, avoiding unnecessary installations as requested.
  2. It provides a clean factory function for easy instantiation.
  3. The LibreOfficeExtension class correctly implements the BuildExtension interface.
  4. The implementation skips installation for dev targets, which is efficient.

The code is well-structured and follows good practices. Great job on delivering this feature!

references/v3-catalog/package.json (2)

19-19: LGTM: Addition of @aws-sdk/client-s3 dependency

The addition of @aws-sdk/client-s3 is appropriate for the project's needs, particularly for S3 interactions mentioned in the PR objectives. The version ^3.675.0 allows for compatible updates while starting from a recent version, which is a good practice for security and feature support.


Line range hint 1-85: Overall package.json changes look good

The additions to the package.json file align well with the PR objectives of creating a LibreOffice extension for file conversion. The new dependencies (@aws-sdk/client-s3 and libreoffice-convert) are appropriate for the intended functionality. No conflicts or issues are apparent with the existing dependencies or project structure.

references/v3-catalog/trigger.config.ts (2)

8-8: LGTM: New LibreOffice extension import added correctly.

The import statement for the LibreOffice extension is properly placed and follows the existing pattern for extension imports. This addition aligns with the PR objective of creating a LibreOffice build extension for file conversion.


91-91: LGTM: LibreOffice extension added to the build configuration.

The libreoffice() extension is correctly added to the extensions array in the build configuration. This addition completes the integration of the LibreOffice extension into the project's build process.

To ensure the extension is properly recognized and configured, please run the following verification script:

This script will help confirm that the LibreOffice extension is properly imported, configured in the build process, and that the necessary files are present in the project structure.

✅ Verification successful

Verification Successful: LibreOffice extension is correctly configured.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the LibreOffice extension is properly imported and configured

# Test 1: Check if the libreoffice import is present
echo "Checking for LibreOffice import..."
grep -n "import.*libreoffice.*from.*@trigger.dev/build/extensions/libreoffice" references/v3-catalog/trigger.config.ts

# Test 2: Verify the libreoffice extension is added to the extensions array
echo "Verifying LibreOffice extension in build configuration..."
sed -n '/extensions:/,/]/p' references/v3-catalog/trigger.config.ts | grep -n "libreoffice()"

# Test 3: Check if the LibreOffice extension file exists
echo "Checking for LibreOffice extension file..."
ls -l packages/build/src/extensions/libreoffice.ts

echo "Verification complete."

Length of output: 815

packages/build/package.json (4)

57-59: LGTM. Type definitions for LibreOffice extension added correctly.

The addition of the LibreOffice extension type definitions is consistent with the existing structure and follows the package's conventions.


Line range hint 1-185: Overall assessment: LibreOffice extension successfully integrated.

The changes to package.json effectively integrate the LibreOffice extension into the @trigger.dev/build package. The modifications are consistent with the existing structure and follow the package's conventions. Make sure to run the suggested verification scripts after building the package to ensure all necessary files are generated correctly.


167-177: LGTM. Export configuration for LibreOffice extension added correctly.

The export configuration for the LibreOffice extension is consistent with the existing structure and correctly specifies paths for both CommonJS and ES modules.

After building the package, please run the following script to verify the existence of the compiled files:

#!/bin/bash
# Verify the existence of compiled LibreOffice extension files
files=(
    "dist/esm/extensions/libreoffice.d.ts"
    "dist/esm/extensions/libreoffice.js"
    "dist/commonjs/extensions/libreoffice.d.ts"
    "dist/commonjs/extensions/libreoffice.js"
)

for file in "${files[@]}"; do
    if [ ! -f "$file" ]; then
        echo "Error: $file not found"
        exit 1
    fi
done

echo "All LibreOffice extension compiled files exist."

31-32: LGTM. Verify the existence of the LibreOffice extension file.

The addition of the LibreOffice extension export is consistent with the existing structure.

Please run the following script to ensure the LibreOffice extension file exists:

✅ Verification successful

Verified: LibreOffice extension file exists at the correct path.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence of the LibreOffice extension file
if [ -f "src/extensions/libreoffice.ts" ]; then
    echo "LibreOffice extension file exists."
else
    echo "Error: LibreOffice extension file not found."
    exit 1
fi

Length of output: 379


Script:

#!/bin/bash

# Verify the existence of the source LibreOffice extension file
if [ -f "src/extensions/libreoffice.ts" ]; then
    echo "Source LibreOffice extension file exists."
else
    echo "Error: Source LibreOffice extension file not found."
    exit 1
fi

# Verify the export path in package.json
grep '"./extensions/libreoffice": "\./src/extensions/libreoffice\.ts"' packages/build/package.json > /dev/null
if [ $? -eq 0 ]; then
    echo "Export path for LibreOffice extension is correctly set in package.json."
else
    echo "Error: Export path for LibreOffice extension is missing or incorrect in package.json."
    exit 1
fi

Length of output: 164


Script:

#!/bin/bash

# Search for TypeScript files related to LibreOffice in the codebase
fd "libreoffice.ts"

Length of output: 64

const downloadFile = async (url: string, outputPath: string): Promise<void> => {
const response = await fetch(url);
const buffer = await response.arrayBuffer();
fs.writeFileSync(outputPath, Buffer.from(buffer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid blocking the event loop by using asynchronous file system methods

The use of synchronous file system methods (fs.writeFileSync, fs.readFileSync, fs.unlinkSync) blocks the event loop and can degrade performance in an asynchronous context.

Consider using the asynchronous versions of these methods (fs.promises.writeFile, fs.promises.readFile, fs.promises.unlink) to ensure non-blocking I/O operations.

Examples:

In downloadFile function:

- fs.writeFileSync(outputPath, Buffer.from(buffer));
+ await fs.promises.writeFile(outputPath, Buffer.from(buffer));

In uploadToS3 function:

- Body: fs.readFileSync(file_path),
+ Body: await fs.promises.readFile(file_path),

In the main task:

- const docxFile = fs.readFileSync(docxPath);
+ const docxFile = await fs.promises.readFile(docxPath);

...

- fs.writeFileSync(pdfPath, pdfBuffer);
+ await fs.promises.writeFile(pdfPath, pdfBuffer);

In the cleanup section:

- fs.unlinkSync(docxPath);
+ await fs.promises.unlink(docxPath);

- fs.unlinkSync(pdfPath);
+ await fs.promises.unlink(pdfPath);

Also applies to: 34-34, 60-60, 63-63, 73-73, 76-76

Comment on lines +8 to +14
const s3Client = new S3Client({
region: process.env.AWS_REGION || "auto",
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID || "",
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY || "",
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

AWS environment variables are not set.

One or more AWS environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) are not properly configured, which may lead to authentication errors when interacting with AWS services.

🔗 Analysis chain

Ensure AWS credentials and region are properly set

In the S3 client initialization, the region and credentials default to "auto" and empty strings if environment variables are not set. This may lead to authentication errors when interacting with AWS services.

You can verify that the necessary environment variables are set by running the following script:

Please ensure that these environment variables are configured correctly in your deployment environment.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if AWS environment variables are set
# Expected: Variables should be set and not empty

if [[ -z "$AWS_ACCESS_KEY_ID" || -z "$AWS_SECRET_ACCESS_KEY" || -z "$AWS_REGION" ]]; then
  echo "One or more AWS environment variables are not set."
else
  echo "All AWS environment variables are properly set."
fi

Length of output: 124

Comment on lines +17 to +22
const downloadFile = async (url: string, outputPath: string): Promise<void> => {
const response = await fetch(url);
const buffer = await response.arrayBuffer();
fs.writeFileSync(outputPath, Buffer.from(buffer));
console.log(`Downloaded file to ${outputPath}`);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check for HTTP response status in downloadFile function

The downloadFile function does not check if the HTTP response is successful before reading the array buffer. If the fetch response is an error (e.g., 404 Not Found), this could lead to writing invalid data to disk.

Consider adding a check for response.ok and handle errors appropriately. Here's a suggested fix:

 const downloadFile = async (url: string, outputPath: string): Promise<void> => {
   const response = await fetch(url);
+  if (!response.ok) {
+    throw new Error(`Failed to download file: ${response.status} ${response.statusText}`);
+  }
   const buffer = await response.arrayBuffer();
   fs.writeFileSync(outputPath, Buffer.from(buffer));
   console.log(`Downloaded file to ${outputPath}`);
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const downloadFile = async (url: string, outputPath: string): Promise<void> => {
const response = await fetch(url);
const buffer = await response.arrayBuffer();
fs.writeFileSync(outputPath, Buffer.from(buffer));
console.log(`Downloaded file to ${outputPath}`);
};
const downloadFile = async (url: string, outputPath: string): Promise<void> => {
const response = await fetch(url);
if (!response.ok) {
throw new Error(`Failed to download file: ${response.status} ${response.statusText}`);
}
const buffer = await response.arrayBuffer();
fs.writeFileSync(outputPath, Buffer.from(buffer));
console.log(`Downloaded file to ${outputPath}`);
};

Comment on lines +52 to +63
const docxPath = path.join(process.cwd(), "sample.docx");
let pdfPath = "";

try {
// Step 1: Download the .docx file
await downloadFile(docUrl, docxPath);

// Step 2: Convert the downloaded .docx file to .pdf
const docxFile = fs.readFileSync(docxPath);
const pdfBuffer = await convertAsync(docxFile, ".pdf", undefined);
pdfPath = path.join(process.cwd(), "libre-docx-to-pdf-for-s3.pdf");
fs.writeFileSync(pdfPath, pdfBuffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Store temporary files in the system's temporary directory

Currently, temporary files are stored in the current working directory using process.cwd(). This can lead to clutter and potential conflicts.

Consider using the system's temporary directory for storing temporary files:

+ import os from "os";
...
- const docxPath = path.join(process.cwd(), "sample.docx");
+ const docxPath = path.join(os.tmpdir(), "sample.docx");

...

- pdfPath = path.join(process.cwd(), "libre-docx-to-pdf-for-s3.pdf");
+ pdfPath = path.join(os.tmpdir(), "libre-docx-to-pdf-for-s3.pdf");

This change ensures that temporary files are stored appropriately and reduces the risk of file collisions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const docxPath = path.join(process.cwd(), "sample.docx");
let pdfPath = "";
try {
// Step 1: Download the .docx file
await downloadFile(docUrl, docxPath);
// Step 2: Convert the downloaded .docx file to .pdf
const docxFile = fs.readFileSync(docxPath);
const pdfBuffer = await convertAsync(docxFile, ".pdf", undefined);
pdfPath = path.join(process.cwd(), "libre-docx-to-pdf-for-s3.pdf");
fs.writeFileSync(pdfPath, pdfBuffer);
import os from "os";
const docxPath = path.join(os.tmpdir(), "sample.docx");
let pdfPath = "";
try {
// Step 1: Download the .docx file
await downloadFile(docUrl, docxPath);
// Step 2: Convert the downloaded .docx file to .pdf
const docxFile = fs.readFileSync(docxPath);
const pdfBuffer = await convertAsync(docxFile, ".pdf", undefined);
pdfPath = path.join(os.tmpdir(), "libre-docx-to-pdf-for-s3.pdf");
fs.writeFileSync(pdfPath, pdfBuffer);

@ericallam
Copy link
Member

This doesn't seem to be necessary, as we could just use the aptGet extension to add libreoffice to the build.

@amuwal
Copy link
Contributor Author

amuwal commented Oct 21, 2024

This doesn't seem to be necessary, as we could just use the aptGet extension to add libreoffice to the build.

you are right, but there was an issue for it
shall i close the pr?

@amuwal amuwal closed this Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a libreoffice build extension to allow docx/pptx to PDF conversion
2 participants