Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor zip archive processing to using dynamic zip generation and streaming #104

Closed
servilla opened this issue Feb 21, 2023 · 2 comments
Closed
Assignees
Labels
development Deployed to development environment feature New feature production Deployed to production environment staging Deployed to staging environment

Comments

@servilla
Copy link
Collaborator

When a user requests a zip archive file, the current processing approach is first to check if the zip file exists in a cache and then, if it does, to begin streaming it. If the zip archive does not exist, the first step is to create the zip archive file and then begin streaming it. This means that the user of the first request pays the price of a long wait while the zip file is created. This is not critical for small volumes of data, but multiple GBs may result in a time-out for that first request. In addition, the cached zip archive files require additional disk storage.

For these reasons, we should refactor the workflow from storing cached versions of the zip archive file to one where the zip archive is dynamically created and streamed back to the user in real time. We assume this will incur a small overhead in the dynamic compression but do not believe it will be humanly noticeable.

@servilla servilla added the feature New feature label Feb 21, 2023
@servilla
Copy link
Collaborator Author

Successful completion of this issue will resolve #78 since cached versions of the zip archive file will no longer be required.

@servilla
Copy link
Collaborator Author

An ensuing discussion on this issue led to options of either addressing this in the existing Java code base (e.g., Data Package Manager service) or using a Python web framework. This particular service call can be easily implemented in Python since it can be accomplished independently of any other Java classes. We ultimately decided to stay within the Java code base for the following reasons:

  1. Java supports streaming zip content (as does Python).
  2. The packaging contents already exist within the current Zip Archive processing.
  3. The current processing already has access to the data store (read-only access would have to be added to a server where a Python app would exist).
  4. We do not have a decided-upon Python framework pattern for building out existing PASTA services.

@servilla servilla added the development Deployed to development environment label May 3, 2023
@servilla servilla added production Deployed to production environment staging Deployed to staging environment labels Jun 1, 2023
@servilla servilla closed this as completed Jun 1, 2023
@servilla servilla added the EDI label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Deployed to development environment feature New feature production Deployed to production environment staging Deployed to staging environment
Projects
Status: Done
Development

No branches or pull requests

2 participants