Refactor zip archive processing to using dynamic zip generation and streaming #104

servilla · 2023-02-21T18:46:58Z

When a user requests a zip archive file, the current processing approach is first to check if the zip file exists in a cache and then, if it does, to begin streaming it. If the zip archive does not exist, the first step is to create the zip archive file and then begin streaming it. This means that the user of the first request pays the price of a long wait while the zip file is created. This is not critical for small volumes of data, but multiple GBs may result in a time-out for that first request. In addition, the cached zip archive files require additional disk storage.

For these reasons, we should refactor the workflow from storing cached versions of the zip archive file to one where the zip archive is dynamically created and streamed back to the user in real time. We assume this will incur a small overhead in the dynamic compression but do not believe it will be humanly noticeable.

servilla · 2023-02-21T18:48:20Z

Successful completion of this issue will resolve #78 since cached versions of the zip archive file will no longer be required.

servilla · 2023-02-21T19:10:17Z

An ensuing discussion on this issue led to options of either addressing this in the existing Java code base (e.g., Data Package Manager service) or using a Python web framework. This particular service call can be easily implemented in Python since it can be accomplished independently of any other Java classes. We ultimately decided to stay within the Java code base for the following reasons:

Java supports streaming zip content (as does Python).
The packaging contents already exist within the current Zip Archive processing.
The current processing already has access to the data store (read-only access would have to be added to a server where a Python app would exist).
We do not have a decided-upon Python framework pattern for building out existing PASTA services.

servilla added the feature New feature label Feb 21, 2023

servilla assigned rogerdahl Feb 21, 2023

servilla added the development Deployed to development environment label May 3, 2023

servilla added production Deployed to production environment staging Deployed to staging environment labels Jun 1, 2023

servilla closed this as completed Jun 1, 2023

rogerdahl added this to Consolidated Issues Oct 18, 2023

rogerdahl moved this to Done in Consolidated Issues Oct 18, 2023

servilla added the EDI label Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor zip archive processing to using dynamic zip generation and streaming #104

Refactor zip archive processing to using dynamic zip generation and streaming #104

servilla commented Feb 21, 2023

servilla commented Feb 21, 2023

servilla commented Feb 21, 2023

Refactor zip archive processing to using dynamic zip generation and streaming #104

Refactor zip archive processing to using dynamic zip generation and streaming #104

Comments

servilla commented Feb 21, 2023

servilla commented Feb 21, 2023

servilla commented Feb 21, 2023