A microservice to build and stream dynamically zipped bundles of remote files. The service does not store files, rather it stores references to files which it can stream in a zipped package to users. Designed to be backend-swappable, it is capable of working with just about any database and any filestore. It aims to be fast, have a low memory footprint, and to support tens-of-thousands of concurrent connections [citation needed].
- Use cases
- Supported backends
- Models
- Configuration
- API
- Getting Started
- Logging
- Code Coverage
- Docker
- FAQ
- Contributing
- Attributions
- History
- License
Consider a situation where you store millions of files on Amazon S3 for a client. The client occasionally asks you to make a subset of their files available for download. ZipStream aims to simplify this by allowing you to submit references of the client's files to ZipStream and send your client a link the bundle's identifier. When your client visits the link, a zip file containing all of the S3 assets will be streamed to the user. Because of the fact that the service retrieves the remote data in small chunks, zips it, and streams it to users on the fly, the memory overhead is extremely low and zipfiles have no max filesize.
Naturally, if the zipped bundle is to be downloaded many times over, it's likely more efficient to actually generate the zipped bundle once, store it in a filestore, and send the client a URI to that zipped file. In this case, it would be better to use ZipStream's /bundle
endpoint for a one-time generation of the to-be-shared zip file. This could be done efficiently via streaming from one service to the other.
Example
var AWS = require('aws-sdk');
var s3Stream = require('s3-upload-stream');
var request = require('request');
// Define s3-upload-stream with S3 credentials.
var awsConn = new AWS.S3({
accessKeyId: '',
secretAccessKey: ''
});
// Establish input stream
var inStream = request({
url: 'https://myZipstreamServer.com/bundle',
method: 'POST',
json: {
files: [
{
src: 'https://i.imgur.com/CMMdGGX.jpg',
dst: 'rock-collection.jpg'
}
]
}
});
// Establish output stream
var outStream = s3Stream(awsConn)
.upload({
'Bucket': 'myBucket',
'Key': 'my-key.zip'
})
.on('error', function (err) {
console.error(err);
});
// Send it
inStream.pipe(outStream)
Consider a situation where you would like to allow a customer to export a number of files of their choosing, perhaps in a shopping-cart fashion. Zipstream makes this simple by having a one-off zip creation endpoint that expects a POST
request containing reference to externally hosted files and responds with a streamed zip of these files. Using this, a developer can set up a webform to enable users to select files of their liking and to initiate the download on submit.
The system relies on two backend types: Database and Filestore.
The database backend is responsible for managing data that represent ZipStream Bundle
instances.
To add support for a new database backend, a file should be placed in the backends/db
directory and offer four functions: create
, read
, update
, delete
. Each function must take in a FileRef
object and return a Promise object.
To see what's on the database backends radar, see our Issues.
The filestore backend is responsible for returning files in a Readable stream format. Multiple filestore backends may be enabled. When examining an submitted FileRef
, the system will look for a backend matching the protocol of the src
attribute (e.g. a src
of s3://myBucket/my/key.jpg
would use the s3
backend).
To add support for a new filestore backend, a file should be placed in the backends/fs
directory and offer a single function: getStream
. This function should take in a src
argument and return a readable stream.
- Amazon S3
- HTTP(S) - Any valid URL. For security reasons, IP addresses are not supported
To see what's on the filestore backends radar, see our Issues.
src
:String
, protocol and location of file (depending on backend), such as S3 Bucketdst
:String
, optional, desired path of file when in bundle, defaults to path ofsrc
value
id
:String
, UUIDv4secret
:String
, UUIDv4files
:Array
,FileRef
objectsexpirationDate
:Number
, unix timestamp representing date at which record should be deleted.filename
:String
, desired filename of bundle (should end with .zip)
NODE_ENV
:String
, environment mode of server. Defaults to'development'
. (read more)PORT
:Number
, port number for zipstream server. Defaults to4040
.DB_INTERFACE
:String
, name of database backend to be used by zipstream instance. Defaults to'postgres'
.FS_INTERFACES
:String
, comma-separated names of filestore backends to be supported by zipstream instance. Defaults to's3,https'
.DB_CNXN
:String
, Connection string used to connect to Postgres database backend. Defaults to'postgres://localhost:5432/zipstream'
.AWS_REGION
:String
, AWS Region to be used by S3 filestore backend and DynamoDB database backend. Defaults to'us-west-2'
.TABLE_NAME
:String
, Name of table used by filestore backend. Defaults to'bundles'
.DATA_LIFETIME
:Number
, Lifespan ofBundle
record, in minutes. Defaults to10080
(1 week). Note that the codebase does not manage the removal of expired bundles. It is up to individual backends to set up expiration logic via TTL or database trigger settings
Returns a bundle of provided files.
URL : /bundle
Method : POST
Data constraints
{
"filename": "[optional, filename of zip to be returned]",
"files": [
{
"src": "[protocol and filestore-backend-related location of file]",
"dst": "[optional, desired path of file when in bundle, defaults to path of `src` value]"
}
]
}
Data example
{
"files": [
{
"src": "s3://my-aws-bucket-1/path/to/foo.jpg",
"dst": "foo.jpg"
},
{
"src": "s3://some-other-bucket-2/bar.gif"
}
]
}
Streaming download of zipped bundle.
Code : 200 OK
Download of bundle containing foo.jpg
and bar.gif
.
Creates a bundle.
URL : /
Method : POST
Data constraints
{
"filename": "[desired filename of bundle (should end with .zip)]"
}
Or, optionally:
{
"filename": "[desired filename of bundle]",
"files": [
{
"src": "[protocol and filestore-backend-related location of file]",
"dst": "[optional, desired path of file when in bundle, defaults to path of `src` value]"
}
]
}
Data example
{
"filename": "my-awesome-bundle.zip",
"files": [
{
"src": "s3://my-aws-bucket-1/path/to/foo.jpg",
"dst": "foo.jpg"
},
{
"src": "s3://some-other-bucket-2/bar.gif"
}
]
}
JSON representation of created bundle.
Code : 201 CREATED
Content example
{
"expirationDate" : 1503029550,
"secret" : "bd1be533-3c5c-4395-bda9-73dd288c5487",
"filename" : "my-awesome-bundle.zip",
"files": [
{
"src": "s3://my-aws-bucket-1/path/to/foo.jpg",
"dst": "foo.jpg"
},
{
"src": "s3://some-other-bucket-2/bar.gif"
}
],
"id" : "c4f6f218-afc4-4af1-ae1b-b22e9b058f26"
}
Download a bundle.
URL : /:id/
Method : GET
Streaming download of zipped bundle.
Code : 200 OK
Download of my-awesome-bundle.zip
containing foo.jpg
and bar.gif
.
Retrieves bundle information.
URL : /:id/:secret/
Method : GET
JSON representation of retrieved bundle.
Code : 200 OK
Content example
{
"expirationDate" : 1503029550,
"secret" : "bd1be533-3c5c-4395-bda9-73dd288c5487",
"filename" : "my-awesome-bundle.zip",
"files": [
{
"src": "s3://my-aws-bucket-1/path/to/foo.jpg",
"dst": "foo.jpg"
},
{
"src": "s3://some-other-bucket-2/bar.gif"
}
],
"id" : "c4f6f218-afc4-4af1-ae1b-b22e9b058f26"
}
Append files a bundle.
URL : /:id/:secret/
Method : PUT
Data constraints
{
"files": [
{
"src": "[protocol and filestore-backend-related location of file]",
"dst": "[optional, desired path of file when in bundle, defaults to path of `src` value]"
}
]
}
Data example
{
"filename": "my-awesome-bundle.zip",
"files": [
{
"src": "s3://one-more-bucket-3/another/file.pdf",
"dst": "another-one.pdf"
}
]
}
JSON representation of bundle with newly-appended data.
Code : 200 OK
Content example
{
"expirationDate" : 1503029550,
"secret" : "bd1be533-3c5c-4395-bda9-73dd288c5487",
"filename" : "my-awesome-bundle.zip",
"files": [
{
"src": "s3://my-aws-bucket-1/path/to/foo.jpg",
"dst": "foo.jpg"
},
{
"src": "s3://some-other-bucket-2/bar.gif"
},
{
"src": "s3://one-more-bucket-3/another/file.pdf",
"dst": "another-one.pdf"
}
],
"id" : "c4f6f218-afc4-4af1-ae1b-b22e9b058f26"
}
Used to create an empty bundle.
URL : /:id/:secret/
Method : DELETE
JSON representation of deleted bundle.
Code : 200 OK
Content example
{
"expirationDate" : 1503029550,
"secret" : "bd1be533-3c5c-4395-bda9-73dd288c5487",
"filename" : "my-awesome-bundle.zip",
"files": [
{
"src": "s3://my-aws-bucket-1/path/to/foo.jpg",
"dst": "foo.jpg"
},
{
"src": "s3://some-other-bucket-2/bar.gif",
},
{
"src": "s3://one-more-bucket-3/another/file.pdf",
"dst": "another-one.pdf"
}
],
"id" : "c4f6f218-afc4-4af1-ae1b-b22e9b058f26"
}
Clone the repo:
git clone [email protected]:Cadasta/zipstream.git
cd zipstream
Install yarn:
npm install -g yarn
Install dependencies:
yarn
Set environment (vars):
cp .env.example .env
Start server:
# Start server
yarn start
# Selectively set DEBUG env var to get logs
DEBUG=zipstream:* yarn start
Refer debug to know how to selectively turn on logs.
Tests:
# Run tests written in ES6
yarn test
# Run test along with enforced code coverage (configured via package.json)
yarn test:coverage
# Run tests on file change
yarn test:watch
TODO: Full test coverage
Lint:
# Lint code with ESLint
yarn lint
# Run lint on any file change
yarn lint:watch
Other gulp tasks:
# Wipe out dist and coverage directory
gulp clean
# Default task: Wipes out dist and coverage directory. Compiles using babel.
gulp
# compile to ES5
1. yarn build
# upload dist/ to your server
2. scp -rp dist/ user@dest:/path
# install production dependencies only
3. yarn --production
# Use any process manager to start your services
4. pm2 start dist/index.js
In production you need to make sure your server is always up so you should ideally use any of the process manager recommended here.
Universal logging library winston is used for logging. It has support for multiple transports. A transport is essentially a storage device for your logs. Each instance of a winston logger can have multiple transports configured at different levels. For example, one may want error logs to be stored in a persistent remote location (like a database), but all logs output to the console or a local file. We just log to the console for simplicity, you can configure more transports as per your requirement.
Logs detailed info about each api request to console during development.
Logs stacktrace of error to console along with other details. You should ideally store all error messages persistently.
Get code coverage summary on executing yarn test
yarn test
also generates HTML code coverage report in coverage/
directory. Open lcov-report/index.html
to view it.
Why isn't this written an AWS Lambda service?
This service would indeed be a good use case for AWS Lambda. In fact, we initially began building it out as a Serverless app. Ultimately, Lambda's 5 minute max-lifetime turned us off of the idea as we cater towards clients in remote, low-bandwidth regions where a 5+ minute download is likely. If you're interesting in running this codebase on AWS Lambda, we'd love to hear how it goes!
Pull Requests are very welcome! If you would like to add a new feature, it is recommended that you create an Issue to first discuss the idea, however this is not mandatory.
Inspired by Teamwork's s3zipper. Built from the express-mongoose-es6-rest-api boilerplate.
For the list of all changes see the CHANGELOG.