Skip to content

🗃️ Managing the safe, long-term storage of our digital collections in the cloud

License

Notifications You must be signed in to change notification settings

wellcomecollection/storage-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

storage-service

Build status Deploy stage Deploy prod

This is the Wellcome Collection storage service. It manages the storage of our digital collections, including:

  • Uploading files to cloud storage providers like Amazon S3 and Azure Blob
  • Verifying fixity information on our files (checksums, sizes, filenames)
  • Reporting on the contents of our digital archive through machine-readable APIs and search tools

Requirements

The storage service is designed to:

  • Ensure the safe, long-term (i.e. decades) storage of our digital assets
  • Provide a scalable mechanism for identifying, retrieving, and storing content
  • To support bulk processing of content, e.g. for file format migrations or batch analysis
  • Follow industry best-practices around file integrity and audit trails
  • Enable us to meet NDSA Level 4 for both digitised and "born-digital" assets

High-level design

The user uploads a "bag" to the storage service. This bag should use the BagIt packaging format. The user could be a person, or an automated workflow system like Goobi or Archivematica.

The storage service verifies the fixity information in the bag (checksums, file sizes, filenames). If the fixity information is correct, it replicates the bag to multiple storage locations, split across different cloud providers and geographic locations.

The storage service stores exactly the bytes you give it; no more, no less. It does not do any introspection of the bag contents, or change its behaviour based on the files a bag contains.

The storage service runs entirely in AWS, with no on-premise infrastructure required.

For more detailed information about the design, see our documentation.

Documentation

We have documentation about the storage service, which includes:

  • How-to guides explaining how to do common operations, e.g. ingest a new bag or look up a stored bag
  • Reference material explaining how the storage service works
  • Notes for developers who want to modify or extend the storage service

Usage

We run two instances of the storage service at Wellcome:

Each instance of the storage service is completely separate. They don't share any files or storage.

If you want to store files in the storage service, you should run your own instance -- the instances we run are only for use at Wellcome. We publish our Docker images and infrastructure code, to allow other people to run the storage service.

For instructions, see our documentation.

Getting started: use Terraform and AWS to run a storage service demo

We have a Terraform configuration that spins up an instance of the storage service. You can use this to try the storage service in your own AWS account.

License

MIT.