Skip to content

Latest commit

 

History

History
15 lines (8 loc) · 1.21 KB

ARCHITECTURE.md

File metadata and controls

15 lines (8 loc) · 1.21 KB

DocumentViewer Architecture

DocumentViewer catalogs documents in S3, and allows users to query and view them.

Catalog

The S3 bucket storing the documents includes metadata files which list every file present, along with information about each file, such as the name and date of birth of the rider the document relates to. When the DocumentViewer application starts up, the catalog gets all metadata files in the S3 bucket and reads them to build an internal catalog of all files in the bucket. This data is held in-memory using an ETS table.

The metadata files that are read for the catalog are generated by the company that performs the bulk document scanning for us. They include a metadata file with each batch of documents they upload.

Currently there is no mechanism to refresh the catalog if additional files are uploaded. Rather, redeploying or otherwise restarting the DocumentViewer application will trigger a fresh creation of the catalog.

Web Interface

A web interface allows users to query against the in-memory catalog in order to find documents. The can view a file of interest in the browser or download it—in either case the document is stream out of S3. All user interactions are logged as an audit trail.