DocumentViewer catalogs documents in S3, and allows users to query and view them.
The S3 bucket storing the documents includes metadata files which list every file present, along with information about each file, such as the name and date of birth of the rider the document relates to. When the DocumentViewer application starts up, the catalog
gets all metadata files in the S3 bucket and reads them to build an internal catalog of all files in the bucket. This data is held in-memory using an ETS table.
The metadata files that are read for the catalog are generated by the company that performs the bulk document scanning for us. They include a metadata file with each batch of documents they upload.
Currently there is no mechanism to refresh the catalog if additional files are uploaded. Rather, redeploying or otherwise restarting the DocumentViewer application will trigger a fresh creation of the catalog.
A web interface allows users to query against the in-memory catalog in order to find documents. The can view a file of interest in the browser or download it—in either case the document is stream out of S3. All user interactions are logged as an audit trail.