Skip to content

Commit

Permalink
Merge pull request #191 from jpappel/documentation
Browse files Browse the repository at this point in the history
Documentation Improvements
  • Loading branch information
bjcoleman authored Oct 21, 2024
2 parents 30508d7 + d06ce55 commit 90d81f1
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 12 deletions.
34 changes: 27 additions & 7 deletions docs/client.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,32 @@
# Client Documentation


## Summary
The clients are their own objects that will request work from the job queue, and perform the work by making calls to [regulations.gov]
(https://www.regulations.gov/) for data downloads, and saves the results.

## Description
The goal is
that the client will request and complete work in order to download data from
[regulations.gov](https://www.regulations.gov/).
Clients are components of the Mirrulations system responsible for downloading data from Regulations.gov. Unless stopped a client continues to attempt the following steps:

1. Get work from the job queue
2. Perform the job by downloading data from Regulations.gov
3. Save downloaded regulation data

To accomplish their task each client interacts externally with Regulations.gov and AWS S3. Internally, each client interacts with the database (Redis) and queue (RabbitMQ).

## Details

Every 3.6 seconds a client attempts to get a job, perform its jobs, and save the resulting data.
Download jobs have three fields: `job_id`, `url`, and `job_type`.

### Getting Work

A client gets work by removing a job from the queue and updating it's current job in the database.

If no work is available at the current time the client waits for 3.6 seconds before attempting again.

### Data Download

After receiving a job, a client attempts to download the remote resource pointed to by its url. If the `job_type` is a comment, any attachments are also downloaded. The client updates the database after the job is completed.

If an unrecoverable error occurs while during download, the client marks the job as an invalid job in the database. Invalid jobs will not be retried by other clients.

### Saving Data

After downloading data it is saved. By default data is saved to disk and to the `mirrulations` AWS S3 bucket.
4 changes: 0 additions & 4 deletions docs/production.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@




## Production Environment Documentation

The system is Dockerized into a number of components:
Expand Down
25 changes: 24 additions & 1 deletion docs/work_generator.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,27 @@
# Work Generator Documentation

## Summary
The work generator interacts with Regulations.gov directly. It uses a personal API key to check to see if anything new has been posted on the website. If there is something new-- meaning if the gathered link(s) are not in Redis-- the work generator will generate jobs for the client to complete. It takes up to 250 jobs at a time. The jobs include a job_id, a url, and a job_type. When a comment job is generated, another job is created for attachments.

The work generator has three functions:

1. Creation of download jobs
2. Updating the stored total docket, document, and comment counts from Regulations.gov
3. Updating the stored current size of the AWS S3 Bucket

To accomplish its functions, the work generator externally interacts with Regulations.gov and AWS. Internally, the work generator updates values values in the database and job queue.updates values values in the database (Redis) and job queue (RabbitMQ).

## Details

Once started the work generator attempts to perform its functions every 6 hours.

### Creating Download Jobs

The work generator iterates over dockets, documents, and comments modified on Regulations.gov since it's last run, and creates download jobs for them in the job queue. A download job has three fields: `job_id`, `url`, and `job_type`. Jobs are added to the queue in batches of up 250, with a minimum of a 3.6 seconds between each batch.

### Docket, Document, and Comment Counts

Before creating any download jobs, the work generator queries Regulations.gov via it's API for total counts of dockets, documents, and comments. It stores these counts within the database.

### AWS S3 Bucket Size

Before creating any download jobs, the work generator attempts to query the AWS S3 bucket where dockets, documents, and comments will be downloaded to. It stores this value within the database.

0 comments on commit 90d81f1

Please sign in to comment.