-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #191 from jpappel/documentation
Documentation Improvements
- Loading branch information
Showing
3 changed files
with
51 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,32 @@ | ||
# Client Documentation | ||
|
||
|
||
## Summary | ||
The clients are their own objects that will request work from the job queue, and perform the work by making calls to [regulations.gov] | ||
(https://www.regulations.gov/) for data downloads, and saves the results. | ||
|
||
## Description | ||
The goal is | ||
that the client will request and complete work in order to download data from | ||
[regulations.gov](https://www.regulations.gov/). | ||
Clients are components of the Mirrulations system responsible for downloading data from Regulations.gov. Unless stopped a client continues to attempt the following steps: | ||
|
||
1. Get work from the job queue | ||
2. Perform the job by downloading data from Regulations.gov | ||
3. Save downloaded regulation data | ||
|
||
To accomplish their task each client interacts externally with Regulations.gov and AWS S3. Internally, each client interacts with the database (Redis) and queue (RabbitMQ). | ||
|
||
## Details | ||
|
||
Every 3.6 seconds a client attempts to get a job, perform its jobs, and save the resulting data. | ||
Download jobs have three fields: `job_id`, `url`, and `job_type`. | ||
|
||
### Getting Work | ||
|
||
A client gets work by removing a job from the queue and updating it's current job in the database. | ||
|
||
If no work is available at the current time the client waits for 3.6 seconds before attempting again. | ||
|
||
### Data Download | ||
|
||
After receiving a job, a client attempts to download the remote resource pointed to by its url. If the `job_type` is a comment, any attachments are also downloaded. The client updates the database after the job is completed. | ||
|
||
If an unrecoverable error occurs while during download, the client marks the job as an invalid job in the database. Invalid jobs will not be retried by other clients. | ||
|
||
### Saving Data | ||
|
||
After downloading data it is saved. By default data is saved to disk and to the `mirrulations` AWS S3 bucket. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,3 @@ | ||
|
||
|
||
|
||
|
||
## Production Environment Documentation | ||
|
||
The system is Dockerized into a number of components: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,27 @@ | ||
# Work Generator Documentation | ||
|
||
## Summary | ||
The work generator interacts with Regulations.gov directly. It uses a personal API key to check to see if anything new has been posted on the website. If there is something new-- meaning if the gathered link(s) are not in Redis-- the work generator will generate jobs for the client to complete. It takes up to 250 jobs at a time. The jobs include a job_id, a url, and a job_type. When a comment job is generated, another job is created for attachments. | ||
|
||
The work generator has three functions: | ||
|
||
1. Creation of download jobs | ||
2. Updating the stored total docket, document, and comment counts from Regulations.gov | ||
3. Updating the stored current size of the AWS S3 Bucket | ||
|
||
To accomplish its functions, the work generator externally interacts with Regulations.gov and AWS. Internally, the work generator updates values values in the database and job queue.updates values values in the database (Redis) and job queue (RabbitMQ). | ||
|
||
## Details | ||
|
||
Once started the work generator attempts to perform its functions every 6 hours. | ||
|
||
### Creating Download Jobs | ||
|
||
The work generator iterates over dockets, documents, and comments modified on Regulations.gov since it's last run, and creates download jobs for them in the job queue. A download job has three fields: `job_id`, `url`, and `job_type`. Jobs are added to the queue in batches of up 250, with a minimum of a 3.6 seconds between each batch. | ||
|
||
### Docket, Document, and Comment Counts | ||
|
||
Before creating any download jobs, the work generator queries Regulations.gov via it's API for total counts of dockets, documents, and comments. It stores these counts within the database. | ||
|
||
### AWS S3 Bucket Size | ||
|
||
Before creating any download jobs, the work generator attempts to query the AWS S3 bucket where dockets, documents, and comments will be downloaded to. It stores this value within the database. |