Merge pull request #191 from jpappel/documentation

Documentation Improvements
MoravianUniversity · Oct 21, 2024 · 90d81f1 · 90d81f1
2 parents 30508d7 + d06ce55
commit 90d81f1
Show file tree

Hide file tree

Showing 3 changed files with 51 additions and 12 deletions.
diff --git a/docs/client.md b/docs/client.md
@@ -1,12 +1,32 @@
 # Client Documentation
 
-
 ## Summary
-The clients are their own objects that will request work from the job queue, and perform the work by making calls to [regulations.gov]
-(https://www.regulations.gov/) for data downloads, and saves the results. 
 
-## Description 
-The goal is 
-that the client will request and complete work in order to download data from 
-[regulations.gov](https://www.regulations.gov/).
+Clients are components of the Mirrulations system responsible for downloading data from Regulations.gov. Unless stopped a client continues to attempt the following steps:
+
+1. Get work from the job queue
+2. Perform the job by downloading data from Regulations.gov
+3. Save downloaded regulation data
+
+To accomplish their task each client interacts externally with Regulations.gov and AWS S3. Internally, each client interacts with the database (Redis) and queue (RabbitMQ).
+
+## Details
+
+Every 3.6 seconds a client attempts to get a job, perform its jobs, and save the resulting data.
+Download jobs have three fields: `job_id`, `url`, and `job_type`.
+
+### Getting Work
+
+A client gets work by removing a job from the queue and updating it's current job in the database.
+
+If no work is available at the current time the client waits for 3.6 seconds before attempting again.
+
+### Data Download
+
+After receiving a job, a client attempts to download the remote resource pointed to by its url. If the `job_type` is a comment, any attachments are also downloaded. The client updates the database after the job is completed.
+
+If an unrecoverable error occurs while during download, the client marks the job as an invalid job in the database. Invalid jobs will not be retried by other clients.
+
+### Saving Data
 
+After downloading data it is saved. By default data is saved to disk and to the `mirrulations` AWS S3 bucket.
diff --git a/docs/production.md b/docs/production.md
@@ -1,7 +1,3 @@
-
-
-
-
 ## Production Environment Documentation
 
 The system is Dockerized into a number of components:

diff --git a/docs/work_generator.md b/docs/work_generator.md
@@ -1,4 +1,27 @@
 # Work Generator Documentation
 
 ## Summary
-The work generator interacts with Regulations.gov directly. It uses a personal API key to check to see if anything new has been posted on the website. If there is something new-- meaning if the gathered link(s) are not in Redis-- the work generator will generate jobs for the client to complete. It takes up to 250 jobs at a time. The jobs include a job_id, a url, and a job_type. When a comment job is generated, another job is created for attachments.
+
+The work generator has three functions:
+
+1. Creation of download jobs
+2. Updating the stored total docket, document, and comment counts from Regulations.gov
+3. Updating the stored current size of the AWS S3 Bucket
+
+To accomplish its functions, the work generator externally interacts with Regulations.gov and AWS. Internally, the work generator updates values values in the database and job queue.updates values values in the database (Redis) and job queue (RabbitMQ).
+
+## Details
+
+Once started the work generator attempts to perform its functions every 6 hours.
+
+### Creating Download Jobs
+
+The work generator iterates over dockets, documents, and comments modified on Regulations.gov since it's last run, and creates download jobs for them in the job queue. A download job has three fields: `job_id`, `url`, and `job_type`. Jobs are added to the queue in batches of up 250, with a minimum of a 3.6 seconds between each batch.
+
+### Docket, Document, and Comment Counts
+
+Before creating any download jobs, the work generator queries Regulations.gov via it's API for total counts of dockets, documents, and comments. It stores these counts within the database.
+
+### AWS S3 Bucket Size
+
+Before creating any download jobs, the work generator attempts to query the AWS S3 bucket where dockets, documents, and comments will be downloaded to. It stores this value within the database.
Original file line number	Diff line number	Diff line change
		@@ -1,7 +1,3 @@




		## Production Environment Documentation

		The system is Dockerized into a number of components:
Expand Down