From d8c485938ad476f7ea3d4c3287ffe26a214d90d2 Mon Sep 17 00:00:00 2001 From: OnToNothing <126170980+OnToNothing@users.noreply.github.com> Date: Sun, 27 Oct 2024 23:42:05 -0400 Subject: [PATCH 1/2] Update Documentation for Redis DB, Readme, and client md --- README.md | 2 +- docs/client.md | 1 - docs/database.md | 2 +- 3 files changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 8002c6c2..86cd18f2 100644 --- a/README.md +++ b/README.md @@ -135,4 +135,4 @@ This project is currently being developed by a student research team at Moravian * [Stocker Daniel](https://www.linkedin.com/in/daniel-stocker-453936159/) ## Faculty -* Ben Coleman (colemanb@moravian.edu) +* Ben Coleman (colemanb@moravian.edu) \ No newline at end of file diff --git a/docs/client.md b/docs/client.md index 5f49ef2c..d6fd85ec 100644 --- a/docs/client.md +++ b/docs/client.md @@ -9,4 +9,3 @@ The clients are their own objects that will request work from the job queue, and The goal is that the client will request and complete work in order to download data from [regulations.gov](https://www.regulations.gov/). - diff --git a/docs/database.md b/docs/database.md index 607ddbf5..7fe13f15 100644 --- a/docs/database.md +++ b/docs/database.md @@ -57,4 +57,4 @@ unique ids for each job. ## Client IDs The 'last_client_id' variable is used by the work server to ensure that it -generates unique client ids. +generates unique client ids. \ No newline at end of file From fd2bffd7f4c9c91fdd203658a1564daae4dcfe2e Mon Sep 17 00:00:00 2001 From: OnToNothing <126170980+OnToNothing@users.noreply.github.com> Date: Sun, 27 Oct 2024 23:56:10 -0400 Subject: [PATCH 2/2] Update documentation for Redis DB and Readme --- README.md | 4 +-- docs/database.md | 64 +++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 54 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 86cd18f2..3c2614eb 100644 --- a/README.md +++ b/README.md @@ -31,17 +31,15 @@ With the API limiting that is in place, it would take us months to download all ## Getting Started - If you are interested in becoming a developer, see `docs/developers.md`. To run Mirrulations, you need Python 3.9 or greater ([MacOSX](https://docs.python-guide.org/starting/install3/osx/) or [Windows](https://docs.python-guide.org/starting/install3/win/)) on your machine to run this, as well as [redis](https://redis.io/) if you are running a server -You will also need a valid API key from Regulations.gov to participate. To apply for a key, you must simply [contact the Regulations Help Desk](regulations@erulemakinghelpdesk.com) and provide your name, email address, organization, and intended use of the API. If you are not with any organizations, just say so in your message. They will email you with a key once they've verified you and activated the key. +You will also need a valid API key from Regulations.gov to participate. To apply for a key, you must simply complete the API key request form (https://open.gsa.gov/api/regulationsgov/) and provide your name, email address, organization, and intended use of the API. After review the key will be sent by email. To download the actual project, you will need to go to our [GitHub page](https://github.com/MoravianUniversity/mirrulations) and [clone](https://help.github.com/articles/cloning-a-repository/) the project to your computer. - ### Disclaimers -------- "Regulations.gov and the Federal government cannot verify and are not responsible for the accuracy or authenticity of the data or analyses derived from the data after the data has been retrieved from Regulations.gov." diff --git a/docs/database.md b/docs/database.md index 7fe13f15..406545b3 100644 --- a/docs/database.md +++ b/docs/database.md @@ -2,8 +2,33 @@ ## Database Format -We use [Redis](https://redis.io/) to store jobs as well as key values that must -be remembered. +We use [Redis](https://redis.io/) to store jobs as well as key values + +## Database Structure + +The Redis database is structured with the following keys: + +regulations_total_comments +num_dockets_done +num_documents_done +num_attachments_done +last_job_id +jobs_in_progress +num_pdf_attachments_done +num_jobs_documents_waiting +num_jobs_comments_waiting +dockets_last_timestamp +invalid_jobs +regulations_total_dockets +client_jobs +num_extractions_done +regulations_total_documents +mirrulations_bucket_size +num_comments_done +documents_last_timestamp +num_jobs_dockets_waiting +comments_last_timestamp + ## Job Management @@ -11,14 +36,19 @@ The REDIS database has three "queues", with the names: `jobs_waiting_queue`, `jobs_in_progress`, and `jobs_done`. -`jobs_waiting_queue` is a list, while 'jobs_in_progress' and 'jobs_done' are hashes. -Each stores jobs for clients to process. +The keys serve the following functions: + +jobs_waiting_queue: A list holding JSON strings representing each job. + +jobs_in_progress: A hash storing jobs currently being processed. -Keys will be integers, the job ids of the jobs. -These keys will be mapped to integers, the values to be processed. +jobs_done: A hash storing completed jobs. -Additionally, the database has an integer value storing the number of clients: -`total_num_client_ids`. +The keys client_jobs and total_num_client_ids are used for sotring client information. + +client_jobs: A hash mapping job IDs to client IDs. + +total_num_client_ids: An integer value storing the number of clients. ## Redis Format ## `jobs_waiting_queue` @@ -54,7 +84,19 @@ timestamp seen when querying regulations.gov. The `last_job_id` variable is used by the work generator to ensure it generates unique ids for each job. -## Client IDs -The 'last_client_id' variable is used by the work server to ensure that it -generates unique client ids. \ No newline at end of file +## Job Statistics Keys + +DOCKETS_DONE: Tracks the number of completed dockets. + +DOCUMENTS_DONE: Tracks the number of completed documents. + +COMMENTS_DONE: Tracks the number of completed comments. + +ATTACHMENTS_DONE: Tracks the number of completed attachments. + +PDF_ATTACHMENTS_DONE: Tracks the number of completed PDF attachments. + +EXTRACTIONS_DONE: Tracks the number of completed extractions. + +MIRRULATION_BUCKET_SIZE: Stores the size of the mirrulations bucket. \ No newline at end of file