Initial metadata extraction implementation #159

jack11wagner · 2023-04-19T19:40:13Z

No description provided.

…mment are found * This change uses the save_json method in the savers * Do note: The DiskSaver and S3 saver were using data with the full data dict before * I changed the Savers and the client to have the parameter for saving json be data["results"] instead of the full data dictionary * This is a better way of implementing the save_json methods since what is passed in is the json we want to save rather than needing to find the key we want in the savers themselves.

nikovacs · 2023-04-20T15:55:17Z

mirrulations-client/src/mirrclient/client.py

+            meta["extraction_status"][file_name] = "Not Attempted"
+        meta_save_path = f"{meta_save_dir}/extraction-metadata.json"
+        self.saver.save_json(meta_save_path, meta)
+        return meta_save_path, meta


This method does not have to return anything. Otherwise good.

jack11wagner · 2023-04-21T02:51:47Z

Example extraction-metadata.json

jack11wagner · 2023-04-21T02:53:20Z

Not writing to S3 yet.

jack11wagner · 2023-04-21T18:22:03Z

After discussion with Dr. Coleman as well as understanding the time constraints at the end of the semester. We are putting the task of having extraction metadata off for now. At the moment, extraction metadata is a json of all the attachments in a docket and the statuses are initialized to "Not Attempted". We never got to having the extractor update this json due to the challenges with loading the file and the problems with this change in S3.

jack11wagner · 2023-04-24T18:08:22Z

The idea of having extraction metadata is still unclear about what exactly would be needed so this PR will need to be revisited at a later date.

jack11wagner added 3 commits April 19, 2023 15:39

Initial metadata extraction implementation

b31d9a8

Static Fixes

75e198e

nikovacs approved these changes Apr 20, 2023

View reviewed changes

jack11wagner marked this pull request as draft April 20, 2023 18:30

jack11wagner added 2 commits April 20, 2023 22:47

Add save_meta function to disk_saver

a2c9bba

Client uses new save_meta function in the DiskSaver

1f444b1

nikovacs added 5 commits April 21, 2023 11:45

os.remove is not neccessary. Writing new meta will overwrite

10d6a41

Add save_meta method. Add static update meta method.

4e2faa9

refactor save_meta to use Saver's update_meta method

cba49a7

add save_meta to s3_saver

3aee40b

linter fixes

3d5229f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial metadata extraction implementation #159

Initial metadata extraction implementation #159

jack11wagner commented Apr 19, 2023

nikovacs Apr 20, 2023

jack11wagner commented Apr 21, 2023

jack11wagner commented Apr 21, 2023

jack11wagner commented Apr 21, 2023

jack11wagner commented Apr 24, 2023

Initial metadata extraction implementation #159

Are you sure you want to change the base?

Initial metadata extraction implementation #159

Conversation

jack11wagner commented Apr 19, 2023

nikovacs Apr 20, 2023

Choose a reason for hiding this comment

jack11wagner commented Apr 21, 2023

jack11wagner commented Apr 21, 2023

jack11wagner commented Apr 21, 2023

jack11wagner commented Apr 24, 2023