-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial metadata extraction implementation #159
base: main
Are you sure you want to change the base?
Initial metadata extraction implementation #159
Conversation
…mment are found * This change uses the save_json method in the savers * Do note: The DiskSaver and S3 saver were using data with the full data dict before * I changed the Savers and the client to have the parameter for saving json be data["results"] instead of the full data dictionary * This is a better way of implementing the save_json methods since what is passed in is the json we want to save rather than needing to find the key we want in the savers themselves.
meta["extraction_status"][file_name] = "Not Attempted" | ||
meta_save_path = f"{meta_save_dir}/extraction-metadata.json" | ||
self.saver.save_json(meta_save_path, meta) | ||
return meta_save_path, meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does not have to return anything. Otherwise good.
Not writing to S3 yet. |
After discussion with Dr. Coleman as well as understanding the time constraints at the end of the semester. We are putting the task of having extraction metadata off for now. At the moment, extraction metadata is a json of all the attachments in a docket and the statuses are initialized to "Not Attempted". We never got to having the extractor update this json due to the challenges with loading the file and the problems with this change in S3. |
The idea of having extraction metadata is still unclear about what exactly would be needed so this PR will need to be revisited at a later date. |
No description provided.