Skip to content

Conversation

@MalloyDelacroix
Copy link
Owner

Adds file hashing a duplicate content handling.

MalloyDelacroix and others added 30 commits May 29, 2025 11:38
As files are downloaded, each chunk updates an md5 hash.  When the download is complete, a hex digest of that hash is used to populate the md5 attribute of the content being stored for the download.
Modify the method that handles duplicate content to take action according to the settings manager.  This allows the user to control if duplicates are deleted or not.
Reddit objects and reddit objects list have both been updated to include a control for duplicate hash control on the individual reddit object level.
Before when the download settings widget was loaded it displayed the Master User List as the displayed list, but it did not load that lists settings into the UI.  This has been corrected.
When new fields are added to default dicts in the settings manager and an existing dict is loaded from a config file, it will not have the new fields present until the user saves the settings.  This adds a method to add new default fields to those existing loaded dicts so the proper default values of newly added settings are considered before any user interaction.
Extracted methods `should_use_multi_part`, `download_with_multipart`, `should_use_hash`, `download_with_hash`, and `download_without_hash` to improve code readability, and added unit tests.
Removed any duplicate controls from settings manager and restructure them to be individually controllable for each reddit object.  Also added more options of how to handle duplicates once they are detected.
Mocking `general_utils.ensure_content_download_path` caused a sideeffect in other tests.  This mocking was moved to `setUpClass` and the method is repaired to the original version in `tearDownClass`.
The `get_base_path` method was being tested separately from `make_directory`, but `make_directory` calls `get_base_path` to build the directory path.  The tests for `get_base_path` are now combined in to the `make_directory` test cases so that the `get_base_path` method can be removed from the base extractor class.
This method was reworked into the filename generator class and is no longer needed in the BaseExtractor
Implement `ensure_file_path` to handle file path uniqueness and directory creation with error handling. Add unit tests to validate functionality, including directory creation, naming conflicts, and permission errors.
Replaced `is_duplicate_hash` with `is_duplicate_content`, delegating logic to `DuplicateHandler`. Removed redundant tests and updated remaining test cases to align with the refactor.
Adjust UI to include advanced duplicate handling options. Add hash content functionality and sync methods for example path previews.
Make duplicate output messages cleaner and change the message type from `debug` to `info`.
Databases that existed prior to the v3.17.0 release did not receive the default values for the duplicate controls introduced in this update.  This method iterates through the existing database and updates the necessary objects with the correct default values.
@MalloyDelacroix MalloyDelacroix merged commit fb64fc3 into master Jul 16, 2025
2 checks passed
@MalloyDelacroix MalloyDelacroix mentioned this pull request Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants