-
Notifications
You must be signed in to change notification settings - Fork 54
Duplicate processing #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Duplicate processing #399
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As files are downloaded, each chunk updates an md5 hash. When the download is complete, a hex digest of that hash is used to populate the md5 attribute of the content being stored for the download.
Modify the method that handles duplicate content to take action according to the settings manager. This allows the user to control if duplicates are deleted or not.
Reddit objects and reddit objects list have both been updated to include a control for duplicate hash control on the individual reddit object level.
Before when the download settings widget was loaded it displayed the Master User List as the displayed list, but it did not load that lists settings into the UI. This has been corrected.
When new fields are added to default dicts in the settings manager and an existing dict is loaded from a config file, it will not have the new fields present until the user saves the settings. This adds a method to add new default fields to those existing loaded dicts so the proper default values of newly added settings are considered before any user interaction.
Extracted methods `should_use_multi_part`, `download_with_multipart`, `should_use_hash`, `download_with_hash`, and `download_without_hash` to improve code readability, and added unit tests.
Removed any duplicate controls from settings manager and restructure them to be individually controllable for each reddit object. Also added more options of how to handle duplicates once they are detected.
Mocking `general_utils.ensure_content_download_path` caused a sideeffect in other tests. This mocking was moved to `setUpClass` and the method is repaired to the original version in `tearDownClass`.
The `get_base_path` method was being tested separately from `make_directory`, but `make_directory` calls `get_base_path` to build the directory path. The tests for `get_base_path` are now combined in to the `make_directory` test cases so that the `get_base_path` method can be removed from the base extractor class.
This method was reworked into the filename generator class and is no longer needed in the BaseExtractor
Implement `ensure_file_path` to handle file path uniqueness and directory creation with error handling. Add unit tests to validate functionality, including directory creation, naming conflicts, and permission errors.
Replaced `is_duplicate_hash` with `is_duplicate_content`, delegating logic to `DuplicateHandler`. Removed redundant tests and updated remaining test cases to align with the refactor.
Adjust UI to include advanced duplicate handling options. Add hash content functionality and sync methods for example path previews.
Make duplicate output messages cleaner and change the message type from `debug` to `info`.
Databases that existed prior to the v3.17.0 release did not receive the default values for the duplicate controls introduced in this update. This method iterates through the existing database and updates the necessary objects with the correct default values.
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds file hashing a duplicate content handling.