Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to bypass checking for .osxphotos_export.db in the export folder. #1774

Open
oPromessa opened this issue Dec 31, 2024 · 1 comment
Labels
feature request New feature or request

Comments

@oPromessa
Copy link
Contributor

oPromessa commented Dec 31, 2024

Is your feature request related to a problem? Please describe.
Slow start of export taking ~12 minutes while searching for .osxphotos_export.db files in the export destination folder (which is SMB remotely mounted and is composed of thousands of folders).

2024-12-06 03:44:10.781197 -- Warning: export database '/Users/Shared/Pictures/Logs/.osxphotos_export.db' is in a different directory than export destination '/Volumes/photo/XXX'
2024-12-06 03:56:44.550763 -- Using osxphotos export database: version 10.0 located at /Users/XXX/Pictures/Pictures/Logs/.osxphotos_export.db

Describe the solution you'd like
For expert users which are certain no other export.db exists in the export folder have an option to bypass this check.
Other than that, optimize the os.walk in some way and maybe stop as soon as the first occurrence of a .osxphotos_export.db fileis found?

other_db_files = find_files_in_branch(dest, OSXPHOTOS_EXPORT_DB)

If you agree in adding an option like --no-check-exportdb -- open to other suggestions 😄 -- I could try to build up a PR.

Describe alternatives you've considered
Maybe add a warning in case a .osxphotos_export.db is found while running the delete code for --update. Hmm, not applicable on all cases 😞 !

Additional context
N/A

@oPromessa oPromessa added the feature request New feature or request label Dec 31, 2024
@RhetTbull
Copy link
Owner

This seems reasonable. I see a few things that are related.

There's already a --ignore-exportdb option:

-F, --ignore-exportdb  If exporting to a directory that already contains an export database and --update is
                       not specified, do not prompt to continue but instead continue the export. Normally,
                       if you export to a directory that already contains an export database and do not
                       specify --update, osxphotos will prompt you to continue. This is because you may be
                       inadvertently merging two export sets. Use --ignore-exportdb to skip this prompt and
                       continue the export. The resulting export database will contain the combined state of
                       both export sets. Short option is '-F' (mnemonic: force export). See also --update.

This is a for a different use case (user ran osxphotos export /path without --update but wants to use the same export database). However, I think that the two use cases are similar enough ("export path contains a database") that perhaps it makes sense to use this flag for both cases? This might be easier and less confusing than adding another flag. The option would then mean: if there's an export database in any part of the export path tree, ignore it.

Thus, you'd add a check to the call to find_files_in_branch() so it doesn't get called if ignore_exportdb is set since if we're going to ignore it, there's no reason to check:

# sanity check export into previous export dest without --update or --exportdb
if (
not update
and not force_update
and not exportdb
and not no_exportdb
and pathlib.Path(pathlib.Path(dest) / OSXPHOTOS_EXPORT_DB).exists()
):
rich_click_echo(
f"[warning]Warning: found previous export database in '{dest}' but --update not specified; "
"osxphotos will not consider state of previous export which may result in duplicate files. "
"Please confirm that you want to continue without using --update",
err=True,
)
if not ignore_exportdb and not click.confirm("Do you want to continue?"):
return 1
# check that export isn't in the parent or child of a previously exported library
other_db_files = find_files_in_branch(dest, OSXPHOTOS_EXPORT_DB)
if other_db_files:
rich_click_echo(
"[warning]WARNING: found other export database files in this destination directory branch. "
+ "This likely means you are attempting to export files into a directory "
+ "that is either the parent or a child directory of a previous export. "
+ "Proceeding may cause your exported files to be overwritten.",
err=True,
)

While you're at it, it would be good to short-circuit find_files_in_branch() so it returns as soon as it finds a file instead of appending files to a list. (But should probably rename it find_first_file_in_branch to be clear. I checked and it is only used in this one place:

❯ rg find_files_in_branch
osxphotos/cli/export.py
1772:    other_db_files = find_files_in_branch(dest, OSXPHOTOS_EXPORT_DB)
3003:def find_files_in_branch(pathname, filename):

def find_files_in_branch(pathname, filename):
"""Search a directory branch to find file(s) named filename
The branch searched includes all folders below pathname and
the parent tree of pathname but not pathname itself.
e.g. find filename in children folders and parent folders
Args:
pathname: str, full path of directory to search
filename: str, filename to search for
Returns: list of full paths to any matching files
"""
pathname = pathlib.Path(pathname).resolve()
files = []
# walk down the tree
for root, _, filenames in os.walk(pathname):
# for directory in directories:
for fname in filenames:
if fname == filename and pathlib.Path(root) != pathname:
files.append(os.path.join(root, fname))
# walk up the tree
path = pathlib.Path(pathname)
for root in path.parents:
filenames = os.listdir(root)
for fname in filenames:
filepath = os.path.join(root, fname)
if fname == filename and os.path.isfile(filepath):
files.append(os.path.join(root, fname))
return files

RhetTbull pushed a commit that referenced this issue Jan 5, 2025
…t.db in the export folder. (#1775)

* enhancement-1774: first adjustments.
Missing adjustment to find_first_file_in_branch.

* enhancement-1774: Adjustment to find_first_file_in_branch to return list of first file found.

* * Reusing option --ignore-exportdb
* Removed exclusivity of options --update and --ignore-exportdb
* Search for files logic will also **not be done** in case options --exportdb or --no-exportdb are used.
* Adjusted option definition. Maybe a bit lengthy 😞 !
* Renamed function find_files_in_branch into find_first_file_in_branch and revised logic to return on first occurrence found, when --ignore-exportdb is not active.
* When --update is not used, added warnings when --ignore-exportdb is in use to make it compatible with interactive questions.

* Addressing topics:

- find_first_file_in_branch should now return either None or a single str with the file path instead of a list or empty list.
- Change other_db_files to other_db_file and the subsequent warning that lists out the files to now say "file" vs "files".

* adjusted test "test_export_update_parent_folder" and "test_export_update_child_folder"

* * Error in Tests: [Fatal Python error: Segmentation fault](https://github.com/RhetTbull/osxphotos/actions/runs/12612637331/job/35149593923#step:9:261)

* Unable to re-run Tests up-stream :(

* Adjusting some long lines and forcing commit!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants