Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watch=true very slow with large mailboxes #584

Open
half-duplex opened this issue Jan 1, 2025 · 1 comment
Open

watch=true very slow with large mailboxes #584

half-duplex opened this issue Jan 1, 2025 · 1 comment

Comments

@half-duplex
Copy link

I have 17,000 historical reports in a folder in o365, which I'm trying to process with parsedmarc.
This seems to require watch=true to do more than one batch, but with watch=true it works through these at an extremely slow rate (less than 1/second).

It appears this is because a batch_size param is passed to connector.fetch_messages() on the first call but not subsequent ones, so the first batch is reasonably fast and every one after that is running a slow, expensive query to list all 17,000 items in the mailbox.

Bad workaround: Set batch_size=250 or so, so the expensive query is only run 34 times instead of 1,700, and deal with the duplicates if it crashes between starting processing and moving/deleting the emails. (Also, don't use watch=true - see #416 )

@nhairs
Copy link
Contributor

nhairs commented Jan 1, 2025

If you are only working with DMARC reports you may be interested in trying out nhairs/parsedmarc-fork which is designed to be more stable for lots of reports.

Note that it is a WIP so it might not have all the functionality you need (do let me know though!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants