Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-threading support #94

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ErichDonGubler
Copy link

This (currently WIP) PR proposes switching to ignore's parallel walker API. It's not complete yet (threads are still only set to 1), but I figured that I should start discussion with y'all here upstream before I got too carried away, since there's actually quite a few design considerations here, and I haven't even validated that this is something y'all want in the first place! :)

See individual commits for more details on changes. Note that this is based on #93, and commits specifically for this PR start with the commit titled refactor: make `Stats` multithread-friendly.

Do our best to limit visiting paths to at most once, in preparation for
allowing multiple paths to be specified in a search-and-replace
operation. If we didn't do this, multiple path cases like this would be
handled incorrectly:

```
$ cat a.txt
foo

$ # We want to replace `foo` with `foobar`...
ruplacer foo foobar a.txt a.txt
<snip>

$ # ...but we got `foobarbar` instead:
$ cat a.txt
foobarbar
```

It's important to note what the trade-off for deduplication is. For each
path, we now call `std::fs::canonicalize`, and put the canonicalized
path in a set built using the the [`patricia_tree` crate] for somewhat
efficiently storage.

Interestingly, the above is also a potential issue with following
symlinks, which has never been allowed before. Perhaps this change could
make it sufficiently safe to add an option for following symlinks?

[`patricia_tree` crate]: https://docs.rs/patricia_tree/0.3.1/
This will be necessary for when we shortly use the multi-threaded API
for `walkdir`, where we will need to use `&self` instead of `&mut self`
as the receiver for methods.
…h_file`

We need this for two interconnected reasons:

1. We won't be able to take by `&mut` _anything_ when using this method
   in the `walkdir` parallel API. This means at least changing the `&mut
   self` receiver to `&self`.
2. We will be adding a field in the next commit that won't be `Sync`.
   This only leaves field splitting as the minimum viable path for
   refactoring `patch_file`.
This takes care of one part of migrating to multiple threads but not
everything. See the large `NOTE` comment for further details on what
needs to be done still.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant