Add support to multiple WORDLISTs #80

Bisaloo · 2024-02-14T10:17:59Z

Automated spellchecking in my organization is causing friction because we're using a lot of domain-specific vocabulary that get erroneously flagged by spelling.

I understand this is solved by adding the words on a case-by-case basis to the WORDLIST but the extra step is generating frustration for something that is perceived as a false positive.

To solve this, it would be helpful to pre-seed a WORDLIST with common domain-specific words that we use across the organization.
However, this doesn't play very well with the current spelling recommended workflow since WORDLIST is often regenerated automatically and these pre-seeded words will be removed if unused at this stage.

A potential solution would be to have two WORDLISTs:

the current existing one, which is package specific and that developers would update via update_wordlist()
a custom one, that is manually curated, and could contain words that are not yet used in the specific project

In terms of interface, a good way I could see this working would be to turn spell_check_package(use_wordlist = TRUE) into spell_check_package(wordlist = "inst/WORDLIST"). In other words, replacing the current boolean argument by a (vector of) paths to the WORDLIST.

For the problem described here, the solution would then be:

spell_check_package(wordlist = c("inst/WORDLIST", "inst/DOMAIN_WORDLIST"))

This feature would also probably help with #64.

If you agree that's a desirable change, I'm happy to submit a PR for this.

The text was updated successfully, but these errors were encountered:

jeroen · 2024-02-15T12:57:04Z

I think this makes sense. Does this also require changes in update_wordlist() if people start using custom files for their wordlists? Or shall we just keep inst/WORDLIST as the primary wordlist that is always included?

Bisaloo · 2024-02-20T08:21:05Z

Hmm, that's a good question. I'm not quite sure to be honest.

Maybe a two steps approach is reasonable? We could start by offering more flexibility in spell_check_package() and allow multiple wordlists, with the idea that users will still use inst/WORDLIST as the primary wordlist, and then update update_wordlist() in a second time if use cases that require it are observed or reported.

Otherwise, it seems easy to update update_wordlist() in the same movement, but I'm afraid to fall in a case of speculative generality.

jeroen · 2024-03-04T13:39:20Z

I added a relatively simple solution so that you can set a SPELLING_WORDLIST environment variable (e.g. in ~/.Renviron) and those words will also be ignored.

jeroen · 2024-03-06T11:39:46Z

I released a new CRAN version if you want to test this in your organization.

jeroen closed this as completed in bfd9241 Mar 4, 2024

Bisaloo mentioned this issue Mar 21, 2024

update_wordlist() claims that it will remove from WORDLIST words from SPELLING_WORDLIST #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to multiple WORDLISTs #80

Add support to multiple WORDLISTs #80

Bisaloo commented Feb 14, 2024 •

edited

Loading

jeroen commented Feb 15, 2024

Bisaloo commented Feb 20, 2024

jeroen commented Mar 4, 2024

jeroen commented Mar 6, 2024

Add support to multiple WORDLISTs #80

Add support to multiple WORDLISTs #80

Comments

Bisaloo commented Feb 14, 2024 • edited Loading

jeroen commented Feb 15, 2024

Bisaloo commented Feb 20, 2024

jeroen commented Mar 4, 2024

jeroen commented Mar 6, 2024

Bisaloo commented Feb 14, 2024 •

edited

Loading