Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to multiple WORDLISTs #80

Closed
Bisaloo opened this issue Feb 14, 2024 · 4 comments
Closed

Add support to multiple WORDLISTs #80

Bisaloo opened this issue Feb 14, 2024 · 4 comments

Comments

@Bisaloo
Copy link
Member

Bisaloo commented Feb 14, 2024

Automated spellchecking in my organization is causing friction because we're using a lot of domain-specific vocabulary that get erroneously flagged by spelling.

I understand this is solved by adding the words on a case-by-case basis to the WORDLIST but the extra step is generating frustration for something that is perceived as a false positive.

To solve this, it would be helpful to pre-seed a WORDLIST with common domain-specific words that we use across the organization.
However, this doesn't play very well with the current spelling recommended workflow since WORDLIST is often regenerated automatically and these pre-seeded words will be removed if unused at this stage.

A potential solution would be to have two WORDLISTs:

  • the current existing one, which is package specific and that developers would update via update_wordlist()
  • a custom one, that is manually curated, and could contain words that are not yet used in the specific project

In terms of interface, a good way I could see this working would be to turn spell_check_package(use_wordlist = TRUE) into spell_check_package(wordlist = "inst/WORDLIST"). In other words, replacing the current boolean argument by a (vector of) paths to the WORDLIST.

For the problem described here, the solution would then be:

spell_check_package(wordlist = c("inst/WORDLIST", "inst/DOMAIN_WORDLIST"))

This feature would also probably help with #64.

If you agree that's a desirable change, I'm happy to submit a PR for this.

@jeroen
Copy link
Member

jeroen commented Feb 15, 2024

I think this makes sense. Does this also require changes in update_wordlist() if people start using custom files for their wordlists? Or shall we just keep inst/WORDLIST as the primary wordlist that is always included?

@Bisaloo
Copy link
Member Author

Bisaloo commented Feb 20, 2024

Hmm, that's a good question. I'm not quite sure to be honest.

Maybe a two steps approach is reasonable? We could start by offering more flexibility in spell_check_package() and allow multiple wordlists, with the idea that users will still use inst/WORDLIST as the primary wordlist, and then update update_wordlist() in a second time if use cases that require it are observed or reported.

Otherwise, it seems easy to update update_wordlist() in the same movement, but I'm afraid to fall in a case of speculative generality.

@jeroen jeroen closed this as completed in bfd9241 Mar 4, 2024
@jeroen
Copy link
Member

jeroen commented Mar 4, 2024

I added a relatively simple solution so that you can set a SPELLING_WORDLIST environment variable (e.g. in ~/.Renviron) and those words will also be ignored.

@jeroen
Copy link
Member

jeroen commented Mar 6, 2024

I released a new CRAN version if you want to test this in your organization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants