Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added One Sentence per line rule (#66) #719

Closed
wants to merge 5 commits into from

Conversation

aepfli
Copy link
Contributor

@aepfli aepfli commented Feb 14, 2023

New configurable and fixable rule to fix ensure only one sentence per line. It is configurable, what defines the ending of a sentence as well as ignored words.

Additionally to the new rule, i adapted the logic to create the yaml file, as it did not work properly with arrays.

Signed-off-by: Simon Schrottner [email protected]

@DavidAnson
Copy link
Owner

Some initial thoughts after looking at this on my phone before bed:

  • More context is available in issue style: sentence per line #66
  • As I mention there, the notion of optional rules is a new one for the library and I’m not sure this is how I want to go about introducing it
  • The implementation of this rule seems fairly English-centric and I wonder if that could be more general (ex: punctuation characters, A-Z range checking, ignore words)
  • My fear is this rule will be pretty challenging to get right, though it’s a little hard to say without seeing tests or playing around with it
  • It should probably include the same double-width punctuation characters that other rules allow
  • The handling of in-line code spans is probably missing a few cases based on how tricky I know that is to get right
  • The current JSON conversion to YAML for the example files is a hack, but what is it these changes fix?
  • The YAML changes could be an interesting PR on their own

@aepfli
Copy link
Contributor Author

aepfli commented Feb 14, 2023

YAML BUG

I thought about extracting the yaml bug too.

if there is a configuration parameter, which is an array of elements, and you have multiple default elements, it is not generating valid yaml

     "defaults": [ "a", "b", "c"]

generates

defaults: [
   "a"
   "b"
   "c"
]

instead of:

defaults: 
   - "a"
   - "b"
   - "c"

By using YAML.stringify we do have the appropriate behavior, with the usage of https://www.npmjs.com/package/yaml (additional dependency)

Configuration options

I added 2 - one for words and one for punctuation.

regarding the range checking, I can also provide a regex and make it configurable. This way, people can adapt to their needs.

double-width characters

can you point me to a rule, so I can get a better glimpse - just to be on the safe side?
another option would also be to use regex \s for all kinds of whitespace characters

regarding inline code spaces

I have not really touched the logic of the original file, as I thought it would be an excellent first step to add this as a basic rule like it is. Then write tests, and after that, rethink maybe the structuring. As I am unfamiliar with the APIs etc., my next guess would be to explore the tokenizer and see if we can leverage this better. instead of parsing line by line, my hopes are high that the tokenizer already provides a lot of information (naive thinking).

conclusion

the big question for me is, do we want to pursue on this adventure, do you @DavidAnson think it is worth, or more like generating too much overhead?

@DavidAnson
Copy link
Owner

I like the YAML change and would be happy to see a PR for that on its own. (See CONTRIBUTING.md)

Here is an example of a rule supporting double-width characters: https://github.com/DavidAnson/markdownlint/blob/main/doc/md026.md

I’d want tests to be part of the initial commit in order to be sure there aren’t false-positives. (Better parsing support should be coming soon via the micromark parser.)

I like the idea of this rule and sentence-per-line is the pattern I use with my blog for years. I’m glad this option exists! However, I’m not sure about introducing optional rules at this point.

Also, I haven’t done any playing around with the rule to see how it behaves in real life. The outcome of that may sway my thinking. :)

lib/md054.js Outdated

if (
lineEndings.includes(line[i]) &&
line[i + 1] === " " &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna have to deal with multiple spaces after a period, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #719 (comment) - this is also covered with this.

lib/md054.js Outdated
if (
lineEndings.includes(line[i]) &&
line[i + 1] === " " &&
isCapitalizedAlphabetCharacter(line[i + 2]) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all sentences start with capitalized alphabet characters. For example, the typescript-eslint project's naming is explicitly lowercase; our docs sometimes start with it lowercased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improved, now people can define a regex, how a sentence start looks like - ^\s+(\w|[*_'"]) - also to cover emphasizes, etc. at the beginning, and I am checking the rest of the string, instead of just a little look ahead. I think this should make it easier to detect such cases

@aepfli
Copy link
Contributor Author

aepfli commented Feb 15, 2023

extracted yaml bug into #721/#722

@aepfli aepfli force-pushed the one_sentence_per_line branch 3 times, most recently from a9062b5 to b5a53c9 Compare February 15, 2023 11:16
@DavidAnson
Copy link
Owner

I'm going to want to try this myself locally, but it could be a couple of days before I get a chance. In the meanwhile, there are some CI failures to look at. To be clear, I still have reservations about introducing an optional rule. I don't want to get your hopes up or waste your time. But maybe some of the improvements you've made here end up going into the separate implementation that exists today.

@aepfli
Copy link
Contributor Author

aepfli commented Feb 16, 2023

Even if this won't be used, I had my fair share of fun and exploration, and I learned a lot about markdown-lint and considerations regarding such a tool. It was easy to start and a pleasure hacking around - I haven't had that for a while. :) so, thank you already for the time and input.

Currently, the rule is no longer optional; I removed that for now, but this also generates most of the errors. All the other tests etc., still need to be prepared for this rule. It is a massive change in structure. Even the readme has a lot of violations. That was also the reason for my initial approach to an optional rule. If we activate this rule as a default, it is a significant breaking change - hence that we have to be careful how we handle this. Many people like the reasoning behind it, but communicating this with a release will be hard.

@DavidAnson
Copy link
Owner

Even if this won't be used, I had my fair share of fun and exploration, and I learned a lot about markdown-lint and considerations regarding such a tool. It was easy to start and a pleasure hacking around - I haven't had that for a while. :) so, thank you already for the time and input.

That's nice to hear, thank you!

PS - I agree this can't become required AND enforce the style - many people have other preferences.

New configurable and fixable rule to fix ensure only one sentence
per line. It is configurable, what defines the ending of a
sentence as well as ignored words.

Additionally to the new rule, i adapted the logic to create the
yaml file, as it did not work properly with arrays.

Signed-off-by: Simon Schrottner <[email protected]>
@aepfli aepfli changed the base branch from main to next February 21, 2023 07:20
@aepfli aepfli force-pushed the one_sentence_per_line branch 2 times, most recently from 7c9d811 to dd624c5 Compare February 21, 2023 07:23
@aepfli
Copy link
Contributor Author

aepfli commented Feb 21, 2023

I rebased on to next - to remove #722 out of the changeset :)

@aepfli
Copy link
Contributor Author

aepfli commented Feb 21, 2023

sidenote: the build is failing, because we do have a lot of violations within the code, and i am not sure, how to best exclude this rule :) without my optional approach :)

@DavidAnson
Copy link
Owner

I played around with this a bit just now using the latest state of the branch. Here's the text I used:

# MD054/sentences-per-line

This should? Produce a violation.

So should! This example.

Abbreviations (e.g. like) these (i.e. should) not.

"Sentences in." Quotes should, too.

Pausing... for... thought... should not?

This rule does weird things if a sentence is
already wrapped. It should maybe unwrap in
cases like this?

Some of what I think is wrong behavior seem like bugs (the first three scenarios seem to have code to handle that's not working), or missing behavior (sentences inside quotation marks), but it's less clear how to handle ellipses in a way that won't annoy half the audience half the time.

However, the last example is much more significant - and it's the reason MD013/line-length does not offer fixes today. If this rule is willing to break lines to isolate sentences, it should arguably be able to re-combine lines to make sentences as well. Something like the following is a violation of the proposed rule in my mind:

This is a
single sentence.

But I worry that handling both directions may prove unreasonably challenging.

I love that this rule exists as an option for folks, but scenarios like these reinforce my resistance to including it in the library for now.

@aepfli
Copy link
Contributor Author

aepfli commented Feb 28, 2023

I understand your rational, and I am fine with this. I am just curious about your tests and i hope this is okay.

I ran your test MD

# MD054/sentences-per-line

This should? Produce a violation.

So should! This example.

Abbreviations (e.g. like) these (i.e. should) not.

"Sentences in." Quotes should, too.

Pausing... for... thought... should not?

This rule does weird things if a sentence is
already wrapped. It should maybe unwrap in
cases like this?

which resulted in

# MD054/sentences-per-line

This should?
Produce a violation.

So should!
This example.

Abbreviations (e.g. like) these (i.e. should) not.

"Sentences in." Quotes should, too.

Pausing...
for...
thought...
should not?

This rule does weird things if a sentence is
already wrapped.
It should maybe unwrap in
cases like this?

Some of what I think is wrong behavior seem like bugs (the first three scenarios seem to have code to handle that's not working)

  • what do you mean by not working, for me those seem perfectly fine, as e.g. and i.e. are excluded as line endings in the config with ignored_words (theoretically, we could also add ... here if we don't want to split on those)

or missing behavior (sentences inside quotation marks)

  • that is also an intriguing case of language diversity :) - American vs. British English. Where do you put the period if a sentence ends after the quotation (at least, that is something that I was told)? Grammarly also always wants me to use ." instead of ". but besides this, it should be handled like inline code blocks. But overall, those "enclosing/wrapping" structures should behave the same. Doesn't matter if it is ", ', (), [] or a code block - and defining a general rule for this is hard.

Anyways, I am okay with your concerns and accept this. I am figuring out an easy way to include a custom rule within the already existing docker images. I'll likely need to start a discussion with the CLI docker image providers. It would be nice to have logic to include rules and let the image handle them independently.

Thank you for the effort - at least a slight improvement regarding yaml generation was the outcome of my adventure, and I could contribute something.

If you decide to allow incubating or optional rules in the future (with no guarantee that they will stick around) - please ping me in this thread. i would happily try to move this forward, but I think a lot of user input is also needed.

@mschoettle
Copy link

I only recently discovered this “one sentence per line” rule/semantic line breaks and would love if it could be enforced by a linter. So I’m quite interested in this rule. @aepfli could you let me know if you figure out an easy way to include this or decide to make this rule available as a separate package?

@aepfli
Copy link
Contributor Author

aepfli commented Feb 28, 2023

@mschoettle, there is https://github.com/JoshuaKGoldberg/sentences-per-line which is already available and served as the basis for this pr (but I made quite some improvements) - but you can use it if you want to declare it - you can add it as a custom rule to your markdown lint configuration.

@DavidAnson
Copy link
Owner

@aepfli Thanks for your understanding!

I agree it's weird we saw different behavior. I did my testing via a fresh GitHub Codespace opened from this PR yesterday. I ran "build-demo" and then tested via the redirected demo site using "npx serve demo" and the current version of Chromium on Raspberry Pi OS. The new rule was definitely present and functioning because I saw violations from it and I was able to experiment with fixing them automatically as well. My guess is things behave differently in the browser environment, though that is fairly unusual. I'd be happy to look into this further if you can't tell why it worked differently for me.

Regarding incorporating rules into CLI Docker containers, I happen to be the maintainer for those as well. :) My current thinking is to maybe create a second Docker image that builds on the base one for the CLI, and includes a variety of "third-party" rules. None would be activated by default, but all would be readily available in the Docker context. Does that seem interesting?

@DavidAnson
Copy link
Owner

BTW, I have a specific approach in mind for Docker if we go that route. Don't bother with a PR.

@aepfli
Copy link
Contributor Author

aepfli commented Feb 28, 2023

the question now is, if @JoshuaKGoldberg wants to see those improvements ported to https://github.com/JoshuaKGoldberg/sentences-per-line or if i do start with a fork :) it is up to you @JoshuaKGoldberg :)

@DavidAnson
Copy link
Owner

Here's the idea for a supplementary Docker image: DavidAnson/markdownlint-cli2@next...thirdparty

It simply includes everything here: https://www.npmjs.com/search?q=keywords:markdownlint-rule

@aepfli
Copy link
Contributor Author

aepfli commented Mar 3, 2023

This rule is still "haunting" me somehow, so here is a different approach.

Instead of parsing per line, I parse per inline token wrapped with a paragraph (maybe this is a naive approach).
Then I parse through the child tokens, considered text, and only check if they contain multiple punctuations. and if I find one, I break. Softbreaks will delimiter my usage. This works for a lot of cases.

replace the forEachLine with this code :) it is so much fun to think about it :)

    const relevantTokens = [];
    for (let i = 0; i < params.tokens.length; i++) {
      const token = params.tokens[i];
      if (token.type === "paragraph_open" &&
        params.tokens[i + 1].type === "inline") {
        relevantTokens.push(params.tokens[i + 1]);
      }
    }

    for (const relevantToken of relevantTokens) {

      for (const token of relevantToken.children) {
        const lineNumber = token.lineNumber;
        if (token.type === "text") {
          const content = token.content;
          for (let i = 0; i < content.length - 2; i += 1) {

            if (lineEndings.includes(content[i])) {
              const sentence = sentenceStart.exec(content.substr(i + 1));
              if (
                sentence !== null &&
                !isAfterIgnoredWord(ignoredWords, content, i)
              ) {
                const spaces = sentence[1];
                const pointInLine = token.line.indexOf(content) + i;
                addError(
                  onError,
                  lineNumber,
                  null,
                  // eslint-disable-next-line max-len
                  content.substr(Math.max(0, i - (contextSize / 2)), contextSize),
                  [ pointInLine, spaces.length ],
                  {
                    "lineNumber": lineNumber,
                    "editColumn": pointInLine + 2,
                    "deleteCount": spaces.length,
                    "insertText": "\n"
                  }
                );
              }
            }
          }
        }
      }
    }

@DavidAnson
Copy link
Owner

Here's the idea for a supplementary Docker image: DavidAnson/[email protected]

It simply includes everything here: https://www.npmjs.com/search?q=keywords:markdownlint-rule

This image is now available for early use (from the next branch): https://hub.docker.com/r/davidanson/markdownlint-cli2-rules

@aepfli
Copy link
Contributor Author

aepfli commented Mar 4, 2023

I went ahead for now and created https://github.com/aepfli/markdownlint-rule-max-one-sentence-per-line - there is also the backlink to @JoshuaKGoldberg's first version missing - but I cover now a lot of cases, and also handle indentation properly (or at least I think I handle it properly.

Additionally, I renamed the rule to max-one-sentence-per-line, which imho reduces the need to fix if a sentence is split over multiple lines :)

Feedback is highly welcome :)

@mschoettle
Copy link

Here's the idea for a supplementary Docker image: DavidAnson/[email protected]
It simply includes everything here: https://www.npmjs.com/search?q=keywords:markdownlint-rule

This image is now available for early use (from the next branch): https://hub.docker.com/r/davidanson/markdownlint-cli2-rules

Is there any way to have these extra rules available within pre-commit as well?

@DavidAnson
Copy link
Owner

@mschoettle, if you use the "-docker" ids with precommit, this should work after I add new entries for the -rule container image here: https://github.com/DavidAnson/markdownlint-cli2/blob/main/.pre-commit-hooks.yaml

(And after the next publish to take that change from "next" branch to "main".)

@DavidAnson
Copy link
Owner

@mschoettle The davidanson/markdownlint-cli2-rules Docker container image can now be used with pre-commit by pointing to this repository's next branch in a project's .pre-commit-config.yaml like so:

repos:
- repo: https://github.com/DavidAnson/markdownlint-cli2
  rev: next
  hooks:
  - id: markdownlint-cli2-rules-docker

@DavidAnson
Copy link
Owner

@aepfli Builds of the Docker container image davidanson/markdownlint-cli2-rules will now contain the markdownlint-rule-max-one-sentence-per-line rule: DavidAnson/markdownlint-cli2@58e9c6d.

@aepfli
Copy link
Contributor Author

aepfli commented Mar 5, 2023

@DavidAnson do you think it make sense to also add all those npm rules to https://github.com/DavidAnson/vscode-markdownlint ?

background, we do have Hugo docs running, and do all the linting with docker files and make targets - but if a contributor is using vscode - it will try to load the rule and throw an exception:

[11:25:06 AM] ERROR: Exception while linting with markdownlint-cli2:
Error: Cannot find module 'markdownlint-rule-max-one-sentence-per-line'
Require stack:
- /home/simonschrottner/.vscode/extensions/davidanson.vscode-markdownlint-0.49.0/bundle.js
- /usr/share/code/resources/app/out/vs/loader.js
- /usr/share/code/resources/app/out/bootstrap-amd.js
- /usr/share/code/resources/app/out/bootstrap-fork.js
-  / Invalid host defined options
	at x (/home/simonschrottner/.vscode/extensions/davidanson.vscode-markdownlint-0.49.0/bundle.js:37:6800)
	at async Promise.all (index 0)
	at async Promise.all (index 0)
	at async T (/home/simonschrottner/.vscode/extensions/davidanson.vscode-markdownlint-0.49.0/bundle.js:37:9515)
	at async L (/home/simonschrottner/.vscode/extensions/davidanson.vscode-markdownlint-0.49.0/bundle.js:37:13839)

If yes, i am gladly opening a pr - i am also not sure, if it might be worth discussing this in the vscode extension repo - but it seem like we do discuss a lot already about this within this ticket :D

@DavidAnson
Copy link
Owner

@aepfli That’s a reasonable idea to consider, but I don’t see a good path to it. There are two main concerns I have:

The first is that everyone would be opted into loading/parsing custom rules rather than choosing to do so by referencing a dedicated Docker container image. The new rules add unknown size and complexity and take some control out of the user’s hands by opting them into a specific version of each rule. Without a way to opt out, the support burden for all of this will be on me.

Relatedly, the second reason is that extension scripts get bundled and need to work in web browser context for scenarios like github.dev. I have made specific changes to the library, CLI host, and extension to allow this and there is no guarantee custom rules will have done the same. There are some coding patterns that cannot be handled easily and I don’t want to introduce inconsistency if things can’t work the same in all contexts.

The theme here is that doing so is likely to sign me up to support code I didn’t write and have no knowledge of. Unlike the Docker scenario where installing custom rules might be difficult, most VS Code scenarios with custom rules should have a way to do so more easily for users, such as via “npm install”.

@DavidAnson
Copy link
Owner

I'm going to close this PR because you published the corresponding rule. Congrats, again! Happy to continue any discussion here, though.

@DavidAnson DavidAnson closed this Mar 7, 2023
@JoshuaKGoldberg
Copy link
Contributor

Hi @aepfli! Sorry for the week-long delay in responding - I was out on vacation. Just catching up now.

Additionally, I renamed the rule to max-one-sentence-per-line, which imho reduces the need to fix if a sentence is split over multiple lines :)

That does reduce the workload, nice idea 😄. But - in my experience with lint rules, it's generally good to not be too specific in the lint rule name. You never know when someone will make a feature request such as switching a rule's direction to enforce a minimum, or changing the number from "one" to a configurable option. JoshuaKGoldberg/sentences-per-line#10 is an example of someone asking for the rule to enforce not splitting sentences across lines. And long lint rule names are harder to read through.

the question now is, if @JoshuaKGoldberg wants to see those improvements ported to https://github.com/JoshuaKGoldberg/sentences-per-line or if i do start with a fork :) it is up to you @JoshuaKGoldberg :)

Definitely up for seeing those improvements ported to sentences-per-line! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants