-
-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added One Sentence per line rule (#66) #719
Conversation
Some initial thoughts after looking at this on my phone before bed:
|
YAML BUGI thought about extracting the yaml bug too. if there is a configuration parameter, which is an array of elements, and you have multiple default elements, it is not generating valid yaml
generates defaults: [
"a"
"b"
"c"
] instead of: defaults:
- "a"
- "b"
- "c" By using Configuration optionsI added 2 - one for words and one for punctuation. regarding the range checking, I can also provide a regex and make it configurable. This way, people can adapt to their needs. double-width characterscan you point me to a rule, so I can get a better glimpse - just to be on the safe side? regarding inline code spacesI have not really touched the logic of the original file, as I thought it would be an excellent first step to add this as a basic rule like it is. Then write tests, and after that, rethink maybe the structuring. As I am unfamiliar with the APIs etc., my next guess would be to explore the tokenizer and see if we can leverage this better. instead of parsing line by line, my hopes are high that the tokenizer already provides a lot of information (naive thinking). conclusionthe big question for me is, do we want to pursue on this adventure, do you @DavidAnson think it is worth, or more like generating too much overhead? |
I like the YAML change and would be happy to see a PR for that on its own. (See CONTRIBUTING.md) Here is an example of a rule supporting double-width characters: https://github.com/DavidAnson/markdownlint/blob/main/doc/md026.md I’d want tests to be part of the initial commit in order to be sure there aren’t false-positives. (Better parsing support should be coming soon via the micromark parser.) I like the idea of this rule and sentence-per-line is the pattern I use with my blog for years. I’m glad this option exists! However, I’m not sure about introducing optional rules at this point. Also, I haven’t done any playing around with the rule to see how it behaves in real life. The outcome of that may sway my thinking. :) |
lib/md054.js
Outdated
|
||
if ( | ||
lineEndings.includes(line[i]) && | ||
line[i + 1] === " " && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna have to deal with multiple spaces after a period, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #719 (comment) - this is also covered with this.
lib/md054.js
Outdated
if ( | ||
lineEndings.includes(line[i]) && | ||
line[i + 1] === " " && | ||
isCapitalizedAlphabetCharacter(line[i + 2]) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all sentences start with capitalized alphabet characters. For example, the typescript-eslint
project's naming is explicitly lowercase; our docs sometimes start with it lowercased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
improved, now people can define a regex, how a sentence start looks like - ^\s+(\w|[*_'"])
- also to cover emphasizes, etc. at the beginning, and I am checking the rest of the string, instead of just a little look ahead. I think this should make it easier to detect such cases
7df0645
to
7a0ec65
Compare
a9062b5
to
b5a53c9
Compare
I'm going to want to try this myself locally, but it could be a couple of days before I get a chance. In the meanwhile, there are some CI failures to look at. To be clear, I still have reservations about introducing an optional rule. I don't want to get your hopes up or waste your time. But maybe some of the improvements you've made here end up going into the separate implementation that exists today. |
Even if this won't be used, I had my fair share of fun and exploration, and I learned a lot about markdown-lint and considerations regarding such a tool. It was easy to start and a pleasure hacking around - I haven't had that for a while. :) so, thank you already for the time and input. Currently, the rule is no longer optional; I removed that for now, but this also generates most of the errors. All the other tests etc., still need to be prepared for this rule. It is a massive change in structure. Even the readme has a lot of violations. That was also the reason for my initial approach to an optional rule. If we activate this rule as a default, it is a significant breaking change - hence that we have to be careful how we handle this. Many people like the reasoning behind it, but communicating this with a release will be hard. |
That's nice to hear, thank you! PS - I agree this can't become required AND enforce the style - many people have other preferences. |
New configurable and fixable rule to fix ensure only one sentence per line. It is configurable, what defines the ending of a sentence as well as ignored words. Additionally to the new rule, i adapted the logic to create the yaml file, as it did not work properly with arrays. Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
b5a53c9
to
c6ae62b
Compare
7c9d811
to
dd624c5
Compare
dd624c5
to
de10ce8
Compare
I rebased on to next - to remove #722 out of the changeset :) |
sidenote: the build is failing, because we do have a lot of violations within the code, and i am not sure, how to best exclude this rule :) without my optional approach :) |
I played around with this a bit just now using the latest state of the branch. Here's the text I used: # MD054/sentences-per-line
This should? Produce a violation.
So should! This example.
Abbreviations (e.g. like) these (i.e. should) not.
"Sentences in." Quotes should, too.
Pausing... for... thought... should not?
This rule does weird things if a sentence is
already wrapped. It should maybe unwrap in
cases like this? Some of what I think is wrong behavior seem like bugs (the first three scenarios seem to have code to handle that's not working), or missing behavior (sentences inside quotation marks), but it's less clear how to handle ellipses in a way that won't annoy half the audience half the time. However, the last example is much more significant - and it's the reason MD013/line-length does not offer fixes today. If this rule is willing to break lines to isolate sentences, it should arguably be able to re-combine lines to make sentences as well. Something like the following is a violation of the proposed rule in my mind: This is a
single sentence. But I worry that handling both directions may prove unreasonably challenging. I love that this rule exists as an option for folks, but scenarios like these reinforce my resistance to including it in the library for now. |
I understand your rational, and I am fine with this. I am just curious about your tests and i hope this is okay. I ran your test MD # MD054/sentences-per-line
This should? Produce a violation.
So should! This example.
Abbreviations (e.g. like) these (i.e. should) not.
"Sentences in." Quotes should, too.
Pausing... for... thought... should not?
This rule does weird things if a sentence is
already wrapped. It should maybe unwrap in
cases like this? which resulted in
Anyways, I am okay with your concerns and accept this. I am figuring out an easy way to include a custom rule within the already existing docker images. I'll likely need to start a discussion with the CLI docker image providers. It would be nice to have logic to include rules and let the image handle them independently. Thank you for the effort - at least a slight improvement regarding yaml generation was the outcome of my adventure, and I could contribute something. If you decide to allow |
I only recently discovered this “one sentence per line” rule/semantic line breaks and would love if it could be enforced by a linter. So I’m quite interested in this rule. @aepfli could you let me know if you figure out an easy way to include this or decide to make this rule available as a separate package? |
@mschoettle, there is https://github.com/JoshuaKGoldberg/sentences-per-line which is already available and served as the basis for this pr (but I made quite some improvements) - but you can use it if you want to declare it - you can add it as a custom rule to your markdown lint configuration. |
@aepfli Thanks for your understanding! I agree it's weird we saw different behavior. I did my testing via a fresh GitHub Codespace opened from this PR yesterday. I ran "build-demo" and then tested via the redirected demo site using "npx serve demo" and the current version of Chromium on Raspberry Pi OS. The new rule was definitely present and functioning because I saw violations from it and I was able to experiment with fixing them automatically as well. My guess is things behave differently in the browser environment, though that is fairly unusual. I'd be happy to look into this further if you can't tell why it worked differently for me. Regarding incorporating rules into CLI Docker containers, I happen to be the maintainer for those as well. :) My current thinking is to maybe create a second Docker image that builds on the base one for the CLI, and includes a variety of "third-party" rules. None would be activated by default, but all would be readily available in the Docker context. Does that seem interesting? |
BTW, I have a specific approach in mind for Docker if we go that route. Don't bother with a PR. |
the question now is, if @JoshuaKGoldberg wants to see those improvements ported to https://github.com/JoshuaKGoldberg/sentences-per-line or if i do start with a fork :) it is up to you @JoshuaKGoldberg :) |
Here's the idea for a supplementary Docker image: DavidAnson/markdownlint-cli2@next...thirdparty It simply includes everything here: https://www.npmjs.com/search?q=keywords:markdownlint-rule |
This rule is still "haunting" me somehow, so here is a different approach. Instead of parsing per line, I parse per replace the const relevantTokens = [];
for (let i = 0; i < params.tokens.length; i++) {
const token = params.tokens[i];
if (token.type === "paragraph_open" &&
params.tokens[i + 1].type === "inline") {
relevantTokens.push(params.tokens[i + 1]);
}
}
for (const relevantToken of relevantTokens) {
for (const token of relevantToken.children) {
const lineNumber = token.lineNumber;
if (token.type === "text") {
const content = token.content;
for (let i = 0; i < content.length - 2; i += 1) {
if (lineEndings.includes(content[i])) {
const sentence = sentenceStart.exec(content.substr(i + 1));
if (
sentence !== null &&
!isAfterIgnoredWord(ignoredWords, content, i)
) {
const spaces = sentence[1];
const pointInLine = token.line.indexOf(content) + i;
addError(
onError,
lineNumber,
null,
// eslint-disable-next-line max-len
content.substr(Math.max(0, i - (contextSize / 2)), contextSize),
[ pointInLine, spaces.length ],
{
"lineNumber": lineNumber,
"editColumn": pointInLine + 2,
"deleteCount": spaces.length,
"insertText": "\n"
}
);
}
}
}
}
}
} |
Signed-off-by: Simon Schrottner <[email protected]>
This image is now available for early use (from the |
I went ahead for now and created https://github.com/aepfli/markdownlint-rule-max-one-sentence-per-line - there is also the backlink to @JoshuaKGoldberg's first version missing - but I cover now a lot of cases, and also handle indentation properly (or at least I think I handle it properly. Additionally, I renamed the rule to Feedback is highly welcome :) |
Is there any way to have these extra rules available within pre-commit as well? |
@mschoettle, if you use the "-docker" ids with precommit, this should work after I add new entries for the -rule container image here: https://github.com/DavidAnson/markdownlint-cli2/blob/main/.pre-commit-hooks.yaml (And after the next publish to take that change from "next" branch to "main".) |
@mschoettle The repos:
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: next
hooks:
- id: markdownlint-cli2-rules-docker |
@aepfli Builds of the Docker container image |
@DavidAnson do you think it make sense to also add all those npm rules to https://github.com/DavidAnson/vscode-markdownlint ? background, we do have Hugo docs running, and do all the linting with docker files and make targets - but if a contributor is using vscode - it will try to load the rule and throw an exception:
If yes, i am gladly opening a pr - i am also not sure, if it might be worth discussing this in the vscode extension repo - but it seem like we do discuss a lot already about this within this ticket :D |
@aepfli That’s a reasonable idea to consider, but I don’t see a good path to it. There are two main concerns I have: The first is that everyone would be opted into loading/parsing custom rules rather than choosing to do so by referencing a dedicated Docker container image. The new rules add unknown size and complexity and take some control out of the user’s hands by opting them into a specific version of each rule. Without a way to opt out, the support burden for all of this will be on me. Relatedly, the second reason is that extension scripts get bundled and need to work in web browser context for scenarios like github.dev. I have made specific changes to the library, CLI host, and extension to allow this and there is no guarantee custom rules will have done the same. There are some coding patterns that cannot be handled easily and I don’t want to introduce inconsistency if things can’t work the same in all contexts. The theme here is that doing so is likely to sign me up to support code I didn’t write and have no knowledge of. Unlike the Docker scenario where installing custom rules might be difficult, most VS Code scenarios with custom rules should have a way to do so more easily for users, such as via “npm install”. |
I'm going to close this PR because you published the corresponding rule. Congrats, again! Happy to continue any discussion here, though. |
Hi @aepfli! Sorry for the week-long delay in responding - I was out on vacation. Just catching up now.
That does reduce the workload, nice idea 😄. But - in my experience with lint rules, it's generally good to not be too specific in the lint rule name. You never know when someone will make a feature request such as switching a rule's direction to enforce a minimum, or changing the number from "one" to a configurable option. JoshuaKGoldberg/sentences-per-line#10 is an example of someone asking for the rule to enforce not splitting sentences across lines. And long lint rule names are harder to read through.
Definitely up for seeing those improvements ported to sentences-per-line! 🚀 |
New configurable and fixable rule to fix ensure only one sentence per line. It is configurable, what defines the ending of a sentence as well as ignored words.
Additionally to the new rule, i adapted the logic to create the yaml file, as it did not work properly with arrays.
Signed-off-by: Simon Schrottner [email protected]