[parser] Implement full featured CSS parser #2

alexander-akait · 2023-05-19T10:45:52Z

Just idea for future and future discussions, maybe we can union and write full featured CSS parser + at-rules/values parser from scratch, I am afraid we can't rewrite postcss due some specific logic (and it will probably take longer), so union around CSS parser will be great for any JS tooling, we can open an issue for this

Shorty about situation:

We have postcss and we have certain problems/issues, which, unfortunately, have not been resolved for a long time, like CSS compliance tokenizer and parser, selectors, at-rules and value parser
There is csstree parser, but it is pretty slow in solving problems
There is rust based parser like lightningcss and swc, but unfortunately they are not quite extensible to support all syntaxes, but probably this is solvable, so it's just a discussion for now
There is csstools with own value and at-rules parser
We have postcss-value-parser, postcss-values-parser and postcss-selector-parser, all of them have rather serious limitations and are not so actively maintained, although they are soling almost all current problems, but when a new syntax appears it is usually a problem, another big problem os postcss design, we need to reparse selectors, values and at-rules in each rule, it is very bad for perfomance (very)
We have to decide will we use JS based tooling or rust tooling (I support any decision, the main thing is to find a point of view that suits everyone)
Combining efforts will allow us to more quickly solve problems and avoid duplication of work
We need to think about AST and structure - we can be align with CSSOM or we can design own AST (yes, it will most likely be identical to CSSOM but with some additional nodes/properties) to be able to do deep analyze and fixes
We need to think about non CSS compliance syntaxes (like sass/less/etc) and how to design it extendable, but, yeah, we can start with only CSS and improve it late, CSS by default is error tolerance, so everyhting what we can't parser will be ListOfComponentValues
Any other thoughts?

Feel free to feedback

I decided to start the problem here, as I think this is the most appropriate place, in the future we may move it or break it into more detailed parts.

The text was updated successfully, but these errors were encountered:

ybiquitous · 2023-05-19T12:02:32Z

@alexander-akait Thanks for raising this parser discussion! This is an exciting topic.

So, do you think this issue covers stylelint/stylelint#5586?

I think also @csstools/* tools like @csstools/media-query-list-parser or @csstools/css-tokenizer by @romainmenke are great achievements in this area.

ybiquitous · 2023-05-19T12:08:23Z

We have postcss and we have certain problems/issues, which, unfortunately, have not been resolved for a long time, like CSS compliance tokenizer and parser, selectors, at-rules and value parser

Does PostCSS's author recognize these problems? And can we not add parser improvements upstream to PostCSS?

Because Stylelint is currently a part of the PostCSS's ecosystem, I think PostCSS would be best for backward compatibility if PostCSS accepted requests for parser improvements.

romainmenke · 2023-05-20T08:54:16Z

Thank you for bringing this up and for gathering all this info 🙇

I think performance of PostCSS and the CSS tooling ecosystem build around PostCSS is a complicated subject :)

On the one hand PostCSS itself is really really fast. But the way it is designed means that there will be a lot of duplicate work on selectors, values, at-rule preludes, ...

It also is written in JS, so it's bound by the constraints of JS engines.
There are a lot of factors here, but in short, it will never be Rust.

Imho it isn't realistically possible to create a new parser that solves everything :

equally fast or faster than PostCSS
can support non-standard syntaxes (less, scss, css-in-js, ...)
can support an ecosystem of plugins
will still be fast when a lot of plugins are run
can parse CSS in it's entirety
has a user friendly API surface
has a complete and correct Object Model
is written in JavaScript

If the constraint that is most important is performance, then it makes more sense to me that Rust or something similar is chosen as a starting point and that other aspects are sacrificed.

However, what I consider to be the most valuable part of PostCSS is not performance but the community and existing adoption.

There is a very large chance that people already have PostCSS as part of their stack, so the barrier to adding a tool based on PostCSS is low.

There is also very little friction within the active PostCSS community, a lot of people are open to collaborate towards a common goal. ( like this here :) )

Starting a community from scratch around a new toolset is not something I am personally interested in :) Not that what I am or am not interested in should stop anyone

Does PostCSS's author recognize these problems?

Yes : postcss/postcss#1145

It's a known issue that it is a waste that each plugin needs to parse values, selectors, at-rule preludes over and over again.

And can we not add parser improvements upstream to PostCSS?

I tried and "succeeded" : postcss/postcss#1812

I wrote more about why I advised not merge that : postcss/postcss#1145 (comment)

TL;DR; the cost of rolling out that change was too high.
It would have fractured the PostCSS ecosystem in a way that it might not recover from.

But it would have made it possible for multiple consumers to parse from an existing token array instead of starting from a string. For a tool that is mostly read heavy like Stylelint it would have meant a serious performance gain.

PostCSS as a host/driver for plugins just works really well and the reason that it is successful is also the reason why we have a performance issue. By hiding a lot of complexity and only exposing a limited Object Model it is much easier to create a simple plugin. But it becomes harder to have a performant "tool chain".

My current approach in postcss-preset-env for better performance is to try and guard each parsing of values, selectors, ... with a small test.

If I want to create a fallback for the ic unit, first check if ic exists as a substring of the value. It might be part of a word like pick and that is fine, but most of the values would be skipped without further parsing.

We might be able to do similar things in Stylelint?
Best to discuss that in it's own issue.

Something that I haven't tried yet, but that I think could work is to cache parsed values.

a shared cache between plugins, packages,...
source string is the cache key
cache value contains
- tokens
- optionally more specialized results (component values, media queries, ...)

Each time you take something out of the cache the entry is removed.
If you didn't mutate or produced more useful parsed values you add them back to the cache.

This would be extremely sensitive to bugs and any bug would be hard to fix.
But it would allow read heavy tasks to share work.

Also best to discuss this in it's own issue.

My current goal with packages like @csstools/css-tokenizer is to lower the barrier to creating high quality parsers for CSS. I want people to have great tooling, even for modern syntax. I don't want there to be a gap of years between a feature landing in browsers and tooling to catch up.

Because it is unopinionated, follows the CSS specification, and doesn't support non-standard syntax it is also really stable. Either it implements the specification correctly or it has a bug and a bug can always be fixed in a patch release. (we might still do semver major from time to time, but these should be rare)

Many things can also be done at the tokenizer level:

upper vs. lower casing of idents/functions
unknown units
normalizing whitespace
removing comments
...

On top of the tokenizer there are the parser algorithms. They are currently limited and only implement the basics for component values. Ideally we extend these to cover more of the css syntax.

These allow you to do more, because structures like blocks, functions are fully parsed.
But there isn't any Object Model specific to your context.

To actually have a useful Object Model another layer is needed, specialized parser which are only invoked when relevant. Things like the media query list parser.

This has a complete Object Model but that is also what makes it massive. There are so many node types in this sub-syntax alone.

We need to think about non CSS compliance syntaxes (like sass/less/etc) and how to design it extendable, but, yeah, we can start with only CSS and improve it late, CSS by default is error tolerance, so everyhting what we can't parser will be ListOfComponentValues

I don't personally use non-standard CSS syntax, everything is plain CSS in a file that has a .css extension. (I am not a frontend developer, so more accurate to say that the team I work in writes plain CSS)

My main reason not to support these is because they do not have a true standards body behind them and that I lack familiarity with these syntaxes.

Correctly following one specification is difficult enough.
Also adding support for several syntaxes that do not even have a specification is not something I want to spend my time on.

But having said that, all tools I've created are composable and modular.
You can rewrite the token stream to make scss look like css, or have different parsing algorithms and then pass on the result of that to one of the specialized parsers like for media queries.

I want people to be able to re-use the complex and hard parts.

Some questions :

are there specific parts of CSS that lack detailed parsers and that this lack of a parser is blocking specific features?
has anyone done any research on what is fast enough for specific tools (minifiers, bundlers, linters, ...) ¹

Faster is always better, but at some point people don't notice the gains anymore. ↩

ybiquitous · 2023-05-20T10:52:10Z

@romainmenke Thanks for sharing postcss/postcss#1145. Now I understand the context very well. 👍🏼

ybiquitous · 2023-05-22T13:25:13Z

@romainmenke I'll try answering your questions as far as I know:

are there specific parts of CSS that lack detailed parsers and that this lack of a parser is blocking specific features?

I don't remember completely, but this project may have some blockers due to insufficient parser libraries.

has anyone done any research on what is fast enough for specific tools (minifiers, bundlers, linters, ...)

Unfortunately, I don't know.

ybiquitous · 2023-05-22T14:09:02Z

I've tried listing up parser libraries used by Stylelint. Some have almost not maintained 😓

Name	Version	Last published	Unpacked size
`postcss`	8.4.23	Apr 20, 2023	194KB
`postcss-media-query-parser`	0.2.3	Oct 27, 2016	n/a
`postcss-resolve-nested-selector`	0.1.1	Feb 19, 2016	n/a
`postcss-safe-parser`	6.0.0	Jun 14, 2021	5.2KB
`postcss-selector-parser`	6.0.13	May 16, 2023	186KB
`postcss-value-parser`	4.2.0	Nov 29, 2021	27KB
`@csstools/css-parser-algorithms`	2.1.1	Apr 10, 2023	31KB
`@csstools/css-tokenizer`	2.1.1	Apr 10, 2023	59KB
`@csstools/media-query-list-parser`	2.0.4	Apr 10, 2023	122KB
`@csstools/selector-specificity`	2.2.0	Mar 21, 2023	17KB
`css-tree`	2.3.1	Dec 15, 2022	1.2MB

Script used to create the table

import { spawnSync } from 'child_process';

const allDeps = JSON.parse(
	spawnSync('npm', ['view', '--json', '[email protected]', 'dependencies']).stdout.toString(),
);

const parserDeps = [
	'postcss',
	'postcss-media-query-parser',
	'postcss-resolve-nested-selector',
	'postcss-safe-parser',
	'postcss-selector-parser',
	'postcss-value-parser',

	'@csstools/css-parser-algorithms',
	'@csstools/css-tokenizer',
	'@csstools/media-query-list-parser',
	'@csstools/selector-specificity',

	'css-tree',
];

const dateFormat = new Intl.DateTimeFormat('en', { dateStyle: 'medium' });
const sizeFormat = new Intl.NumberFormat('en', { notation: 'compact' });

console.log(`| Name | Version | Last published | Unpacked size |`);
console.log(`|:-----|:--------|:---------------|---------------:|`);

for (const name of parserDeps) {
	const version = allDeps[name];
	if (!version) {
		throw new Error(`${name} is not in dependencies`);
	}

	let dep = JSON.parse(
		spawnSync('npm', ['view', '--json', `${name}@${version}`]).stdout.toString(),
	);
	if (Array.isArray(dep)) {
		dep = dep.at(-1);
	}

	const lastPublished = dateFormat.format(new Date(dep.time[dep.version]));
	const size = dep.dist.unpackedSize ? sizeFormat.format(dep.dist.unpackedSize) + 'B' : 'n/a';

	console.log(
		`| [\`${name}\`](https://www.npmjs.com/package/${name}) | ${dep.version} | ${lastPublished} | ${size} |`,
	);
}

EDIT: This list is at point of Stylelint 15.6.2

ybiquitous · 2023-05-22T14:30:02Z

Problems with dependent parsers:

needed to replace unmaintained parsers with alternative
needed to notify plugin authors unmaintained/replaced
- https://github.com/stylelint/stylelint/blob/6c85850f1135085f76948beede43cb6933a2cd60/docs/developer-guide/rules.md?plain=1#L96-L101
css-tree is large and may be going to be unmaintained

romainmenke · 2023-05-22T15:16:49Z

Of that list only these seem immediately problematic to me :

postcss-resolve-nested-selector
postcss-media-query-parser

They have not been updated even when the CSS specifications that are relevant to them have changed years ago.

postcss-value-parser has a few open issues which are hard to fix but these are edge cases, not entire unsupported features. Maybe this one can be handled more on a case by case basis?

css-tree is hard for me to judge the situation. It might be a temporary gap in between active maintenance?
Would be good to reach out.

I really like the syntax checking it offers and it's not trivial to re-create this feature.

alexander-akait · 2023-05-23T13:21:33Z

Oh, there are a lot of messages

Imho it isn't realistically possible to create a new parser that solves everything :

equally fast or faster than PostCSS
can support non-standard syntaxes (less, scss, css-in-js, ...)
can support an ecosystem of plugins
will still be fast when a lot of plugins are run
can parse CSS in it's entirety
has a user friendly API surface
has a complete and correct Object Model
is written in JavaScript

I full disagree:

Postcss tokenizer is not CSS compilance, we already have problems with it and trust me in the future there will be more and more of them, and once again I will have to create more hacks, if necessary, I can list and point to everything
Due to lack good tokenizer support we don't have ListOfComponentsValues for declaraiotns, at-rules, selectors and etc
Due to lack above we need to reparse it in the each plugin (very very very perfomance)
Due to lack above we don't have good CSS AST, for example look at Comment Node, working with commnets in postcss is the hell, we have around 9k hacks to make it works (and ability to get their content) and look at babel/acorn comments implementations, no commnets in AST, you can easy undestand trailing and remains comments
It can be extendable, for even on lower level, look at acorn for example and how it implemented, no need
Postcss has not made any improvements for a long time, it just froze at some stage of development
No grammar parsing, so stylelint has 2 CSS parser, one for grammar and other for syntax analize (csstree and postcss)
No ability to associate structures/multiple parsing/grammar parsing/infromation with AST nodes, you need to stringify everything and return

By default CSS tokenizer is error resistance (and CSS parser) too, so we don't need to worry a lot of non standard CSS, because by spec it will be ListOfCompomentsValues if we can't apply grammar.

If the constraint that is most important is performance, then it makes more sense to me that Rust or something similar is chosen as a starting point and that other aspects are sacrificed.
However, what I consider to be the most valuable part of PostCSS is not performance but the community and existing adoption.
There is a very large chance that people already have PostCSS as part of their stack, so the barrier to adding a tool based on PostCSS is low.
There is also very little friction within the active PostCSS community, a lot of people are open to collaborate towards a common goal. ( like this here :) )
Starting a community from scratch around a new toolset is not something I am personally interested in :) Not that what I am or am not interested in should stop anyone

I propose not to parry to emotion, but to return to reality, if the tool is not going to solve problems and does not provide an opportunity to solve them, then it's time to change the tool.

Some questions :

are there specific parts of CSS that lack detailed parsers and that this lack of a parser is blocking specific features?
has anyone done any research on what is fast enough for specific tools (minifiers, bundlers, linters, ...) 1

Yes and Yes, But we just have incredible performance issues and bugs

Now let's get back to being more constructive:

By performance, I don't mean that we should have speed like C++ or Rust, it should be acceptable, here is a clear example of the problem - cssnano has around 14-16 plugins under the hood and in almost every we parse selectors and values, same here in stylelint, if this is not a clear performance problem, then I immediately give up
All parsers are divided into small packages, like postcss-value-parser, postcss-selector-parser, postcss-media-parser, postcss--again-again-again-parser, some of them are abandoned, some are simply not maintance, they are not completely coordinated with each other, have different AST and fail to resolve issues promptly
We do not have a normal grammar parser, that's why the code is just full of complex loops and more complex conditions, here is still not a small part of my code, reading and maintaining is a crazy effort, a grammar parser would make it possible to simplify this by at least half
Problems with comments I already mentioned, if you need to get comment like /* i-need-to-ignore-the-next-line */ (it can be in any place), you need do magic things
I've been talking about this for a long time and point out these problems, but there is no movement
It is worth adding that I have already implemented tokenizer and parser on Rust https://github.com/swc-project/swc/blob/main/crates/swc_css_parser/src/lexer/mod.rs, but I am here because I need JS solution and I would like to consolidate our efforts here for stylelint team, @csstools and other teams
I fully understand that this is a big task and we don't need to take and replace everything right now, I would like to add that this is not even possible

That is why I suggest to follow the steps:

Start working with the CSS tokenizer, implement this and test
Union around value/at-rules and selector parser and reuse this tokenizer there
Agree with basic AST Node (where we store positions, how we store it, where we store comments and how and etc)
Improve PostCSS to allow store structures, so multiple plugins can reuse results of parsing structures
Implement general CSS parser (like in spec) and release postcss-new-parser (maybe better name) where we will generate PostCSS AST bug using our tokenizer and parser
Deprecated postcss-value-parser (it already just tokenizer parser, so we don't need it anymore), make postcss-selector-parser and postcss-at-rules-parser (there are multiple parser) like utils for our parser
Focus on grammar parsing and simplify our parsers from above because we have it
Focus on extendable - we can allow override tokenizers steps and parser (they are all in spec, we don't need to invent something new), so we can implement basic support for SCSS/Less/etc, I think we have enough basic structures (like Rule/AtRule/Declartion/etc)
Here we already have own full features parser with grammar parsing, if we want to can start to replace PostCSS and implement tranformers with plugin support (and PostCSS AST support), so postcss plugins will work

Some steps can be split into several, I am fine with it, I would also like to add - I've spent quite a bit of time on a lot of tools and parsers in the postcss ecosystem, and I'm honestly tired, and perhaps this is my last attempt to somehow consolidate all this, if it fails again, I will be upset too much again, ultimately, this will lead to the fact that we will simply lose most of our community in the near future

ybiquitous · 2023-05-23T13:52:54Z

@alexander-akait What a big challenge! 👍🏼 👍🏼 👍🏼

I totally agree with the JS solution against Rust since there is a big JS/CSS community here.

Additionally, I agree with starting with a CSS tokenizer and value/at-rule/selector/etc parsers. We will be able to try them in the Stylelint codebase easily.

romainmenke · 2023-05-23T16:44:49Z

By performance, I don't mean that we should have speed like C++ or Rust, it should be acceptable, here is a clear example of the problem - cssnano has around 14-16 plugins under the hood and in almost every we parse selectors and values, same here in stylelint, if this is not a clear performance problem, then I immediately give up

Yeah, the performance issue is absolutely clear, I know it very well :)
But my point was more that I don't think users of PostCSS see/experience this problem.

LightningCSS for example is (on the surface) a combo of :

postcss-import
postcss-preset-env
cssnano

Even when being so much faster, people aren't really that interested, they think it is very cool, but very few are switching to it.
The cost of switching tools is higher than the cost of waiting a few 100ms, even if 90% of that time is useless re-parsing.

I've spent quite a bit of time on a lot of tools and parsers in the postcss ecosystem, and I'm honestly tired, and perhaps this is my last attempt to somehow consolidate all this, if it fails again, I will be upset too much again

I can understand this, and I feel this too, but this is also exactly why I am hesitant.

How can we do a project like this sustainably?

funding
sufficient maintainers
ease of adoption
...

The tokenizer is not something we have to start all over right?
Is there a reason we can not use our existing tokenizer?

https://github.com/csstools/postcss-plugins/tree/main/packages/css-tokenizer#readme

ybiquitous · 2023-05-24T13:27:31Z

How can we do a project like this sustainably?

Yes, this is really a headache for us. 😓
But at least, I believe we can provide a place where the Stylelint community members can easily join.

Is there a reason we can not use our existing tokenizer?

Personally, I think @csstools/css-tokenizer is a great starting point.

silverwind · 2023-05-25T17:16:51Z

Would https://github.com/servo/rust-cssparser be suitable to integrate? It's the CSS parser that Firefox uses. Thought its docs do indicate it does not parse into selectors or properties, so it's probably only half a parser.

silverwind · 2023-05-25T22:33:08Z

We need to think about non CSS compliance syntaxes (like sass/less/etc) and how to design it extendable, but, yeah, we can start with only CSS and improve it late, CSS by default is error tolerance, so everyhting what we can't parser will be ListOfComponentValues

CSS preprocessors are on their way out with CSS now having variables, nesting and color modification. I see no compelling reason anymore to use them.

ybiquitous · 2023-05-26T00:42:58Z

Would servo/rust-cssparser be suitable to integrate?

It's interesting. But I believe our community may be hard to maintain the Rust code.

CSS preprocessors are on their way out with CSS now having variables, nesting and color modification. I see no compelling reason anymore to use them.

I think it's important to keep backward compatibility and extendability for CSS-like syntaxes (Sass/Less etc.) because there are big communities already. At least, we should allow anyone to extend and customize our new parser for such syntaxes.

silverwind · 2023-05-26T09:15:45Z

I think it's important to keep backward compatibility and extendability for CSS-like syntaxes (Sass/Less etc.) because there are big communities already. At least, we should allow anyone to extend and customize our new parser for such syntaxes.

One way of supporting preprocessors would be to transpile the Sass/Less code with source maps to CSS, lint the CSS, and then report back the errors with the position obtained through the source map. Maybe this is already how it works with the existing customSyntax option, not sure.

ota-meshi · 2023-05-26T11:11:32Z

I personally don't think it's a good idea to rely on using source maps. I think autocorrection breaks syntax in most cases.

silverwind · 2023-05-26T11:20:27Z

Right, --fix would not work via such a sourcemap transformation I assume.

Mouvedia · 2023-05-26T13:08:29Z

Do we have a flamegraph of node_modules/.bin/jest --runInBand ?
In short we need some metrics/profiling first.

romainmenke · 2023-05-27T22:10:32Z

https://github.com/stylelint/stylelint/blob/main/lib/rules/color-named/index.js#L63-L128

color-named is a good example of a performance issue.

It is eagerly parsing with declaration values with postcss-value-parsers without a fast abort.

It is then walking the value AST and again eagerly parsing with colord.

We also have a color value parser built on top of our tokenizer and parser algorithms :
https://github.com/csstools/postcss-plugins/tree/main/packages/css-color-parser#readme

The input to this specialized parser is not a string but component values.
So there isn't any expensive serializing and re-parsing to make tools work together.

As many logic as possible can be done first at the token level, than at component values and only when really needed as fully parsed color values.

Each step only does the minimal amount of work.

alexander-akait · 2023-05-28T20:43:58Z

@romainmenke

The tokenizer is not something we have to start all over right?
Is there a reason we can not use our existing tokenizer?

https://github.com/csstools/postcss-plugins/tree/main/packages/css-tokenizer#readme

I am fine with it.

My suggestions are:

move it to own repo, we can still be under csstools org, just to avoid mixing postcss-plugins works and parser works
maybe we can move all parser related things there?
I looked at code and it looks like they are fully CSS compliance tokenizer,
Also I found some memory and perf imromenets (for example we store each character - https://github.com/csstools/postcss-plugins/blob/main/packages/css-tokenizer/src/tokenizer.ts#L92, it means if we will have 3mb of CSS, we will store tokens + 3mb characters, it is not good)
if we want to support Sass/Less/Any custom syntax we need to make it extanable, it is not hard, we should just allow to override tokens logic here https://github.com/csstools/postcss-plugins/blob/main/packages/css-tokenizer/src/tokenizer.ts#L81 and run own function, so this section need to be refactors
Implement callbacks on each token (it is useful for bundlers), to bundle CSS you don't need full AST (in most of cases) and to avoid run loops twice we can implement callback options
because it is typescript we need to verify output, because typescript generates additional code and it has affect performance very well in some cases
comments will be useful in source code with links on CSS spec and descriptions, this is not necessary, but usually developers do it for the convenience (look at acorn/typescript/etc for examples)
we need to move it in the one file, because each import/require degrades start time (it's pretty obvious for parser, on each file Node.js execute fs calls, they cost a time)

Maybe I missed something else but this is not a problem, we can discuss it in the repository if we can all agree

romainmenke · 2023-05-28T21:32:08Z

move it to own repo, we can still be under csstools org, just to avoid mixing postcss-plugins works and parser works

I don't have ownership, admin or publish permissions for either the github org or the npm org for csstools. Either that needs to change and must be extended at least to you (@alexander-akait) or a different space must be created for this effort.

It might be better to do a clean slate start.
(We can transfer existing code, test suites, ...)

I personally prefer to work in a mono repo because that makes it easier to spot regressions.
Are you ok with having a single git repository for all tokenizer, parser related work?

I agree on all points of feedback related to the current tokenizer.

ybiquitous · 2023-05-29T09:17:56Z

@alexander-akait @romainmenke If you wish, providing repositories for parsers etc. under the github.com/stylelint org may be possible.

@stylelint/owners Any thoughts?

ntwb · 2023-05-30T08:07:33Z

If you wish, providing repositories for parsers etc. under the github.com/stylelint org may be possible.
No objections to hosting under the github.com/stylelint org

Would servo/rust-cssparser be suitable to integrate?

It's interesting. But I believe our community may be hard to maintain the Rust code.

This is something to be aware of, historically Stylelint has had difficulty in attracting contributors at various times, it's been at times quite challenging allowing both Stylelint to be extended by other plugins and Stylelint depending on other packages and having this ecosystem maintained

Another consideration is the eslint/rfcs#99

This RFC specifies a plugin format that would allow ESLint plugins to fully define their own languages, effectively expanding ESLint from a JavaScript-focused linter into a more general-purpose linter.

The goal here is to take the boring parts of a linter (file finding, configuration, etc.) and separate that out from the JS-specific parts so no one needs to rebuild the boring parts over and over again.

I've not fully thought through all of this, though if writing new tokenizer/parser and having ESLint under the hood to simplify & streamline the maintenance of the underlying cli and api aspects of Stylelint is worth thinking about also IMHO

ybiquitous · 2023-05-31T14:03:43Z

@ntwb Thanks for the comment. As you mentioned, Stylelint has needed more maintainers.

I personally think this @alexander-akait's suggestion is great not only for the Stylelint community but also for other JS/CSS communities. However, unfortunately, supporting the challenge under the Stylelint organization may be risky because of that maintainer shortage. 😓

alexander-akait · 2023-06-01T17:43:03Z

@romainmenke

I personally prefer to work in a mono repo because that makes it easier to spot regressions.
Are you ok with having a single git repository for all tokenizer, parser related work?

Yes, of course, tokenizer/parser/traverser/serializer, these are things related to the parser process, so it would be great to have them all in one place.

@ntwb

Another consideration is the eslint/rfcs#99
I've not fully thought through all of this, though if writing new tokenizer/parser and having ESLint under the hood to simplify & streamline the maintenance of the underlying cli and api aspects of Stylelint is worth thinking about also IMHO

It's so funny, because I offered to do this 5 years ago, when we were just starting work, but was refused everywhere, now it's official.

And I proceeded from a simple thing - we should make the core for any linters. CLI logic/rules logic/configuration(s)/ignoring and extending/options for parsers and rules/fixable logic/etc and we had to duplicate all this. And my logic was that we could avoid this, collaborate and combine the work, and now I see how it all came to this. But unfortunately a little late and our code has become more complicated and now it would be quite difficult to rewrite all this (yeah, we can just create a rule and run stylelint inside that rule, but that looks like a big mono and badly configurable rule).

But now we can avoid some mistakes too

JS has https://github.com/estree/estree, so any parsers which follow estree are compatibility and I think we have to do the same, yes it takes a time and I definitely can't do it alone, BUT if we do this, then we will become independent of the parser and its implementation in the future, Rust/JS/Zip/C++/C, whatever you want. I still think that the idea of rewriting everything in Rust is a utopia at this moment (the future is foggy and we do not know what will happen tomorrow, but we can influence it), yes it would be great and it would allow for us to have good perf and many and many, but if we look at the world realistically, we will understand that, unfortunately, there are not so many people who know it, and most our users know only JS (some TS too). But this does not mean that we should not build the right foundation, if we get to this in time, then it will be fine, but for now we can just agree on some documents for AST structures and maybe basic API.

conartist6 · 2023-06-03T20:55:41Z

Hey I just want to introduce myself. I'm working on a shared parser/linter/formatter core, and it is my explicit goal (and full-time job) to unify what can be unified across this ecosystem. I believe myself to be several (important) steps ahead of ESLint in this regard, and as they have also shown me nothing but indifference it seems that I am their open competitor. My project is still flying under the radar for the moment, but I plan for that to change in a major way, and soon.

romainmenke · 2023-06-13T17:45:12Z

Might be an interesting read : https://railsatscale.com//2023-06-12-rewriting-the-ruby-parser/

ybiquitous · 2023-06-16T00:17:27Z

Thanks for sharing the article. I read it. We wish "Universal Parser" for CSS, too!

silverwind · 2023-06-16T12:48:05Z

The best CSS parser ought the be the one that browsers use. I wonder if Blink's CSS parser could be leveraged 😆.

romainmenke · 2023-06-16T12:57:45Z

The best CSS parser ought the be the one that browsers use.

Yes and no :)

They are the best because they are extremely well tested and are used in the wild by billions.

But browsers only need to parse CSS for a limited use case.
Their parsers don't have to preserve as much debug info (like whitespace or comments).

Those parsers also don't have to support non-standard syntax like scss, less, ....

LightingCSS for example uses Servo's CSS tokenizer/parser and that is what makes it good and extremely fast. But it's also the source of all the limitations of LightingCSS.

LightingCSS can not be used to build a linter because it discards too many tokens.

conartist6 · 2023-06-16T13:27:15Z

LightingCSS can not be used to build a linter because it discards too many tokens.

This is where I come in! cst-tokens takes the output of an existing parser and uses it to rebuild a tree in which every source character is present in the token stream. Doing this requires defining the syntax of CSS in a cst-tokens parser grammar, but the parser need not be complete: it does not need to know how to resolve ambiguity. The traversal code simply uses the output of the first-pass parser for that purpose. In this way my project's functionality is closely related to that of ungrammar (which you should also look into though I am focused on extensible grammars and they are not).

The cst-tokens CST is also a pure superset of the AST it decorates, and is meant to have all the APIs needed to build any kind of parser, formatter, and linter functionality. It allows comment attachment rules for ambiguous comments to be well-defined, while always preserving the ability to see all possible comment attachments for any given node.

conartist6 · 2023-06-16T16:07:26Z

Another reason there's a strong case for a concrete syntax wrapper around an existing AST is that you don't really have to risk breaking anything!! You use the same parser -- you're just adding a new validator and retokenizer layer, so for your users AND your lint rules the language is guaranteed not to have changed at all!

The downside is that the technology isn't ready for production usage yet, and won't be for a little while. Serious users will want to see the library hit 1.0.0, a goal which I've ensured that I can reach and am working directly towards.

I'm essentially here asking for help doing the work that makes everything I am describing possible. With the right help I could get to 1.0.0 a lot faster!

romainmenke · 2023-06-16T17:42:48Z

I think it's important to find a place for this effort so that we can split this thread.

I don't want to engage too much on specifics but I also don't want to appear dismissive of people reaching out like @conartist6 .

I think many people care about this issue and want to collaborate.

Maybe any new repository is fine?
It only needs to serve as a temporary home for discussion and issues.

A place where we can align on priorities, goals, ...

ybiquitous · 2023-06-17T04:54:48Z

I can provide a new repository in the github.com/stylelint org, which would be a temporary home for our collaboration. It also would work until we would find a more appropriate home (org).

For example, how about github.com/stylelint/css-parser? I can invite a few people as the repository owner at first.

scripthunter7 · 2023-06-17T16:34:25Z

I'm also interested in this project, and as my time allows, I am happy to help with the planning / implementation. Are you planning to create a Discord server or similar communication platform?

alexander-akait · 2023-06-17T21:38:44Z

I like the idea of CST, but unfortunately the use of generic solutions is often much worse in performance due overhead (but I would look at the benches), original CSS tokens (from the syntax spec) already have everything - whitespaces/tokens/etc. Also it is good to be align with it for maintance purposes.

If someone wants to start that would be great, I'm a little busy right now. And yes anyway we need to start with the tokenizer and we already have a solutions (we can reuse them).

ybiquitous · 2023-06-18T13:34:57Z

@alexander-akait @romainmenke
I've created a repository for this project and invited you as an admin.
https://github.com/stylelint/css-parser

Please freely use it. Since the repo may be temporary, you don't need to follow the Stylelint organization rules.

Are you planning to create a Discord server or similar communication platform?

@scripthunter7
We have no plan at this point, but it's possible to consider it if such a platform is required. I want to leave its decision up to the admins.

romainmenke · 2023-06-19T21:29:07Z

Thank you @ybiquitous,

I will try to get the ball rolling in a few issues in the next few weeks.

silverwind · 2023-06-23T11:19:24Z

I recently saw @keithamus's csslex, maybe it is something to consider using.

romainmenke · 2023-06-23T12:05:32Z

Thank you for sharing this @silverwind
That package looks really great!

I've started a list of tokenizers here : #1

ybiquitous · 2023-06-23T12:32:26Z

@romainmenke You can transfer this issue to stylelint/css-parser if you wish it. Of course, no problem with as-is. 👍🏼

conartist6 · 2023-07-17T20:06:32Z

I'm still working on my solution. It won't be fast in the way Rust is zoom-zoom close-to-the-metal fast, but it will be incremental, streaming, extensible, and easy to maintain -- properties that should prove highly advantageous to linters. Right now I'm working on defining an XML-based serialization format that allows my disambiguated trees to be easily sent over a wire. It's a fun example to check out because it both defines the syntax and shows how the parser core works to define syntaxes. https://gist.github.com/conartist6/5adbbf28d11497467848f530756c1c2a

conartist6 · 2023-07-17T20:12:54Z

As for the zoom-zoom part, making that method of defining syntax fast is mostly just a matter of doing some code transformation. For example if you have a production like this:

export const productions = {
  *Identifier() {
    yield eat(tok`Identifier`);
  }
}

There's a bunch of associated cost from evaluating eat(tok`Identifier`) repeatedly. But I could eliminate that cost using a hoisting transform that would change the code to something like

const hoisted_1 = eat(tok`Identifier`);
export const productions = {
  *Identifier() {
    yield hoisted_1;
  }
}

Now you can see that there's actually a pretty small amount of logic necessary to process any given production!

conartist6 · 2023-07-17T20:17:10Z

What you gain for your effort is the ability to process chunked streams. You don't need to have the entire source in a single stream, as many parsers require so that they can store indexes into the string as state.

For a linter this means gaining the ability to lint files larger than fit in memory. Memory usage would be driven more by the complexity of language and query rules than by the size of the file being linted.

conartist6 · 2023-07-17T20:19:41Z

Also tokens that index into strings tend to perform badly when you want to insert a token. The structure requires invalidating all other tokens because the indexes of all tokens after the change will need to be updated by some offset.

Mouvedia · 2024-02-02T16:15:42Z

related: biomejs/biome#268

alexander-akait changed the title ~~Implement full featured CSS parser~~ [parser] Implement full featured CSS parser May 19, 2023

silverwind mentioned this issue May 23, 2023

Add no-invalid-position-var-function stylelint/stylelint#6859

Open

This was referenced May 27, 2023

Fix alpha-value-notation performance with improved benchmark script stylelint/stylelint#6864

Merged

Fix at-rule-property-required-list performance stylelint/stylelint#6865

Merged

This was referenced May 28, 2023

Fix color-* performance stylelint/stylelint#6868

Merged

Fix performance of constructor parsing rules stylelint/stylelint#6869

Open

Mouvedia mentioned this issue May 31, 2023

Refactor to use PostCSS Visitors API stylelint/stylelint#5354

Open

2 tasks

romainmenke transferred this issue from stylelint/stylelint Jun 23, 2023

jeddy3 mentioned this issue Jun 23, 2023

Use SWC as parser stylelint/stylelint#5586

Closed

[parser] Implement full featured CSS parser #2

[parser] Implement full featured CSS parser #2

Comments

alexander-akait commented May 19, 2023 • edited Loading

ybiquitous commented May 19, 2023

ybiquitous commented May 19, 2023

romainmenke commented May 20, 2023

Footnotes

ybiquitous commented May 20, 2023

ybiquitous commented May 22, 2023

ybiquitous commented May 22, 2023 • edited Loading

ybiquitous commented May 22, 2023

romainmenke commented May 22, 2023 • edited Loading

alexander-akait commented May 23, 2023

ybiquitous commented May 23, 2023

romainmenke commented May 23, 2023

ybiquitous commented May 24, 2023

silverwind commented May 25, 2023 • edited Loading

silverwind commented May 25, 2023

ybiquitous commented May 26, 2023

silverwind commented May 26, 2023

ota-meshi commented May 26, 2023

silverwind commented May 26, 2023

Mouvedia commented May 26, 2023

romainmenke commented May 27, 2023

alexander-akait commented May 28, 2023 • edited Loading

romainmenke commented May 28, 2023 • edited Loading

ybiquitous commented May 29, 2023

ntwb commented May 30, 2023

ybiquitous commented May 31, 2023

alexander-akait commented Jun 1, 2023 • edited Loading

conartist6 commented Jun 3, 2023

romainmenke commented Jun 13, 2023

ybiquitous commented Jun 16, 2023

silverwind commented Jun 16, 2023 • edited Loading

romainmenke commented Jun 16, 2023

conartist6 commented Jun 16, 2023 • edited Loading

conartist6 commented Jun 16, 2023 • edited Loading

romainmenke commented Jun 16, 2023

ybiquitous commented Jun 17, 2023

scripthunter7 commented Jun 17, 2023

alexander-akait commented Jun 17, 2023

ybiquitous commented Jun 18, 2023

romainmenke commented Jun 19, 2023

silverwind commented Jun 23, 2023 • edited Loading

romainmenke commented Jun 23, 2023

ybiquitous commented Jun 23, 2023

conartist6 commented Jul 17, 2023

conartist6 commented Jul 17, 2023

conartist6 commented Jul 17, 2023

conartist6 commented Jul 17, 2023

Mouvedia commented Feb 2, 2024

alexander-akait commented May 19, 2023 •

edited

Loading

ybiquitous commented May 22, 2023 •

edited

Loading

romainmenke commented May 22, 2023 •

edited

Loading

silverwind commented May 25, 2023 •

edited

Loading

alexander-akait commented May 28, 2023 •

edited

Loading

romainmenke commented May 28, 2023 •

edited

Loading

alexander-akait commented Jun 1, 2023 •

edited

Loading

silverwind commented Jun 16, 2023 •

edited

Loading

conartist6 commented Jun 16, 2023 •

edited

Loading

conartist6 commented Jun 16, 2023 •

edited

Loading

silverwind commented Jun 23, 2023 •

edited

Loading