Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach input location to tokens (add spans feature) #10

Open
not-my-profile opened this issue Nov 30, 2021 · 5 comments
Open

Attach input location to tokens (add spans feature) #10

not-my-profile opened this issue Nov 30, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@not-my-profile
Copy link
Contributor

not-my-profile commented Nov 30, 2021

Hey, thanks for this library ... it looks really promising :) I am working on an HTML linter for which I require the spans of parser errors, tag names, attribute names and attribute values. These spans would ideally be reported as core::ops::Range<usize>, so that I can pass them directly to the codespan_reporting library (codespan_reporting::diagnostic::Label::range in particular). Since span tracking is of course overhead it would be behind an off-by-default feature flag.

I recently implemented this in my fork of the html5ever tokenizer ... which I frankly would love to abandon for a more sound library :) If you are interested in this I can probably implement it.

@untitaker
Copy link
Owner

untitaker commented Nov 30, 2021

i think the easiest way to hack this in without adding overhead to the common case is to wrap the Reader in one that keeps track of the currently read character, then you can get the current file position and attach it to all the tokens you retrieve from the tokenizer. File position is not really a span but maybe close enough?

i'm interested in this feature but I have no idea how it could be done without adding overhead. probably using a generic that resolves to () for the common case?

@untitaker untitaker added the enhancement New feature or request label Nov 30, 2021
@not-my-profile
Copy link
Contributor Author

File position is not really a span but maybe close enough?

I want to output the nicest error messages possible ... for which I need spans ... not just the current position.

I have no idea how it could be done without adding overhead

I was thinking of a bunch of #[cfg(feature = "spans")]{ ... } in the giant (unreadable) match block.

@untitaker
Copy link
Owner

yes compile-time feature probably works, ideally it would be configurable per tokenizer instance for testability reasons though. perhaps it can be done via generics

@not-my-profile
Copy link
Contributor Author

I am afraid I don't see how this could be nicely done with generics. What are these testability reasons you speak of?

@untitaker
Copy link
Owner

if I want to for example test with codespans, and once without, or benchmark with/without codespans I would have to recompile everything many times

generics idea:

enum Token<S> {
    StartTag(StartTag<S>),
    ...
}

struct StartTag<S> {
    name: String,
    attributes: BTreeMap<String, String>,
    span: S,
}

type Span = Range<char>;
type TokenWithoutSpans = Token<()>;
type TokenWithSpans = Token<Span>;

and make DefaultEmitter also generic over S.

Your entire tokenizer already has the emitter as type param, so now it can be Tokenizer<DefaultEmitter<()>, _> for running without spans, and Tokenizer<DefaultEmitter<Span>, _> for with spans

so during monomorphization you get two separately optimized codepaths

@untitaker untitaker changed the title Add optional spans feature to report source code spans Attach input location to tokens (add spans feature) Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants