-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of overly long lines in generated .hs file #84
Comments
Apparently, For instance, consider alex_base :: AlexAddr
alex_base = AlexA# "\xf8\xff\xff\xff\xd5\xff\xff\xff\xfa\xff\xff\xff\x00\x00\x00\x00\xa5\xff\xff\xff\xa6\xff\xff\xff\xa7\xff\xff\xff\xa8\xff\xff\xff\xa9\xff\xff\xff\xaa\xff\xff\xff\xab\xff\xff\xff\xac\xff\xff\xff\xad\xff\xff\xff\xae\xff\xff\xff\xaf\xff\xff\xff\xbb\xff\xff\xff\xbc\xff\xff\xff\xf6\xff\xff\xff\x10\x00\x00\x00\x11\x00\x00\x00\x2c\x00\x00\x00\x2d\x00\x00\x00\x2e\x00\x00\x00\x2f\x00\x00\x00\x30\x00\x00\x00\x31\x00\x00\x00\x32\x00\x00\x00\x33\x00\x00\x00\x34\x00\x00\x00\x35\x00\x00\x00\x00\x00\x00\x00\x87\x00\x00\x00\x00\x00\x00\x00"# while this produces a segfault alex_base :: AlexAddr
alex_base = AlexA# "\xf8\
\\xff\xff\xff\xd5\xff\xff\xff\xfa\xff\xff\xff\x00\x00\x00\x00\xa5\xff\xff\xff\xa6\xff\xff\xff\xa7\xff\xff\xff\xa8\xff\xff\xff\xa9\xff\xff\xff\xaa\xff\xff\xff\xab\xff\xff\xff\xac\xff\xff\xff\xad\xff\xff\xff\xae\xff\xff\xff\xaf\xff\xff\xff\xbb\xff\xff\xff\xbc\xff\xff\xff\xf6\xff\xff\xff\x10\x00\x00\x00\x11\x00\x00\x00\x2c\x00\x00\x00\x2d\x00\x00\x00\x2e\x00\x00\x00\x2f\x00\x00\x00\x30\x00\x00\x00\x31\x00\x00\x00\x32\x00\x00\x00\x33\x00\x00\x00\x34\x00\x00\x00\x35\x00\x00\x00\x00\x00\x00\x00\x87\x00\x00\x00\x00\x00\x00\x00"# |
This could be due to CPP interacting badly with the backslash-end-of-line sequence, see https://downloads.haskell.org/~ghc/master/users-guide/phases.html#cpp-and-string-gaps |
I think the haskell multi-line string syntax is bad, given standard linux backslash at end of line behaviour. |
But, then, what would be the best way to split these long bytearrays into manageable chunks? |
Well, knowing that there is a CPP preprocessor for them always gives you
options. Just use the CPP standard, so you end up with
```
alex_base :: AlexAddr
alex_base = AlexA# "\xf8\\xff\xff\xff\xd5\xff\xff\xff\xfa\xff\xff\xff\x00\
\x00\x00\x00\xa5\xff\xff\xff\xa6\xff\xff\xff\xa7\xff\
\xff\xff\xa8\xff\xff\xff\xa9\xff\xff\xff\xaa\xff\xff\xff\xab\xff\xff\xff\xac\xff\xff\xff\xad\xff\xff\xff\xae\xff\xff\xff\xaf\xff\xff\xff\xbb\xff\xff\xff\xbc\xff\xff\xff\xf6\xff\xff\xff\x10\x00\x00\x00\x11\x00\x00\x00\x2c\x00\x00\x00\x2d\x00\x00\x00\x2e\x00\x00\x00\x2f\x00\x00\x00\x30\x00\x00\x00\x31\x00\x00\x00\x32\x00\x00\x00\x33\x00\x00\x00\x34\x00\x00\x00\x35\x00\x00\x00\x00\x00\x00\x00\x87\x00\x00\x00\x00\x00\x00\x00"#
```
…On 11 June 2017 at 14:57, Sergey Vinokurov ***@***.***> wrote:
But, then, what would be the best way to split these long bytearrays into
manageable chunks?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#84 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAZAB6b_Ke_pesgsh-p1bWJVC1Ej5Fxeks5sC-QhgaJpZM4HCQZX>
.
|
Apparently fixed by #107. |
Unfortunately the #107 fix was reverted back since it didn't play well with
This is the original discussion that led to revert #116. Perhaps cpphs is less of an issue nowadays and the fix could be restored? Thinking today about the |
Ah sorry, I was mistaken. The problem still exists for I suppose the |
I did some benchmarking of fast-tags lexer on a set of hackage packages (up to 10000 Haskell files totalling 70Mb) and ghc bytestring literal is slightly but consistently better than arrays performancewise. I also tried to mess around with generated lexers and manually switch boxed arrays to unboxed ones, replace indexing with unsafe indexing that doesn't do bounds checking and even replace arrays with vectors from the My benchmark results as reported by
However, runtime is not the only thing that changes when alex array backend is used. Big array constants that are generated when
Even after stripping the difference is dramatic
I think the PS I can share my messy benchmarks if anyone is curious. |
Great investigation, @sergv ! |
@andreasabel I have an idea. I propose to restore my previous commit that makes lines smaller at the cost of meddling with preprocessor - this should mostly work with recent gcc and clang. For others I'll add a flag for alex, similar to --ghc or --debug that will disable long lines and thus e.g. projects that use only cpphs could always pass this flag and it'll work for them. Please share any objections. |
@sergv : Sorry for the delayed response and thanks for the investigation. What you suggest could be a solution but I wonder whether this issue is worth any complications, as it is purely aesthetic.
To verify this, we would have to diversify our CI. |
I can take a look at restoring commit and diversifying CI. It wouldn't cause that much complications (see my old PR) logicwise, yet I recall being pissed whenever I had to open Alex generated files. I didn't open any lately and admittedly it's the editors who are wrong at not being able to handle files with very long lines. Yet in practice we have the editors we have and looks like the issue can be alleviated in alex. As a user I definitely did want this - that's why I did the PR before that fixed it. |
I suggest to create another workflow (independent of the semi-autogenerated |
Some packages provide the generated files to ease installation. If a user wants to open these in an editor or run tooling over them, this presents challenges.
e.g. http://hackage.haskell.org/package/Agda-2.4.2.2/docs/src/Agda-Syntax-Parser-Lexer.html#lexer
Consider splitting the generated code over multiple lines for
The text was updated successfully, but these errors were encountered: