Skip to content

Conversation

yglukhov
Copy link
Member

No description provided.

@yglukhov yglukhov marked this pull request as draft July 22, 2025 13:15
@yglukhov yglukhov marked this pull request as ready for review July 22, 2025 17:44
@yglukhov
Copy link
Member Author

toml_serialization failure unrelated, and I can't reproduce it locally, and can't tell what's the problem. @arnetheduck maybe you could have a look?

@arnetheduck
Copy link
Contributor

#25066
status-im/nim-faststreams#76
should work now

@yglukhov
Copy link
Member Author

Rest of failures unrelated too, I think

@Araq
Copy link
Member

Araq commented Jul 23, 2025

I hate this shit. Just make the compiler emit copyMem for for loops instead of patching every single copy loop ever written.

@arnetheduck
Copy link
Contributor

Just make the compiler emit copyMem

ugh please no..

this is actually entirely solvable in library code - the problem is mainly that too much happens in the compiler already that is of poor quality and hard to address, and taking this shortcut means that you no longer can build custom data structures that benefit from the same optimizations. The right way here which would be extensible is to have primitives in the language and library that can be reused and remove many of the compiler magics (like ArrPut and so on)..

see also https://github.com/status-im/nim-stew/blob/35f935d0997724d3c61f57ab9fd0992e6f114f7f/stew/assign2.nim - ie this little snippet gets rid of most of bad performance in nim codegen - the biggest problem with it is that the compiler sneaks in calls to the compiler-generated = in lots of places which remains terribly slow and hard to fix.

The ideal scenario would be that assignment becomes pluggable the way assign2 works, and then we can implement library functions that call = for the right types.

This is also why we want setLenUninit and friends to work consistently and sanely - so that a Table and Deque can be written that is as efficient as seq and other magic compiler-based types.

@yglukhov
Copy link
Member Author

Library code is a lot easier to reason about than compiler code. If anything, I'd prefer to reduce the number of compiler magics/builtins, by moving them out to lib.

@Araq if you're concerned with this copying pattern recurring too much, we could just introduce a lib function to reuse it (e.g. template arrayCopy[T](dstArray, srcArray, dstIdx, srcIdx, len)), but I'm not aware of that many places to bother about it.

@arnetheduck
Copy link
Contributor

arnetheduck commented Jul 24, 2025

I'm not aware of that many places

=, []=, add(var seq, openarray), etc - there's plenty if you go looking for them - with the right primitives (setLenUnit, something like arrayCopy etc) they all boil down to the same reusable pattern though which is why this is so amenable to library solutions .. @ = setLenUninit + arrayCopy, []= is arrayCopy, add is setLenUninit + arrayCopy and so on and they tend to differ in whether they overwrite or extend things mainly.

the other big slowdown factor is that nim inserts range changes even for cases where this is not needed - ie for i in 0..<x.len: x[i] = y[i] cannot be optimized by the C compiler because it has to raise the defect at the right iteration after having copied the parts that fit - semantically, this of course doesn't matter for an array copy, but this is not something the C optimizer can deal with - sometimes it can be avoided by performing the length check explicitly before the loop which allows the compiler to make some assumptions in removing later range checks but this only works if aliasing information is available which it often is not (the range check rereads the length of the x and y on every iteration, and if aliasing is possible it cannot hoist this value from the loop and therefore cannot remove the in-loop check either).

There are plenty of these little nuggets that could be solved easily if we moved implementations out from the compiler and into the library, as they are much more easy to reason about there.

@Araq
Copy link
Member

Araq commented Jul 24, 2025

Range check elimination is also a complex compiler analysis, you don't get simplicity out of moving these things to libraries (how would that even work anyway for range checks elimination).

@yglukhov
Copy link
Member Author

Range check elimination is also a complex compiler analysis

Maybe something like {.push rangeChecks: off.}/{.pop.} in strategic places could help?

@arnetheduck
Copy link
Contributor

arnetheduck commented Jul 24, 2025

don't get simplicity out of moving these things

of course you do: you just disable it explicitly in your arrayCopy etc primitives without waiting for the complex compiler analysis to be written. When the compiler finally understands these fine points (I doubt it will), all you need to do is change the library function and everything that uses the smart primitives remains unchanged. This makes the implementations of add etc simple and the complexity remains concentrated in a single tightly controlled spot.

@ringabout ringabout closed this Sep 8, 2025
@ringabout ringabout reopened this Sep 8, 2025
@Araq Araq merged commit 49e66e8 into nim-lang:devel Sep 10, 2025
30 checks passed
Copy link
Contributor

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 49e66e8

Hint: mm: orc; opt: speed; options: -d:release
183046 lines; 9.306s; 660.699MiB peakmem

narimiran pushed a commit that referenced this pull request Sep 12, 2025
Co-authored-by: ringabout <[email protected]>
(cherry picked from commit 49e66e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants