-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation JIT #88
Comments
What is gained by having a Forth compiled by a JIT? There are plenty of Forths that compile to native code. That’s a much simpler option compared to, for instance, firmforth (which I assume is what you found). |
JIT is fairly complex and good native code compilation is not very
difficult. But you probably hit on the issue in that most production forth
systems are embedded and often real-time so JIT wouldn't be appropriate.
I'm curious to explore JIT for interpreted code running under Linux and
other general purpose operating systems. Would be curious to see examples
of doing it in a forth-like manner. The canonical implementation of my new
ActorForth is in Python which could definitely benefit from such things.
Naturally it will be a meta-compiler with compiled output to target
environments but we also see it as a useful alternative to bash scripts,
build scripts, and even certain uses of python where it stays in its
interpretive environment. JIT could be interesting for those use cases.
- - Ben Scherrey
…On Mon, Jun 22, 2020, 9:22 AM mgcaret ***@***.***> wrote:
What is gained by having a Forth compiled by a JIT? There are plenty of
Forths that compile to native code. That’s a much simpler option compared
to, for instance, firmforth (which I assume is what you found).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#88 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWP7777XBWNWV57DCVACLRX2567ANCNFSM4OEAGEGQ>
.
|
I have been using threaded code Forth systems for years and I have never encountered a case where a more complicated compiling strategy would have been worth the tradeoffs. On PC-class machines, a threaded code Forth runs so blazingly fast that making it scorchingly fast is just pointless. On tiny tiny systems, native compilation has its advantages, but you have to go way down to AVR-class micros before that becomes compelling. ARM and ESP32 chips costing about $1 run a threaded code Forth fast enough for any reasonable purpose - typically faster than the I/O systems. |
Thanks Quentin - that's a pretty interesting project with some impressive
results.
…-- Ben Scherrey
On Wed, Jun 24, 2020 at 5:34 AM Quentin Quadrat ***@***.***> wrote:
@scherrey <https://github.com/scherrey> https://github.com/anse1/firmforth
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#88 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWP773GRK3LJJJYW262H3RYEUYRANCNFSM4OEAGEGQ>
.
|
But now since I discovered https://github.com/PeterForth/ForthWin-Download using "subroutine threading" (implying the code is composed only from direct calls in machine code) when I edited this ticket I realized that JIT is not needed. |
I think there's a case for running native code out of flash for really tiny devices with say 1K RAM or less. That said, maybe it's better to spend a few more cents for a part with more memory. |
When optimizing native-code compilers are used with Forth, the result is that Forth code runs far faster than when threaded code is used. Whether the performance gains of native-code compilation are 'needed' will of course depend on your specific case. |
This has more to do with the target processor than native vs threaded. On CPUs that do not have severe penalties for consistently-mispredicted, indirect jumps, threaded code can be quite fast. I don't believe it will ever be as fast native (just as C won't ever be as fast as well written, hand-rolled assembly). But, a direct- or subroutine-threaded Forth with NEXT inlined and a reasonably good set of CODE words can be very fast indeed on many CPUs, including modern ones like the ARM (esp. in THUMB mode). And the paid cost of a little performance is often more than made-up for in size savings. That said, since this discussion thread is about optimizations... On threaded Forths it's trivial to implement word-pair reduction optimizations, and those can end up being quite effective. For example:
With the above rules, |
My FCC does a similar "phrasing" optimization. It uses fairly straightforward direct-threaded code as a baseline, but the compiler buffers up to 4 words, and then when it needs to emit one (or drain the queue because we're at a control flow boundary like I collected the (fairly large) set of optimized phrases by running a slate of small benchmarks and some more real-world applications and having the compiler keep track of all the 4-, 3- and 2-word phrases it saw and how often. Then I optimized the most common/most optimizable ~64 of each length. This gave it a serious speedup, and for a time it was neck-and-neck with Gforth-fast on its target platforms. Further work on Gforth has surpassed FCC substantially on most of the benchmarks, now, alas. I should say that it differs from the above ( |
👍 Yep! Your particular strategy seems quite nice. Furthering the discussion, there's nothing that requires "phrase" optimizations (or "reductions" as I called them) to be hand-rolled, CODE words. Although that's typically what they are. As an example of where that wouldn't be the case... During an old work project that used subroutine threading that was a bit resource constrained, I came across situations like this (simplified for clarity):
So, while one might think that The above became even more common when talking about |
Does anyone know of a recent comparison of Forth performance, either between different Forth implementations or against, say, C? This one is by Anton Ertl of Gforth fame, from 1996. I can't figure out how to read that results table, so I'll go with quotes instead. It shows Gforth being much slower than when Forth was translated to C via f2c and then compiled to native code.
It also states that:
But again that's from 1996. This one, run by the VFX Forth folks, shows a 2012 benchmark where gforth-fast took around 4 times as long as VFX Forth and iForth (both native-code-compiling Forths) to run its LZ77 compression benchmark. This one from 2018 shows SwiftForth taking 131% as long as C++/g++, and Gforth taking 602% as long as C++/g++. The benchmark problem was to loop through the integers from 2 to 100,000 naively checking the primeness of each integer and printing the ones which are prime. I don't think it used gforth-fast though, and it might be unduly measuring startup time rather than throughput. The comparisons linked from here are also ancient. edit A new performance-comparison was just done (discussed here). gforth-fast took just over 6x as long as C. Was a bit surprised to see Python outperform Gforth, but I believe this is due to the Python script leaning heavily on the highly optimised Python standard library (written in C). In the Forth solution, all the heavy lifting is done in Forth. |
From the old-timer hobbyist.. This question is as old as Forth I think. It begs the question why would anyone use language(X) when it's not as fast as C? The Forth ITC VM is a compromise of size and performance and as Mitch mentioned it is a pretty damned good compromise. I went through the exercise of building a word CODE, ( xt --) that is the basis of a simple JIT that "steals" code words from the forth kernel and compiles them inline as headless code words, removing NEXT. The XT of these headless words is compiled into the colon definition. This can typically double the speed of critical areas of a program. You can optimize loops and branches in a similar manner with even more compiler complexity. However it is not optimized native code. It is still using the stack machine code on a register machine. (unless you use Forth hardware) There are still a large number PUSH/POP removals that could be done as the simplest optimization for example. Sub-routine threading gives nice speed boost and simpler code inlining for intrinsic instructions, but on 16 bit machines the code is 2X bigger. On some older machines the sub-routine overhead was worse than DTC implementations. That is less relevant today of course where 32bits is the norm. Native code generation is not too difficult but good optimized native code generation is not trivial IMHO and so all the same rules apply that one sees in C naturally. So although **VFX and iForth are very fast they only optimize in small sections of the code at a time. The big C compilers can and do perform multiple passes and so get the chance to optimize things that optimizing native code Forth compilers just don't look at. The C approach can make very fast code but I am told it can also cause *demons to fly out your nose . :-) *un-defined behaviour ** Stephen Pelc has said they stopped comparing VFX to their old compilers when the speed differences hit 12X. :-) |
Hi,
Seems to me that the compilation Just-In-Time (JIT) is an old technique (~ the 80s). Why this technique is not used nowadays in modern Forths which keeps using virtual machines and byte code. I'm talking about Forth not designed to live in bare metal but within an OS (ie Linux). After all, all script languages, even Lua uses this technique. I know only one Forth project using JIT. Why so few? I know the main idea on how to implement JIT and I read a basic example of stack language in an article (unfortunately directly diving inside code and the problems with the security of executing code from segments in Linux but not describing the main principles of JIT).
The text was updated successfully, but these errors were encountered: