-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.0.0 causes crashes during execution #403
Comments
Maybe if you have the code that was working before using this |
The master branch runs the previous version of MIR (v0.x) which works fine. I could try dumping out the asm for both. |
I am not sure how to generate assembly output. I generated the MIR output from the two versions, along with the C input file. |
After doing a search and replace with regular expressions on
Also in a several functions the type of some variables have changed from
|
thank you @mingodad I need to find a reduced example of the issue - I will try to see if I can minimize failing tests; but if the issue is size related then it may be difficult to minimize. |
Probably what I said here |
Running on Ubuntu-18.04 64bits with valgrind:
|
Here is my modified script to run the tests:
|
thank you |
One more piece of info - I wasn't setting an opt level, so it was using the default level. Opt level 0 and 1 do not appear to have the bug. |
It seems related to size somehow. One of the failures is in a file containing many tests (the C code above is from that file); test 88 fails. |
Literally, if I remove one test, it fails after running 1 more test from the end. |
I wonder if the source file is too large causing something to overflow; maybe an integer value is overflowing? Seems to be related to opt level 2. |
for hard to debug stuff like this |
@rofl0r thanks for the suggestion; it may indeed by be a good idea to look back at the commits to see when it starts to fail. But the problem is that the bbv branch had many commits, and I not sure at what points in the commit history the branch was usable. |
just go back to where it branched off. if it works there, you can tag it |
Thank you for working on this bug. Even providing the test case requires a lot of efforts. I really appreciate your involvement. Although I tested the release well but one person is not enough. In order to work on this issue I need some standalone C program (the bug can be in C code, e.g. C undefined behavior can be used, the bug can be in c2mir compiler, although I don't see it in your comparison of different mir versions). The C program should be executable with some result which says that the program works or not. The C code can be very big. It does not matter for me. Having this program I can bisect optimizations and functions of the program and find the bug origin. I understand that providing the standalone program requires even more efforts from you but without such program it is practically impossible to find the bug reason and fix the bug. |
I guess the issue might be related to the size of a function; Lua code that is all part of a single script goes into a single top level function, making it large. It is hard to find such examples in regular C code. Its impossible to create a standalone C program. I think the best approach is as suggested by @rofl0r - but even this is a lot of effort, but I will have a go at it over the weekend. If we can narrow down the commit from when the bug started then it might help find the issue. |
But have you fixed the problems that |
@mingodad don't think those are real problems - its reporting issues with standard C functions. I have ASAN enabled and no failures reported there - but of course I don't think ASAN covers JIT output. |
Results from git bisect:
I realize that this may be an interim issue that may have been fixed - but I would require patch to continue bisecting. CORRECTED |
I am going to do a different bisect run to see if I can find a later point with a different issue |
okay so my second bisect run ignores other errors and focuses on the issue with "large" functions.
To me this looks like a bug::
|
It seems that you didn't understood the
Valgrind is interpreting the machine code and detected use of not initialized memory in
|
It is possible that there is a separate bug (notice that the call stack has luaG_runerror - so it is already in the process of reporting an error). I will have a look at it, but right now I don't think it is related. The code that is being reported is this:
runerror tries to get more info - and its looking for a line number; I think the bug is that this doesn't make sense when using the source to JIT compiler. I will fix this. |
Well, I checked this, and actually what I said above does not apply in Lua tests, because they are compiled from Lua bytecodes. |
I can confirm that reverting this commit fixes the issue. |
Yes, you are absolutely right. I fixed this by 19d4a62 . Although I have a question: how is important generated code speed for the test case on which the problem occurs? W/o this optimization (coalescing), the generated code will be quite bad. It will have a lot of moves as for each phi result and operand a copy is added. When there are a lot vars (and consequently conflict matrix is big), I could do coalescing on live range basis. The current coalescing based on conflict graph has more quality than coalescing based on live-range uses. |
I can confirm the fix worked, thank you for looking into it. I assume this was size related; how is this feature tested? Presumably you do not have a test case that has a large enough function to hit the limit? Re impact, I do not know yet as I haven't measured, but its okay I think to penalize very large functions, rather than fail completely or fail at runtime. I can of course limit JIT compilation based on function size. Usually such large functions are the top level script and the performance of these are unlikely to be important. Another question is that in v0 there was no limit, but this limit was added in v1, is that right? What was the reason for it? |
v0 finds conflicts based on live-ranges. It is difficult to evaluate complexity of the algorithm but I would say it is "close to" linear. Therefore there is no limit. v1 finds conflicts based on conflict graph. It is more superior. For example, v0 considers
when v1 recognizes that there is no conflict. Complexity (in time and space) of conflict graph based algorithm can be quadratic in the worst case. Therefore I introduced a limit. Using v0 algorithm as a backup when we hit the limit of v1 algorithm could be a solution. I'll work on it. It will be not a big addition to the MIR-generator code. |
Ravi tests crash during execution, and I am not yet able to narrow it down. Initial investigation appears to show that if I isolate the code that fails, then it passes, but when run in the larger test program, it fails. This makes me think that its something to do with the size of the function being compiled.
I am flagging this for two reasons:
The failure is basically causing runtime assertions to trigger in Ravi - that means some corruption somewhere.
Example of failure (there are other failures similar to this)
https://github.com/dibyendumajumdar/ravi/actions/runs/9374498129/job/25810662116
The text was updated successfully, but these errors were encountered: