Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64bit expressions used in #[el]if statements #1389

Closed
omm opened this issue Jan 30, 2025 · 26 comments
Closed

64bit expressions used in #[el]if statements #1389

omm opened this issue Jan 30, 2025 · 26 comments
Labels
bug C C compiler

Comments

@omm
Copy link

omm commented Jan 30, 2025

Hi,

@jmalak can you look at the original message 64bit expressions used in #[el]if statements about subj?

Thanks

@alcz
Copy link

alcz commented Jan 30, 2025

It happens on Win32/Win64 hosted compilers (I tested both). Linux hosted is said to be okay.

This was done by Przemyslaw to workaround truncations in these macros:
harbour/core@e66aab7
harbour/core@6c8c5cd

@jmalak
Copy link
Member

jmalak commented Jan 30, 2025

The issue is caused by C standard.
C89 (default for OW) use 32-bit expression in pre-processor.
C99 define intmax_t/uintmax_t expression size in pre-processor, you need to use -za99 C compiler option to this behaviour.
I don't believe that on Linux it is different, it has nothing to do with host platform.

@alcz
Copy link

alcz commented Jan 30, 2025

I suppose "LONGLONG" ints shouldn't be enabled in compiler too if preprocessor in this mode is strictly limited? Why enable 64-bit ints for compiler in C89 mode?

@jmalak
Copy link
Member

jmalak commented Jan 30, 2025

It is for backward compatibility with previous OW and Watcom versions and compatibility with C standard.
Any C89/C90 compliant compiler will have same problem.
Anyway such code uses undefined behaviour for C89/90 standard.

@alcz
Copy link

alcz commented Jan 30, 2025

Now i'm just curious what was different in a ~2020 OW, which didn't manifest anything like that...

@jmalak
Copy link
Member

jmalak commented Jan 30, 2025

Do you want to say that in current version is bug?
If yes, then specify which one and we can fix it.
But if in older version something worked in undefined behaviour then it doesn't means now it is wrong if doesn't.

@jmalak
Copy link
Member

jmalak commented Jan 30, 2025

New preprocessor was introduced by following commit

Commit 546c585
jmalak
committed
on Dec 6, 2020
implement new preprocessor constant expression processing with support for C99 standard (modified copy of ppexpn.c from C++ compiler)
in C89/90 mode constant expression is evaluated with 32-bit precision
in C99 mode constant expression is evaluated with 64-bit precision (OW intmax_t type)
master
·
Last-CI-build
2021-01-01-Build

@alcz
Copy link

alcz commented Jan 30, 2025

Version 2.0 beta Feb 5 2020 06:09:49 (32-bit)
is something I incidentally have on disk here. If you're interested, we can probably make a self contained example in a spare time. As far as undefined behaviours go. I can recheck current Linux host for these truncations too, if it's true that is different. I didn't test that myself.

@jmalak
Copy link
Member

jmalak commented Jan 30, 2025

It is irrelevant what was doing older version, only C standard is relevant.
If it is something in undefined behaviour no reason to change it, but if it is against standard then it is bug and must be fixed.

@alcz
Copy link

alcz commented Jan 30, 2025

Hmm, don't need to explain that , we're quite used to what's possible under ANSI C. Anyway OpenWatcom also set up kind of it's own standard after all these years ;-)

@alcz
Copy link

alcz commented Jan 30, 2025

Version 2.0 beta Feb 5 2020 06:09:49 (32-bit) here "OK"
current version #error's with BUG block uncommented

Older didn't truncate PP in this scenario either. We looked as far as Open Watcom C++ 12.70.8 (32-bit) on our hbtest for similar regression.

#include "limits.h"
#include "stdint.h"
#include "stdio.h"

/* extracted from hbdefs.h */
   #if ! defined( LONGLONG_MAX )
      #if defined( _I64_MAX )
         #define LONGLONG_MAX       _I64_MAX
      #elif defined( LLONG_MAX )
         #define LONGLONG_MAX       LLONG_MAX
      #elif defined( LONG_LONG_MAX )
         #define LONGLONG_MAX       LONG_LONG_MAX
      #else
         #define LONGLONG_MAX       9223372036854775807LL
      #endif
   #endif

void main()
{

#if LONGLONG_MAX < LONG_MAX
    #error "BUG"
#endif

   printf("LONG_MAX %d\n", LONG_MAX );
   printf("LONGLONG_MAX %I64d\n", LONGLONG_MAX );
}

(redacted, preprocessor case only)

@jmalak
Copy link
Member

jmalak commented Jan 30, 2025

Sorry, but C++ compiler work with different standard C++98 I don't know how much it is different from C89 or C99.
Anyway try to use some expression, because we are talking about calculation in preprocessor not in C code.
Your case is only about macro definition and macro value, which should be OK.

Maybe it looks like old version worked correctly in some situation but how can have precision if all internal variables were 32-bit. It is not about macro value.

@alcz
Copy link

alcz commented Jan 30, 2025

Case just shows what we know about it, thanks for looking at it. Lastly, I can confirm that cross building from Linux results are the same as Win32/Win64 - there is no difference between host platforms, so the "Win32 version" may be discarded from title of this issue.

@omm omm changed the title 64bit expressions used in #[el]if statements in Win32 version 64bit expressions used in #[el]if statements Jan 30, 2025
@alcz
Copy link

alcz commented Feb 1, 2025

Older preprocessor overflowed on ops (as one could expect), but somehow processed comparisons of constants in macros.

#if LONG_MAX + 1 < LONG_MAX
  #error "BUG arithmetic"
#endif

is different to (in pre December 2020)

#if 2147483648 < 2147483647
  #error "BUG comparison"
#endif

If compiler exposes 64-bit datatypes in C89 mode, then preprocessor I think should at least emit truncation warnings, as it may result in very unsafe code silently, if someone trusted OpenWatcom's preprocessor.

Also there is this what fails now:

#if 2147483648I64 < 2147483647I64
  #error "BUG comparison I64"
#endif

@jmalak
Copy link
Member

jmalak commented Feb 1, 2025

Thanks for you investigation.

But I think old behaviour mask the fact that all processing of expressions is 32-bit because comparison of simple constant items doesn't go to the expression evaluation and was processed differently.
But in latest version where switching is strict between 32 and 64-bit processing it applied also on comparison of simple constant items.

Question is what is correct?

anyway you look on this from 64-bit point of view, but earlier it was 32-bit world and nobody has this problem that signal something as overflow was non-sense and is not required by standard.

Take into account that

#if x < y

and

#if x - 1 + 1 < y

can be a problem in old version.

@alcz
Copy link

alcz commented Feb 1, 2025

Such warning, if you implement it, would be not for C standards of course (if having the cleanest C89 compiler is your goal). That would be for OpenWatcom users that implemented some 64-bit code over the years and will upgrade to current. They could have not use -za99 just like us, because 64-bit int was available anyway.

@jmalak
Copy link
Member

jmalak commented Feb 1, 2025

I looked at it from all sides and came up with the following less incorrect solution.

I will modify OW 2.0 to use only 64-bit processing in the pre-processor for both standards (C89 and C99).

Here are some facts from history and reasons for my decision.

  • WATCOM compiler was designed as 32-bit according to the C89 standard
  • C89 standard did not have 64-bit types
  • WATCOM I don't know in which version, added 64-bit types as extension, but never completely fixed the pre-processor to process 64-bit numbers correctly, apparently they only partially modified the pre-processor and did not solve the rest of problems
  • the preprocessor in all previous versions of OW has these bugs too by design
  • there is no point in going back to some previous solution, because they all had these bugs by design

I assume that no one will have a problem with that, the pre-processor will work correctly even if it is not 100% backward compatible.

@jmalak jmalak added bug C C compiler labels Feb 1, 2025
@jmalak
Copy link
Member

jmalak commented Feb 1, 2025

Now preprocessor of C compiler is fixed that use 64-bit precision for constant expression evaluation in both modes C89 and C99.
But it has nothing to do with your problem.
You need understand what is preprocesor number token and how it is evaluated.

By example by C standard folowing text is simple preprocessor number token 0XZB54.
What value has this preprocessor number?
It is reason why test with number 2147483648I64 is failing, because preprocessor is not able determine value of this preprocessor number. Preprocessor expression evaluator don't understand I64 on the end of this token.
It is terrible but for preprocessor any token starting with dot or digit is number token and can contain any characters except keywords etc.. It can be very bizard what is preprocessor doing because preprocessor number is different from C number.

You can try it with other compilers but using of 2147483648I64 in preprocessor #if condition is invalid.

Probably some diagnostics message should be add into preprocessor expression evaluator if incorrect number is passed to preprocessor, because it can be typical mistake. most of programmers don't know how preprocesor tokenize input stream and what preprocessor number is and think that preprocessor process C numbers but it is not true.

@jmalak
Copy link
Member

jmalak commented Feb 2, 2025

C compiler preprocesor number doesn't have any type and by example don't know octal or hexadecimal number format only integer numbers in decadic representation therefore you can not use most of C macros in preprocessor #if statement because results can be really surprising.

@jmalak
Copy link
Member

jmalak commented Feb 2, 2025

Anyway I will check if reprocessing tokens in constant expression in preprocessor is correct with regards to C99 standard.
There are some assumption about replacement identifiers by 0 etc. I need review also C standard about limits.h macros value if there is defined some requirement about format with regard to preprocessor. I thing if there are format nnnnnnI64 it is not compatible with preprocessor and shoud be changed to nnnnnnnnnn only etc. the source of problems can be also there.
If you know about some testing sources for check preprocessor functionality it helps me.

@jmalak
Copy link
Member

jmalak commented Feb 2, 2025

Please recheck with fixed precision of constant expression evaluation.

#if 2147483648I64 < 2147483647I64
#error "BUG comparison I64"
#endif

works for me in both modes.

jmalak added a commit to jmalak/open-watcom-v2 that referenced this issue Feb 2, 2025
correct preprocessor in C89 mode to use 64-bit precision for expression evaluation

Now same behavior in C89 and C99 mode

correct now wrong tests for preprocessor after fixing precision
jmalak added a commit that referenced this issue Feb 2, 2025
correct preprocessor in C89 mode to use 64-bit precision for expression evaluation

Now same behavior in C89 and C99 mode

correct now wrong tests for preprocessor after fixing precision
@alcz
Copy link

alcz commented Feb 2, 2025

Thank you for progressing with these. Yes, the idea of this2147483647I64, etc. was extracted by me from OW limits.h, so it was checked rather out of curiosity what if I'm using them indirectly.

@jmalak
Copy link
Member

jmalak commented Feb 2, 2025

I was not sure how it is going exactly it is long time than I touch pre-processing I remember that it is really a little strange.
It is first tokenizing by preprocessor rules to preprocessor tokens and if it is #if expression then it is retokenized to C tokens and next evaluated because you need follow C stndard translation phases (I thik there are 7 levels).
OW doesn't use preprocessed source for next phase it do it by these retokenizing during input stream tokenization.

I hope that it will works, because there should not be change, change is only in expression evaluation on last tokenization that it should process C tokens. It means that 2147483647I64 should be processed without problems because it is valid C tokens 2147483647 + I + 64 and it is processed as constant number.

@jmalak
Copy link
Member

jmalak commented Feb 2, 2025

Please recheck and confirm if OK or you have still some issues.

@alcz
Copy link

alcz commented Feb 2, 2025

I've downloaded Last-CI-build and redone the tests, so far it looks good to me. Thanks Jiří!

@jmalak
Copy link
Member

jmalak commented Feb 2, 2025

Thanks for confirmation.

@jmalak jmalak closed this as completed Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug C C compiler
Projects
None yet
Development

No branches or pull requests

3 participants