Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed: d[]*PRP_BASE result has unexpected carryout! #33

Open
efvb opened this issue Jun 13, 2024 · 8 comments
Open

Assertion failed: d[]*PRP_BASE result has unexpected carryout! #33

efvb opened this issue Jun 13, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@efvb
Copy link

efvb commented Jun 13, 2024

I am using code from main, May 5th, compiled with GCC 11 with tuned Makefile with -O3 -flto=auto -DUSE_ARM_V8_SIMD -mcpu=neoverse-n1
This is an Ampere Arm VM.

I also have the file to restart, if needed.

Mlucas 21.0.1

https://www.mersenneforum.org/mayer/README.html

INFO: testing qfloat routines...
System total RAM = 23236, free RAM = 11262
INFO: 11262 MB of free system RAM detected.
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 11.4.1 20230605 (Red Hat 11.4.1-2.1.0.1).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using prefetch.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing IMUL routines...
INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
Set affinity for the following 4 cores: 0.1.2.3.
User did not set LowMem in mlucas.ini ... allowing all test types.
User did not set CheckInterval in mlucas.ini ... using default.
NTHREADS = 4
Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
looking for worktodo.txt file...
worktodo.txt file found...reading next assignment...
worktodo.txt entry: PRP=16A20FCE71DF730C49E6B299E558CC11,1,2,11396863,-1,99,0,3,5,"61547191186328636909209"

INFO: Maximum recommended exponent for FFT length (576 Kdbl) = 11450805; p[ = 11396863]/pmax_rec = 0.9952892395.
Initial DWT-multipliers chain length = [hiacc] in carry step.
INFO: restart file p11396863 found...reading...
Computing 2990000-squaring residue R (mod known prime q = 61547191186328636909209)
A: R == 44566622423702911301179 (mod q)
B: R == 44566622423702911301179 (mod q)
mers_mod_square: Init threadpool of 4 threads
Using 4 threads in carry step
ERROR: at line 2157 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!

[1]+ Exit 1 ./Mlucas -cpu 0:3

@tdulcet
Copy link
Member

tdulcet commented Jun 13, 2024

Thanks for the bug report!

As with #10, it would be helpful if you could try disabling LTO, so we could rule that out.

If that does not help, could you add this printf call:

printf("c_uint64_ptr[j] = %llu\n\nj = %u\n", c_uint64_ptr[j], j);

above that assert statement on line 2157:

Mlucas/src/Mlucas.c

Lines 2154 to 2157 in 1839858

// Use mi64 routines to compute d[]*PRP_BASE and do ensuing equality check:
itmp64 = ((MODULUS_TYPE == MODULUS_TYPE_FERMAT) ? 3ull : (uint64)PRP_BASE); // Fermat-mod uses PRP_BASE to store 2 for random-shift-offset scheme
c_uint64_ptr[j] = mi64_mul_scalar(c_uint64_ptr, itmp64, c_uint64_ptr, j);
ASSERT(HERE, c_uint64_ptr[j] == 0ull, "d[]*PRP_BASE result has unexpected carryout!");
This might provide us more information about why the assert statement is failing.

You might also want to try forcing a larger FFT length with the -fft option, as this is very close to the limit and could be hitting a roundoff error (ROE). Would you mind sharing the .stat file for this exponent, which would show the ROE values.

@efvb
Copy link
Author

efvb commented Jun 19, 2024

p11396863.stat.txt

Hi
Stat file attached. I will recompile and retry over the weekend.

@efvb
Copy link
Author

efvb commented Jun 25, 2024

Now compiled without -LTO and with the printf line added, same issue but more info.
(edited)

Mlucas 21.0.1

https://www.mersenneforum.org/mayer/README.html

INFO: testing qfloat routines...
System total RAM = 23236, free RAM = 2102
INFO: 2102 MB of free system RAM detected.
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 11.4.1 20230605 (Red Hat 11.4.1-2.1.0.1).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using prefetch.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing IMUL routines...
INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs...
INFO: System has 4 available processor cores.
INFO: testing FFT radix tables...
Set affinity for the following 4 cores: 0.1.2.3.
User did not set LowMem in mlucas.ini ... allowing all test types.
User did not set CheckInterval in mlucas.ini ... using default.
NTHREADS = 4
Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
looking for worktodo.txt file...
worktodo.txt file found...reading next assignment...
worktodo.txt entry: PRP=16A20FCE71DF730C49E6B299E558CC11,1,2,11396863,-1,99,0,3,5,"61547191186328636909209"

INFO: Maximum recommended exponent for FFT length (576 Kdbl) = 11450805; p[ = 11396863]/pmax_rec = 0.9952892395.
Initial DWT-multipliers chain length = [hiacc] in carry step.
INFO: restart file p11396863 found...reading...
Computing 2990000-squaring residue R (mod known prime q = 61547191186328636909209)
A: R == 44566622423702911301179 (mod q)
B: R == 44566622423702911301179 (mod q)
mers_mod_square: Init threadpool of 4 threads
Using 4 threads in carry step

c_uint64_ptr[j] = 1

j = 178076
ERROR: at line 2159 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!

@efvb
Copy link
Author

efvb commented Jun 25, 2024

which -fft value should I try?

@tdulcet
Copy link
Member

tdulcet commented Jun 25, 2024

Thank you for the additional information. From the provided .stat file, I see the max ROE before this failure was 0.3125, which should be OK.

Now compiled without -O3

Sorry, I meant building it without Link Time Optimization (LTO) enabled, so without the -flto=auto flag. It may be easier to just run the makemake.sh script, which should automatically use the correct (known stable) compiler options for the system.

which -fft value should I try?

The next largest one listed in your mlucas.cfg file, most likely 640K, so try with -fft 640.

CC: @xanthe-cat

@efvb
Copy link
Author

efvb commented Jun 26, 2024

@tdulcet,
I really meant to write without LTO instead of 03.
I used this line
CFLAGS = -fdiagnostics-color -Wall -g -O3 -mcpu=neoverse-n1 -DUSE_ARM_V8_SIMD # -flto=auto

I used -fft 640 and same error happened.

I will try a run from the beginning without LTO to reproduce the issue.

@efvb
Copy link
Author

efvb commented Jun 26, 2024

same issue from a fresh run without LTO.

...
INFO: Maximum recommended exponent for FFT length (576 Kdbl) = 11450805; p[ = 11396863]/pmax_rec = 0.9952892395.
Initial DWT-multipliers chain length = [hiacc] in carry step.
INFO: restart file p11396863 found...reading...
Computing 5990000-squaring residue R (mod known prime q = 61547191186328636909209)
A: R == 26966841112747424620161 (mod q)
B: R == 26966841112747424620161 (mod q)
mers_mod_square: Init threadpool of 4 threads
Using 4 threads in carry step
c_uint64_ptr[j] = 1

j = 178076
ERROR: at line 2159 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!

@tdulcet tdulcet added the bug Something isn't working label Jun 26, 2024
@tdulcet
Copy link
Member

tdulcet commented Jul 3, 2024

Thanks for running it again and for the information.


Another user reported a similar issue today when performing a regular PRP test. Here was their provided output:

Mlucas 20.1.1

    http://www.mersenneforum.org/mayer/README.html

INFO: testing qfloat routines...
INFO: 32768 MB of available system RAM detected.
CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 15.0.0 (clang-1500.3.9.4).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using prefetch.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. 
INFO: testing IMUL routines...
INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs...
INFO: System has 10 available processor cores.
INFO: testing FFT radix tables...
User did not set LowMem in mlucas.ini ... allowing all test types.
User did not set CheckInterval in mlucas.ini ... using default.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
 looking for worktodo.txt file...
 worktodo.txt file found...reading next assignment...
 worktodo.txt entry: PRP=3425336B523EA2F0C86FD2E9670FE12F,1,2,127979711,-1,77,0
INFO: Maximum recommended exponent for FFT length (7168 Kdbl) = 134847983; p[ = 127979711]/pmax_rec = 0.9490665574.
Initial DWT-multipliers chain length = [long] in carry step.
 INFO: restart file p127979711 found...reading...
mers_mod_square: Init threadpool of 1 threads
Using 1 threads in carry step
ERROR: at line 2157 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!

and the end of their .stat file for this exponent:

[2024-07-01 21:16:23] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:09.197 [ 48.9198 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103215234. MaxErr = 0.156250000. Residue shift count = 56244167.
Restarting M127979711 at iteration = 3990000. Res64: F877F2E964401A12, residue shift count = 37917547
M127979711: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 37917547
This gives an average   17.435851914542063 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices       224        32        32        16
[2024-07-03 08:07:25] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:14.202 [ 49.4202 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103210938. MaxErr = 0.156250000. Residue shift count = 56244167.
Restarting M127979711 at iteration = 3990000. Res64: F877F2E964401A12, residue shift count = 37917547
M127979711: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 37917547
This gives an average   17.435851914542063 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices       224        32        32        16
[2024-07-03 11:25:23] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:17.074 [ 49.7075 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103210938. MaxErr = 0.156250000. Residue shift count = 56244167.
Restarting M127979711 at iteration = 3990000. Res64: F877F2E964401A12, residue shift count = 37917547
M127979711: using FFT length 7168K = 7340032 8-byte floats, initial residue shift count = 37917547
This gives an average   17.435851914542063 bits per digit
The test will be done in form of a 3-PRP test.
Using complex FFT radices       224        32        32        16
[2024-07-03 12:20:21] M127979711 Iter# = 4000000 [ 3.13% complete] clocks = 00:08:12.683 [ 49.2684 msec/iter] Res64: EE265C0AAA7538E9. AvgMaxErr = 0.103210938. MaxErr = 0.156250000. Residue shift count = 56244167.

Forcing a larger FFT length also did not help:

% ./Mlucas -fft 8192

    Mlucas 20.1.1

    http://www.mersenneforum.org/mayer/README.html

INFO: testing qfloat routines...
INFO: 32768 MB of available system RAM detected.
CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 15.0.0 (clang-1500.3.9.4).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
INFO: Using prefetch.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. 
INFO: testing IMUL routines...
INFO: Testing 64-bit 2^p (mod q) functions with 100000 random (p, q odd) pairs...
INFO: System has 10 available processor cores.
INFO: testing FFT radix tables...
User did not set LowMem in mlucas.ini ... allowing all test types.
User did not set CheckInterval in mlucas.ini ... using default.
No CPU set or threadcount specified ... running single-threaded.
Set affinity for the following 1 cores: 0.
Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
 looking for worktodo.txt file...
 worktodo.txt file found...reading next assignment...
 worktodo.txt entry: PRP=3425336B523EA2F0C86FD2E9670FE12F,1,2,127979711,-1,77,0
INFO: Maximum recommended exponent for FFT length (7168 Kdbl) = 134847983; p[ = 127979711]/pmax_rec = 0.9490665574.
Initial DWT-multipliers chain length = [long] in carry step.
 INFO: restart file p127979711 found...reading...
mers_mod_square: Init threadpool of 1 threads
Using 1 threads in carry step
ERROR: at line 2157 of file ../src/Mlucas.c
Assertion failed: d[]*PRP_BASE result has unexpected carryout!

Both of the effected users are using ARM systems with ASIMD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants