-
Notifications
You must be signed in to change notification settings - Fork 184
Description
Describe the bug
Systems larger than approx 831-833 atoms always crash. This doesn't seem to depend on what the systems are (tried a few different types of systems, from one long linear molecule to many small ones with different atoms, all behave the same), and also doesn't depend on the coordinates (molecules near each other in different orientations, or very far apart). It also doesn’t seem related to the OpenMP stack size.
To Reproduce
Using the provided water278.xyz file:
https://gist.github.com/aizvorski/641a987e7dfa89eba4ce241c68409768#file-water278-xyz
$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G time -v /home/ubuntu/bin/xtb-6.5.1/bin/xtb water278.xyz --gfn 2 --chrg "0"
...
* xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
...................................................
: SETUP :
:.................................................:
: # basis functions 1668 :
: # atomic orbitals 1668 :
: # shells 1112 :
: # electrons 2224 :
: max. iterations 250 :
: Hamiltonian GFN2-xTB :
: restarted? false :
: GBSA solvation false :
: PC potential false :
: electronic temp. 300.0000000 K :
: accuracy 1.0000000 :
: -> integral cutoff 0.2500000E+02 :
: -> integral neglect 0.1000000E-07 :
: -> SCF convergence 0.1000000E-05 Eh :
: -> wf. convergence 0.1000000E-03 e :
: Broyden damping 0.4000000 :
...................................................
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
xtb 000000000305452D Unknown Unknown Unknown
xtb 0000000003271BC0 Unknown Unknown Unknown
xtb 000000000099DF21 xtb_disp_coordina 396 coordinationnumber.f90
xtb 00000000031D4B83 Unknown Unknown Unknown
xtb 0000000003186C16 Unknown Unknown Unknown
xtb 0000000003155085 Unknown Unknown Unknown
xtb 000000000099DCA0 xtb_disp_coordina 396 coordinationnumber.f90
xtb 000000000099B2C8 xtb_disp_coordina 340 coordinationnumber.f90
xtb 00000000008E7399 xtb_scf_mp_scf_.A 519 scf_module.F90
xtb 00000000006125A3 xtb_xtb_calculato 257 calculator.f90
xtb 000000000041800F xtb_prog_main_mp_ 580 main.F90
xtb 000000000042512B MAIN__ 55 primary.f90
xtb 00000000004020EE Unknown Unknown Unknown
xtb 0000000003273060 Unknown Unknown Unknown
xtb 0000000000401FD7 Unknown Unknown Unknown
Command exited with non-zero status 174
Command being timed: "/home/ubuntu/bin/xtb-6.5.1/bin/xtb water278.xyz --gfn 2 --chrg 0"
User time (seconds): 0.15
System time (seconds): 0.03
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 108560
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 28220
Voluntary context switches: 1
Involuntary context switches: 449
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 174
For comparison, an input file water277.xyz with one less water succeeds:
https://gist.github.com/aizvorski/7b4215388491126090ba83b6ae4ab341#file-water277-xyz
$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G time -v /home/ubuntu/bin/xtb-6.5.1/bin/xtb water277.xyz --gfn 2 --chrg "0"
...
* xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
...................................................
: SETUP :
:.................................................:
: # basis functions 1662 :
: # atomic orbitals 1662 :
: # shells 1108 :
: # electrons 2216 :
: max. iterations 250 :
: Hamiltonian GFN2-xTB :
: restarted? true :
: GBSA solvation false :
: PC potential false :
: electronic temp. 300.0000000 K :
: accuracy 1.0000000 :
: -> integral cutoff 0.2500000E+02 :
: -> integral neglect 0.1000000E-07 :
: -> SCF convergence 0.1000000E-05 Eh :
: -> wf. convergence 0.1000000E-03 e :
: Broyden damping 0.4000000 :
...................................................
iter E dE RMSdq gap omega full diag
1 -1415.6386943 -0.141564E+04 0.204E-07 8.73 0.0 T
2 -1415.6386943 0.886757E-11 0.119E-07 8.73 29040.2 T
3 -1415.6386943 -0.106866E-10 0.207E-08 8.73 100000.0 T
*** convergence criteria satisfied after 3 iterations ***
# Occupation Energy/Eh Energy/eV
-------------------------------------------------------------
1 2.0000 -0.7271272 -19.7861
... ... ... ...
1102 2.0000 -0.3682050 -10.0194
1103 2.0000 -0.3664023 -9.9703
1104 2.0000 -0.3625255 -9.8648
1105 2.0000 -0.3584824 -9.7548
1106 2.0000 -0.3570151 -9.7149
1107 2.0000 -0.3556497 -9.6777
1108 2.0000 -0.3359206 -9.1409 (HOMO)
1109 -0.0151621 -0.4126 (LUMO)
1110 -0.0061251 -0.1667
1111 0.0011029 0.0300
1112 0.0020212 0.0550
1113 0.0029399 0.0800
... ... ...
1662 0.4675880 12.7237
-------------------------------------------------------------
HL-Gap 0.3207585 Eh 8.7283 eV
Fermi-level -0.1755413 Eh -4.7767 eV
SCC (total) 0 d, 0 h, 0 min, 17.350 sec
SCC setup ... 0 min, 0.037 sec ( 0.211%)
Dispersion ... 0 min, 0.080 sec ( 0.462%)
classical contributions ... 0 min, 0.011 sec ( 0.063%)
integral evaluation ... 0 min, 0.634 sec ( 3.651%)
iterations ... 0 min, 11.684 sec ( 67.342%)
molecular gradient ... 0 min, 4.016 sec ( 23.145%)
printout ... 0 min, 0.889 sec ( 5.125%)
:::::::::::::::::::::::::::::::::::::::::::::::::::::
:: SUMMARY ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::
:: total energy -1405.892124104588 Eh ::
:: gradient norm 0.203225946340 Eh/a0 ::
:: HOMO-LUMO gap 8.728283439762 eV ::
::.................................................::
:: SCC energy -1415.638694336316 Eh ::
:: -> isotropic ES 8.569566870483 Eh ::
:: -> anisotropic ES -0.289563022977 Eh ::
:: -> anisotropic XC -0.213130853940 Eh ::
:: -> dispersion -0.253146647874 Eh ::
:: repulsion energy 9.734657198491 Eh ::
:: add. restraining 0.000000000000 Eh ::
:: total charge -0.000000000003 e ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::
...
-------------------------------------------------
| TOTAL ENERGY -1405.892124104588 Eh |
| GRADIENT NORM 0.203225946340 Eh/α |
| HOMO-LUMO GAP 8.728283439762 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2022/10/02 at 00:43:41.395
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 18.069 sec
* cpu-time: 0 d, 0 h, 0 min, 18.065 sec
* ratio c/w: 1.000 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 17.377 sec
* cpu-time: 0 d, 0 h, 0 min, 17.376 sec
* ratio c/w: 1.000 speedup
normal termination of xtb
Command being timed: "/home/ubuntu/bin/xtb-6.5.1/bin/xtb water277.xyz --gfn 2 --chrg 0"
User time (seconds): 17.69
System time (seconds): 0.37
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.07
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 594568
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 228439
Voluntary context switches: 1
Involuntary context switches: 483
Swaps: 0
File system inputs: 0
File system outputs: 368
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
This does not appear to be due to out-of-memory, or to too-low setting for OMP_STACKSIZE. The machine this was tested on has >200GB memory. The actual memory used when the crash happens (reported by time -v) is just a little over 100MB.
Setting the stack size deliberately very low with largest input system which succeeds, water277.xyz:
OMP_STACKSIZE=1M OMP_NUM_THREADS=1succeeds - the stack size seems to not matter when there is only one threadOMP_STACKSIZE=50M OMP_NUM_THREADS=2succeedsOMP_STACKSIZE=20M OMP_NUM_THREADS=2fails, the exact failure seems non-deterministic - either SIGSEGV in xtb_coulomb_klopm during the iterations, or "Command terminated by signal 11" after iterations finish
GDB backtrace:
$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G gdb /home/ubuntu/bin/xtb-6.5.0/bin/xtb
(gdb) run water278.xyz --gfn 1 --chrg "0"
Program received signal SIGSEGV, Segmentation fault.
0x000000000099cf41 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
(gdb) bt
#0 0x000000000099cf41 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
#1 0x00000000031d3f83 in __kmp_invoke_microtask ()
#2 0x0000000003186016 in __kmp_fork_call ()
#3 0x0000000003154485 in __kmpc_fork_call ()
#4 0x000000000099ccc0 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
#5 0x000000000099a438 in xtb_disp_coordinationnumber_mp_getcoordinationnumberlp_ ()
#6 0x00000000008e6429 in xtb_scf_mp_scf_.A ()
#7 0x0000000000611d33 in xtb_xtb_calculator_mp_singlepoint_.A ()
#8 0x00000000004177f3 in xtb_prog_main_mp_xtbmain_.A ()
#9 0x000000000042492b in MAIN__ ()
Expected behaviour
No crash.
Additional context
Using xtb 6.5.1 binary downloaded from https://github.com/grimme-lab/xtb/releases/download/v6.5.1/xtb-6.5.1-linux-x86_64.tar.xz
xtb --version gives version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
OS: Ubuntu 18.04.4 LTS
Hardware: AMD EPYC 7B13 CPU, 224GB RAM
(also tested on Ubuntu 20.04 LTS, Intel i7-10510U, 48GB RAM: same behavior)
(also tested on xtb-6.5.0 and 6.4.1: same)