Skip to content

sethtroisi/prime-gap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b64c160 · Feb 3, 2025
Feb 2, 2021
Jan 19, 2021
Oct 28, 2020
Aug 9, 2024
Aug 25, 2023
Aug 25, 2023
Jan 19, 2021
Aug 25, 2023
Feb 5, 2020
Jul 16, 2021
Aug 25, 2023
Aug 25, 2023
Oct 1, 2021
Feb 3, 2025
Aug 3, 2021
Aug 9, 2024
Aug 3, 2021
Aug 5, 2021
Aug 2, 2021
Aug 5, 2021
Aug 25, 2023
Feb 3, 2025
Aug 2, 2021
Aug 2, 2021
Aug 9, 2024
Jul 28, 2021
Aug 25, 2023
May 18, 2021
Jul 14, 2021

Repository files navigation

Combined Sieve - a new program to find prime gaps.

A fast prime gap searching suite.

Table of Contents

Table of contents generated with markdown-toc

Overview

This is a suite of tools (combined_sieve, gap_stats, gap_test.py and gap_test_gpu) which are useful for searching for prime gaps centered around m * P#/d.

The flow is to sieve many intervals with combined_sieve --save -u <filename>

Then calculate statistics about these intervals with gap_stats --save -u <filename>

Then test some portion of these intervals with gap_test.py -u <filename>.txt --prp-top-percent 25

To quickly test a known 30 merit prime gap (Setup must already be complete)

$ make
$ sqlite3 test.db < schema.sql
$ FN=809_1841070_11244000_1000_s15000_l200M.txt
$ ./combined_sieve --save --search-db test.db -u $FN
$ ./gap_stats --save --search-db test.db -u $FN --min-merit 20
$ ./gap_test.py --search-db test.db -u $FN
...
23142  30.1485  11244911 * 809#/1841070 -5788 to +17354	RECORD!
...

see [EXTENDED.md](for labourious notes and details)

Tools

See detailed instructions on each combined_sieve, gap_stats, gap_test, and more in EXTENDED.md tools.

Flow

  1. combined_sieve
    • Takes a set of parameters ([m_start, m_start + m_inc), P#, d, sieve-length, max-prime)
      • Use combined_sieve --mstart 1 --minc 100 --max-prime 1 -d 4 -p <P> for some insight on choose of d
    • Performs combined sieve algorithm
    • Produces <PARAMS>.txt, list of all numbers near m * P#/d with no small factors.
      • each row is m: -count_low, +count_high | -18 -50 -80 ... | +6 +10 +12 ...
    • Writes initial range entry (with time_sieve) into range table.
  2. gap_stats
    • Calculates some statistics over each range in the interval
      • Expected gap in both directions
      • Probability of being a record gap, of being a missing gap, of being gap > --min-merit
    • Store statistics in m_stats table
    • Compute prob(gap) over all m and save in range_stats table
  3. gap_test_simple / gap_test.py --unknown-filename <PARAM>.txt
    • Runs PRP tests on most likely m's from <PARAM>.txt
    • Stores results (as they are calculated in result & m_stats
      • Allows for saving of progress and graceful restart
    • gap_test.py can produce graphs of distribution (via --plots)

Database

Often I run combined_sieve and gap_stats for a BUNCH of ranges not I never end up testing.

misc/show_ranges.sh updates and shows ranges from prime-gap-search.db.

misc/drop_range.sh deletes ranges when none of them have ever been tested

misc/record_check.py searches through prime-gap-search.db for records (over gaps.db)

misc/finaliz.py -p <P> -d <D> dumps m_stats to <UNKNOWN>.csv and <UNKNOWN>.stats.txt files. It also DELETES DATA from prime-gap-search.db use it when you are done searching that P#/d


misc/run.sh <UNKNOWN_FILENAME> runs combined_sieve, gap_stats, and prints the gap_test.py command

misc/status.py to check if a filename has been processed 100% (and can be deleted)

Benchmarks

See EXTENDED.md benchmark and slightly out of date BENCHMARKS.md

Theory

Theory and justification for some calculations in present in THEORY.md

Setup

#$ sudo apt install libgmp10 libgmp-dev
#$ sudo apt install mercurial build-essential automake autoconf bison make libtool texinfo m4
# Building gmp from source is required until 6.3
$ hg clone https://gmplib.org/repo/gmp/
$ cd gmp
$ ./.bootstrap
$ mkdir build
$ cd build
$ ../configure
$ make
$ make check
$ make install
$ sudo apt install sqlite3

$ sudo apt install libmpfr-dev libmpc-dev libomp-dev libsqlite3-dev libbenchmark-dev

# Make sure this is >= python3.7 and not python2
$ python -m pip install --user gmpy2==2.1.0b5 primegapverify

# For misc/double_check.py
$ sudo apt install gmp-ecm
$ sudo apt install sympy
$ git clone https://github.com/sethtroisi/prime-gap.git
$ cd prime-gap

$ mkdir -p logs/ unknowns/
$ sqlite3 prime-gap-search.db < schema.sql

# There are a handful of constants (MODULE_SEARCH_SECS, PRIME_RANGE_SEC, ...)
# in gap_common.cpp `combined_sieve_method2_time_estimate` that can be set for
# more accurate `combined_sieve` timing.
  • prime-gap-list
    $ git clone --depth 10 https://github.com/primegap-list-project/prime-gap-list.git
    $ sqlite3 gaps.db < prime-gap-list/allgaps.sql

Notes

  • gap_test_gpu is experimental using https://github.com/NVlabs/CGBN
    • It's much faster for <2,048 bits.
  • Clang (CC=clang++-12) might speed up combined_sieve slightly.
  • There is some support for using OpenPFGW PrimeWiki SourceForge
    • Unpack somewhere and link binary as pfgw64
    • may also need to chmod a+x pfgw64
    • modify is_prime in gap_utils.py
  • Multiple layers of verification of combined_sieve
    • Can compare --method1 output with --method2
    • Can add use make clean combined_sieve VALIDATE_FACTORS=1 or make clean combined_sieve VALIDATE_LARGE=1
    • misc/double_check.py double checks using ecm, to check that small factor aren't found for numbers in unknown-file.txt
      • python misc/double_check.py --unknown-filename <unknown_file.txt> -c 10
    • skipped PRP/s is checked in THEORY.md
  • Compression
    • method1 doesn't support compression at this time so it also makes verifying combined_sieve slightly harder.
      • misc/convert_rle.py can convert existing files to RLE if you want to double check / shrink old files.
    • Run Length Encoding
      • can be forced with --rle` saves ~60% of space
    • --bitcompress
      • Saves ~95% of space, but makes code very hard
      • Stores 7 bits per byte (to avoid ascii control characters) of is_composite array
      • Reduces size of is_composite by removing any number coprime to K# or d

Quick test of all functions

This is somewhat automated with ./misc/tests.sh which should print a green Success!

quick test commands and output

$ PARAMS="-p 907 -d 2190 --mstart 1 --minc 200 --max-prime 100 --sieve-length 11000"
$ make combined_sieve gap_stats gap_test_simple
$ time ./combined_sieve --method1 -qqq --save-unknowns $PARAMS
$ time ./combined_sieve           -qqq --save-unknowns $PARAMS
$ md5sum unknowns/907_2190_1_200_s11000_l100M.{txt,m1.txt}
15a5cbff7301262caf047028c05f0525  unknowns/907_2190_1_200_s11000_l100M.txt
15a5cbff7301262caf047028c05f0525  unknowns/907_2190_1_200_s11000_l100M.m1.txt

$ ./gap_stats --unknown-filename 907_2190_1_200_s11000_l100M.txt
# Verify RECORD @ 1,7,13,101,137
# Verify "avg missing prob : 0.0000000"
# Verify "RECORD : top  50% (    26) sum(prob) = 1.50e-05 (avg: 5.77e-07)"
# Verify "RECORD : top 100% (    53) sum(prob) = 2.15e-05 (avg: 4.06e-07)"

# Optionally validate combined_sieve results with ecm
$ python misc/double_check.py --seed=123 --unknown-filename unknowns/907_2190_1_200_s11000_l100M.txt -c 5 --B1 1000
1 * 907# / 2190 	unknowns 218 + 212 = 430
			+1423  had trivial factor of 2 skipping
	+6570 	should have small factor
		ecm: 2099 (1*907#/2190+6570)/2099
			+1373  had trivial factor of 2 skipping
			-1439  had trivial factor of 2 skipping
			+7789  had trivial factor of 2 skipping
			-2306  had trivial factor of 3 skipping
			+4377  had trivial factor of 2 skipping
	-4222 	should have small factor
		ecm: 3325199 (1*907#/2190+-4222)/3325199
		factor 3325199: 1319 2521
...
17 * 907# / 2190 	unknowns 216 + 208 = 424
	+2722 	shouldn't have small factor
		ecm: 17*907#/2190+2722
...
4150 trivially composite, 86 unknowns, 179 known composites
ecm found 178 composites: known 171 composite, 7 were unknown


$ ./gap_test_simple --unknown-filename 907_2190_1_200_s11000_l100M.txt -q --min-merit 8
Testing m * 907#/2190, m = 1 + [0, 200)
Min Gap ~= 7734 (for merit > 9.0)

Reading unknowns from '907_2190_1_200_s11000_l100M.txt'

53 tests M_start(1) + mi(0 to 198)

7750  9.0185  1 * 907#/2190 -450 to +7300
	m=1  218 <- unknowns -> 212 	 450 <- gap -> 7300
	    tests     1          (15.29/sec)  0 seconds elapsed
	    unknowns  430        (avg: 430.00), 98.05% composite  50.70% <- % -> 49.30%
	    prp tests 144        (avg: 144.00) (2201.5 tests/sec)
7462  8.6548  17 * 907#/2190 -6982 to +480
6938  8.0460  19 * 907#/2190 -6722 to +216
	m=37  214 <- unknowns -> 234 	1296 <- gap -> 250
	    tests     10         (34.35/sec)  0 seconds elapsed
	    unknowns  4262       (avg: 426.20), 98.06% composite  48.97% <- % -> 51.03%
	    prp tests 643        (avg: 64.30) (2208.7 tests/sec)
8520  9.8689  53 * 907#/2190 -4084 to +4436
	m=113  207 <- unknowns -> 215 	  40 <- gap -> 972
	    tests     30         (41.87/sec)  1 seconds elapsed
	    unknowns  12676      (avg: 422.53), 98.08% composite  49.31% <- % -> 50.69%
	    prp tests 1550       (avg: 51.67) (2163.0 tests/sec)
8382  9.6979  143 * 907#/2190 -1800 to +6582
7678  8.8820  163 * 907#/2190 -432 to +7246
	m=199  216 <- unknowns -> 205 	 192 <- gap -> 30
	    tests     53         (43.88/sec)  1 seconds elapsed
	    unknowns  22377      (avg: 422.21), 98.08% composite  49.97% <- % -> 50.03%
	    prp tests 2590       (avg: 48.87) (2144.5 tests/sec)


$ python gap_test.py --unknown-filename 907_2190_1_200_s11000_l100M.txt --min-merit 8
...
7750  9.0185  1 * 907#/2190 -7300 to +450
	  1  218 <- unknowns ->  212	7300 <- gap ->  450
7462  8.6548  17 * 907#/2190 -480 to +6982
6938  8.0460  19 * 907#/2190 -216 to +6722
	 37  214 <- unknowns ->  234	 250 <- gap -> 1296
	    tests     10         (35.7/sec)  0 seconds elapsed
	    unknowns  4262       (avg: 426.20), 98.06% composite  48.97% <- % -> 51.03%
	    prp tests 643        (avg: 64.30) (2294.854 tests/sec)
	    best merit this interval: 9.018 (at m=1)
8520  9.8689  53 * 907#/2190 -4436 to +4084
	113  207 <- unknowns ->  215	 972 <- gap ->   40
	    tests     30         (43.1/sec)  1 seconds elapsed
	    unknowns  12676      (avg: 422.53), 98.08% composite  49.31% <- % -> 50.69%
	    prp tests 1550       (avg: 51.67) (2226.529 tests/sec)
	    best merit this interval: 9.869 (at m=53)
8382  9.6979  143 * 907#/2190 -6582 to +1800
7678  8.8820  163 * 907#/2190 -7246 to +432
	199  216 <- unknowns ->  205	  30 <- gap ->  192
	    tests     53         (45/sec)  1 seconds elapsed
	    unknowns  22377      (avg: 422.21), 98.08% composite  49.97% <- % -> 50.03%
	    prp tests 2590       (avg: 48.87) (2199.432 tests/sec)
	    best merit this interval: 9.698 (at m=143)