'Postponed Sieve algorithm with start logic

Based on the answer by Will Ness, I've been using a JavaScript adaptation for the postponed sieve algorithm:

function * primes() {
    yield 2;
    yield 3;
    yield 5;
    yield 7;
    const sieve = new Map();
    const ps = primes();
    ps.next() && ps.next();
    for (let p = 3, i = 9; true; i += 2) {
        let s = sieve.get(i);
        if (s !== undefined) {
            sieve.delete(i);
        } else if (i < p * p) {
            yield i;
            continue;
        } else {
            s = 2 * p;
            p = ps.next().value;
        }
        let k = i + s;
        while (sieve.has(k)) k += s;
        sieve.set(k, s);
    }
}

But now that I need to add start point to it, I am struggling to wrap my head around it, for the logic here isn't straightforward.

When start is a prime, I need it to be the first value. And when start is not a prime, I need the sequence to start with the first prime after start.

Will Ness suggested in one of the comments:

You would have to come up with the valid sieve dictionary for the start point. It's easy to get the needed primes - just run this algo up to sqrt(start), then you need to find each core prime's multiple just above (or is it just below?) the start, while minding the duplicates.

Wielding this into reality however is not that simple (or at least for me) :|

Can anybody help with updating this algorithm for such *primes(start) implementation (preferably in JavaScript as above)?

function * primes(start) {

  // how to amend it to support 'start' logic???

}

Conclusion

Following the excellent answer by Will Ness, I decided to share the final code I used, through a public library - primes-generator. All main algorithms can be found in src/sieve.ts.

Solution 1:^[1]

(update: added working JS code at the bottom of this answer).

This is the sieve of Eratosthenes:

primes = {2,3,...} \ ?_{(p ? primes)} {p², p²+p, ...} = {2} ? oddPrimes , oddPrimes = {3,5,...} \ ?_{(p ? oddPrimes)} {p², p²+2p, ...}

where \ is set difference (read "minus"), ? set union, and ? big union of sets.

An example might illustrate:

{2,3,4,5,6,7,8,9,10,11,12,13,14,15,...}
 \             |
   { 4,  6,  8,| 10,   12,   14,   16,   18, ...}
    \          .
             { 9,      12,      15,      18,    21,  ...}
       \ 
                                               { 25, 30, 35, ...}
          \                                     { 49, 56, 63, ...}
            \                                     { 121, 132, 143, ...}
              \  ........

or for the odd primes,

{3,5,7,9,11,13,15,17,19,21,23,25,27,29,31, ...}
 \                    |
     { 9,      15,    | 21,      27,      33, ...}
   \                          .
                            { 25,            35, ...}
     \                                        { 49, 63, ...}
      \                                         { 121, 143, ...}
       \  ........

The code you refer to in the question implements this approach. At any point in time, when considering a certain candidate, sieve exists in a certain state, as are the rest of the variables in the loop. So we just need to re-create this state directly.

Let's say we're considering i = 19 as the current candidate. At that point we have sieve = { (21, 6) } and p = 5.

This means for a candidate i, sieve contains multiples of all the primes q such that q^2 < i, and p is the next prime after the qs.

Each of the multiples is the smallest one not smaller than i, and there are no duplicates in the sieve. Then it's in a consistent state and can be continued from that point on.

Thus, in pseudocode:

primes() = ..... // as before

primesFrom(start) =
  let
  { primes.next()
  ; ps = takeWhile( (p => p^2 < start) , primes() )
  ; p = primes.next_value()
  ; mults = map( (p => let 
                       { s = 2*p 
                       ; n = (start-p)/s  // integer division NB!
                       ; r = (start-p)%s
                       ; m = p + (r>0 ? n+1 : n)*s
                       }
                       ( m, s) )
                , ps)
  }
  for each (m,s) in mults
    if m in sieve
       increment m by s until m is not in sieve
    add (m,s) to sieve

and then do the same loop as before.

As requested, the JS code:

function *primesFrom(start) {
    if (start <= 2) { yield 2; }
    if (start <= 3) { yield 3; }
    if (start <= 5) { yield 5; }
    if (start <= 7) { yield 7; }
    const sieve = new Map();
    const ps = primesFrom(2);
    ps.next();                 // skip the 2
    let p = ps.next().value;   // p==3
    let psqr = p * p;          // p^2, 9
    let c = psqr;              // first candidate, 9
    let s = 6;                 // step value
    let m = 9;                 // multiple

    while( psqr < start )      // must adjust initial state
    {
         s = 2 * p;            
         m = p + s * Math. ceil( (start-p)/s );  // multiple of p
         while (sieve.has(m)) m += s;
         sieve.set(m, s);
         p = ps.next().value;
         psqr = p * p;
    }
    if ( start > c) { c = start; }
    if ( c%2 === 0 ) { c += 1; }
    
    for ( ;  true ;  c += 2 )     // main loop
    {
        s = sieve.get(c);
        if (s !== undefined) {
            sieve.delete(c);      // c composite
        } else if (c < psqr) {
            yield c;              // c prime
            continue;
        } else {                  // c == p^2
            s = 2 * p;
            p = ps.next().value;
            psqr = p * p;
        }
        m = c + s;
        while (sieve.has(m)) m += s;
        sieve.set(m, s);
    }
}

Correctly produces the first 10 primes above 500,000,000 in no time at all on ideone:

Success #stdin #stdout 0.03s 17484KB
500000003,500000009,500000041,500000057,500000069,
500000071,500000077,500000089,500000093,500000099

Apparently it does so with the whopping recursion depth of 5 (five) invocations.

The power of the repeated squaring! or its inverse, the log log operation:

log₂( log₂( 500000000 )) == 4.85

Solution 2:^[2]

I have modified Will Ness's amazing answer

To allow any start value for printing. As explained below, I think there is no way to avoid it having to compute all the earlier primes, if you want an infinite sequence of higher primes.
Adjusted variable names to assist understanding the algorithm.
Changed what is stored in the map from 2 times a prime factor, to just the prime factor, to make it easier for readers to follow the algorithm.
Moved one part of the code into a subfunction, again for ease of reader understanding.
Changed the control flow of the 3-way choice in the middle of the algorithm, and added comments, that simplify understanding. It is probably very slightly slower, because it no longer tests the commonest-true condition first, but it is easier for readers.

function* primeGenerator() {
  yield 2;
  yield 3;
  yield 5;
  yield 7;
  const multiplesWithAFactor = new Map();

  // We only need to consider potential factors  that are themselves prime
  // Start with 3 and get further prime factors on demand from a child instance of ourself.
  let biggestFactorConsidered = 3;
  const childStreamOfPrimes = primeGenerator();
  childStreamOfPrimes.next(); // Discard the 2
  childStreamOfPrimes.next(); // Discard the 3

  let candidate = 7; // We have already yielded 7 as a prime above, so on the first iteration of the while loop we will be testing 9.
  while (true) {
    candidate += 2;

    // Assess candidate, into one of three outcomes: square, non-square multiple, prime. This order is arranged for ease of understanding, but for maximum speed efficiency you should test in the order in Will Ness's version.
    const factorOfCandidate = multiplesWithAFactor.get(candidate);
    
    if (candidate === biggestFactorConsidered * biggestFactorConsidered) {
    // A square, so from now on we will need to consider the next factor, too.
      markNextUnmarkedMultiple(candidate, biggestFactorConsidered);
      biggestFactorConsidered = childStreamOfPrimes.next().value;
    } else if (factorOfCandidate) {
    // A non-square multiple. Can forget that fact now, and instead mark the next unmarked multiple of the same prime factor
      multiplesWithAFactor.delete(candidate);
      markNextUnmarkedMultiple(candidate, factorOfCandidate);
    } else {
    // Prime
      yield candidate;
    }
  }

  // This is a subfunction for ease of understanding, but for maximum efficiency you should put this in the while loop above, and have the "yield candidate" avoid calling it, via a "continue" statement.
  function markNextUnmarkedMultiple(candidate, factor) {
    let nextMultiple = candidate + 2 * factor;
    while (multiplesWithAFactor.has(nextMultiple)) {
      nextMultiple += 2 * factor;
    }
    multiplesWithAFactor.set(nextMultiple, factor);
  }
}

const maxItems = 20;
const minimumPrintablePrime = 5e8;
console.log("Running");
const primeObject = primeGenerator();
let count = 0;
do {
  const prime = primeObject.next().value;
  if (prime > minimumPrintablePrime) { // If you don't like seeing this process of reading non-printed primes out here in the parent code, you can do it within the primeGenerator itself. Either way, _someone_ has to call the prime generator for all primes up to the square root of the starting value.
    console.log(prime);
    count++;
  }
} while (count < maxItems);

The algorithm is astonishingly economical on storage

It uses several instances of the prime generator. You call the "main" one, which runs along examining candidates, while storing a set of future candidates that it knows are multiples of smaller prime factors.

At any one time, when considering candidate N, its stored map consists of an entry for each prime factor up to sqrt(N) but stored in the map at the next multiple of the prime factor which the algorithm will reach, as it explores forward from the current candidate.

Whenever the candidate is present in this map of multiples, it can reject that candidate. The map tells us, for that multiple, what was the prime factor which told us that number was a multiple. For example, 15 is marked as a multiple of 3. When the algorithm comes to 15, it realises it is a multiple of 3, so it rejects the 15.

From now on, it no longer needs to remember that 15 is a multiple, because 15 will never be a candidate again in this instance of the algorithm. It can delete the 15 entry from its map of multiples, and replace it with the next multiple of 3. However the very next multiple will always be even, because all candidates (e.g. 15) and all factors (e.g. 3) are odd. Therefore it only needs to tell the map that 15 + 2x3 is a multiple, i.e. it can step forward by 2 x the factor. Moreover, if the resulting number, 21, is already marked as a multiple, it need not mark that number: it can advance by a further 2x3 to 15 + 2x3 + 2x3, etc, until it finds a number that is not already marked as a multiple.

Cleverly, this process ensures that every factor (e.g. 3) remains marking a multiple in the map forever. To assess a candidate N, the map will only contain one entry for every prime up to sqrt(N).

When the candidate rises above the square of the largest factor considered so far, the algorithm needs to get the next factor. It simply uses a second instance of itself to get the next prime in the sequence. I was at first worried about this creating vast memory demands, but that is not so. Assessing a number ~2^64, would need all the second instance for ~2^32, which in turn would call a third instance for ~2^16, and so on. Only a handful of instances of the generator need to be created even for primes of enormous size, and if an instance's map has factors up to F, the next instance only needs factors up to sqrt(F), which rapidly becomes small.

Even the Ness algorithm still needs to build a map of all the factors up to sqrt(N)

It needs the map to contain all those factors, so that it can correctly reject candidates around N.

But it also will _eventually_ need the factors between sqrt(N) and N

Because only that way can it be confident of numbers from N to N^2 being prime.

My conclusion

I think it needs to iterate through, in order to work. I can't see a way of starting it at an arbitrary candidate, e.g. 10^8, without pre-filling the map with (all the) primes up to 10^4.

The most efficient way of doing that pre-filling, is by running the algorithm itself, which is in fact what it does if you simply ask it to yield (but do not print) all the primes up to 10^8.

My suspicion is that the algorithm is so efficient, that the best way to get it into position to start at an arbitrary candidate, e.g. 1e8, is by running the algorithm itself, i.e. there is no shortcut. Remarkable situation!

This seems to be confirmed in Will's updated answer, in which the algorithm calls itself to prefill the sieve. At the end of the prefilling, the sieve contains one entry for each prime up to sqrt(start). My version is much slower, but this is not because it collects more primes than Will's; it is because it (a) tests for the 3 conditions in a different order (to make it easier for humans to follow) and (b) extracts some code into a subfunction (again to make it easier to understand). Obviously Will's approach is better for production code!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2