'How to make a prime decomposition function faster?

I want to make a prime decomposition function faster, but it needs to use a precomputed list of primes. This is what I have so far:

def decompose(n):
    factors = dict()
    m = n
    while m > 1:
        for i in prime_list:
            if i > m:          
                break               
            else:
                if m%i == 0:
                    m = m//i
                    if i in factors.keys():
                        factors[i] += 1
                    else:
                        factors[i] = 1
                            
    return factors

Where prime_list is obviously the precomputed list of primes. Any help is much appreciated :)



Solution 1:[1]

The main two performance optimizations you were missing were:

  • Factor as many instances of the smaller primes as possible, to avoid re-iterating through the entire primes list (and to also factor faster, since smaller primes are more likely to be present, even multiple times)
  • Early exit when we can definitely say that the remaining value is prime (its the last prime factor)

With this in mind, here's my implementation:

def decompose_fast(n):
    prime_factors = Counter()

    for factor in PRIMES:
        if factor ** 2 > n:
            # This check will allow us to early-exit extremely quickly
            #
            # We're already divided out every factor less than `factor`
            # so 'k * factor = n' would imply 'k >= factor', but if
            # 'factor ** 2 > n', no such k exists. 'n' is prime
            prime_factors[n] += 1
            return prime_factors

        q, r = divmod(n, factor)
        while not r:
            # we want to find every instance of `factor` in `n`, so
            # we keep dividing and updating until the remainder is not '0'
            #
            # Its far more likely to have a second instance of a small factor
            # than just one instance of a large factor, so we want to retry
            # the small factor first. Plus, it allows us to early-exit above
            n = q
            prime_factors[factor] += 1
            q, r = divmod(n, factor)

        if n == 1:
            # This will only hit if the largest prime factor has a power
            # of two or more. Otherwise, our early-exit check will hit
            return prime_factors

    # If we exhaust the `PRIMES` list without hitting either of our exit conditions,
    # then either `n` has factors larger than our largest prime, or `n` is prime,
    # but we cannot be certain since 'sqrt(n)' is larger than our largest prime.
    raise ValueError(f"Unable to factor input '{n}' with precomputed prime list")

Performance:

from timeit import timeit
from random import randrange

runs = 10
batch_size  = int(1e3)
number_size = int(1e10)

nums = [randrange(number_size) for _ in range(batch_size)]

code = "for n in nums: decompose_fast(n)"
setup = "from __main__ import decompose_fast, nums"
print(timeit(code, setup=setup, number=runs) / runs / batch_size)

Output (average time taken in seconds to factorize a number up to 10 billion)

0.0002759509642000012

For reference, my prime list included primes up to a hundred thousand. I was not able to benchmark your existing solution (for reference sake), because it kept hanging whenever I'd throw it a reasonably large semi-prime. I think its safe to say that sub-millisecond times are probably an improvement, however.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dillon Davis