'How to make a prime decomposition function faster?
I want to make a prime decomposition function faster, but it needs to use a precomputed list of primes. This is what I have so far:
def decompose(n):
factors = dict()
m = n
while m > 1:
for i in prime_list:
if i > m:
break
else:
if m%i == 0:
m = m//i
if i in factors.keys():
factors[i] += 1
else:
factors[i] = 1
return factors
Where prime_list is obviously the precomputed list of primes. Any help is much appreciated :)
Solution 1:[1]
The main two performance optimizations you were missing were:
- Factor as many instances of the smaller primes as possible, to avoid re-iterating through the entire primes list (and to also factor faster, since smaller primes are more likely to be present, even multiple times)
- Early exit when we can definitely say that the remaining value is prime (its the last prime factor)
With this in mind, here's my implementation:
def decompose_fast(n):
prime_factors = Counter()
for factor in PRIMES:
if factor ** 2 > n:
# This check will allow us to early-exit extremely quickly
#
# We're already divided out every factor less than `factor`
# so 'k * factor = n' would imply 'k >= factor', but if
# 'factor ** 2 > n', no such k exists. 'n' is prime
prime_factors[n] += 1
return prime_factors
q, r = divmod(n, factor)
while not r:
# we want to find every instance of `factor` in `n`, so
# we keep dividing and updating until the remainder is not '0'
#
# Its far more likely to have a second instance of a small factor
# than just one instance of a large factor, so we want to retry
# the small factor first. Plus, it allows us to early-exit above
n = q
prime_factors[factor] += 1
q, r = divmod(n, factor)
if n == 1:
# This will only hit if the largest prime factor has a power
# of two or more. Otherwise, our early-exit check will hit
return prime_factors
# If we exhaust the `PRIMES` list without hitting either of our exit conditions,
# then either `n` has factors larger than our largest prime, or `n` is prime,
# but we cannot be certain since 'sqrt(n)' is larger than our largest prime.
raise ValueError(f"Unable to factor input '{n}' with precomputed prime list")
Performance:
from timeit import timeit
from random import randrange
runs = 10
batch_size = int(1e3)
number_size = int(1e10)
nums = [randrange(number_size) for _ in range(batch_size)]
code = "for n in nums: decompose_fast(n)"
setup = "from __main__ import decompose_fast, nums"
print(timeit(code, setup=setup, number=runs) / runs / batch_size)
Output (average time taken in seconds to factorize a number up to 10 billion)
0.0002759509642000012
For reference, my prime list included primes up to a hundred thousand. I was not able to benchmark your existing solution (for reference sake), because it kept hanging whenever I'd throw it a reasonably large semi-prime. I think its safe to say that sub-millisecond times are probably an improvement, however.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dillon Davis |