'Cryptographically secure pseudo random shuffle a list or array in python
I am in need of a shuffle function that uses CSPRNG (Cryptographically Secure Pseudo Random Number Generator) and can be seeded manually for the same output for the same seed.
The built-in random.shuffle()
in Python can be manually seeded but is not suitable for cryptographic use and will be removed in version 3.11 of python.
Crypto.Random.random.shuffle()
from PyCryptodome does not accept a seed as far as I can gather.
Currently, I have settled for the Mersenne Twister used by the built-in random.shuffle()
function to shuffle a list of numbers but this is far from ideal. Shuffling a numpy array is also welcome as the built-in numpy.random.shuffle
is not suitable for cryptographic purposes.
The array/list may have elements upwards of 10 billion so performance and randomness are key.
The bandaid code is below.
import numpy as np
from random import seed, shuffle
array = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
print(array)
seed(1)
shuffle(array)
print(array) # [6 8 9 7 5 3 0 4 1 2]
Solution 1:[1]
CSPRNG's often have problems with re-seeding and such, drawing entropy from an entropy pool within the operating system during operation. So instead it is better to use a stream cipher, e.g. AES in counter mode.
Then it is also important that the shuffle operation always is performed in the same way. Similarly, the numbers from the generated bit stream should always operate in the same way. If those are optimized or otherwise changed the result will be a different shuffle, breaking the scheme.
In the end you are better off programming this yourself, so you are sure that the code behind the method contracts doesn't change.
The requirements for this are:
- a stream cipher with a seed the size of the key;
- an implementation of "rejection sampling" to get random numbers in a range;
- the Fisher-Yates shuffle to create a fully random shuffle.
It is possible to hash the seed and take the leftmost bytes if the size is not compliant with the stream cipher.
Demonstration of the idea (not fully tested or well-designed):
import numpy as np
from Crypto.Cipher import ChaCha20
from Crypto.Random import get_random_bytes
array = np.arange(100) # [0 1 2 3 4 5 6 7 8 9]
seed = bytes([0x00]) * 32 # use SHA-256 to hash different size seeds
nonce_rfc7539 = bytes([0x00]) * 12
cipher = ChaCha20.new(key=seed, nonce=nonce_rfc7539)
zerobuf = bytes([0x00]) * 5
def sample(max):
# rejection sampling using rand(0..n * max) % max
# the value 2 is in there to make sure the number of bits is at least
# two higher than max, so that the chance of each candicate succeeding
# is higher
stream_size = (max.bit_length() + 2 + 7) // 8
max_stream_value = 1 << (stream_size * 8)
max_candidate = max_stream_value - max_stream_value % max
while True:
stream = cipher.encrypt(zerobuf[0:stream_size])
candidate = int.from_bytes(stream, "big")
if (candidate < max_candidate):
break
return candidate % max
def shuffle(list):
# do the Fisher-Yates shuffle
for i in range(len(list) - 1, 0, -1):
j = sample(i + 1)
list[i],list[j] = list[j],list[i]
# test only
print(array)
for i in range(0, 100):
shuffle(array)
print(array)
Solution 2:[2]
SECRETS module contains CSPRNG based functions like CHOICE, which returns a randomly-chosen element. So you could use this function in Fisher-Yates shuffle algorithm implementation like this one:
import secrets
x_list = list(range(256))
y_list = []
print(x_list)
while len(x_list) > 0:
picked_item = secrets.choice(x_list)
y_list.append(picked_item)
x_list.remove(picked_item)
print(y_list)
Not sure about the performance and if it will handle well such a big list. It took 2.5 min to shuffle the list containing 256000 elements. But for less demanding applications it might do the job.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Agaragar |