'tf.data.Dataset, map functionality and random

Manipulating tf.data.Dataset I get a behavior, I am not able to understand the origin. I am manipulating a tf.data.Dataset a simple integer buffer where I want to add a random integer to each number (the important point). TF provides a map function to apply a transformation (generator) to each element of the dataset. If I code:

seed(0)
dataset = tf.data.Dataset.from_tensor_slices([1, 1, 1, 1, 1, 1]) 
dataset = dataset.map(lambda x: x + randint(0,9))
print(list(dataset.as_numpy_iterator()))  

This code will not work return what as I want. The random generator is apply only once (return 6), and applied to every element of the buffer. I get [7, 7, 7, 7 ,7 ,7 ,7].

However, if I code:

seed(0)
dataset = tf.data.Dataset.from_tensor_slices([1, 1, 1, 1, 1, 1]) 
dataset = dataset.map(lambda x: x + tf.random.uniform([], minval=0, maxval=9, dtype=tf.dtypes.int32, seed=2))
print(list(dataset.as_numpy_iterator()))

return [7, 8, 1, 9, 1, 4] (what I need). I am confuse, why the first version does not work, the generator is applied but the function randint(0,9) is performed only once. Any suggestions ?

Thank you,

Timocafé



Solution 1:[1]

In the second case, generating random integers is a part of graph, because you use tf API. So each time the graph runs, the process of generating random integers will rerun. In the first case, the random integers are generated first, and then they take part in the computing graph. So they act like constant. This is because they are not generated by tf API.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Red Yang