'How to acquire tf.data.dataset's shape?
I know dataset has output_shapes, but it shows like below:
data_set: DatasetV1Adapter shapes: {item_id_hist: (?, ?), tags: (?, ?), client_platform: (?,), entrance: (?,), item_id: (?,), lable: (?,), mode: (?,), time: (?,), user_id: (?,)}, types: {item_id_hist: tf.int64, tags: tf.int64, client_platform: tf.string, entrance: tf.string, item_id: tf.int64, lable: tf.int64, mode: tf.int64, time: tf.int64, user_id: tf.int64}
How can I get the total number of my data?
Solution 1:[1]
Where the length is known you can call:
tf.data.experimental.cardinality(dataset)
but if this fails then, it's important to know that a TensorFlow Dataset
is (in general) lazily evaluated so this means that in the general case we may need to iterate over every record before we can find the length of the dataset.
For example, assuming you have eager execution enabled and its a small 'toy' dataset that fits comfortably in memory you could just enumerate
it into a new list and grab the last index (then add 1 because lists are zero-indexed):
dataset_length = [i for i,_ in enumerate(dataset)][-1] + 1
Of course this is inefficient at best and, for large datasets, will fail entirely because everything needs to fit into memory for the list. in such circumstances I can't see any alternative other than to iterate through the records keeping a manual count.
Solution 2:[2]
Code as below:
dataset_to_numpy = list(dataset.as_numpy_iterator())
shape = tf.shape(dataset_to_numpy)
print(shape)
It produces output like this:
tf.Tensor([1080 64 64 3], shape=(4,), dtype=int32)
It's simple to write the code, but it still costs time to iterate the dataset.
For more info about tf.data.Dataset
, check this link.
Solution 3:[3]
As of 4/15/2022 with the TF v2.8, you can get the results by using
dataset.cardinality().numpy()
ref: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cardinality
Solution 4:[4]
To see element shapes and types, print dataset elements directly instead of using as_numpy_iterator. - https://www.tensorflow.org/api_docs/python/tf/data/Dataset
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
for element in dataset:
print(element)
break from for loop to see the shape of any tensor
dataset = tf.data.Dataset.from_tensor_slices((X_s, y_s))
for element in dataset:
print(element)
break
Output here as two numpy arrays and shape of each is printed
(<tf.Tensor: shape=(13,), dtype=float32, numpy=
array([ 0.9521966 , 0.68100524, 1.973123 , 0.7639558 , -0.2563337 ,
2.394438 , -1.0058318 , 0.01544279, -0.69663054, 1.0873381 ,
-2.2745786 , -0.71442884, -2.1488726 ], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 1.], dtype=float32)>)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Stewart_R |
Solution 2 | Shivam Roy |
Solution 3 | Vincent Yuan |
Solution 4 | Alex Punnen |