'How to feed in a list of numpy arrays into a TensorFlow model?

I have a large list of numpy arrays that I want to feed into a TensorFlow model. I can not concatenate the lists into one due to RAM memory issues. Below, I have recreated the dataset I have with the code below:

train_data_list = []
number_of_patients = 20

for i in range(number_of_patients):
    sample_size = int(np.random.randint(low=2000, high=30000, size=1))
    sequence_length = 1024 # subsequence length
    feature_size = 3 # number of features e.g. vital sign 1, vital sign 2, vital sign 3 
    
    random_data = np.random.rand(sample_size, sequence_length, feature_size)
    train_data_list.append(random_data)

From this you will get a list of numpy arrays, each array belongs to a patient. I have my TensorFlow model set up and want to feed this data in - TensorFlow does not take in lists and I can not concatenate my data into one single numpy array.



Solution 1:[1]

It is easy using h5py, I use a bit of time correct and fixed all those error messages when the option os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async' is help allocate memory issues of the GPU in previous 1.2 version of cuDAA

[ Sample ]:

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
DataSet
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

# Create hdf5 file
hdf5_file = h5py.File(database_buffer, mode='w')

# Train images
hdf5_file['x_train'] = train_images
hdf5_file['y_train'] = train_labels

# Test images
hdf5_file['x_test'] = test_images
hdf5_file['y_test'] = test_labels

hdf5_file.close()

# Visualize dataset train sample
hdf5_file = h5py.File(database_buffer,  mode='r')

# Load features
x_train = hdf5_file['x_train'][0: 50000]
x_test = hdf5_file['x_test'][0: 10000]
y_train = hdf5_file['y_train'][0: 50000]
y_test = hdf5_file['y_test'][0: 10000]

# random pickup
index = random.randint(0, 50000)
image = hdf5_file['x_train'][index]
label = hdf5_file['y_train'][index]
plt.imshow(image)
plt.show()
print(label)

[ Output ]:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
None
cuda_malloc_async
[5]
2022-04-05 07:01:09.480563: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-05 07:01:10.068919: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:214] Using CUDA malloc Async allocator for GPU: 0
2022-04-05 07:01:10.069200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4634 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
Epoch 1/10
2022-04-05 07:01:13.067862: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
1563/1563 [==============================] - 27s 12ms/step - loss: 0.7749 - accuracy: 0.1023
Epoch 2/10
1563/1563 [==============================] - 18s 12ms/step - loss: 0.6487 - accuracy: 0.1027
Epoch 3/10
1563/1563 [==============================] - 18s 12ms/step - loss: 0.6039 - accuracy: 0.1017
Epoch 4/10
1563/1563 [==============================] - 19s 12ms/step - loss: 0.3459 - accuracy: 0.0975
Epoch 5/10
1563/1563 [==============================] - 18s 12ms/step - loss: 0.3192 - accuracy: 0.0946
Epoch 6/10
1563/1563 [==============================] - 18s 11ms/step - loss: 0.2996 - accuracy: 0.0933
Epoch 7/10
1285/1563 [=======================>......] - ETA: 3s - loss: 0.2811 - accuracy: 0.0863

Sample

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martijn Pieters