'saving the shuffled images in a batch to disk with original filename
I have a dataset in a single dir, that I wish to split into training and validation set, then save all of images of each set to a different dir
I'm trying to do this by using the tf.keras.preprocessing.image_dataset_from_directory() and tf.keras.preprocessing.image.save_img() functions, and the tf.data.Dataset.file_paths attribute
code looks something like this:
train_dataset = image_dataset_from_directory(PATH_DS,
shuffle=True,
labels='inferred',
label_mode='categorical',
class_names=class_names,
batch_size=1,
image_size=[1080, 1920],
validation_split=0.15,
subset="training",
seed=456)
validation_dataset = image_dataset_from_directory(PATH_DS,
shuffle=True,
labels='inferred',
label_mode='categorical',
class_names=class_names,
batch_size=1,
image_size=[1080, 1920],
validation_split=0.15,
subset="validation",
seed=456)
filepaths_val = validation_dataset.file_paths
filepaths_train = train_dataset.file_paths
for idx, (batch, filepath) in enumerate(zip(train_dataset.as_numpy_iterator(), train_dataset.file_paths)):
images, labels = batch
tf.keras.preprocessing.image.save_img(os.path.join(PATH_WD, f"test/train/{class_names[np.argmax(labels[0])]}/{os.path.basename(filepath)}"), images[0], "channels_last", "png")
I need to have the images shuffled because they have filenames such that a alphanumerical sort would result in data leakage between the sets
The problem I am running into seems to be that the dataset iterator has random initialization. The filepaths object is just a list that I can slice, and I've already verified that each seed always returns the same file paths.
However, calling the dataset always returns a different element. I've tried the Dataset.unbatch() method, as_numpy_iterator(), etc. Every time I call the iterator for the first time, it returns a different element.
Solution 1:[1]
You're loss function has to be differentiable in the whole domain, that means no sharp turning points. In layman's terms, has to be "smooth".
Solution 2:[2]
What you are taking is the argmax of the tensor. Since this operator is not differentiable. In practical terms, this means you can't backpropagate through that operation i.e. call backward
on the results of pred.max(1).indices
or similarly pred.argmax(1)
.
Here have a look:
>>> pred = torch.rand(10, 10, requires_grad=True)
>>> values, indices = pred.max(1)
>>> values.grad_fn # can be backpropagated on:
<MaxBackward0 at 0x7febc10d2ed0>
>>> indices.grad_fn # can't be backpropagated on:
None
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jjaskulowski |
Solution 2 |