'Converting tensorflow dataset to pandas dataframe
I am very new to the deep learning and computer vision. I want to do some face recognition project. For that I downloaded some images from Internet and converted to Tensorflow dataset by the help of this article from tensorflow documentation. Now I want to convert that dataset to pandas dataframe in order to convert that to csv files. I tried a lot but am unable to do it. Can someone help me with it. Here is the code for making datasets and and then some of the wrong code which I tried for this.
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
filenames = tf.constant(['al.jpg', 'al2.jpg', 'al3.jpg', 'al4.jpeg','al5.jpeg', 'al6.jpeg','al7.jpg','al8.jpeg', '5.jpg', 'hrit8.jpeg', 'Hrithik-Roshan.jpg', 'Hrithik.jpg', 'hriti1.jpeg', 'hriti2.jpg', 'hriti3.jpeg', 'hritik4.jpeg', 'hritik5.jpg', 'hritk9.jpeg', 'index.jpeg', 'sah.jpeg', 'sah1.jpeg', 'sah3.jpeg', 'sah4.jpg', 'sah5.jpg','sah6.jpg','sah7.jpg'])
labels = tf.constant([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 2])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string,channels=3)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()
sess = tf.Session()
print(sess.run([image, labels]))
Initially I just tried to use df = pd.DataFrame(dataset)
Then i got following error:
enter code here
ValueError Traceback (most recent call last)
<ipython-input-15-d5503ae4603d> in <module>()
----> 1 df = pd.DataFrame((dataset))
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
402 dtype=values.dtype, copy=False)
403 else:
--> 404 raise ValueError('DataFrame constructor not properly called!')
405
406 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
Thereafter I came across this article I got my mistake that in tensorflow anything exist only within a session. So I tried following code:
with tf.Session() as sess:
df = pd.DataFrame(sess.run(dataset))
Please pardon me if i did stupidest mistake because i wrote above code from this analogy print(sess.run(dataset))
and got a much bigger error:
TypeError: Fetch argument <BatchDataset shapes: ((?, 28, 28, 3), (?,)), types: (tf.float32, tf.int32)> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>, must be a string or Tensor. (Can not convert a BatchDataset into a Tensor or Operation.)
Solution 1:[1]
I think you could use map like this. I assumed that you want to add a numpy array to a data frame as described here. But you have to append one by one and also figure out how this whole array fits in one column of the data frame.
import tensorflow as tf
import pandas as pd
filenames = tf.constant(['C:/Machine Learning/sunflower/50987813_7484bfbcdf.jpg'])
labels = tf.constant([1])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
sess = tf.Session()
def convert_to_dataframe(filename, label):
print ( pd.DataFrame.from_records(filename))
return filename, label
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string,channels=3)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.map( lambda filename, label: tf.py_func(convert_to_dataframe,
[filename, label],
[tf.float32,tf.int32]))
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()
sess.run([image, labels])
Solution 2:[2]
One easy way to do it is to save the dataset into normal csv file, and then directly read the csv file into pandas dataframe.
import tensorflow_datasets as tfds
# Construct a tf.data.Dataset
ds = tfds.load('civil_comments/CivilCommentsCovert', split='train')
#read the dataset into a tensorflow styled_dataframe
df = tfds.as_dataframe(ds)
#save the dataframe into csv file
df.to_csv("/.../.../Desktop/covert_toxicity.csv")
#read the csv file as normal, then you have the df you need
import pandas as pd
file_path = "/.../.../Desktop/covert_toxicity.csv"
df = pd.read_csv(file_path, header = 0, sep=",")
df
Solution 3:[3]
A more simpler way to convert a TensorFlow object to a dataframe would be to convert the TensorFlow object to a numpy array and pass the pandas DataFrame class.
import pandas as pd
dataset = pd.DataFrame(labels.numpy(), columns=filenames)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Tayéhi Mouné |
Solution 2 | letsBeePolite |
Solution 3 | ebenezer agbozo |