'Converting tensorflow dataset to pandas dataframe

I am very new to the deep learning and computer vision. I want to do some face recognition project. For that I downloaded some images from Internet and converted to Tensorflow dataset by the help of this article from tensorflow documentation. Now I want to convert that dataset to pandas dataframe in order to convert that to csv files. I tried a lot but am unable to do it. Can someone help me with it. Here is the code for making datasets and and then some of the wrong code which I tried for this.

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


filenames = tf.constant(['al.jpg', 'al2.jpg', 'al3.jpg', 'al4.jpeg','al5.jpeg', 'al6.jpeg','al7.jpg','al8.jpeg', '5.jpg', 'hrit8.jpeg', 'Hrithik-Roshan.jpg', 'Hrithik.jpg', 'hriti1.jpeg', 'hriti2.jpg', 'hriti3.jpeg', 'hritik4.jpeg', 'hritik5.jpg', 'hritk9.jpeg', 'index.jpeg', 'sah.jpeg', 'sah1.jpeg', 'sah3.jpeg', 'sah4.jpg', 'sah5.jpg','sah6.jpg','sah7.jpg'])
labels = tf.constant([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 2])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))


def _parse_function(filename, label):
     image_string = tf.read_file(filename)
     image_decoded = tf.image.decode_jpeg(image_string,channels=3)
     image_resized = tf.image.resize_images(image_decoded, [28, 28])
     return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()

sess = tf.Session()

print(sess.run([image, labels]))

Initially I just tried to use df = pd.DataFrame(dataset)

Then i got following error:

enter code here
ValueError                                Traceback (most recent call last)
<ipython-input-15-d5503ae4603d> in <module>()
----> 1 df = pd.DataFrame((dataset))

 ~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
402                                          dtype=values.dtype, copy=False)
403             else:
--> 404                 raise ValueError('DataFrame constructor not properly called!')
405 
406         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

Thereafter I came across this article I got my mistake that in tensorflow anything exist only within a session. So I tried following code:

with tf.Session() as sess:
df = pd.DataFrame(sess.run(dataset))

Please pardon me if i did stupidest mistake because i wrote above code from this analogy print(sess.run(dataset)) and got a much bigger error:

 TypeError: Fetch argument <BatchDataset shapes: ((?, 28, 28, 3), (?,)), types: (tf.float32, tf.int32)> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>, must be a string or Tensor. (Can not convert a BatchDataset into a Tensor or Operation.)


Solution 1:[1]

I think you could use map like this. I assumed that you want to add a numpy array to a data frame as described here. But you have to append one by one and also figure out how this whole array fits in one column of the data frame.

import tensorflow as tf
import pandas as pd


filenames = tf.constant(['C:/Machine Learning/sunflower/50987813_7484bfbcdf.jpg'])
labels = tf.constant([1])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))

sess = tf.Session()

def convert_to_dataframe(filename, label):
    print ( pd.DataFrame.from_records(filename))
    return filename, label


def _parse_function(filename, label):
     image_string = tf.read_file(filename)
     image_decoded = tf.image.decode_jpeg(image_string,channels=3)
     image_resized = tf.image.resize_images(image_decoded, [28, 28])
     return image_resized, label

dataset = dataset.map(_parse_function)
dataset = dataset.map( lambda filename, label: tf.py_func(convert_to_dataframe,
                                                          [filename, label],
                                                          [tf.float32,tf.int32]))

dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()


sess.run([image, labels])

Solution 2:[2]

One easy way to do it is to save the dataset into normal csv file, and then directly read the csv file into pandas dataframe.

import tensorflow_datasets as tfds

# Construct a tf.data.Dataset
ds = tfds.load('civil_comments/CivilCommentsCovert', split='train')
#read the dataset into a tensorflow styled_dataframe
df = tfds.as_dataframe(ds)
#save the dataframe into csv file
df.to_csv("/.../.../Desktop/covert_toxicity.csv")

#read the csv file as normal, then you have the df you need
import pandas as pd
file_path = "/.../.../Desktop/covert_toxicity.csv"
df = pd.read_csv(file_path, header = 0, sep=",")
df

Solution 3:[3]

A more simpler way to convert a TensorFlow object to a dataframe would be to convert the TensorFlow object to a numpy array and pass the pandas DataFrame class.

import pandas as pd

dataset = pd.DataFrame(labels.numpy(), columns=filenames)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tayéhi Mouné
Solution 2 letsBeePolite
Solution 3 ebenezer agbozo