'returning results from python script to variable in Jupyter notebook

I have a python script that returns a pandas dataframe and I want to run the script in a Jupyter notebook and then save the results to a variable.

The data are in a file called data.csv and a shortened version of the dataframe.py file whose results I want to access in my Jupyter notebook is:

# dataframe.py
import pandas as pd
import sys

def return_dataframe(file):
    df = pd.read_csv(file)
    return df

if __name__ == '__main__':
    return_dataframe(sys.argv[1])

I tried running:

data = !python dataframe.py data.csv

in my Jupyter notebook but data does not contain the dataframe that dataframe.py is supposed to return.



Solution 1:[1]

This is how I did it:

# dataframe.py 
import pandas as pd
import sys

def return_dataframe(f): # don't shadow built-in `file`
    df = pd.read_csv(f)
    return df

if __name__ == '__main__':
    return_dataframe(sys.argv[1]).to_csv(sys.stdout,index=False)

Then in the notebook you need to convert an 'IPython.utils.text.SList' into a DataFrame as shown in the comments to this question: Convert SList to Dataframe:

data = !python3 dataframe.py data.csv
df = pd.DataFrame(data=data)[0].str.split(',',expand=True)

If the DataFrame is already going to be put into CSV format then you could simply do this in the notebook:

df = pd.read_csv('data.csv')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1