'Pickle encoding utf-8 issue

I'm trying to pickle a pandas dataframe to my local directory so I can work on it in another jupyter notebook. The write appears to go successful at first but when trying to read it in a new jupyter notebook the read is unsuccessful.

When I open the pickle file I appear to have wrote, the file's only contents are:

Error! /Users/.../income.pickle is not UTF-8 encoded Saving disabled. See console for more details.

I also checked and the pickle file itself is only a few kilobytes.

Here's my code for writing the pickle:


with open('income.pickle', 'wb', encoding='UTF-8') as to_write:
    pickle.dump(new_income_df, to_write)

And here's my code for reading it:


with open('income.pickle', 'rb') as read_file:
    income_df = pickle.load(read_file)

Also when I return income_df I get this output:

Series([], dtype: float64)

It's an empty series that I errors on when trying to call most series methods on it.

If anyone knows a fix for this I'm all ears. Thanks in advance!

EDIT:

This is the solution I arrived at:

with open('cleaned_df', 'wb') as to_write:
    pickle.dump(df, to_write)

with open('cleaned_df','rb') as read_file:
    df = pickle.load(read_file)

Which was much simpler than I expected



Solution 1:[1]

Pickling is generally used to store raw data, not to pass a Pandas DataFrame object. When you try to pickle it, it will just store the top level module name, Series, in this case.

1) You can write only the data from the DataFrame to a csv file.

# Write/read csv file using DataFrame object's "to_csv" method.
import pandas as pd
new_income_df.to_csv("mydata.csv")
new_income_df2 = pd.read_csv("mydata.csv")

2) If your data can be saved as a function in a regular python module with a *.py name, you can call it from a Jupyter notebook. You can also reload the function after you have changed the values inside. See autoreload ipynb documentation: https://ipython.org/ipython-doc/3/config/extensions/autoreload.html

# Saved as "mymodule1.py" (from notebook1.ipynb).
import pandas as pd
def funcdata():
    new_income_df = pd.DataFrame(data=[100, 101])
    return new_income_df

# notebook2.ipynb
%load_ext autoreload
%autoreload 2
import pandas as pd
import mymodule1.py
df2 = mymodule1.funcdata()
print(df2)
# Change data inside fucdata() in mymodule1.py and see if it changes here.

3) You can share data between Jupyter notebooks using %store command.
See src : https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
And: Share data between IPython Notebooks

# %store example, first Jupyter notebook.
from sklearn import datasets
dataset = datasets.load_iris()
%store dataset

# from a new Jupyter notebook read.
%store -r dataset

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1