'Pretty Printing a pandas dataframe
How can I print a pandas dataframe as a nice text-based table, like the following?
+------------+---------+-------------+
| column_one | col_two | column_3 |
+------------+---------+-------------+
| 0 | 0.0001 | ABCD |
| 1 | 1e-005 | ABCD |
| 2 | 1e-006 | long string |
| 3 | 1e-007 | ABCD |
+------------+---------+-------------+
Solution 1:[1]
I've just found a great tool for that need, it is called tabulate.
It prints tabular data and works with DataFrame
.
from tabulate import tabulate
import pandas as pd
df = pd.DataFrame({'col_two' : [0.0001, 1e-005 , 1e-006, 1e-007],
'column_3' : ['ABCD', 'ABCD', 'long string', 'ABCD']})
print(tabulate(df, headers='keys', tablefmt='psql'))
+----+-----------+-------------+
| | col_two | column_3 |
|----+-----------+-------------|
| 0 | 0.0001 | ABCD |
| 1 | 1e-05 | ABCD |
| 2 | 1e-06 | long string |
| 3 | 1e-07 | ABCD |
+----+-----------+-------------+
Note:
To suppress row indices for all types of data, pass
showindex="never"
orshowindex=False
.
Solution 2:[2]
pandas >= 1.0
If you want an inbuilt function to dump your data into some github markdown, you now have one. Take a look at to_markdown
:
df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
| | A | B |
|:---|----:|----:|
| a | 1 | 1 |
| a | 2 | 2 |
| b | 3 | 3 |
Here's what that looks like on github:
Note that you will still need to have the tabulate
package installed.
Solution 3:[3]
A simple approach is to output as html, which pandas does out of the box:
df.to_html('temp.html')
Solution 4:[4]
If you are in Jupyter notebook, you could run the following code to interactively display the dataframe in a well formatted table.
This answer builds on the to_html('temp.html') answer above, but instead of creating a file displays the well formatted table directly in the notebook:
from IPython.display import display, HTML
display(HTML(df.to_html()))
Credit for this code due to example at: Show DataFrame as table in iPython Notebook
Solution 5:[5]
You can use prettytable to render the table as text. The trick is to convert the data_frame to an in-memory csv file and have prettytable read it. Here's the code:
from StringIO import StringIO
import prettytable
output = StringIO()
data_frame.to_csv(output)
output.seek(0)
pt = prettytable.from_csv(output)
print pt
Solution 6:[6]
I used Ofer's answer for a while and found it great in most cases. Unfortunately, due to inconsistencies between pandas's to_csv and prettytable's from_csv, I had to use prettytable in a different way.
One failure case is a dataframe containing commas:
pd.DataFrame({'A': [1, 2], 'B': ['a,', 'b']})
Prettytable raises an error of the form:
Error: Could not determine delimiter
The following function handles this case:
def format_for_print(df):
table = PrettyTable([''] + list(df.columns))
for row in df.itertuples():
table.add_row(row)
return str(table)
If you don't care about the index, use:
def format_for_print2(df):
table = PrettyTable(list(df.columns))
for row in df.itertuples():
table.add_row(row[1:])
return str(table)
Solution 7:[7]
Following up on Mark's answer, if you're not using Jupyter for some reason, e.g. you want to do some quick testing on the console, you can use the DataFrame.to_string
method, which works from -- at least -- Pandas 0.12 (2014) onwards.
import pandas as pd
matrix = [(1, 23, 45), (789, 1, 23), (45, 678, 90)]
df = pd.DataFrame(matrix, columns=list('abc'))
print(df.to_string())
# outputs:
# a b c
# 0 1 23 45
# 1 789 1 23
# 2 45 678 90
Solution 8:[8]
Maybe you're looking for something like this:
def tableize(df):
if not isinstance(df, pd.DataFrame):
return
df_columns = df.columns.tolist()
max_len_in_lst = lambda lst: len(sorted(lst, reverse=True, key=len)[0])
align_center = lambda st, sz: "{0}{1}{0}".format(" "*(1+(sz-len(st))//2), st)[:sz] if len(st) < sz else st
align_right = lambda st, sz: "{0}{1} ".format(" "*(sz-len(st)-1), st) if len(st) < sz else st
max_col_len = max_len_in_lst(df_columns)
max_val_len_for_col = dict([(col, max_len_in_lst(df.iloc[:,idx].astype('str'))) for idx, col in enumerate(df_columns)])
col_sizes = dict([(col, 2 + max(max_val_len_for_col.get(col, 0), max_col_len)) for col in df_columns])
build_hline = lambda row: '+'.join(['-' * col_sizes[col] for col in row]).join(['+', '+'])
build_data = lambda row, align: "|".join([align(str(val), col_sizes[df_columns[idx]]) for idx, val in enumerate(row)]).join(['|', '|'])
hline = build_hline(df_columns)
out = [hline, build_data(df_columns, align_center), hline]
for _, row in df.iterrows():
out.append(build_data(row.tolist(), align_right))
out.append(hline)
return "\n".join(out)
df = pd.DataFrame([[1, 2, 3], [11111, 22, 333]], columns=['a', 'b', 'c'])
print tableize(df)
Output: +-------+----+-----+ | a | b | c | +-------+----+-----+ | 1 | 2 | 3 | | 11111 | 22 | 333 | +-------+----+-----+
Solution 9:[9]
Update: an even better solution is to simply put the variable name of the dataframe on the last line of the cell. It will automatically print in a pretty format.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Data1': np.linspace(0,10,11), 'Data2': np.linspace(10,0,11)})
df
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Jon-Eric |
Solution 4 | Mark Andersen |
Solution 5 | Ofer |
Solution 6 | ejrb |
Solution 7 | sigint |
Solution 8 | Pafkone |
Solution 9 |