'Databricks display() function equivalent or alternative to Jupyter

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent for Jupyter(im not sure but i think its a DataBricks specific function), i tried :

dataframe.show()

But it's a text version of it ,when you have many columns it breaks , so i'm trying to find an alternative to display() that can render Spark dataframes better than show() functions. Is there any equivalent or alternative to this?



Solution 1:[1]

When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.

# Alternative to Databricks display function.
import pandas as pd
pd.set_option('max_columns', None)

myDF.limit(10).toPandas().head()enter image description here

Solution 2:[2]

First Recommendation: When you use Jupyter, don't use df.show() instead use df.limit(10).toPandas().head() which results perfect display even better Databricks display()

Second Recommendation: Zeppelin Notebook. Just use z.show(df.limit(10))

Additionally in Zeppelin;

  1. You register your dataframe as SQL Table df.createOrReplaceTempView('tableName')
  2. Insert new paragraph beginning %sql then query your table with amazing display.

Solution 3:[3]

In recent IPython, you can just use display(df) if df is a panda dataframe, it will just work. On older version you might need to do a from IPython.display import display. It will also automatically display if the result of the last expression of a cell is a data_frame. For example this notebook. Of course the representation will depends on the library you use to make your dataframe. If you are using PySpark and it does not defined a nice representation by default, then you'll need to teach IPython how to display the Spark DataFrame. For example here is a project that teach IPython how to display Spark Contexts, and Spark Sessions.

Solution 4:[4]

Without converting to pandas dataframe. Use this... This will render dataframe in proper grids.

from IPython.core.display import HTML
display(HTML("<style>pre { white-space: pre !important; }</style>"))

df.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Duncan Lukkenaer
Solution 2
Solution 3 Matt
Solution 4 Itachi