Category "spark-dataframe"

How to make sure my DataFrame frees its memory?

I have a Spark/Scala job in which I do this: 1: Compute a big DataFrame df1 + cache it into memory 2: Use df1 to compute dfA 3: Read raw data into df2 (again,

How to slice a pyspark dataframe in two row-wise

I am working in Databricks. I have a dataframe which contains 500 rows, I would like to create two dataframes on containing 100 rows and the other containing t

How can you parse a string that is json from an existing temp table using PySpark?

I have an existing Spark dataframe that has columns as such: -------------------- pid | response -------------------- 12 | {"status":"200"} response is a st

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.

Converting Pandas dataframe into Spark dataframe error

I'm trying to convert Pandas DF into Spark one. DF head: 10000001,1,0,1,12:35,OK,10002,1,0,9,f,NA,24,24,0,3,9,0,0,1,1,0,0,4,543 10000001,2,0,1,12:36,OK,10002,1

How to efficiently read data from mongodb and convert it into spark's dataframe?

I have already researched a lot but could not find a solution. Closest question I could find here is Why my SPARK works very slowly with mongoDB. I am trying t

Spark saveAsTextFile() results in Mkdirs failed to create for half of the directory

I am currently running a Java Spark Application in tomcat and receiving the following exception: Caused by: java.io.IOException: Mkdirs failed to create file:/