'Getting Error while encoding from reading text file

I have a pipe delimited file I need to strip the first two rows off of. So I read it into and RDD, exclude the first two rows, and make it into a data frame.

val rdd = spark.sparkContext.textFile("/path/to/file")
val rdd2 = rdd.mapPartitionsWithIndex{ (id_x, iter) => if (id_x == 0) iter.drop(2) else iter }
val rdd3 = rdd2.map(_.split("\\|")).map{x 
    => org.apache.spark.sql.Row(x:_*)}

val df = spark.sqlContext.createDataFrame(rdd3,schema)

That all works. I can do show on the dataframe, still works. However, when I attempt to save this as a parquet file, I get a huge stack trace with errors like:

if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString,
 validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 
 0, col1), StringType), true, false) AS col1#5637

This repeats for every column in the data frame, as far I can tell. What am I doing wrong? Is it something to the the UTF8 bit?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source