'Getting Error while encoding from reading text file
I have a pipe delimited file I need to strip the first two rows off of. So I read it into and RDD, exclude the first two rows, and make it into a data frame.
val rdd = spark.sparkContext.textFile("/path/to/file")
val rdd2 = rdd.mapPartitionsWithIndex{ (id_x, iter) => if (id_x == 0) iter.drop(2) else iter }
val rdd3 = rdd2.map(_.split("\\|")).map{x
=> org.apache.spark.sql.Row(x:_*)}
val df = spark.sqlContext.createDataFrame(rdd3,schema)
That all works. I can do show on the dataframe, still works. However, when I attempt to save this as a parquet file, I get a huge stack trace with errors like:
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString,
validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]),
0, col1), StringType), true, false) AS col1#5637
This repeats for every column in the data frame, as far I can tell. What am I doing wrong? Is it something to the the UTF8 bit?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|