'Spark : skip top rows with spark-excel

I have an excel file with damaged rows on the top (3 first rows) which needs to be skipped, I'm using spark-excel library to read the excel file, on their github there no such functionality, so is there a way to achieve this?

This my code:

Dataset<Row> ds = session.read().format("com.crealytics.spark.excel")
                                .option("location", filePath)
                                .option("sheetName", "Feuil1")
                                .option("useHeader", "true")
                                .option("delimiter", "|")
                                .option("treatEmptyValuesAsNulls", "true")
                                .option("inferSchema", "true")
                                .option("addColorColumns", "false")
                                .load(filePath);


Solution 1:[1]

This issue is fixed with spark excel 0.9.16, issue link in github

Solution 2:[2]

I have looked at the source code and there is no option for the same

https://github.com/crealytics/spark-excel/blob/master/src/main/scala/com/crealytics/spark/excel/DefaultSource.scala

You should fix your excel file and remove the first 3 rows. Or else you would need to create a patched version of the code to allow you the same. Which would be way more effort then having a correct excel sheet

Solution 3:[3]

You can use the skipFirstRows option (I believe it is deprecated after version 0.11)

Library Dependency : "com.crealytics" %% "spark-excel" % "0.10.2"

Sample Code :

val df = sparkSession.read.format("com.crealytics.spark.excel")
      .option("location", inputLocation)
      .option("sheetName", "sheet1")
      .option("useHeader", "true")
      .option("skipFirstRows", "2") // Mention the number of top rows to be skipped
      .load(inputLocation)

Hope it helps! Feel free to let me know in comments if you have any doubts/issues. Thanks!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Abdennacer Lachiheb
Solution 2 Tarun Lalwani
Solution 3 Hema Priya Velaga