'Spark load csv file in jar from resources folder

I am trying to create a Spark application running on Scala that reads a .csv file that is located in src/main/resources directory and saves it on the local hdfs instance. Everything works charming when I run it locally, whenever I bundle it as a .jar file however and deploy it on a server something goes wrong...

This is my code that that is located in src/main/scala, the location of my datafile is src/main/resources/dataset.csv:

val df = spark.read
  .format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load(getClass.getResource("dataset.csv").toString())

When I make a jar by calling sbt package and deploy this to my server however, I receive the following error:

Exception in thread "main" java.lang.IllegalArgumentException: 
java.net.URISyntaxException: 
Relative path in absolute URI: jar:file:/root/./myapp_2.11-0.1.jar!/dataset.csv

How can I correctly link to my file?



Solution 1:[1]

Use getPath() on the URL object returned from getResource to get an absolute path:

getClass.getResource("data.csv").getPath()

Like so:

/upload-data-scala-project/target/scala-2.11/classes/data.csv

Using toString will give you a string representation of the URL like:

file:/upload-data-scala-project/target/scala-2.11/classes/data.csv

which has no leading slash, and is thus interpreted as an relative path.

Solution 2:[2]

When you have a path in your resources and deploy the code in cluster, the resources folder will be somewhere based on configuration path you provided in your code deploy set up Accordingly, you can specify that file by referring to the complete path of the resources folder

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Erick