'Spark load csv file in jar from resources folder
I am trying to create a Spark application running on Scala that reads a .csv
file that is located in src/main/resources
directory and saves it on the local hdfs
instance. Everything works charming when I run it locally, whenever I bundle it as a .jar file however and deploy it on a server something goes wrong...
This is my code that that is located in src/main/scala
, the location of my datafile is src/main/resources/dataset.csv
:
val df = spark.read
.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load(getClass.getResource("dataset.csv").toString())
When I make a jar by calling sbt package
and deploy this to my server however, I receive the following error:
Exception in thread "main" java.lang.IllegalArgumentException:
java.net.URISyntaxException:
Relative path in absolute URI: jar:file:/root/./myapp_2.11-0.1.jar!/dataset.csv
How can I correctly link to my file?
Solution 1:[1]
Use getPath()
on the URL
object returned from getResource
to get an absolute path:
getClass.getResource("data.csv").getPath()
Like so:
/upload-data-scala-project/target/scala-2.11/classes/data.csv
Using toString
will give you a string representation of the URL like:
file:/upload-data-scala-project/target/scala-2.11/classes/data.csv
which has no leading slash, and is thus interpreted as an relative path.
Solution 2:[2]
When you have a path in your resources and deploy the code in cluster, the resources folder will be somewhere based on configuration path you provided in your code deploy set up Accordingly, you can specify that file by referring to the complete path of the resources folder
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Erick |