'Import custom udf from jar to Spark
I am using Jupyter notebook for running Spark. My problem arises when I am trying to register a UDF from my custom imported jar.
This is how I create th UDF in my custom jar:
package com.udf;
import org.apache.spark.sql.api.java.UDF1;
public class TestUDF implements UDF1<String,String> {
public String call(String arg) throws Exception {
return doSomehing(arg);
}
...
then I try to import it like this in Jupyter notebook
val spark = SparkSession.builder
.master("yarn")
.appName("Spark SQL")
.config("spark.jars", "/user/.../test.udf.jar")
.getOrCreate()
or like this
spark.sparkContext.addJar("/user/.../test.udf.jar")
Not sure if these imports work fine but there is no error message at least. Then trying to register my udf like this
spark.udf.register("myUDF", TestUDF.call)
I get an error message:
not found: value TestUDF
(Tried some other names but also not found)
This approach seems legit. But I couldn't find explanations about both importing jar and accessing udf from it. Am I missing something important? Could anyone help me with this?
Edit:
Maybe TestUDF should be explicitly imported like this
import com.udf.TestUDF
but this import attempt returns an error: object udf is not a member of package com
and registering
spark.udf.register("myUDF", new TestUDF(), StringType)
returns not found: type TestUDF
Solution 1:[1]
Try first importing your jar file by using sparkContext
spark.sparkContext.addPyFile("yourjarfilepath")
then
spark.udf.registerJavaFunction("UDF1","com.udf.UDF1",StringType())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | thalearningmenace |