'How to use Apache Spark to query Hive table with Kerberos?
I am attempting to use Scala with Apache Spark locally to query Hive table which is secured with Kerberos. I have no issues connecting and querying the data programmatically without Spark. However, the problem comes when I try to connect and query in Spark.
My code when run locally without spark:
Class.forName("org.apache.hive.jdbc.HiveDriver")
System.setProperty("kerberos.keytab", keytab)
System.setProperty("kerberos.principal", keytab)
System.setProperty("java.security.krb5.conf", krb5.conf)
System.setProperty("java.security.auth.login.config", jaas.conf)
val conf = new Configuration
conf.set("hadoop.security.authentication", "Kerberos")
UserGroupInformation.setConfiguration(conf)
UserGroupInformation.createProxyUser("user", UserGroupInformation.getLoginUser)
UserGroupInformation.loginUserFromKeytab(user, keytab)
UserGroupInformation.getLoginUser.checkTGTAndReloginFromKeytab()
if (UserGroupInformation.isLoginKeytabBased) {
UserGroupInformation.getLoginUser.reloginFromKeytab()
}
else if (UserGroupInformation.isLoginTicketBased) UserGroupInformation.getLoginUser.reloginFromTicketCache()
val con = DriverManager.getConnection("jdbc:hive://hdpe-hive.company.com:10000", user, password)
val ps = con.prepareStatement("select * from table limit 5").executeQuery();
Does anyone know how I could include the keytab, krb5.conf and jaas.conf into my Spark initialization function so that I am able to authenticate with Kerberos to get the TGT?
My Spark initialization function:
conf = new SparkConf().setAppName("mediumData")
.setMaster(numCores)
.set("spark.driver.host", "localhost")
.set("spark.ui.enabled","true") //enable spark UI
.set("spark.sql.shuffle.partitions",defaultPartitions)
sparkSession = SparkSession.builder.config(conf).enableHiveSupport().getOrCreate()
I do not have files such as hive-site.xml, core-site.xml.
Thank you!
Solution 1:[1]
Looking at your code, you need to set the following properties in the spark-submit command on the terminal.
spark-submit --master yarn \
--principal YOUR_PRINCIPAL_HERE \
--keytab YOUR_KEYTAB_HERE \
--conf spark.driver.extraJavaOptions="-Djava.security.auth.login.config=JAAS_CONF_PATH" \
--conf spark.driver.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \
--conf spark.executor.extraJavaOptions="-Djava.security.auth.login.config=JAAS_CONF_PATH" \
--conf spark.executor.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \
--class YOUR_MAIN_CLASS_NAME_HERE code.jar
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | PHPirate |