'How to use Apache Spark to query Hive table with Kerberos?

I am attempting to use Scala with Apache Spark locally to query Hive table which is secured with Kerberos. I have no issues connecting and querying the data programmatically without Spark. However, the problem comes when I try to connect and query in Spark.

My code when run locally without spark:

Class.forName("org.apache.hive.jdbc.HiveDriver")

    System.setProperty("kerberos.keytab", keytab)
    System.setProperty("kerberos.principal", keytab)
    System.setProperty("java.security.krb5.conf", krb5.conf)
    System.setProperty("java.security.auth.login.config", jaas.conf)

    val conf = new Configuration
    conf.set("hadoop.security.authentication", "Kerberos")

    UserGroupInformation.setConfiguration(conf)
    UserGroupInformation.createProxyUser("user", UserGroupInformation.getLoginUser)
    UserGroupInformation.loginUserFromKeytab(user, keytab)
    UserGroupInformation.getLoginUser.checkTGTAndReloginFromKeytab()

    if (UserGroupInformation.isLoginKeytabBased) {
      UserGroupInformation.getLoginUser.reloginFromKeytab()
    }
    else if (UserGroupInformation.isLoginTicketBased) UserGroupInformation.getLoginUser.reloginFromTicketCache()

    val con = DriverManager.getConnection("jdbc:hive://hdpe-hive.company.com:10000", user, password)
    val ps = con.prepareStatement("select * from table limit 5").executeQuery();

Does anyone know how I could include the keytab, krb5.conf and jaas.conf into my Spark initialization function so that I am able to authenticate with Kerberos to get the TGT?

My Spark initialization function:

conf = new SparkConf().setAppName("mediumData")
      .setMaster(numCores)
      .set("spark.driver.host", "localhost")
      .set("spark.ui.enabled","true") //enable spark UI
      .set("spark.sql.shuffle.partitions",defaultPartitions)
    sparkSession = SparkSession.builder.config(conf).enableHiveSupport().getOrCreate()

I do not have files such as hive-site.xml, core-site.xml.

Thank you!



Solution 1:[1]

Looking at your code, you need to set the following properties in the spark-submit command on the terminal.

spark-submit --master yarn \
--principal YOUR_PRINCIPAL_HERE \
--keytab YOUR_KEYTAB_HERE \
--conf spark.driver.extraJavaOptions="-Djava.security.auth.login.config=JAAS_CONF_PATH" \
--conf spark.driver.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \
--conf spark.executor.extraJavaOptions="-Djava.security.auth.login.config=JAAS_CONF_PATH" \
--conf spark.executor.extraJavaOptions="-Djava.security.krb5.conf=KRB5_PATH" \
--class YOUR_MAIN_CLASS_NAME_HERE code.jar

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PHPirate