'Databricks local test fail with java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism

I have a unit test to databricks code, and I want to run it locally on windows. Unluckily when I run pytest with PyCharm, it throws the following exception:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism(Ljava/lang/String;)V
at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:84)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2747)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2747)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

And from source code it is from the initialization:

        spark = SparkSession.builder \
        .master("local[2]") \
        .appName("Helper Functions Unit Testing") \
        .getOrCreate()

I do search the above error and most of them are related to maven configure to add dependency of hadoop auth. However, for pyspark, I don't know how to deal with it. Does anyone have experience or insight for this error?



Solution 1:[1]

Here my workaround is to have python version to 3.7 and change pyspark version to 3.0, and then it seems ok. So it is related to the environment and dependency inconsistent. This is just limit to my case, and from my search on web most is related to maven to add hadoop-auth.jar dependency for hadoop configuration.

Solution 2:[2]

Encountered this error for a Maven project written in Scala, not Python. What did it for me was adding not only the hadoop-auth dependency like OP specified but also the hadoop-common dependency in my pom file like so,

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>3.1.2</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-auth</artifactId>
    <version>3.1.2</version>
</dependency>

Replace 3.1.2 with whatever version you're using. However, I also found that I had to find other dependencies that conflicted with hadoop-common and hadoop-auth and add exclusions to them like so,

<exclusions>
    <exclusion>
        <artifactId>hadoop-common</artifactId>
        <groupId>org.apache.hadoop</groupId>
    </exclusion>
    <exclusion>
        <artifactId>hadoop-auth</artifactId>
        <groupId>org.apache.hadoop</groupId>
    </exclusion>
</exclusions>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tommy Tan
Solution 2 Chris Gong