'spark elasticsearch: Multiple ES-Hadoop versions detected in the classpath

I'm new to spark. I'm trying to run a spark job that loads data to elasticsearch. I've built a fat jar from my code and used it during spark-submit.

spark-submit \
  --class CLASS_NAME \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 20 \
  --executor-cores 5 \
  --executor-memory 32G \

The maven dependency of elasticsearch-hadoop dependency is:


When I don't include the elasticsearch-hadoop jar file in the EXTERNAL_JAR_FILES list, then I'm getting this error.

Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.rdd.CompatUtils
  at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:344)
  at org.elasticsearch.hadoop.util.ObjectUtils.loadClass(ObjectUtils.java:73)
  ... 26 more

If I include it in the EXTERNAL_JAR_FILES list, I'm getting this error.

java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one

  at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73)
  at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
  at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
  at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:97)
  at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:97)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  at org.apache.spark.scheduler.Task.run(Task.scala:108)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

Is there anything that needs to be done to overcome it?

Solution 1:[1]

The problem is solved by not including the elasticserach-hadoop jar in the fat jar I've built. I've mentioned scope param to provided in the dependency.


Solution 2:[2]

I solved this problem


note that [<scope>provided</scope>]

then you can use command:

bin/spark-submit \
--maste local[*] \
--class xxxxx  \
--jars https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-hadoop/7.4.2/elasticsearch-hadoop-7.4.2.jar \

Solution 3:[3]

I was facing this issue because I changed my project's build from SBT to POM. On exploring. I saw that there were two jars in classpath one from .ivy2 folder another from .mvn I deleted the one from .ivy2 and the issue disappeared. Hope it helps somebody.

Solution 4:[4]

This happened to me when I had a dependency that used a different version of elasticsearch-spark than the elasticsearch-spark dependency explicitly stated in my pom file. For example, I added the elasticsearch-spark-30_2.12 dependency but I had a separate dependency that used elasticsearch-spark-13_2.10 so I just added an exclusion to that separate dependency in my pom file like so,



This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pkgajulapalli
Solution 2 geekyouth
Solution 3 Kundan Singh Thakur
Solution 4 Chris Gong