'Run spark program locally with intellij

I tried to run a simple test code in intellij IDEA. Here is my code:

import org.apache.spark.sql.functions._
import org.apache.spark.{SparkConf}
import org.apache.spark.sql.{DataFrame, SparkSession}

object hbasetest {

  val spconf = new SparkConf()
  val spark = SparkSession.builder().master("local").config(spconf).getOrCreate()
  import  spark.implicits._

  def main(args : Array[String]) {
    val df = spark.read.parquet("file:///Users/cy/Documents/temp")
    df.show()
    spark.close()
  }
}

My dependencies list:

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.0</version>
<!--<scope>provided</scope>-->
</dependency>

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.1.0</version>
  <!--<scope>provided</scope>-->
</dependency>

when I click with run button, it throw an exception:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.TaskID.<init>(Lorg/apache/hadoop/mapreduce/JobID;Lorg/apache/hadoop/mapreduce/TaskType;I)V

I checked this post, but situation don't change after making modification. Can I get some help with running local spark application in IDEA? THx.

Update: I can run this code with spark-submit. I hope to directly run it with run button in IDEA.



Solution 1:[1]

Are you using cloudera sandbox and running this application because in POM.xml i could see CDH dependencies '2.6.0-mr1-cdh5.5.0'.

If you are using cloudera please use the below dependency for your spark scala project because the 'spark-core_2.10' artifact version gets changed.

<dependencies>
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.10.2</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.0.0-cdh5.1.0</version>
  </dependency>
</dependencies>

I used the below reference to run my spark application.

Reference: http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/

Solution 2:[2]

Here are the settings I use for Run/Debug configuration in IntelliJ:

*Main class:*
org.apache.spark.deploy.SparkSubmit

*VM Options:*
-cp <spark_dir>/conf/:<spark_dir>/jars/* -Xmx6g

*Program arguments:*
--master
local[*]
--conf
spark.driver.memory=6G
--class
com.company.MyAppMainClass
--num-executors
8
--executor-memory
6G
<project_dir>/target/scala-2.11/my-spark-app.jar
<my_spark_app_args_if_any>

spark-core and spark-sql jars are referred in my build.sbt as "provided" dependencies and their versions must match one of the Spark installed in spark_dir. I use Spark 2.0.2 at the moment with hadoop-aws jar version 2.7.2.

Solution 3:[3]

It may be late for the reply, but I just had the same issue. You can run with spark-submit, probably you already had related dependencies. My solution is:

  • Change the related dependencies in Intellij Module Settings for your projects from provided to compile. You may only change part of them but you have to try. Brutal solution is to change all.

  • If you have further exception after this step such as some dependencies are "too old", change the order of related dependencies in module settings.

Solution 4:[4]

I ran into this issue as well, and I also had an old cloudera hadoop reference in my code. (You have to click the 'edited' link in the original poster's link to see his original pom settings).

I could leave that reference in as long as I put this at the top of my dependencies (order matters!). You should match it against your own hadoop cluster settings.

<dependency>
  <!-- THIS IS REQUIRED FOR LOCAL RUNNING IN INTELLIJ -->
  <!-- IT MUST REMAIN AT TOP OF DEPENDENCY LIST TO 'WIN' AGAINST OLD HADOOP CODE BROUGHT IN-->
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.6.0-cdh5.12.0</version>
  <scope>provided</scope>
</dependency>

Note that in 2018.1 version of Intellij, you can check Include dependiencies with "Provided" Scope which is a simple way to keep your pom scopes clean.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jose praveen
Solution 2 Denis Makarenko
Solution 3 Kwitter
Solution 4 sethcall