'py4j.protocol.Py4JJavaError: An error occurred while calling o63.save. : java.lang.NoClassDefFoundError: org/apache/spark/Logging
I am new to Spark and BigData component - HBase, I am trying to write Python code in Pyspark and connect to HBase to read data from HBase. I'm using the following versions:
- Spark version:
spark-3.1.2-bin-hadoop2.7
- Python version:
3.8.5
- HBase version:
hbase-2.3.5
I have installed standalone Hbase and Spark in my local on ubuntu 20.04
Code:
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
df = sc.parallelize([("1","Abby","Smith","K","3456main","Orlando","FL","45235"),
("2","Amaya","Williams","L","123Orange","Newark","NJ","27656"),("3","Alchemy","Davis","P","Warners","Sanjose","CA","34789")])
.toDF(schema=['key','firstName','lastName','middleName','addressLine','city','state','zipCode'])
df.show()
catalog=''.join('''{
"table":{"namespace":"emp_data","name":"emp_info"},
"rowkey":"key",
"columns":{
"key":{"cf":"rowkey","col":"key","type":"string"},
"fName":{"cf":"person","col":"firstName","type":"string"},
"lName":{"cf":"person","col":"lastName","type":"string"},
"mName":{"cf":"person","col":"middleName","type":"string"},
"addressLine":{"cf":"address","col":"addressLine","type":"string"},
"city":{"cf":"address","col":"city","type":"string"},
"state":{"cf":"address","col":"state","type":"string"},
"zipCode":{"cf":"address","col":"zipCode","type":"string"}
}
}'''.split())
#Writing
print("Writing into HBase")
df.write\
.options(catalog=catalog)\
.format(data_source_format)\
.save()
#Reading
print("Readig from HBase")
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
print("Program Ends")
Error Message:
Writing into HBase
Traceback (most recent call last):
File "/mnt/c/Codefiles/pyspark_test.py", line 36, in
df.write
File "/home/aditya/spark-3.1.2-bin-
hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line
1107, in save
File "/home/aditya/spark-3.1.2-bin-hadoop2.7/python/lib/py4j-0.10.9-
src.zip/py4j/java_gateway.py", line 1304, in call
File "/home/aditya/spark-3.1.2-bin-
hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in
deco
File "/home/aditya/spark-3.1.2-bin-hadoop2.7/python/lib/py4j-0.10.9-
src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o63.save.
: java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
Solution 1:[1]
check if java is installed and Java_home is set in the env variables.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Tarun Teja |