'AWS EMR s3a filesystem not found

I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script:

py4j.protocol.Py4JJavaError: An error occurred while calling o36.json.: 
   java.lang.RuntimeException: 
     java.lang.ClassNotFoundException: 
       Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

How can we resolve this?

Thanks in advance.

pyspark amazon-emr

Solution 1:^[1]

It was an issue with dependencies of spark. I had to add jars config in park-defaults.conf .

spark.jars.packages                com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.2

Please follow below link: https://gist.github.com/eddies/f37d696567f15b33029277ee9084c4a0

Solution 2:^[2]

Download the hadoop-aws-3.2.1.jar (or any version above 2.7.10 based on your EMR version) and put it in /usr/lib/spark/jars
Download the latest aws SDK and put it in /usr/lib/spark/jars
update /usr/lib/spark/conf/spark-defaults.conf
update spark.driver.extraClasspath - in the end add the full path of these 2 new jars, seperated by colon
run spark submit after that

Note: I used AWS EMR version 6.0+

Solution 3:^[3]

For Amazon EMR, use the "s3:" prefix. The S3A connector is the ASF's open source one; Amazon have their own (closed source) connector, which is the only one they support

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Raghav salotra
Solution 2	Shyam Prasad
Solution 3	stevel

'AWS EMR s3a filesystem not found

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]