Category "amazon-emr"

Problem to read data from HBase on AWS EMR cluster using Java Spring boot client

I'm trying to write a simple API application to read data from HBase on an AWS EMR cluster. But I get an UnknownHostException when I try to send the reques

AWS EMR EMRFS Kerberos login on policy refresh

I installed Kerberos on a ec2 server and on a second ec2 server I installed Apache Ranger (with Kerberos auth added in core-site file,hadoop.security.authentica

Package sparse_dot_topn in Pyspark AWS EMR Jupyter install error

Running on AWS and EMR, Jupyter, Pyspark notebook and trying to install a python package "sparse_dot_topn" version 0.2.9 I'm getting an error I don't understand

Spark on Rapids single node

I'm trying to run Tpcds on Rapids single node on EMR using this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html But getting res

S3DistCP - Split source in multiples jobs

I have to do copy of an S3 to HDFS of an cluster EMR. I'm trying to smaller the execution time of my job. Looking in the logs the map input of the job is 1_000_

Aiflow 2 Xcom in Task Groups

I have two tasks inside a TaskGroup that need to pull xcom values to supply the job_flow_id and step_id. Here's the code: with TaskGroup('execute_my_steps') a

mount_workspace_dir notebook magic not working in EMR Studio

In an EMR Studio Python3 notebook, I execute the following: %mount_workspace_dir . And receive the following error: UsageError: Line magic function `%mount_wor

How to use java runtime 11 in EMR cluster AWS

I'm creating a cluter in EMR aws and when spark runs my application I'm getting error below: Exception in thread "main" java.lang.UnsupportedClassVersionError:

AWS EMRFS S3 ranger plugin error for amazon s3

I am trying to integrate AWS EMR with Apache Ranger. out of 3 plugin hive, spark, and s3. Two plugins are working hive and spark but getting error while trying

AWS EMRFS S3 ranger plugin error for amazon s3

I am trying to integrate AWS EMR with Apache Ranger. out of 3 plugin hive, spark, and s3. Two plugins are working hive and spark but getting error while trying

Delta Table / Athena And Spark

I have my delta table, which can be read from Athena. When I try to get the data through a query from spark I get the following error: Caused by: org.apache.sp

Spark SQL error from EMR notebook with AWS Glue table partition

I'm testing some pyspark code in an EMR notebook before I deploy it and keep running into this strange error with Spark SQL. I have all my tables and metadata i

AWS EMR s3a filesystem not found

I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script: py4

AWS EMR s3a filesystem not found

I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script: py4

AWS EMR: Enable auto-termination-policy in cloudformation

I am trying to enable auto termination policy in EMR. Here is the documentation https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-auto-termination-poli

Spark Catalog w/ AWS Glue: database not found

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via s

Missing Jupyter Notebook Kernels in VSCode

I have multiple people working on the same AWS EMR cluster to run some Spark jobs. This is being done through Jupyter Notebooks which are created/modified usin

Setup Apache Sedona on EMR

I want to be able to use Apache Sedona for distributed GIS computing on AWS EMR. We need the right bootstrap script to have all dependencies. I tried setting up

Increasing Spark application timeout in Jupyter/Livy

I'm using a shared EMR cluster with Jupyterhub installed. If my cluster is under heavy load, I get an error How do I increase the timeout for a spark applicati

Airflow/Luigi for AWS EMR automatic cluster creation and pyspark deployment

I am new to airflow automation, i dont now if it is possible to do this with apache airflow(or luigi etc) or should i just make a long bash file to do this. I

Category "amazon-emr"

Other Categories