Category "amazon-emr"

Aiflow 2 Xcom in Task Groups

I have two tasks inside a TaskGroup that need to pull xcom values to supply the job_flow_id and step_id. Here's the code: with TaskGroup('execute_my_steps') a

mount_workspace_dir notebook magic not working in EMR Studio

In an EMR Studio Python3 notebook, I execute the following: %mount_workspace_dir . And receive the following error: UsageError: Line magic function `%mount_wor

How to use java runtime 11 in EMR cluster AWS

I'm creating a cluter in EMR aws and when spark runs my application I'm getting error below: Exception in thread "main" java.lang.UnsupportedClassVersionError:

AWS EMRFS S3 ranger plugin error for amazon s3

I am trying to integrate AWS EMR with Apache Ranger. out of 3 plugin hive, spark, and s3. Two plugins are working hive and spark but getting error while trying

AWS EMRFS S3 ranger plugin error for amazon s3

I am trying to integrate AWS EMR with Apache Ranger. out of 3 plugin hive, spark, and s3. Two plugins are working hive and spark but getting error while trying

Delta Table / Athena And Spark

I have my delta table, which can be read from Athena. When I try to get the data through a query from spark I get the following error: Caused by: org.apache.sp

Spark SQL error from EMR notebook with AWS Glue table partition

I'm testing some pyspark code in an EMR notebook before I deploy it and keep running into this strange error with Spark SQL. I have all my tables and metadata i

AWS EMR s3a filesystem not found

I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script: py4

AWS EMR s3a filesystem not found

I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script: py4

AWS EMR: Enable auto-termination-policy in cloudformation

I am trying to enable auto termination policy in EMR. Here is the documentation https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-auto-termination-poli

Spark Catalog w/ AWS Glue: database not found

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via s

Missing Jupyter Notebook Kernels in VSCode

I have multiple people working on the same AWS EMR cluster to run some Spark jobs. This is being done through Jupyter Notebooks which are created/modified usin

Setup Apache Sedona on EMR

I want to be able to use Apache Sedona for distributed GIS computing on AWS EMR. We need the right bootstrap script to have all dependencies. I tried setting up

Increasing Spark application timeout in Jupyter/Livy

I'm using a shared EMR cluster with Jupyterhub installed. If my cluster is under heavy load, I get an error How do I increase the timeout for a spark applicati

Airflow/Luigi for AWS EMR automatic cluster creation and pyspark deployment

I am new to airflow automation, i dont now if it is possible to do this with apache airflow(or luigi etc) or should i just make a long bash file to do this. I

Trino iceberg connector "getTablesWithParameter for GlueHiveMetastore is not implemented"

I'm running trino on EMR version 6.5 and I have added the iceberg connector for the trino and I want it to use a glue catalog. These are the configuration under

How do you full text search in an amazon s3 bucket?

What are options to create solution based on the AWS native platform to be able to full text search in an amazon s3 bucket/s. We have process that will be stori

"HTTPConnectionPool(host='127.0.0.1', port=9000): [Errno 111] Connection refused" error on AWS EMR when loading Stanford NLP model

I am using the Stanford CoreNLP Model in an algorithm, which includes a Java client to the server, the (StanfordCoreNLPClient) in order to interact with CoreNLP