Category "aws-glue"

Unable to connect oracle database using cx_oracle from AWS Glue

I am trying to connect oracle database from AWS glue using cx_oracle but i am getting this error message DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle

AWS Glue 2.0, local pyspark development, testing confusion

I'm new to Glue jobs and I'm looking to try to use Glue 2.0 to run PySpark jobs (python 3) that require the following python libraries as defined in my requirem

Glue Crawler: The number of unique events received is 0 for the target

I've created a crawler that pulls messages from SQS when new objects are added on S3 but when it runs the message "The number of unique events received is 0 for

How to connect and query MySQL DB from python shell job in AWS Glue

I was using sqlalchemy to create connection and query mySQL DB, however, glue doesn't seem to support "sqlalchemy" or even "pymysql". Is there a way to do this

How to create External Table without specifying columns in Redshift?

I have a folder containing files in parquet format. I used crawler to create table defined in Glue Data Catalog which counted to 2500+ columns. I want to create

Glue 3.0 has Psycopg2 but "No module named 'psycopg2._psycopg'"?

I'm using AWS Glue 3.0 and am trying to connect to Redshift using Psycopg2. At first I was uploading a whl file version of it and it would give me the error abo

Spark SQL error from EMR notebook with AWS Glue table partition

I'm testing some pyspark code in an EMR notebook before I deploy it and keep running into this strange error with Spark SQL. I have all my tables and metadata i

Is version control possible in AWS Glue ETL jobs?

I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and n

Get all tables and fields from glue data catalog

Thanks for taking your time to read this! I have multiple tables within an AWS glue catalog database and want to create an ER diagram from that database. It sho

SPARK SQL create table does not show / read all columns as expected

I am trying to create table in spark sql by providing the schema and giving the location. However when i run select on the table, i see only half the columns. (

Does AWS Glue support positional arguments

How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?

Copy and Merge files to another S3 bucket

I have a source bucket where small 5KB JSON files will be inserted every second. I want to use AWS Athena to query the files by using an AWS Glue Datasource and

How to copy data from Amazon S3 to DDB using AWS Glue

I am following AWS documentation on how to transfer DDB table from one account to another. There are two steps: Export DDB table into Amazon S3 Use a Glue job t

Non-Partitioned Table Schema not updated with Glue ETL Job

We have an ETL job that uses the below code snippet to update the catalog table: sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc']

Having trouble setting up multiple tables in AWS glue from a single bucket

So, I've used Glue before, but it's been with a single file <> single folder relationship. What I'm trying to do now is to have a structure like this crea

Spark Catalog w/ AWS Glue: database not found

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via s

Not able to populate AWS Glue ETL Job metrics

I am trying to populate maximum possible Glue job metrics for some testing, below is the setup I have created: A crawler reads data (dummy customer data of 500

exclusions doesn't work in AWS Glue ELT job s3 connection

According to AWS Glue documentation, we can use exlusions to exclude files when the connection type is s3: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-

Trino iceberg connector "getTablesWithParameter for GlueHiveMetastore is not implemented"

I'm running trino on EMR version 6.5 and I have added the iceberg connector for the trino and I want it to use a glue catalog. These are the configuration under

AWS Glue Jupyter Notebook Failed to authenticate user

When I started job with IAM Role AWSGlueServiceNotebookRoleDefault I have this error: Failed to authenticate user due to missing information in request. No info

Category "aws-glue"

Other Categories