Category "aws-glue"

Use AWS CDK to trigger AWS Glue Worklfow from EventBridge

As mentioned in this link, it's not supported by Level 2 constructs. But, it's possible to use Level 1 Constructs to implement it. Can anyone show me how to do

AWS Glue 3.0 PySpark: different behavior when installing dependencies using wheels vs installing same dependencies with Glue itself

Having a problem launching PySpark job that utilizes connection to RedShift via awswrangler lib. Everything works fine if using --additional-python-modules: aws

Glue Dynamic Frame Parse text file with ¶ delimiter

I have a text file which look like below. HDR¶20200101 BDY¶1¶Jimmy BDY¶1¶Something TRL¶123 I would like to parse it to a Glue Dyn

Glue-Spark transform to Postgres time data type

Postgres has a time data type. I am trying to insert rows into postgres from a glue job. Given the code: applymapping1 = ApplyMapping.apply(frame = SelectFromCo

Unable to connect oracle database using cx_oracle from AWS Glue

I am trying to connect oracle database from AWS glue using cx_oracle but i am getting this error message DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle

AWS Glue 2.0, local pyspark development, testing confusion

I'm new to Glue jobs and I'm looking to try to use Glue 2.0 to run PySpark jobs (python 3) that require the following python libraries as defined in my requirem

Glue Crawler: The number of unique events received is 0 for the target

I've created a crawler that pulls messages from SQS when new objects are added on S3 but when it runs the message "The number of unique events received is 0 for

How to connect and query MySQL DB from python shell job in AWS Glue

I was using sqlalchemy to create connection and query mySQL DB, however, glue doesn't seem to support "sqlalchemy" or even "pymysql". Is there a way to do this

How to create External Table without specifying columns in Redshift?

I have a folder containing files in parquet format. I used crawler to create table defined in Glue Data Catalog which counted to 2500+ columns. I want to create

Glue 3.0 has Psycopg2 but "No module named 'psycopg2._psycopg'"?

I'm using AWS Glue 3.0 and am trying to connect to Redshift using Psycopg2. At first I was uploading a whl file version of it and it would give me the error abo

Spark SQL error from EMR notebook with AWS Glue table partition

I'm testing some pyspark code in an EMR notebook before I deploy it and keep running into this strange error with Spark SQL. I have all my tables and metadata i

Is version control possible in AWS Glue ETL jobs?

I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and n

Get all tables and fields from glue data catalog

Thanks for taking your time to read this! I have multiple tables within an AWS glue catalog database and want to create an ER diagram from that database. It sho

SPARK SQL create table does not show / read all columns as expected

I am trying to create table in spark sql by providing the schema and giving the location. However when i run select on the table, i see only half the columns. (

Does AWS Glue support positional arguments

How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?

Copy and Merge files to another S3 bucket

I have a source bucket where small 5KB JSON files will be inserted every second. I want to use AWS Athena to query the files by using an AWS Glue Datasource and

How to copy data from Amazon S3 to DDB using AWS Glue

I am following AWS documentation on how to transfer DDB table from one account to another. There are two steps: Export DDB table into Amazon S3 Use a Glue job t

Non-Partitioned Table Schema not updated with Glue ETL Job

We have an ETL job that uses the below code snippet to update the catalog table: sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc']

Having trouble setting up multiple tables in AWS glue from a single bucket

So, I've used Glue before, but it's been with a single file <> single folder relationship. What I'm trying to do now is to have a structure like this crea

Spark Catalog w/ AWS Glue: database not found

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via s

Category "aws-glue"

Other Categories