I have a folder containing files in parquet format. I used crawler to create table defined in Glue Data Catalog which counted to 2500+ columns. I want to create
I'm using AWS Glue 3.0 and am trying to connect to Redshift using Psycopg2. At first I was uploading a whl file version of it and it would give me the error abo
I'm testing some pyspark code in an EMR notebook before I deploy it and keep running into this strange error with Spark SQL. I have all my tables and metadata i
I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and n
Thanks for taking your time to read this! I have multiple tables within an AWS glue catalog database and want to create an ER diagram from that database. It sho
I am trying to create table in spark sql by providing the schema and giving the location. However when i run select on the table, i see only half the columns. (
How to capture a Glue job's arguments by position rather than using the getResolvedOptions function and passing the arguments as key value pairs?
I have a source bucket where small 5KB JSON files will be inserted every second. I want to use AWS Athena to query the files by using an AWS Glue Datasource and
I am following AWS documentation on how to transfer DDB table from one account to another. There are two steps: Export DDB table into Amazon S3 Use a Glue job t
We have an ETL job that uses the below code snippet to update the catalog table: sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc']
So, I've used Glue before, but it's been with a single file <> single folder relationship. What I'm trying to do now is to have a structure like this crea
Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via s
I am trying to populate maximum possible Glue job metrics for some testing, below is the setup I have created: A crawler reads data (dummy customer data of 500
According to AWS Glue documentation, we can use exlusions to exclude files when the connection type is s3: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-
I'm running trino on EMR version 6.5 and I have added the iceberg connector for the trino and I want it to use a glue catalog. These are the configuration under
When I started job with IAM Role AWSGlueServiceNotebookRoleDefault I have this error: Failed to authenticate user due to missing information in request. No info
I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes. The crawler takes roughly 20 seconds
Looks like my earlier post was not clear. Here is what am looking for, I have an aws glue catalog table consisting of 29 columns. Source table with 31 columns.
According to Moving data from S3 -> RDS using AWS Glue I found that an instance is required to add a connection to a data target. However, my RDS is a serve
I have ran a crawler on json S3 file for updating an existing external table. Once finished I checked the SVL_S3LOG to see the structure of the external table a