'Not able to populate AWS Glue ETL Job metrics
I am trying to populate maximum possible Glue job metrics for some testing, below is the setup I have created:
- A crawler reads data (dummy customer data of 500 rows) from a CSV file placed in an S3 bucket.
- Used another crawler to crawl tables created in Redshift cluster.
- An ETL job finally reads data from csv file in s3 and dumps it into a Redshift table.
The job is running without any issue and i am able to see final data getting dumped into Redshift table, however, in the end, only below 5 Cloudwatch metrics are being populated:
- glue.jvm.heap.usage
- glue.jvm.heap.used
- glue.s3.filesystem.read_bytes
- glue.s3.filesystem.write_bytes
- glue.system.cpuSystemLoad
There are approximately 20 more metrics which are not getting populated.
Any suggestions on how to populate those remaining metrics as well?
Solution 1:[1]
Met the same issue. Does your glue.s3.filesystem.read_bytes and glue.s3.filesystem.write_bytes have any data?
One possible reason is that the AWS Glue job metrics not emitted if job completes in less then 30 sec
Solution 2:[2]
While running the job enable the metrics option under monitoring tab.
Solution 3:[3]
- Double check if the CW metrics for your job is enabled
- Make sure your job runs longer say > 3mins such that it allows CW to push the metrics
- For this you can add a sleep time in your code
- Assuming that you are using Glue version 2.0+ for the above job, please be advised that AWS Glue version 2.0+ does not use dynamic allocation, hence the ExecutorAllocationManager metrics are not available. Trackback on using Glue 1.0 and you should confirm that all the documented metrics are now available.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Shirui Xu |
Solution 2 | Shubham Jain |
Solution 3 |