'Is version control possible in AWS Glue ETL jobs?

I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and need to find a way to distinguish between the changes made to a job/job-script from different developers(managing the changes on a code). Is it possible to have something similar to versioning in informatica mappings and workflows in AWS Glue job/job-script. I can see there is versioning on objects in data-catalog. There isn't enough information on this in the aws documentation. Any help is appreciated. Thnx



Solution 1:[1]

This blog by amazon talked about how to implement source control, testing and CICD for glue applications

https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/

Solution 2:[2]

Since Glue jobs are deployed in S3, I see two options:

  1. Adding the git commit SHA1 in the filename
  2. Using S3's own versioning capabilities

Adding the git commit SHA1 in the filename

The following suggestion by Tyler Treat might work to achieve this:

You’ll also notice that I append the Git commit SHA to the name of the file uploaded to S3. This way, you’ll know exactly what version of the code the script contains and the bucket will retain a history of each script. This is useful when you need to debug a job or revert to a previous version.

Source: https://blog.realkinetic.com/continuous-deployment-for-aws-glue-c8abd50d7d58

He uses this in the context of a continuous deployment setup, where each Glue job script deployed to S3 gets the Git Commit SHA1 hash embedded in its filename. In that way he builds up the history in the storage bucket.

Using S3's own versioning capabilities

Another way might be to enable versioning on the storage bucket itself. I have not tested how this will play out with Glue, but to try this follow these steps:

  1. Enable versioning on the bucket itself using the following AWS CLI command:

    aws s3api put-bucket-versioning --bucket DOC-EXAMPLE-BUCKET1 --versioning-configuration Status=Enabled

    Source: https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html

  2. Versioning will happen automatically from then on. In case you need to rollback the version you can:

a) Copy a previous version of the object into the same bucket. The copied object becomes the current version of that object and all object versions are preserved.

b) Permanently delete the current version of the object. When you delete the current object version, you, in effect, turn the previous version into the current version of that object.

Source: https://docs.aws.amazon.com/AmazonS3/latest/userguide/RestoringPreviousVersions.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kun
Solution 2 benvdh