'Is there any way to read the contents of an S3 file from an AWS Step Function?
I have a particular workflow where I want to pass a list of 500 json strings from a lambda function to a step function (stepFunction1
), and then iterate over the list in that step function's map state. From there, I want to pass each item in the list to a separate step function (stepFunction2
) where additional work will be done.
My problem is that my list of 500 json strings exceeds the AWS service limit when passed to stepFunction1
. I have tried splitting up the list into several smaller segments, but this leads to several invocations of stepFunction1
running concurrently, which I can't have due to other limitations. My next idea was to try and store the list of json strings on an S3 bucket, access it from stepFunction1
, and then iterate through it from there. Is there any way to achieve this? Is it possible to read a file in S3 from an AWS state machine? I'm a bit stumped here.
Solution 1:[1]
One solution is to store the items in an Amazon DynamoDB table and directly access them from AWS Step Functions.
Here's an example how to retrieve an item from DynamoDB:
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "MyTable",
"Key": {
"MessageId": {"S.$": "$.List[0]"}
}
},
"ResultPath": "$.DynamoDB",
"Next": "Do something"
}
You can find more information about calling DynamoDB APIs with Step Functions in the documentation.
Solution 2:[2]
Step functions works very well with AWS Lambda functions, you could design a nice workflow easily.
You could read S3 from a lambda. In the end your lambda could work separately and be part of a step function.
I would advise you first create a single lambda function, read and process the s3 file and later try with stepfunction if it fits in your scenario.
Solution 3:[3]
You can use GetObject S3 API. It can read your JSON file stored in S3 as a string under Body entity of the state output, so you can then convert it to JSON at ResultSelector with Intrinsic function "States.JsonToString", like "myJson.$": "States.StringToJson($.Body)".
The code example could be:
{
"StartAt": "GetObject",
"States": {
"GetObject": {
"Type": "Task",
"Parameters": {
"Bucket": "<YOUR S3 Bucket Name>",
"Key": "<YOUR JSON File Name>"
},
"Resource": "arn:aws:states:::aws-sdk:s3:getObject",
"End": true,
"ResultSelector": {
"myJson.$": "States.StringToJson($.Body)"
}
}
},
"Comment": "S3 -> JSON",
"TimeoutSeconds": 60
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dennis Traub |
Solution 2 | Traycho Ivanov |
Solution 3 | Kotaro Doi |