'Is there any way to read the contents of an S3 file from an AWS Step Function?

I have a particular workflow where I want to pass a list of 500 json strings from a lambda function to a step function (stepFunction1), and then iterate over the list in that step function's map state. From there, I want to pass each item in the list to a separate step function (stepFunction2) where additional work will be done.

My problem is that my list of 500 json strings exceeds the AWS service limit when passed to stepFunction1. I have tried splitting up the list into several smaller segments, but this leads to several invocations of stepFunction1 running concurrently, which I can't have due to other limitations. My next idea was to try and store the list of json strings on an S3 bucket, access it from stepFunction1, and then iterate through it from there. Is there any way to achieve this? Is it possible to read a file in S3 from an AWS state machine? I'm a bit stumped here.



Solution 1:[1]

One solution is to store the items in an Amazon DynamoDB table and directly access them from AWS Step Functions.

Here's an example how to retrieve an item from DynamoDB:

"Read Next Message from DynamoDB": {
  "Type": "Task",
  "Resource": "arn:aws:states:::dynamodb:getItem",
  "Parameters": {
    "TableName": "MyTable",
    "Key": {
      "MessageId": {"S.$": "$.List[0]"}
    }
  },
  "ResultPath": "$.DynamoDB",
  "Next": "Do something"
}

You can find more information about calling DynamoDB APIs with Step Functions in the documentation.

Solution 2:[2]

Step functions works very well with AWS Lambda functions, you could design a nice workflow easily.

You could read S3 from a lambda. In the end your lambda could work separately and be part of a step function.

I would advise you first create a single lambda function, read and process the s3 file and later try with stepfunction if it fits in your scenario.

Solution 3:[3]

You can use GetObject S3 API. It can read your JSON file stored in S3 as a string under Body entity of the state output, so you can then convert it to JSON at ResultSelector with Intrinsic function "States.JsonToString", like "myJson.$": "States.StringToJson($.Body)".

The code example could be:

{
  "StartAt": "GetObject",
  "States": {
    "GetObject": {
      "Type": "Task",
      "Parameters": {
        "Bucket": "<YOUR S3 Bucket Name>",
        "Key": "<YOUR JSON File Name>"
      },
      "Resource": "arn:aws:states:::aws-sdk:s3:getObject",
      "End": true,
      "ResultSelector": {
        "myJson.$": "States.StringToJson($.Body)"
      }
    }
  },
  "Comment": "S3 -> JSON",
  "TimeoutSeconds": 60
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dennis Traub
Solution 2 Traycho Ivanov
Solution 3 Kotaro Doi