'Two-Phase Starting of an AWS Step Function?

Scenario

I'm looking for a way to create an instance of a step function that waits for me to start it. Pseudo code would look like this.

StateMachine myStateMachine = new();

string executionArn = myStateMachine.ExecutionArn;

myStateMachine.Start();

Use Case

We need a way to reliably store the Execution ARN of a step function to a database. If we fail to write the Execution ARN to the database, we won't call the Start method and the step function should timeout. If the starting of the step function fails, the database operation would be rolled back.

These are the steps we plan to take

  1. A local transaction is started
  2. The step function instance is created, but not started
  3. The ExecutionArn of the created step function instance is recorded in a database
  4. The step function is started
  5. The local transaction is committed

Is there a simple way to start a step function like this?

Below is the result of some research I've done on this so far.

Manual Callbacks

Following information in this article https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/, I create an empty activity, then us this activity as the first step in the step function and add a timeout of 30 seconds to the activity step. The expectation was that if I didn't send a success to that activity task in the step function then the step would timeout and the workflow would fail, but it isn't doing that. Even though I set the timeout to 30 seconds, the step is not timing out. I'm guessing the timeout is about how long it waits for the step function to be able to schedule the activity, not how long it waits for the step function to move on from the activity step.

I've also considered using an SQS SendMessage step with Wait for callback checked and with a similar timeout, but that would require I create a throw-away SQS queue just to contain messages I never intend to read, plus I'm guessing the timeout functionality would work the same here as in an activity.

Wait State

There may be something I can do with a Wait state and parallel branches by following the accepted answer in this SO article: Does AWS Step Functions have a timeout feature?, but before I go down that route I want to see if something simpler can be done.

Global Timeout

I have found that step functions have a global timeout, and that is useful in this case if I use it in conjunction with a step that pauses until my application explicitly resumes it, but the global timeout is only useful if it can be reasonably low (like 20 minutes) and still have the step function viable for all use cases. For instance, if the maximum time it should take to run the step function is 2 or 3 minutes, then all is fine. But if I have another step in the step function that can take longer than 20 minutes then I can't use the global timer anymore or I have to start setting it to something very high, which I don't want to do.

Is there anything I can do here easily that I'm overlooking?

Thanks



Solution 1:[1]

Two-phase initialization of a step function cannot be done. We've worked around this by:

  1. Our Application: Writing a row in our DB to indicate the intent to start a step function
  2. Our Application: Start the step function
  3. Our Application: Record the ExecutionArn of the step function instance in the created row
  4. Step Function: Have the step function wait on step 1 indefinitely on an SQS step
  5. Our Application: Poll the SQS queue and either abort the step function or allow it to proceed to the next step by sending a callback to the SQS step. (This is the 2nd phase)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 omatase