'Saga Choreography implementation problems

I am designing and developing a microservice platform based on the specifications of http://microservices.io/

The entire framework integrates through socket thus removing the overhead of multiple HTTP requests (like most REST APIs).

A service registry host receives the registry of multiple microservice hosts, each microservice is responsible for a domain of the business. Another host we call a router (or API gateway) is responsible for exposing the microservices for consumption by third parties.

We will use the structure of Sagas (in choreography style) to distribute the requisitions, so we have some doubts:

  • Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
  • Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
  • If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?

I think the main point is that in this router and microservice structure, who is responsible for building the Sagas and propagating their events.



Solution 1:[1]

The article Patterns for Microservices — Sync vs. Async does a great job defining many of the terms used here and has animated gifs demonstrating sync vs. async and orchestrated vs. choreographed as well as hybrid setups.

I know the OP answered his own question for his use case, but I want to try and address the questions raised a bit more generally in lieu of the linked article.

Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events?

To use a more general term, a process manager is an orchestrator. A concrete implementation of this may involve a stateful actor that orchestrates a workflow, keeping track of the progress in some way. Since a saga is workflow itself (composed of both forward and compensating actions), it would be the job of the process manager to keep track of the state the saga until completion (success or failure). This typically involves the actor sending synchronous* calls to services waiting for some result before going to the next step. Parallel operations can of course be introduced and what not, but the point is that this actor dictates the progression of the saga.

This is fundamentally different from the choreography model. With this model there is no central actor keeping track of the state of a saga, but rather the saga progresses implicitly via the events that each step emits. Arguably, this is a more pure case of an event-driven model since there is no coordination.

That said, the challenge with this model is observing the state at any given point in time. With the orchestration model above, in theory, each actor could be queried for the state of the saga. In this choreographed model, we don't have this luxury, so in practice a correlation ID is added to every message corresponding to (in this case) a saga. If the messages are queryable in some way (the event bus supports it or through some other storage means), then the messages corresponding to a saga could be queried and the saga state could be reconstructed.. (effectively an event sourced modeled).

Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?

This is an interesting question by itself and one that I have been thinking about quite a lot. The easiest and default answer would be.. hard code the saga plans and map them to the incoming message types. E.g. message A triggers plan X, message B triggers plan Y, etc.

However, I have been thinking about what a control plane might look like that manages these plans and provides the mechanism for pushing changes dynamically to message handlers and/or orchestrators dynamically. The two specific use cases in mind are changes in authorization policies or dynamically adding new steps to a plan.

If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?

The way I have approached this is to include references to the large data if these are objects such as a file or something. For data that are inherently streams themselves, a parallel channel could be referenced that a consumer could read from once it receives the message. I think the important distinction here is to decouple thinking about the messages driving the workflow from where the data is physically materialized which depends on the data representation.

Solution 2:[2]

For microservices, every microservice should be responsible for its domain business.

Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)

All events are not passed to the next microservice, but are published, then all microservices interested in the events should subscribe to them.

If there is rollback, you should consider orchestration.

Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?

The microservice who publish the event will certainly know how to build it. There are no chain of events, because every microservice interested in the event will subscribe it separately.

If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?

Only publish the data others may be interested, not all. In most cases, the data are not large, and message queue can handle them efficiently

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Byron Ruth
Solution 2 yedf