Using SAGAS to maintain data consistency in Microservices

During the past age when Monoliths were ruling the world, we had the ACID to take care about data consistency and management. In those days what we used to do was create a transaction object and perform all our required transaction within that transaction scope. It guarantees the Atomicity, Consistency, Isolation, Durability of all database actions we did within the scope of that transactions.

But that day and age has long gone now. Now most of the back-end systems are moving towards microservices and more or less distributed architecture pattern. If you implement the microservices architecture pattern correctly, you would be having one database per service. Following is an example of microservices architecture pattern.

The question is can we use ACID properties across multiple databases which are associated with each microservice?

Imagine you are designing a system for an online store like an e commerce online store. Assume that customer has a credit limit. The system should check for particular customer’s credit limit before placing an order for that particular customer.

Example :

This is not a challenge in a system that designed using Monolithic architecture. Because we have the ACID properties to manage all database transactions. If we do our transactions properly ( we used to do it even without knowing it much in those good old days :D ), the ACID properties ensures the integrity of the transaction irrespective to the number of requests that are made to the API. Following is a very basic example of usage of transactions.

So, in a Monolithic application we can guarantees that concurrent transactions for the customer will be serialized. But how we can achieve this in our microservices back end ? Remember we have two independent services to maintain customers ( credit limits ) and orders.

We would be having a loosely coupled encapsulated services and data management as in the above diagram in our Order and Customer services.

There wont be any issue if there is only request coming into these services. But that is not the reality. What happens when there are 10 or 20 other services are calling your service simultaneously? In a situation like this, how can we maintain the data consistency across multiple databases?

So how we can we overcome this problem ? How about using a 2 phase commit ( 2PC ) . Well.. it is sounds like a good solution but 2PC in a distributed architecture inherently contains following problems.

  • The 2PC coordinator is a single point of a failure
  • It is chatty and creates large network traffic O(4n) messages and O(n²) with retries
  • Reduced throughput due to locks
  • Not supported by many No-SQL databases
  • CAP theorem ( the 2PC impacts the availability )

So what can we do?

SAGAS to the rescue

So the SAGA mechanism comes to the rescue for distributed systems.

The principle behind the SAGA is fairly simple where you get rid of distributed transactions agent(s) and come up with set of self coordinated local transactions. This idea first introduced by Hector Garcaa Molrna and Kenneth Salem ( Princeton university ) in 1987.

So if we come back to our example , the first step is to create the order saga. So, once it is done the 2nd step is triggered, which is reserve credit in the customer service. Once the customer service reserves the credit, the order service would analyze the reserved credit and approves the order. Then the customer service can update the credit limit of the particular customer upon order status update.

So easy eh ? Nope .. 👎 the problems happen when you want to rollback. Now each individual database would execute its own private transaction. There is no automatic rollbacks and if you are in a middle of a transaction you need to undone everything before that manually in respective services. So lets examine how SAGAS can solve this problem.

Solution : Every Transaction Ti have a compensating transaction Ci

What they suggested in the original paper ( as a remedy to the problem that we discussed earlier ) , is to have compensating transactions per each transaction. This compensating transaction would contains what should be undone when there is a requirement of a rollback.

The C1, C2 are the execution blocks which do have the knowledge of what to in order to rollback a particular step of the distributed transaction. So this solves the problems but it makes the API design more complicated. Following are the options that we have when designing the API.

1. Send response when SAGA completes. With this method, the response contains the outcome of the SAGA. This is more or less a wait call and it can lead to a problem of reduce the availability of the service.

2. Send the response immediately ( Async ). In this method the service response do not contains the result of the saga. Client needs to poll or get notified about the result of the saga. Can use event based mechanism to notify the client.

Now the question arise. Who would manage this SAGAS? There are main two ways we can do it .

  1. Choreography : This is more or less a distributed decision making engine. The downside of this method is, it would lead into a high coupling problem in between sagas and services. So you know that high coupling is a big NO NO when its comes to distributed computing !
  2. Orchestration : This is more or less a centralized decision making. So isnt centralization is bad when it comes to distributed computing ? yes it is, but in this instance the orchestrator would be the main service that is responsible of the transaction ( in our example the order service ). Hence it is order service’s ( orchestrator in this instance ) to make sure that this particular request is executed without any issues. Hence Orchestration is the preferred solution to this.

The above picture depicts the sagas orchestration. This orchestrator can be implement in two different ways.

  1. Implicit Orchestrator
  2. Explicit Orchestrator

Implicit Orchestrater

The implicit orchestrator is simpler to implement. We can build this into an existing domain object as well. One of the problems of that is it leads to violates the SRP. Because apart from performing order related functionalities now it has to manage the orchestration responsibilities as well. This is not that good because there might be cyclic dependencies between services via events.

The following diagram depicts an event based implicit orchestrator. We can use any distributed message queue to implement the event driven architecture.

Explicit Orchestrator

The explicit orchestrator would not violate the SRP. It would add an extra dedicated component to the order service to manage the sagas.

Following diagram depicts the usage of explicit orchestrator.

Summary

Data consistency across multiple services is one of the major challenges in microservices and other distributed architecture patterns. This was not a problem for a monolithic architecture pattern because of the ACID properties when it comes to database transactions.SAGS is one one way that we can over come this problem in distributed patterns.

There are few saga frameworks been build for .Net and Java such as Eventuate Tram and NSaga. Stay tuned for another post on how to implement saga in a microservices architecture pattern.

Thanks for reading !

CTO @ ZorroSign | Seasoned Software Architect | Expertise in AI/ML , Blockchain , Distributed Systems and IoT | Lecturer | Speaker | Blogger