Distributed Architecture: Unraveling Data Consistency

Distributed Architecture: Unraveling Data Consistency

In the realm of distributed systems, data consistency is a perpetual concern. As we navigate the complexities of multiple services, data sources, and applications, it’s natural to wonder: how do we ensure data consistency across the board?

Unified Terminology

Before we dive into the nitty-gritty of data consistency, let’s establish a common language. In the context of distributed architecture, we’ll use the following terminology:

  • Domain: a virtual classification that groups multiple systems together. For instance, online banking and mobile banking channels belong to the domain of electronics.
  • Single Application: a conventional term that refers to a system in a micro-service architecture. In this context, applications are designed to operate independently, with each application having its own set of peer processes (also known as application instances) that provide services.
  • Database: a database corresponds to each application, and in some cases, a single application may correspond to multiple databases.

Frequently Asked Questions

Now that we have a unified language, let’s tackle some common questions:

  1. Data consistency between systems: How do we ensure data consistency across multiple systems?
  2. Data consistency within a system: How do we ensure data consistency across multiple services within a single system?
  3. Data consistency between applications: How do we ensure data consistency between multiple applications that share a database?
  4. Data consistency with multiple databases: How do we ensure data consistency when multiple applications correspond to multiple databases?

Principles for Data Consistency

To address these questions, we need to analyze the situation from a business perspective, rather than a technical one. The principle is to identify the business needs and requirements, and then design the system accordingly.

The goal is to solve the problem of distributed data consistency, rather than focusing on the technical aspect of distributed transactions. With this principle in mind, let’s examine each of the four cases:

1. Data consistency between systems

To achieve data consistency between systems, we can use a TCC (Transactional Coordination) compensation mode, which is automatically invoked by the framework (Business Coordinator). This approach reduces human intervention and provides a retry mechanism in case of failure.

2. Data consistency within a system

For data consistency within a system, we can use the Huawei SAGA model, which establishes a shared transaction coordinator mode. This approach records event service calls and invokes TCC and compensation services at the right time.

3. Data consistency between applications

For data consistency between applications that share a database, we can use a shared transaction coordinator mode, similar to the Huawei SAGA model. This approach ensures data consistency by recording event service calls and invoking TCC and compensation services.

4. Data consistency with multiple databases

This is often an anti-pattern, as it requires a shared database. However, if necessary, we can use a multi-database and multi-application processing approach, similar to the one described in the previous case.

Conclusion

Data consistency is a critical concern in distributed architecture. By analyzing the situation from a business perspective and using a unified language, we can design systems that ensure data consistency across multiple services, applications, and databases. While there is no one-size-fits-all solution, we can use TCC compensation mode, shared transaction coordinator mode, and multi-database and multi-application processing approaches to achieve data consistency in various scenarios.