High Availability Architecture: Ensuring Business Continuity
Detailed Analysis Section
1. Website Usability and Availability
For a website to be effective, it must be accessible to users. This means that the website’s usability is a critical aspect of its overall performance. One key metric for assessing website usability is its availability. The annual site unavailable time is a crucial indicator of a website’s availability, calculated as: (site unavailable time / time of year) × 100%. This metric helps evaluate the effectiveness of a website’s architecture design.
2. Website Unavailability Time (Downtime)
The website unavailability time, also known as downtime, is the time during which the website is not accessible to users. This can be caused by various factors, including faults in the system. The downtime is calculated as: fault repair time - found fault point (reports) Time.
3. Fault Classification
Faults can be classified into different levels based on their severity. These levels are:
- Level fault accident: Serious fault, the whole site is not available (100)
- A fault: Site visit is not smooth or core functions unavailable (20)
- Fault Type B: Non-core functions are not available, or the core functions of a small number of users cannot access (5)
- Class C malfunction: Other faults (1)
4. Fault Points Calculation
Fault points are calculated as: time to failure (min) * failure weight.
High Availability Architecture
A typical site design typically follows the basic hierarchical model shown below. In a large load site architecture, the division’s size will be smaller, more detailed, but still able to put these servers are usually divided into three layers: server applications layer, service layer, and data layer.
Server Applications Layer
The server applications layer typically consists of a cluster of servers that work together to provide services. When a server in the cluster becomes unavailable, the load-balancing device detects the issue and redirects requests to another available server in the cluster. This approach helps achieve high availability for applications.
Service Layer
The service layer is similar to the server applications layer, but it also uses a cluster approach to achieve high availability. The servers in this layer are accessed by the application layer through a distributed service invocation framework.
Data Layer
The data layer is special, as it stores data on a data server. To ensure against data loss and interrupted access, data is written synchronously replicated across multiple servers, providing data redundancy.
High Availability of Applications
The application layer, also known as the business logic layer, is responsible for the business logic of the application. This layer is stateless, making it relatively easy to achieve load balancing. In a clustered environment, session management is complex, and various approaches can be used, such as session copy, session binding, recording by using session cookie, and session server.
High Availability Service
Reusable business service modules provide basic public goods and services. These services are often deployed independently and distributed across multiple servers. They are stateless, making them suitable for high availability through load balancing. Additional high-availability service strategies include hierarchical management, timeout setting, asynchronous call, and service degradation.
Highly Available Data
To ensure high availability of data storage and data backup, failover mechanisms are used. The CAP principle (data persistence, data accessibility, data consistency) is crucial for ensuring data availability.
Highly Available Site Quality Assurance
Quality assurance for highly available sites involves the next major site release process. This process includes monitoring and evaluation of the site’s performance, scalability, and reliability.
Site Performance Monitoring
Monitoring the performance of a website is critical for optimizing site operation and maintenance and infrastructure design. Key metrics for monitoring include user behavior log collection, performance monitoring of servers (CPU, memory, etc.), and data monitoring operation (cache hit rate, average response delay time, number of messages transmitted per minute, total number of tasks to be processed).