Data Jiangtang | Achieving High-Performance Data Export
In this article:
- Ready to Work: Understanding the Basics of Data Export
- Inquiry Response Process: How to Optimize Data Retrieval
- Export Process: A Step-by-Step Guide to Efficient Data Transfer
- Author: Jiang steps StarSource: Data Jiangtang
This article is approximately 1800 words long, and we recommend reading it in about 8 minutes.
When storing data in a file, a good calculation engine is essential for achieving better performance. However, as databases grow in size, analysis linked to the database can affect normal production system operation. To mitigate this issue, we export production data from the database to a file, leveraging the superior IO performance of the file system. This approach also allows for flexible compression technology, making it ideal for historical data that no longer undergoes changes.
The Challenge of Cold Export
Cold export refers to the process of exporting data to a file, suspending the query and analysis system until the export is complete. This approach is suitable when there are no inquiries carried out during the export process. For example, when every night is devoid of inquiries, we can export new historical data to the original file without affecting the system’s performance. However, building an index on the exported file can be time-consuming and may not be feasible during the export process.
The Heat Removal Conundrum
Heat removal, on the other hand, involves analyzing the system without shutting it down at any time. This approach requires the ability to respond to requests while exporting data. However, the file system’s inability to maintain an index during the export process can lead to performance issues. The database’s transactional consistency and write capabilities can resolve this issue, but it may impose a significant burden on the target database, especially when dealing with large amounts of data.
A Simple Solution: File Subdivision
To enjoy the benefits of high-performance file-based computing while supporting hot export, we can employ a simple strategy: file subdivision. By saving a new file every day, we can continue to use the export process without affecting the system’s performance. When a new day’s export is completed, we can open the new file for queries, and the old file can be maintained and updated in the background.
Ready to Work: Ensuring Seamless Data Transfer
To achieve seamless data transfer, we need to ensure that the export process does not interfere with the system’s normal operation. We can achieve this by copying the same data file into two parts, A and B, and using A for queries. We can then establish a table in the production database to track the currently used file, as well as the queries being executed. By reading from the data files and writing to the production database, we can ensure that the system remains responsive during the export process.
Inquiry Response Process: Optimizing Data Retrieval
The inquiry response process involves reading from the data files currently in use and writing to the production database. We can achieve this by generating a unique code for each query and recording the data file based on which the query is executed. By using the corresponding data file to query and obtain return data, we can ensure that the system remains responsive during the export process.
Export Process: A Step-by-Step Guide
The export process involves starting the export data with the currently recorded file, A or B, and waiting for all queries to complete. Once the queries are finished, we can export data to the file A and complete maintenance work on A. We can then record the current file as the new file to use and wait for all queries to complete on the new file, B. By adding data to B and completing maintenance work on B, we can ensure that the system remains responsive during the export process.
Achieving Real-Time T + 0 Queries
To achieve real-time T + 0 queries, we can use a hybrid operation between the file system and production databases. By leveraging the consistency of the database and the backup file, we can support hot export without interrupting the system’s normal operation. However, this approach requires careful design and implementation to avoid concurrent queries and write errors.
Conclusion
In conclusion, achieving high-performance data export requires a deep understanding of the inquiry response process and the export process. By leveraging file subdivision, we can enjoy the benefits of high-performance file-based computing while supporting hot export. By employing a hybrid operation between the file system and production databases, we can achieve real-time T + 0 queries without interrupting the system’s normal operation.