Tencent’s Big Data Platform: A Deep Dive into the Architecture and Challenges
The Current State of Tencent’s Data Platform
Tencent’s data platform is a behemoth, comprising over 200 units and divided into four key areas: basic platform, core applications, product packaging, and quality control. The platform is responsible for managing Tencent’s vast amounts of data, which includes:
- 829 million active QQ IM accounts
- 521 million active QQ intelligent terminal accounts
- 438 million active “Micro-letters and WeChat” accounts
- 645 million active “QQ space” accounts
- 497 million active QQ space intelligent terminal accounts
- 88 million registered accounts for value-added services
The sheer scale of Tencent’s data is staggering, with the highest number of daily access messages reaching 1 trillion and daily data access reaching 200TB, with concurrent sorting service interfaces reaching 10,000.
Platform Architecture Design Ideas
Tencent’s platform architecture is designed around three key principles: data openness, specialization, and cost-performance. The platform is designed to be:
- Data Open: Providing a self-service platform for data analysts to access data, reduce labor costs, and meet the rapidly growing demand for data insights.
- Specialized: Offering integrated, automated data services development platforms, which provide valuable services to businesses and enable the rapid development of data applications.
- Cost-Performance: Optimizing data storage and computing solutions, removing double counting and storage, and constructing large-scale clusters to enhance platform capabilities and reduce costs.
Platform Build Process
Tencent’s platform build process involves supporting large data internet access and processing massive data in real-time and offline. The system comprises a core TDW (Tencent Distributed Warehouse), TRC (Tencent Real-time Computing), and TDbank (Tencent Data Bank).
- TDW is a self-developed distributed data warehouse, built on top of open-source Hadoop + Hive architecture, and has undergone significant optimization to achieve commercial database compatibility and scalability.
- TRC is a real-time computing platform, built on top of the Storm community, and has been rewritten to improve stability and efficiency.
- TDbank is a real-time data access and distribution system, which enables the collection of business data in near real-time and provides a heterogeneous data source adaptation.
TOD and MTA
TOD (Tencent Open Data) is a large-scale computing clusters platform, providing data collection, processing, and self-service capabilities. Its advantages include:
- No need to purchase physical devices
- No worries about data expansion
- Only need to develop business logic, with deployment, operation, and monitoring handled by the platform
MTA (Tencent Cloud Analysis) is a professional mobile data applications operating platform, supporting iOS and Android. Its advantages include:
- Real-time multi-dimensional analysis
- User portrait analysis
- Cloud publishing
- Real-time second-level analysis
- Operation and maintenance monitoring
- Game models analysis
Challenges Faced
Tencent’s biggest challenge is the technical level, where they must quickly catch up and lead the technological updates to cope with the rapid changes in the Internet business and the increasing depth of large data applications.
Hardware Resources
Tencent uses custom hardware, with over 8,000 units of PC servers, each equipped with 2T * 12 STAT hard drives, 64GB memory, and dual 32-core CPU. The hardware resources are managed through the GAIA scheduling system, which allocates resources to the TDW, TRC, and other systems.
Future Challenges
Tencent’s future challenges include:
- How to make the opening wider coverage so that more people share large data services
- How to find more user pain points and make their services better fit the needs of more users
- How to contribute their experience and advice in the process of policy-making
- How to respond positively and cooperate with the national implementation of related policies
In conclusion, Tencent’s big data platform is a complex and sophisticated system, designed to support the company’s vast amounts of data and meet the rapidly growing demand for data insights. However, the platform also faces significant challenges, including the technical level, data openness, and cost-performance.