Optimizing LinkedIn’s Instant Messaging System: A Technical Deep Dive
Foreword
In today’s fast-paced digital landscape, instant messaging systems are expected to handle a massive volume of concurrent connections without breaking a sweat. LinkedIn’s instant messaging system is no exception, with hundreds of thousands of persistent connections constantly being tuned to deliver optimal performance. Recently, the team behind this system shared their optimization process on the official LinkedIn blog, shedding light on the technology selection and tuning that went into making it a success.
Constituting the Underlying Technology
Basic Requirements
To push data to clients, the IM server technology must implement a persistent connection, rather than relying on the traditional “request-response” mode. To achieve this, LinkedIn chose Server-sent events (SSE), a technology that allows servers to push data streams to clients through a normal HTTP connection. SSE is characterized by its simplicity, good compatibility, and the fact that it only requires a normal HTTP connection between the client and server.
SSE and EventSource
The EventSource interface, supported by all modern browsers, iOS, and Android, makes SSE a viable choice. In contrast, WebSockets, which were also considered, pose compatibility issues. LinkedIn’s choice of SSE over WebSockets is a deliberate one, driven by the need for a reliable and compatible solution.
Development Language and Framework
LinkedIn’s instant messaging system is built using Java as the development language, with the Actor model and Akka being the chosen programming model and library, respectively. The Play framework is used, which integrates seamlessly with EventSource and Akka, making it an ideal choice for this project.
Optimization Process
Socket Maximum Number of Connections Limit
During performance testing, LinkedIn encountered a surprise: concurrent connections were not scaling as expected, with a maximum of 128 connections being supported. Further investigation revealed that this was due to a system kernel parameter, net.core.somaxconn, which controls the number of TCP connections allowed in the backlog. Adjusting this parameter in /etc/sysctl.conf resolved the issue.
Temporary Port Restrictions
Load balancers use temporary ports to connect to server nodes, which are released back to the system after each connection is terminated. However, this can lead to temporary port depletion, requiring special attention when choosing a load balancer.
File Descriptors Limit
After increasing pressure testing, LinkedIn encountered a SocketException: Too many files open, indicating that the file descriptor limit was not sufficient. Adjusting the file descriptor limit in /etc/security/limits.conf and /etc/sysctl.conf resolved this issue.
Summary
In conclusion, optimizing LinkedIn’s instant messaging system required a deep understanding of the underlying technology, including SSE, EventSource, and the Actor model. By choosing the right technology and tuning the system kernel parameters, file descriptor limit, and load balancer settings, LinkedIn was able to scale its instant messaging system to handle hundreds of thousands of concurrent connections.