Skip to content

System Design - How do Websockets Work

Published: at 06:23 AM

Table of contents

Open Table of contents

Context

The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011. The current specification allowing web applications to use this protocol is known as WebSockets. It is a living standard maintained by the WHATWG and a successor to The WebSocket API from the W3C.

Neo Kim Story

Neo Kim likes to use a story to explain, which, in most cases, is a great idea and easier to remember. This time Neo used a story where two people would like to collaboratively work on a Google document.

The traditional HTTP request response cycle works like below.

For short communications, the overhead is expensive.

Then there is short polling. Imagine automatic refreshing every a few seconds to check progress.

The drawback is the trade-offs on how to select the right interval and extra overhead from connection requests and empty responses.

Then there is long polling.

The server load can be high since the connections are kept open, which consumes resources. The client has to make a separate request to send data to the server.

There is server-sent (server-side) events (SSE).

The server still has to keep the connection to the client open and the communication is unidirectional.

Neo mentioned Google Docs use web sockets. There is uncertainty about that (see reddit, toolingant, and stackoverflow discussions). The bidirectional communication nature is similar to telephone call.

WebSocket is distinct from HTTP used to serve most webpages. Although they are different, RFC 6455 states that WebSocket “is designed to work over HTTP ports 443 and 80 as well as to support HTTP proxies and intermediaries”, thus making it compatible with HTTP. To achieve compatibility, the WebSocket handshake uses the HTTP Upgrade header to change from the HTTP protocol to the WebSocket protocol.

The protocol includes the following steps throughout the lifecycle of communication.

  1. Opening handshake (HTTP request and response)
  2. Data and control (close, ping, pong) messages which can be composed of one or more frames. Frames enable messages with initial data available but complete length unknown.
  3. Closing handshake (two close frames) to close the connection

Nginx, Apache HTTP Server, Internet Information Services (IIS), lighttpd supports web sockets.

References

  1. Neo Kim Blog article
  2. wiki
  3. MDN doc
  4. CSDN article

Previous Post
LeetCode 2539 LintCode 3855 Count the Number of Good Subsequences
Next Post
LeetCode 2450 LintCode 3841 Number of Distinct Binary Strings After Applying Operations