WebSocket is crucial for signaling in WebRTC. In this blog we dive deep to understand how WebSocket operates in WebRTC, starting from the TCP handshake to closeing connection. To follow along, you can download the trace from my GitHub repository here.
- TCP Handshake
WebSocket communication begins with a TCP handshake, establishing a reliable connection between the client and the server.

Note:- Here server is using port 7880 but port 80 or 443 is generally used.
- HTTP Upgrade Request
After the TCP handshake, the client sends an HTTP Upgrade request to transition to the WebSocket protocol. The server responds with a status code 101 Switching Protocols, confirming the upgrade.

- Persistent WebSocket Connection
Once the protocol switches, the WebSocket connection becomes a persistent, full-duplex channel for signaling. This is used to exchange Session Description Protocol (SDP) messages and Interactive Connectivity Establishment (ICE) candidates.


- Masked Messages
WebSocket clients (browsers) must send messages with a mask applied for security reasons. This is seen in the trace where messages from the client are marked as [MASKED].

- Media Transport
After signaling, media (audio, video, and data) is sent directly between peers using WebRTC protocols like DTLS and SRTP. WebSocket is no longer involved in the media path. - Termination Process
A WebSocket connection terminates through a process called a close handshake, where both the client and the server can initiate the termination. Here’s how it works:

- Step 1: Close Frame Sent
- Either the client or the server sends a Close frame (Opcode: 0x8) to indicate that it wants to close the connection.
- The Close frame may include a status code (2 bytes) and an optional reason (text string).
- Step 2: Acknowledgment
- The receiving party acknowledges the Close frame by sending its own Close frame back.
- This confirms that both sides agree to terminate the connection.
- Step 3: TCP Connection Closure
- Once both Close frames are exchanged, the underlying TCP connection is closed.
FAQ’s
- Why is WebSocket used for signaling in WebRTC?
WebSocket provides a persistent, real-time, full-duplex connection, perfect for exchanging SDP and ICE candidates required for WebRTC setup. - Does WebRTC specify a signaling protocol?
No, WebRTC doesn’t mandate a signaling protocol. WebSocket, SIP, or custom protocols can be used. - Can SIP work over WebSocket in WebRTC?
Yes, SIP over WebSocket (as per RFC 7118) is a common setup for WebRTC. - Can SIP be used without WebSocket in WebRTC?
Yes, SIP can work over other transports like TCP or UDP, but browsers do not natively support SIP over these protocols. - Is WebSocket supported by browsers because it uses HTTP/HTTPS?
Yes, WebSocket starts with an HTTP/HTTPS handshake, leveraging standard web infrastructure and ports (80/443), which are browser-friendly. - What is the advantage of using websockets without SIP?
Simplicity: You can create lightweight, custom protocols tailored to your application’s needs.
Flexibility: Not tied to SIP’s structure or semantics, which might be overkill for some applications.
Reduced Overhead: SIP includes many headers and mechanisms that may be unnecessary for certain applications.
Broader Use Cases: WebSocket is not limited to VoIP or RTC but can be used for any real-time data exchange. - What are different types of websocket connections?
Unencrypted WebSocket (ws://): Plain TCP, Transmits data in clear text, not secure. Best for non-sensitive use on trusted networks (e.g., ws://example.com/chat).
Encrypted WebSocket or WebSocket Secure (wss://): TLS Secured, Encrypts data for confidentiality and integrity, ideal for internet use (e.g., wss://example.com/chat). - TCP is layer 4 protocol, which layer protocol is websockets?
WebSocket is an application-layer protocol, which corresponds to Layer 7 of the OSI model like SIP.
Key Takeaways
• WebSocket is a vital part of WebRTC for signaling and connection setup.
• The process starts with a TCP handshake, followed by an HTTP upgrade.
• Persistent WebSocket connections enable the exchange of signaling data, as seen in the provided trace.
• After signaling, WebRTC handles media directly over peer-to-peer connections.
• WebSocket connections close gracefully. This occurs through a combination of WebSocket Close frames and underlying TCP FIN-ACK exchange.
Additional Resources
SIP:- https://de.wikipedia.org/wiki/Session_Initiation_Protocol
SDP:- https://en.wikipedia.org/wiki/Session_Description_Protocol
ICE:-https://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment
Akash Gupta
Senior VoIP Engineer and AI Enthusiast

AI and VoIP Blog
Thank you for visiting the Blog. Hit the subscribe button to receive the next post right in your inbox. If you find this article helpful don't forget to share your feedback in the comments and hit the like/clap button. This will helps in knowing what topics resonate with you, allowing me to create more that keeps you informed.
Thank you for reading, and stay tuned for more insights and guides!

Leave a Reply to Understanding LiveKit: A Deep Dive into WebRTC Communication – AI and VoIP BlogCancel reply