HTML 5

Draft Recommendation — 7 July 2008

7. Communication

7.1 Event definitions

Messages in cross-document messaging and in server-sent DOM events, use the message event.

The following interface is defined for this event:

interface MessageEvent : Event {
  readonly attribute DOMString data;
  readonly attribute DOMString origin;
  readonly attribute DOMString lastEventId;
  readonly attribute Window source;
  void initMessageEvent(in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMString dataArg, in DOMString originArg, in DOMString lastEventIdArg, in Window sourceArg);
  void initMessageEventNS(in DOMString namespaceURI, in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMString dataArg, in DOMString originArg, in DOMString lastEventIdArg, in Window sourceArg);
};

The initMessageEvent() and initMessageEventNS() methods must initialise the event in a manner analogous to the similarly-named methods in the DOM3 Events interfaces. [DOM3EVENTS]

The data attribute represents the message being sent.

The origin attribute represents, in cross-document messaging, the origin of the document that sent the message (typically the scheme, hostname, and port of the document, but not its path or fragment identifier).

The lastEventId attribute represents, in server-sent dom events, the last event ID string of the event source.

The source attribute represents, in cross-document messaging, the Window from which the message came.

7.2 Server-sent DOM events

This section describes a mechanism for allowing servers to dispatch DOM events into documents that expect it. The event-source element provides a simple interface to this mechanism.

7.2.1 The RemoteEventTarget interface

Any object that implements the EventTarget interface must also implement the RemoteEventTarget interface.

interface RemoteEventTarget {
  void addEventSource(in DOMString src);
  void removeEventSource(in DOMString src);
};

When the addEventSource(src) method is invoked, the user agent must resolve the URL specified in src, and if that succeeds, add the resulting absolute URL to the list of event sources for that object. The same URL can be registered multiple times. If the URL fails to resolve, then the user agent must raise a SYNTAX_ERR exception.

When the removeEventSource(src) method is invoked, the user agent must resolve the URL specified in src, and if that succeeds, remove the resulting absolute URL from the list of event sources for that object. If the same URI has been registered multiple times, removing it must remove only one instance of that URI for each invocation of the removeEventSource() method. If the URL fails to resolve, the user agent does nothing.

7.2.2 Connecting to an event stream

Each object implementing the EventTarget and RemoteEventTarget interfaces has a list of event sources that are registered for that object.

When a new URI is added to this list, the user agent should, as soon as all currently executing scripts (if any) have finished executing, and if the specified URL isn't removed from the list before they do so, fetch the resource identified by that URL.

When an event source is removed from the list of event sources for an object, if that resource is still being fetched, then the relevant connection must be closed.

Since connections established to remote servers for such resources are expected to be long-lived, UAs should ensure that appropriate buffering is used. In particular, while line buffering may be safe if lines are defined to end with a single U+000A LINE FEED character, block buffering or line buffering with different expected line endings can cause delays in event dispatch.

Each event source in the list must have associated with it the following:

In general, the semantics of the transport protocol specified by the URLs for the event sources must be followed, including HTTP caching rules.

For HTTP connections, the Accept header may be included; if included, it must contain only formats of event framing that are supported by the user agent (one of which must be text/event-stream, as described below).

Other formats of event framing may also be supported in addition to text/event-stream, but this specification does not define how they are to be parsed or processed.

Such formats could include systems like SMS-push; for example servers could use Accept headers and HTTP redirects to an SMS-push mechanism as a kind of protocol negotiation to reduce network load in GSM environments.

User agents should use the Cache-Control: no-cache header in requests to bypass any caches for requests of event sources.

If the event source's last event ID string is not the empty string, then a Last-Event-ID HTTP header must be included with the request, whose value is the value of the event source's last event ID string.

For connections to domains other than the document's domain, the semantics of the Access-Control HTTP header must be followed. [ACCESSCONTROL]

HTTP 200 OK responses with a Content-Type header specifying the type text/event-stream that are either from the document's domain or explicitly allowed by the Access-Control HTTP headers must be processed line by line as described below.

For the purposes of such successfully opened event streams only, user agents should ignore HTTP cache headers, and instead assume that the resource indicates that it does not wish to be cached.

If such a resource completes loading (i.e. the entire HTTP response body is received or the connection itself closes), the user agent should request the event source resource again after a delay equal to the reconnection time of the event source.

HTTP 200 OK responses that have a Content-Type other than text/event-stream (or some other supported type), and HTTP responses whose Access-Control headers indicate that the resource are not to be used, must be ignored and must prevent the user agent from refetching the resource for that event source.

HTTP 201 Created, 202 Accepted, 203 Non-Authoritative Information, and 206 Partial Content responses must be treated like HTTP 200 OK responses for the purposes of reopening event source resources. They are, however, likely to indicate an error has occurred somewhere and may cause the user agent to emit a warning.

HTTP 204 No Content, and 205 Reset Content responses must be treated as if they were 200 OK responses with the right MIME type but no content, and should therefore cause the user agent to refetch the resource after a delay equal to the reconnection time of the event source.

HTTP 300 Multiple Choices responses should be handled automatically if possible (treating the responses as if they were 302 Found responses pointing to the appropriate resource), and otherwise must be treated as HTTP 404 responses.

HTTP 301 Moved Permanently responses must cause the user agent to reconnect using the new server specified URL instead of the previously specified URL for all subsequent requests for this event source. (It doesn't affect other event sources with the same URL unless they also receive 301 responses, and it doesn't affect future sessions, e.g. if the page is reloaded.)

HTTP 302 Found, 303 See Other, and 307 Temporary Redirect responses must cause the user agent to connect to the new server-specified URL, but if the user agent needs to again request the resource at a later point, it must return to the previously specified URL for this event source.

HTTP 304 Not Modified responses should be handled like HTTP 200 OK responses, with the content coming from the user agent cache. A new request should then be made after a delay equal to the reconnection time of the event source.

HTTP 305 Use Proxy, HTTP 401 Unauthorized, and 407 Proxy Authentication Required should be treated transparently as for any other subresource.

Any other HTTP response code not listed here should cause the user agent to stop trying to process this event source.

DNS errors must be considered fatal, and cause the user agent to not open any connection for that event source.

For non-HTTP protocols, UAs should act in equivalent ways.

7.2.3 Parsing an event stream

This event stream format's MIME type is text/event-stream.

The event stream format is (in pseudo-BNF):

<stream>          ::= <bom>? <event>*
<event>           ::= [ <comment> | <field> ]* <newline>
<comment>         ::= <colon> <any-char>* <newline>
<field>           ::= <name-char>+ [ <colon> <space>? <any-char>* ]? <newline>

# characters:
<bom>             ::= a single U+FEFF BYTE ORDER MARK character
<space>           ::= a single U+0020 SPACE character (' ')
<newline>         ::= a U+000D CARRIAGE RETURN character
                      followed by a U+000A LINE FEED character
                      | a single U+000D CARRIAGE RETURN character
                      | a single U+000A LINE FEED character
                      | the end of the file
<colon>           ::= a single U+003A COLON character (':')
<name-char>       ::= a single Unicode character other than
                      U+003A COLON, U+000D CARRIAGE RETURN and U+000A LINE FEED
<any-char>        ::= a single Unicode character other than
                      U+000D CARRIAGE RETURN and U+000A LINE FEED

Event streams in this format must always be encoded as UTF-8.

Lines must be separated by either a U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair, a single U+000A LINE FEED (LF) character, or a single U+000D CARRIAGE RETURN (CR) character.

7.2.4 Interpreting an event stream

Bytes or sequences of bytes that are not valid UTF-8 sequences must be interpreted as the U+FFFD REPLACEMENT CHARACTER.

One leading U+FEFF BYTE ORDER MARK character must be ignored if any are present.

The stream must then be parsed by reading everything line by line, with a U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair, a single U+000A LINE FEED (LF) character, a single U+000D CARRIAGE RETURN (CR) character, and the end of the file being the four ways in which a line can end.

When a stream is parsed, a data buffer and an event name buffer must be associated with it. They must be initialized to the empty string

Lines must be processed, in the order they are received, as follows:

If the line is empty (a blank line)

Dispatch the event, as defined below.

If the line starts with a U+003A COLON character (':')

Ignore the line.

If the line contains a U+003A COLON character (':') character

Collect the characters on the line before the first U+003A COLON character (':'), and let field be that string.

Collect the characters on the line after the first U+003A COLON character (':'), and let value be that string. If value starts with a single U+0020 SPACE character, remove it from value.

Process the field using the steps described below, using field as the field name and value as the field value.

Otherwise, the string is not empty but does not contain a U+003A COLON character (':') character

Process the field using the steps described below, using the whole line as the field name, and the empty string as the field value.

Once the end of the file is reached, the user agent must dispatch the event one final time, as defined below.

The steps to process the field given a field name and a field value depend on the field name, as given in the following list. Field names must be compared literally, with no case folding performed.

If the field name is "event"

Set the event name buffer the to field value.

If the field name is "data"

If the data buffer is not the empty string, then append a single U+000A LINE FEED character to the data buffer. Append the field value to the data buffer.

If the field name is "id"

Set the event stream's last event ID to the field value.

If the field name is "retry"

If the field value consists of only characters in the range U+0030 DIGIT ZERO ('0') U+0039 DIGIT NINE ('9'), then interpret the field value as an integer in base ten, and set the event stream's reconnection time to that integer. Otherwise, ignore the field.

Otherwise

The field is ignored.

When the user agent is required to dispatch the event, then the user agent must act as follows:

  1. If the data buffer is an empty string, set the data buffer and the event name buffer to the empty string and abort these steps.

  2. If the event name buffer is not the empty string but is also not a valid NCName, set the data buffer and the event name buffer to the empty string and abort these steps.

  3. Otherwise, create an event that uses the MessageEvent interface, with the event name message, which does not bubble, is cancelable, and has no default action. The data attribute must be set to the value of the data buffer, the origin attribute must be set to the origin of the event stream's URL, the lastEventId attribute must be set to the last event ID string of the event source, and the source attribute must be set to null.

  4. If the event name buffer has a value other than the empty string, change the type of the newly created event to equal the value of the event name buffer.

  5. Set the data buffer and the event name buffer to the empty string.

  6. Dispatch the newly created event at the RemoteEventTarget object to which the event stream is registered.

If an event doesn't have an "id" field, but an earlier event did set the event source's last event ID string, then the event's lastEventId field will be set to the value of whatever the last seen "id" field was.

The following event stream, once followed by a blank line:

data: YHOO
data: -2
data: 10

...would cause an event message with the interface MessageEvent to be dispatched on the event-source element, whose data attribute would contain the string YHOO\n-2\n10 (where \n represents a newline).

This could be used as follows:

<event-source src="http://stocks.example.com/ticker.php"
              onmessage="var data = event.data.split('\n'); updateStocks(data[0], data[1], data[2]);">

...where updateStocks() is a function defined as:

function updateStocks(symbol, delta, value) { ... }

...or some such.

The following stream contains four blocks. The first block has just a comment, and will fire nothing. The second block has two fields with names "data" and "id" respectively; an event will be fired for this block, with the data "first event", and will then set the last event ID to "1" so that if the connection died between this block and the next, the server would be sent a Last-Event-ID header with the value "1". The third block fires an event with data "second event", and also has an "id" field, this time with no value, which resets the last event ID to the empty string (meaning no Last-Event-ID header will now be sent in the event of a reconnection being attempted). Finally the last block just fires an event with the data "third event". Note that the last block doesn't have to end with a blank line, the end of the stream is enough to trigger the dispatch of the last event.

: test stream

data: first event
id: 1

data: second event
id

data: third event

The following stream fires just one event:

data

data
data

data:

The first and last blocks do nothing, since they do not contain any actual data (the data buffer remains at the empty string, and so nothing gets dispatched). The middle block fires an event with the data set to a single newline character.

The following stream fires two identical events:

data:test

data: test

This is because the space after the colon is ignored if present.

7.2.5 Notes

Legacy proxy servers are known to, in certain cases, drop HTTP connections after a short timeout. To protect against such proxy servers, authors can include a comment line (one starting with a ':' character) every 15 seconds or so.

Authors wishing to relate event source connections to each other or to specific documents previously served might find that relying on IP addresses doesn't work, as individual clients can have multiple IP addresses (due to having multiple proxy servers) and individual IP addresses can have multiple clients (due to sharing a proxy server). It is better to include a unique identifier in the document when it is served and then pass that identifier as part of the URL in the src attribute of the event-source element.

Implementations that support HTTP's per-server connection limitation might run into trouble when opening multiple pages from a site if each page has an event-source to the same domain. Authors can avoid this using the relatively complex mechanism of using unique domain names per connection, or by allowing the user to enable or disable the event-source functionality on a per-page basis.

7.3 Web sockets

To enable Web applications to maintain bidirectional communications with their originating server, this specification introduces the WebSocket interface.

This interface does not allow for raw access to the underlying network. For example, this interface could not be used to implement an IRC client without proxying messages through a custom server.

7.3.1 Introduction

This section is non-normative.

An introduction to the client-side and server-side of using the direct connection APIs.

7.3.2 The WebSocket interface

interface WebSocket {
  // constructor
  [Constructor] WebSocket(in DOMString url);
  readonly attribute DOMString URL;

  // ready state
  const unsigned short CONNECTING = 0;
  const unsigned short OPEN = 1;
  const unsigned short CLOSED = 2;
  readonly attribute int readyState;

  // networking
           attribute EventListener onopen;
           attribute EventListener onread;
           attribute EventListener onclosed;
  void send(in DOMString data);
  void disconnect();
};

WebSocket objects must also implement the EventTarget interface. [DOM3EVENTS]

The WebSocket constructor takes one argument, url, which specifies the URL to which to connect. When a WebSocket object is created, the UA must parse this argument and verify that the URL parses without failure and has a <scheme> component whose value is either "ws" or "wss", when compared case-insensitively. If it does, it has, and it is, then the user agent must asynchronously establish a Web Socket connection to url. Otherwise, the constructor must raise a SYNTAX_ERR exception.

The URL attribute must return the value that was passed to the constructor.

The readyState attribute represents the state of the connection. It can have the following values:

CONNECTING (numeric value 0)
The connection has not yet been established.
OPEN (numeric value 1)
The Web Socket connection is established and communication is possible.
CLOSED (numeric value 2)
The connection has been closed or could not be opened.

When the object is created its readyState must be set to CONNECTING (0).

The send(data) method transmits data using the connection. If the connection is not established (readyState is not OPEN), it must raise an INVALID_STATE_ERR exception. If the connection is established, then the user agent must send data using the Web Socket.

The disconnect() method must close the Web Socket connection or connection attempt, if any. If the connection is already closed, it must do nothing. Closing the connection causes a close event to be fired and the readyState attribute's value to change, as described below.

7.3.3 WebSocket Events

The open event is fired when the Web Socket connection is established.

The close event is fired when the connection is closed (whether by the author, calling the disconnect() method, or by the server, or by a network error).

No information regarding why the connection was closed is passed to the application in this version of this specification.

The read event is fired when when data is received for a connection. It uses the WebSocketReadEvent interface:

interface WebSocketReadEvent : Event {
  readonly attribute DOMString data;
  void initWebSocketReadEvent(in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMString dataArg);
  void initWebSocketReadEventNS(in DOMString namespaceURI, in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMString dataArg);
};

The initWebSocketReadEvent() and initWebSocketReadEventNS() methods must initialise the event in a manner analogous to the similarly-named methods in the DOM3 Events interfaces. [DOM3EVENTS]

The data attribute represents the data that was received.

When the user agent is to fire a read event with data data, the user agent must dispatch an event whose name is read, with no namespace, which does not bubble but is cancelable, which uses the WebSocketReadEvent interface, and whose data attribute is set to data, at the given object.

Events that would be fired during script execution (e.g. between the WebSocket object being created — and thus the connection being established — and the current script completing; or, during the execution of a read event handler) must be buffered, and those events queued up and each one individually fired after the script has completed.


The following are the event handler DOM attributes that must be supported by objects implementing the WebSocket interface:

onopen

Must be invoked whenever an open event is targeted at or bubbles through the WebSocket object.

onread

Must be invoked whenever an read event is targeted at or bubbles through the WebSocket object.

onclose

Must be invoked whenever an close event is targeted at or bubbles through the WebSocket object.

7.3.4 The Web Socket protocol

7.3.4.1. Client-side requirements

This section only applies to user agents.

7.3.4.1.1. Handshake

When the user agent is to establish a Web Socket connection to url, it must run the following steps, in the background (without blocking scripts or anything like that):

  1. Resolve the URL url.

  2. If the <scheme> component of the resulting absolute URL is "ws", set secure to false; otherwise, the <scheme> component is "wss", set secure to true.

  3. Let host be the value of the <host> component in the resulting absolute URL.

  4. If the resulting absolute URL has a <port> component, then let port be that component's value; otherwise, if secure is false, let port be 81, otherwise let port be 815.

  5. Let resource name be the value of the <path> component (which might be empty) in the resulting absolute URL.

  6. If resource name is the empty string, set it to a single character U+002F SOLIDUS (/).

  7. If the resulting absolute URL has a <query> component, then append a single 003F QUESTION MARK (?) character to resource name, followed by the value of the <query> component.

  8. If the user agent is configured to use a proxy to connect to port port, then connect to that proxy and ask it to open a TCP/IP connection to the host given by host and the port given by port.

    For example, if the user agent uses an HTTP proxy, then if it was to try to connect to port 80 on server example.com, it might send the following lines to the proxy server:

    CONNECT example.com HTTP/1.1

    If there was a password, the connection might look like:

    CONNECT example.com HTTP/1.1
    Proxy-authorization: Basic ZWRuYW1vZGU6bm9jYXBlcyE=

    Otherwise, if the user agent is not configured to use a proxy, then open a TCP/IP connection to the host given by host and the port given by port.

  9. If the connection could not be opened, then fail the Web Socket connection and abort these steps.

  10. If secure is true, perform a TLS handshake over the connection. If this fails (e.g. the server's certificate could not be verified), then fail the Web Socket connection and abort these steps. Otherwise, all further communication on this channel must run through the encrypted tunnel. [RFC2246]

  11. Send the following bytes to the remote side (the server):

    47 45 54 20

    Send the resource name value, encoded as US-ASCII.

    Send the following bytes:

    20 48 54 54 50 2f 31 2e  31 0d 0a 55 70 67 72 61
    64 65 3a 20 57 65 62 53  6f 63 6b 65 74 0d 0a 43
    6f 6e 6e 65 63 74 69 6f  6e 3a 20 55 70 67 72 61
    64 65 0d 0a

    The string "GET ", the path, " HTTP/1.1", CRLF, the string "Upgrade: WebSocket", CRLF, and the string "Connection: Upgrade", CRLF.

  12. Send the following bytes:

    48 6f 73 74 3a 20

    Send the host value, encoded as US-ASCII, if it represents a host name (and not an IP address).

    Send the following bytes:

    0d 0a

    The string "Host: ", the host, and CRLF.

  13. Send the following bytes:

    4f 72 69 67 69 6e 3a 20

    Send the ASCII serialization of the origin of the script that invoked the WebSocket constructor.

    Send the following bytes:

    0d 0a

    The string "Origin: ", the origin, and CRLF.

  14. If the client has any authentication information or cookies that would be relevant to a resource with a URL that has a scheme of http if secure is false and https if secure is true and is otherwise identical to url, then HTTP headers that would be appropriate for that information should be sent at this point. [RFC2616] [RFC2109] [RFC2965]

    Each header must be on a line of its own (each ending with a CR LF sequence). For the purposes of this step, each header must not be split into multiple lines (despite HTTP otherwise allowing this with continuation lines).

    For example, if the server had a username and password that applied to that URL, it could send:

    Authorization: Basic d2FsbGU6ZXZl
  15. Send the following bytes:

    0d 0a

    Just a CRLF (a blank line).

  16. Read the first 85 bytes from the server. If the connection closes before 85 bytes are received, or if the first 85 bytes aren't exactly equal to the following bytes, then fail the Web Socket connection and abort these steps.

    48 54 54 50 2f 31 2e 31  20 31 30 31 20 57 65 62
    20 53 6f 63 6b 65 74 20  50 72 6f 74 6f 63 6f 6c
    20 48 61 6e 64 73 68 61  6b 65 0d 0a 55 70 67 72
    61 64 65 3a 20 57 65 62  53 6f 63 6b 65 74 0d 0a
    43 6f 6e 6e 65 63 74 69  6f 6e 3a 20 55 70 67 72
    61 64 65 0d 0a

    The string "HTTP/1.1 101 Web Socket Protocol Handshake", CRLF, the string "Upgrade: WebSocket", CRLF, the string "Connection: Upgrade", CRLF.

    What if the response is a 401 asking for credentials?

  17. Let headers be a list of name-value pairs, initially empty.

  18. Header: Let name and value be empty byte arrays.

  19. Read a byte from the server.

    If the connection closes before this byte is received, then fail the Web Socket connection and abort these steps.

    Otherwise, handle the byte as described in the appropriate entry below:

    If the byte is 0x0d (ASCII CR)
    If the name byte array is empty, then jump to the headers processing step. Otherwise, fail the Web Socket connection and abort these steps.
    If the byte is 0x0a (ASCII LF)
    Fail the Web Socket connection and abort these steps.
    If the byte is 0x3a (ASCII ":")
    Move on to the next step.
    If the byte is in the range 0x41 .. 0x5a (ASCII "A" .. "Z")
    Append a byte whose value is the byte's value plus 0x20 to the name byte array and redo this step for the next byte.
    Otherwise
    Append the byte to the name byte array and redo this step for the next byte.

    This reads a header name, terminated by a colon, converting upper-case ASCII letters to lowercase, and aborting if a stray CR or LF is found.

  20. Read a byte from the server.

    If the connection closes before this byte is received, then fail the Web Socket connection and abort these steps.

    Otherwise, handle the byte as described in the appropriate entry below:

    If the byte is 0x20 (ASCII space)
    Ignore the byte and move on to the next step.
    Otherwise
    Treat the byte as described by the list in the next step, then move on to that next step for real.

    This skips past a space character after the colon, if necessary.

  21. Read a byte from the server.

    If the connection closes before this byte is received, then fail the Web Socket connection and abort these steps.

    Otherwise, handle the byte as described in the appropriate entry below:

    If the byte is 0x0d (ASCII CR)
    Move on to the next step.
    If the byte is 0x0a (ASCII LF)
    Fail the Web Socket connection and abort these steps.
    Otherwise
    Append the byte to the name byte array and redo this step for the next byte.

    This reads a header value, terminated by a CRLF.

  22. Read a byte from the server.

    If the connection closes before this byte is received, or if the byte is not a 0x0a byte (ASCII LF), then fail the Web Socket connection and abort these steps.

    This skips past the LF byte of the CRLF after the header.

  23. Append an entry to the headers list that has the name given by the string obtained by interpreting the name byte array as a UTF-8 byte stream and the value given by the string obtained by interpreting the value byte array as a UTF-8 byte stream.

  24. Return to the header step above.

  25. Headers processing: If there is not exactly one entry in the headers list whose name is "websocket-origin", or if there is not exactly one entry in the headers list whose name is "websocket-location", or if there are any entries in the headers list whose names are the empty string, then fail the Web Socket connection and abort these steps.

  26. Handle each entry in the headers list as follows:

    If the entry's name is "websocket-origin"
    Assume the value is a URL. If the value does not have the same origin as the script that invoked the WebSocket constructor, then fail the Web Socket connection and abort these steps.
    If the entry's name is "websocket-location"
    If the value is not exactly equal to the absolute URL that resulted from the first step of ths algorithm, then fail the Web Socket connection and abort these steps.
    If the entry's name is "set-cookie" or "set-cookie2" or another cookie-related header name
    Handle the cookie as defined by the appropriate spec, except pretend that the resource's URL actually has a scheme of http if secure is false and https if secure is true and is otherwise identical to url. [RFC2109] [RFC2965]
    Any other name
    Ignore it.
  27. Change the readyState attribute's value to OPEN (1).

  28. Fire a simple event named open at the WebSocket object.

  29. The Web Socket connection is established. Now the user agent must send and receive to and from the connection as described in the next section.

To fail the Web Socket connection, the user agent must close the Web Socket connection, and may report the problem to the user (which would be especially useful for developers). However, user agents must not convey the failure information to the script in a way distinguishable from the Web Socket being closed normally.

7.3.4.1.2. Data framing

Once a Web Socket connection is established, the user agent must run through the following state machine for the bytes sent by the server.

  1. Try to read a byte from the server. Let frame type be that byte.

    If no byte could be read because the Web Socket connection is closed, then abort.

  2. Handle the frame type byte as follows:

    If the high-order bit of the frame type byte is set (i.e. if frame type anded with 0x80 returns 0x80)

    Run these steps. If at any point during these steps a read is attempted but fails because the Web Socket connection is closed, then abort.

    1. Let length be zero.

    2. Length: Read a byte, let b be that byte.

    3. Let bv be integer corresponding to the low 7 bits of b (the value you would get by anding b with 0x7f).

    4. Multiply length by 128, add bv to that result, and store the final result in length.

    5. If the high-order bit of b is set (i.e. if b anded with 0x80 returns 0x80), then return to the step above labeled length.

    6. Read length bytes.

    7. Discard the read bytes.

    If the high-order bit of the frame type byte is not set (i.e. if frame type anded with 0x80 returns 0x00)

    Run these steps. If at any point during these steps a read is attempted but fails because the Web Socket connection is closed, then abort.

    1. Let raw data be an empty byte array.

    2. Data: Read a byte, let b be that byte.

    3. If b is not 0xff, then append b to raw data and return to the previous step (labeled data).

    4. Interpret raw data as a UTF-8 string, and store that string in data.

    5. If frame type is 0x00, fire a read event at the WebSocket object with data data. Otherwise, discard the data.

  3. Return to the first step to read the next byte.

If the user agent is faced with content that is too large to be handled appropriately, then it must fail the Web Socket connection.


Once a Web Socket connection is established, the user agent must use the following steps to send data using the Web Socket:

  1. Send a 0x00 byte to the server.

  2. Encode data using UTF-8 and send the resulting byte stream to the server.

  3. Send a 0xff byte to the server.

7.3.4.2. Server-side requirements

This section only applies to servers.

7.3.4.2.1. Minimal handshake

This section describes the minimal requirements for a server-side implementation of Web Sockets.

Listen on a port for TCP/IP. Upon receiving a connection request, open a connection and send the following bytes back to the client:

48 54 54 50 2f 31 2e 31  20 31 30 31 20 57 65 62
20 53 6f 63 6b 65 74 20  50 72 6f 74 6f 63 6f 6c
20 48 61 6e 64 73 68 61  6b 65 0d 0a 55 70 67 72
61 64 65 3a 20 57 65 62  53 6f 63 6b 65 74 0d 0a
43 6f 6e 6e 65 63 74 69  6f 6e 3a 20 55 70 67 72
61 64 65 0d 0a

Send the string "WebSocket-Origin" followed by a U+003A COLON (":") followed by the ASCII serialization of the origin from which the server is willing to accept connections, followed by a CRLF pair (0x0d 0x0a).

For instance:

WebSocket-Origin: http://example.com

Send the string "WebSocket-Location" followed by a U+003A COLON (":") followed by the URL of the Web Socket script, followed by a CRLF pair (0x0d 0x0a).

For instance:

WebSocket-Location: ws://example.com:80/demo

Send another CRLF pair (0x0d 0x0a).

Read (and discard) data from the client until four bytes 0x0d 0x0a 0x0d 0x0a are read.

If the connection isn't dropped at this point, go to the data framing section.

7.3.4.2.2. Handshake details

The previous section ignores the data that is transmitted by the client during the handshake.

The data sent by the client consists of a number of fields separated by CR LF pairs (bytes 0x0d 0x0a).

The first field consists of three tokens separated by space characters (byte 0x20). The middle token is the path being opened. If the server supports multiple paths, then the server should echo the value of this field in the initial handshake, as part of the URL given on the WebSocket-Location line (after the appropriate scheme and host).

The remaining fields consist of name-value pairs, with the name part separated from the value part by a colon and a space (bytes 0x3a 0x20). Of these, several are interesting:

Host (bytes 48 6f 73 74)

The value gives the hostname that the client intended to use when opening the Web Socket. It would be of interest in particular to virtual hosting environments, where one server might serve multiple hosts, and might therefore want to return different data.

The right host has to be output as part of the URL given on the WebSocket-Location line of the handshake described above, to verify that the server knows that it is really representing that host.

Origin (bytes 4f 72 69 67 69 6e)

The value gives the scheme, hostname, and port (if it's not the default port for the given scheme) of the page that asked the client to open the Web Socket. It would be interesting if the server's operator had deals with operators of other sites, since the server could then decide how to respond (or indeed, whether to respond) based on which site was requesting a connection.

If the server supports connections from more than one origin, then the server should echo the value of this field in the initial handshake, on the WebSocket-Origin line.

Other fields

Other fields can be used, such as "Cookie" or "Authorization", for authentication purposes.

7.3.4.2.3. Data framing

This section only describes how to handle content that this specification allows user agents to send (text). It doesn't handle any arbitrary content in the same way that the requirements on user agents defined earlier handle any content including possible future extensions to the protocols.

The server should run through the following steps to process the bytes sent by the client:

  1. Read a byte from the client. Assuming everything is going according to plan, it will be a 0x00 byte. Behaviour for the server is undefined if the byte is not 0x00.

  2. Let raw data be an empty byte array.

  3. Data: Read a byte, let b be that byte.

  4. If b is not 0xff, then append b to raw data and return to the previous step (labeled data).

  5. Interpret raw data as a UTF-8 string, and apply whatever server-specific processing should occur for the resulting string.

  6. Return to the first step to read the next byte.


The server should run through the followin steps to send strings to the client:

  1. Send a 0x00 byte to the client to indicate the start of a string.

  2. Encode data using UTF-8 and send the resulting byte stream to the client.

  3. Send a 0xff byte to the client to indicate the end of the message.

7.3.4.3. Closing the connection

To close the Web Socket connection, either the user agent or the server closes the TCP/IP connection. There is no closing handshake. Whether the user agent or the server closes the connection, it is said that the Web Socket connection is closed.

Servers may close the Web Socket connection whenever desired.

User agents should not close the Web Socket connection arbitrarily.

When the Web Socket connection is closed, the readyState attribute's value must be changed to CLOSED (2), and the user agent must fire a simple event named close at the WebSocket object.

7.4 Cross-document messaging

Web browsers, for security and privacy reasons, prevent documents in different domains from affecting each other; that is, cross-site scripting is disallowed.

While this is an important security feature, it prevents pages from different domains from communicating even when those pages are not hostile. This section introduces a messaging system that allows documents to communicate with each other regardless of their source domain, in a way designed to not enable cross-site scripting attacks.

7.4.1 Processing model

When a script invokes the postMessage(message, targetOrigin) method on a Window object, the user agent must follow these steps:

  1. If the value of the targetOrigin argument is not a single U+002A ASTERISK character ("*"), and parsing it as a URL fails, then throw a SYNTAX_ERR exception and abort the overall set of steps.

  2. Return from the postMessage() method, but asynchronously continue running these steps.

  3. Wait for the Window object on which the method was invoked to have finished executing any pending scripts.

  4. If the targetOrigin argument has a value other than a single literal U+002A ASTERISK character ("*"), and the active document of the browsing context of the Window object on which the method was invoked does not have the same origin as targetOrigin, then abort these steps silently.

  5. Create an event that uses the MessageEvent interface, with the event name message, which does not bubble, is cancelable, and has no default action. The data attribute must be set to the value passed as the message argument to the postMessage() method, the origin attribute must be set to the Unicode serialization of the origin of the script that invoked the method, the lastEventId attribute must be set to the empty string, and the source attribute must be set to the Window object of the default view of the browsing context for which the Document object with which the script is associated is the active document.

  6. Dispatch the event created in the previous step at the Window object on which the method was invoked.

Authors should check the origin attribute to ensure that messages are only accepted from domains that they expect to receive messages from. Otherwise, bugs in the author's message handling code could be exploited by hostile sites.

Authors should not use the wildcard keyword ("*") in the targetOrigin argument in messages that contain any confidential information, as otherwise there is no way to guarantee that the message is only delivered to the recipient to which it was intended.

For example, if document A contains an object element that contains document B, and script in document A calls postMessage() on document B, then a message event will be fired on that element, marked as originating from document A. The script in document A might look like:

var o = document.getElementsByTagName('object')[0];
o.contentWindow.postMessage('Hello world', 'http://b.example.org/');

To register an event handler for incoming events, the script would use addEventListener() (or similar mechanisms). For example, the script in document B might look like:

document.addEventListener('message', receiver, false);
function receiver(e) {
  if (e.origin == 'http://example.com') {
    if (e.data == 'Hello world') {
      e.source.postMessage('Hello', e.origin);
    } else {
      alert(e.data);
    }
  }
}

This script first checks the domain is the expected domain, and then looks at the message, which it either displays to the user, or responds to by sending a message back to the document which sent the message in the first place.

The integrity of this API is based on the inability for scripts of one origin to post arbitrary events (using dispatchEvent() or otherwise) to objects in other origins (those that are not the same).

Implementors are urged to take extra care in the implementation of this feature. It allows authors to transmit information from one domain to another domain, which is normally disallowed for security reasons. It also requires that UAs be careful to allow access to certain properties but not others.