This section is non-normative.
An introduction to marking up a document.
Every XML and HTML document in an HTML UA is represented by a
Document object. [DOM3CORE]
Document objects are assumed to be XML documents unless they are flagged as being HTML documents when they are created. Whether a document is
an HTML document or an XML document affects the
behavior of certain APIs, as well as a few CSS rendering rules. [CSS21]
A Document object created by the createDocument() API on the DOMImplementation
object is initially an XML
document, but can be made into an HTML document by calling document.open() on it.
All Document objects (in user agents implementing this
specification) must also implement the HTMLDocument interface, available using
binding-specific methods. (This is the case whether or not the document in
question is an HTML document
or indeed whether it contains any HTML
elements at all.) Document objects must also implement
the document-level interface of any other namespaces found in the document
that the UA supports. For example, if an HTML implementation also supports
SVG, then the Document object must implement HTMLDocument and SVGDocument.
Because the HTMLDocument interface is now obtained
using binding-specific casting methods instead of simply being the primary
interface of the document object, it is no longer defined as inheriting
from Document.
interface HTMLDocument {
// resource metadata management
[PutForwards=href] readonly attribute Location location;
readonly attribute DOMString URL;
attribute DOMString domain;
readonly attribute DOMString referrer;
attribute DOMString cookie;
readonly attribute DOMString lastModified;
readonly attribute DOMString compatMode;
attribute DOMString charset;
readonly attribute DOMString characterSet;
readonly attribute DOMString defaultCharset;
readonly attribute DOMString readyState;
// DOM tree accessors
attribute DOMString title;
attribute DOMString dir;
attribute HTMLElement body;
readonly attribute HTMLCollection images;
readonly attribute HTMLCollection embeds;
readonly attribute HTMLCollection plugins;
readonly attribute HTMLCollection links;
readonly attribute HTMLCollection forms;
readonly attribute HTMLCollection anchors;
readonly attribute HTMLCollection scripts;
NodeList getElementsByName(in DOMString elementName);
NodeList getElementsByClassName(in DOMString classNames);
// dynamic markup insertion
attribute DOMString innerHTML;
HTMLDocument open();
HTMLDocument open(in DOMString type);
HTMLDocument open(in DOMString type, in DOMString replace);
Window open(in DOMString url, in DOMString name, in DOMString features);
Window open(in DOMString url, in DOMString name, in DOMString features, in boolean replace);
void close();
void write(in DOMString text);
void writeln(in DOMString text);
// user interaction
Selection getSelection();
readonly attribute Element activeElement;
boolean hasFocus();
attribute boolean designMode;
boolean execCommand(in DOMString commandId);
boolean execCommand(in DOMString commandId, in boolean showUI);
boolean execCommand(in DOMString commandId, in boolean showUI, in DOMString value);
boolean queryCommandEnabled(in DOMString commandId);
boolean queryCommandIndeterm(in DOMString commandId);
boolean queryCommandState(in DOMString commandId);
boolean queryCommandSupported(in DOMString commandId);
DOMString queryCommandValue(in DOMString commandId);
readonly attribute HTMLCollection commands;
};
Since the HTMLDocument
interface holds methods and attributes related to a number of disparate
features, the members of this interface are described in various different
sections.
User agents must raise a security exception
whenever any of the members of an HTMLDocument object are accessed by
scripts whose effective script origin is not the
same as the
Document's effective script origin.
The URL attribute
must return the document's address.
The referrer attribute must
return either the address of the active document of the source
browsing context at the time the navigation was started (that is, the
page which navigated the browsing context to the current document), or the
empty string if there is no such originating page, or if the UA has been
configured not to report referrers in this case, or if the navigation was
initiated for a hyperlink with a noreferrer keyword.
In the case of HTTP, the referrer DOM attribute will match the Referer (sic) header that was sent when fetching the
current page.
Typically user agents are configured to not report referrers
in the case where the referrer uses an encrypted protocol and the current
page does not (e.g. when navigating from an https:
page to an http: page).
The cookie
attribute represents the cookies of the resource.
On getting, if the sandboxed
origin browsing context flag is set on the browsing context of the document, the user agent
must raise a security exception. Otherwise, it
must return the same string as the value of the Cookie HTTP header it would include if fetching the
resource indicated by the document's
address over HTTP, as per RFC 2109 section 4.3.4
or later specifications. [RFC2109] [RFC2965]
On setting, if the sandboxed origin browsing
context flag is set on the browsing context
of the document, the user agent must raise a security
exception. Otherwise, the user agent must act as it would when
processing cookies if it had just attempted to fetch the document's
address over HTTP, and had received a response
with a Set-Cookie header whose value was the specified value,
as per RFC 2109 sections 4.3.1, 4.3.2, and 4.3.3 or later specifications.
[RFC2109] [RFC2965]
Since the cookie attribute is accessible across frames,
the path restrictions on cookies are only a tool to help manage which
cookies are sent to which parts of the site, and are not in any way a
security feature.
The lastModified attribute,
on getting, must return the date and time of the Document's
source file's last modification, in the user's local timezone, in the
following format:
All the numeric components above, other than the year, must be given as two digits in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE representing the number in base ten, zero-padded if necessary.
The Document's source file's last modification date and
time must be derived from relevant features of the networking protocols
used, e.g. from the value of the HTTP Last-Modified
header of the document, or from metadata in the file system for local
files. If the last modification date and time are not known, the attribute
must return the string 01/01/1970 00:00:00.
A Document is always set to one of three modes: no quirks mode, the default; quirks
mode, used typically for legacy documents; and limited quirks mode, also known as "almost standards"
mode. The mode is only ever changed from the default by the HTML parser, based on the presence, absence, or value
of the DOCTYPE string.
The compatMode DOM attribute
must return the literal string "CSS1Compat" unless
the document has been set to quirks mode by the HTML parser, in which case it must instead return the
literal string "BackCompat".
As far as parsing goes, the quirks I know of are:
Documents have an associated character encoding. When a Document
object is created, the document's character
encoding must be initialized to UTF-16. Various algorithms during page
loading affect this value, as does the charset setter. [IANACHARSET]
The charset DOM attribute must,
on getting, return the preferred MIME name of the document's character encoding. On setting, if the
new value is an IANA-registered alias for a character encoding, the document's character encoding must be set to that
character encoding. (Otherwise, nothing happens.)
The characterSet DOM
attribute must, on getting, return the preferred MIME name of the document's character encoding.
The defaultCharset DOM
attribute must, on getting, return the preferred MIME name of a character
encoding, possibly the user's default encoding, or an encoding associated
with the user's current geographical location, or any arbitrary encoding
name.
Each document has a current document readiness.
When a Document object is created, it must have its current document readiness set to the string
"loading". Various algorithms during page loading affect this value. When
the value is set, the user agent must fire a simple
event called readystatechanged at the
Document object.
The readyState DOM attribute
must, on getting, return the current document
readiness.
The html element of a document is
the document's root element, if there is one and it's an html element, or null otherwise.
The head element of a document is
the first head element that is a child of
the html element, if there is one,
or null otherwise.
The title element of a document is
the first title element in the document
(in tree order), if there is one, or null otherwise.
The title attribute must, on
getting, run the following algorithm:
If the root element is an svg
element in the "http://www.w3.org/2000/svg"
namespace, and the user agent supports SVG, then the getter must return
the value that would have been returned by the DOM attribute of the same
name on the SVGDocument interface.
Otherwise, it must return a concatenation of the data of all the child
text nodes of the title element, in tree order, or
the empty string if the title
element is null.
On setting, the following algorithm must be run:
If the root element is an svg
element in the "http://www.w3.org/2000/svg"
namespace, and the user agent supports SVG, then the setter must defer
to the setter for the DOM attribute of the same name on the
SVGDocument interface. Stop the algorithm here.
title element is null
and the head element is null, then
the attribute must do nothing. Stop the algorithm here.
title element is null,
then a new title element must be
created and appended to the head
element.
title
element (if any) must all be removed.
Text node whose data is the new value being
assigned must be appended to the title
element.
The title attribute on the HTMLDocument interface should shadow the
attribute of the same name on the SVGDocument interface when
the user agent supports both HTML and SVG.
The body element of a document is the first
child of the html element that is
either a body element or a
frameset element. If there is no such element, it is null. If
the body element is null, then when the specification requires that events
be fired at "the body element", they must instead be fired at the
Document object.
The body
attribute, on getting, must return the body
element of the document (either a body element, a frameset element, or
null). On setting, the following algorithm must be run:
body or
frameset element, then raise a
HIERARCHY_REQUEST_ERR exception and abort these steps.
replaceChild() method had been called
with the new value and the
incumbent body element as its two arguments respectively, then abort
these steps.
The images
attribute must return an HTMLCollection rooted at the
Document node, whose filter matches only img elements.
The embeds
attribute must return an HTMLCollection rooted at the
Document node, whose filter matches only embed elements.
The plugins attribute must
return the same object as that returned by the embeds attribute.
The links
attribute must return an HTMLCollection rooted at the
Document node, whose filter matches only a elements with href attributes and area elements with href attributes.
The forms
attribute must return an HTMLCollection rooted at the
Document node, whose filter matches only form
elements.
The anchors attribute must
return an HTMLCollection
rooted at the Document node, whose filter matches only
a elements with name attributes.
The scripts attribute must
return an HTMLCollection
rooted at the Document node, whose filter matches only
script elements.
The getElementsByName(name) method a string name, and must return a live NodeList
containing all the a, applet, button, form,
iframe,
img, input, map, meta,
object,
select, and textarea elements in that document
that have a name attribute whose value is
equal to the name
argument.
The getElementsByClassName(classNames) method takes a string that
contains an unordered set of unique space-separated
tokens representing classes. When called, the method must return a
live NodeList object containing all the elements in the
document that have all the classes specified in that argument, having
obtained the classes by splitting a string on spaces. If there are no tokens specified
in the argument, then the method must return an empty
NodeList.
The getElementsByClassName()
method on the HTMLElement
interface must return a live NodeList with the nodes that the
HTMLDocument getElementsByClassName() method
would return when passed the same argument(s), excluding any elements that
are not descendants of the HTMLElement object on which the method was
invoked.
HTML, SVG, and MathML elements define which classes they are in by
having an attribute in the per-element partition with the name class containing a space-separated list of classes to
which the element belongs. Other specifications may also allow elements in
their namespaces to be labeled as being in specific classes. UAs must not
assume that all attributes of the name class for
elements in any namespace work in this way, however, and must not assume
that such attributes, when used as global attributes, label other elements
as being in specific classes.
Given the following XHTML fragment:
<div id="example"> <p id="p1" class="aaa bbb"/> <p id="p2" class="aaa ccc"/> <p id="p3" class="bbb ccc"/> </div>
A call to
document.getElementById('example').getElementsByClassName('aaa')
would return a NodeList with the two paragraphs
p1 and p2 in it.
A call to getElementsByClassName('ccc bbb') would
only return one node, however, namely p3. A call to
document.getElementById('example').getElementsByClassName('bbb ccc ')
would return the same thing.
A call to getElementsByClassName('aaa,bbb') would return
no nodes; none of the elements above are in the "aaa,bbb" class.
The dir attribute on the HTMLDocument interface is defined along
with the dir content
attribute.
Elements, attributes, and attribute values in HTML are defined (by this
specification) to have certain meanings (semantics). For example, the
ol element represents an ordered list, and
the lang attribute
represents the language of the content.
Authors must not use elements, attributes, and attribute values for purposes other than their appropriate intended semantic purpose.
For example, the following document is non-conforming, despite being syntactically correct:
<!DOCTYPE html>
<html lang="en-GB">
<head> <title> Demonstration </title> </head>
<body>
<table>
<tr> <td> My favourite animal is the cat. </td> </tr>
<tr>
<td>
—<a href="http://example.org/~ernest/"><cite>Ernest</cite></a>,
in an essay from 1992
</td>
</tr>
</table>
</body>
</html>
...because the data placed in the cells is clearly not tabular data
(and the cite element mis-used). A
corrected version of this document might be:
<!DOCTYPE html> <html lang="en-GB"> <head> <title> Demonstration </title> </head> <body> <blockquote> <p> My favourite animal is the cat. </p> </blockquote> <p> —<a href="http://example.org/~ernest/">Ernest</a>, in an essay from 1992 </p> </body> </html>
This next document fragment, intended to represent the heading of a corporate site, is similarly non-conforming because the second line is not intended to be a heading of a subsection, but merely a subheading or subtitle (a subordinate heading for the same section).
<body> <h1>ABC Company</h1> <h2>Leading the way in widget design since 1432</h2> ...
The header element should be used in
these kinds of situations:
<body> <header> <h1>ABC Company</h1> <h2>Leading the way in widget design since 1432</h2> </header> ...
Through scripting and using other mechanisms, the values of attributes, text, and indeed the entire structure of the document may change dynamically while a user agent is processing it. The semantics of a document at an instant in time are those represented by the state of the document at that instant in time, and the semantics of a document can therefore change over time. User agents must update their presentation of the document as this occurs.
HTML has a progress
element that describes a progress bar. If its "value" attribute is
dynamically updated by a script, the UA would update the rendering to show
the progress changing.
The nodes representing HTML elements in the DOM must implement, and expose to scripts, the interfaces listed for them in the relevant sections of this specification. This includes HTML elements in XML documents, even when those documents are in another context (e.g. inside an XSLT transform).
Elements in the DOM represent things; that is, they have intrinsic meaning, also known as semantics.
For example, an ol element
represents an ordered list.
The basic interface, from which all the HTML
elements' interfaces inherit, and which must be used by elements that
have no additional requirements, is the HTMLElement interface.
interface HTMLElement : Element { // DOM tree accessors NodeList getElementsByClassName(in DOMString classNames); // dynamic markup insertion attribute DOMString innerHTML; // metadata attributes attribute DOMString id; attribute DOMString title; attribute DOMString lang; attribute DOMString dir; attribute DOMString className; readonly attribute DOMTokenList classList; readonly attribute DOMStringMap dataset; // user interaction attribute boolean irrelevant; void click(); void scrollIntoView(); void scrollIntoView(in boolean top); attribute long tabIndex; void focus(); void blur(); attribute boolean draggable; attribute DOMString contentEditable; readonly attribute DOMString isContentEditable; attribute HTMLMenuElement contextMenu; // styling readonly attribute CSSStyleDeclaration style; // data templates attribute DOMString template; readonly attribute HTMLDataTemplateElement templateElement; attribute DOMString ref; readonly attribute Node refNode; attribute DOMString registrationMark; readonly attribute DocumentFragment originalContent; // event handler DOM attributes attribute EventListener onabort; attribute EventListener onbeforeunload; attribute EventListener onblur; attribute EventListener onchange; attribute EventListener onclick; attribute EventListener oncontextmenu; attribute EventListener ondblclick; attribute EventListener ondrag; attribute EventListener ondragend; attribute EventListener ondragenter; attribute EventListener ondragleave; attribute EventListener ondragover; attribute EventListener ondragstart; attribute EventListener ondrop; attribute EventListener onerror; attribute EventListener onfocus; attribute EventListener onkeydown; attribute EventListener onkeypress; attribute EventListener onkeyup; attribute EventListener onload; attribute EventListener onmessage; attribute EventListener onmousedown; attribute EventListener onmousemove; attribute EventListener onmouseout; attribute EventListener onmouseover; attribute EventListener onmouseup; attribute EventListener onmousewheel; attribute EventListener onresize; attribute EventListener onscroll; attribute EventListener onselect; attribute EventListener onstorage; attribute EventListener onsubmit; attribute EventListener onunload; };
The HTMLElement interface holds
methods and attributes related to a number of disparate features, and the
members of this interface are therefore described in various different
sections of this specification.
The following attributes are common to and may be specified on all HTML elements (even those not defined in this specification):
class
contenteditable
contextmenu
dir
draggable
id
irrelevant
lang
ref
registrationmark
style
tabindex
template
title
In addition, the following event handler content attributes may be specified on any HTML element:
onabort
onbeforeunload
onblur
onchange
onclick
oncontextmenu
ondblclick
ondrag
ondragend
ondragenter
ondragleave
ondragover
ondragstart
ondrop
onerror
onfocus
onkeydown
onkeypress
onkeyup
onload
onmessage
onmousedown
onmousemove
onmouseout
onmouseover
onmouseup
onmousewheel
onresize
onscroll
onselect
onstorage
onsubmit
onunload
Also, custom data
attributes (e.g. data-foldername or data-msgid) can be specified on any HTML
element, to store custom data specific to the page.
In HTML documents, elements in the HTML namespace may have an xmlns attribute specified, if, and only if, it has the
exact value "http://www.w3.org/1999/xhtml". This does not
apply to XML documents.
In HTML, the xmlns attribute has
absolutely no effect. It is basically a talisman. It is allowed merely to
make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in no namespace, not
the "http://www.w3.org/2000/xmlns/" namespace like namespace
declaration attributes in XML do.
In XML, an xmlns attribute is part of
the namespace declaration mechanism, and an element cannot actually have
an xmlns attribute in no namespace specified.
id attributeThe id attribute represents
its element's unique identifier. The value must be unique in the subtree
within which the element finds itself and must contain at least one
character. The value must not contain any space characters.
If the value is not the empty string, user agents must associate the
element with the given value (exactly, including any space characters) for
the purposes of ID matching within the subtree the element finds itself
(e.g. for selectors in CSS or for the getElementById() method
in the DOM).
Identifiers are opaque strings. Particular meanings should not be
derived from the value of the id attribute.
This specification doesn't preclude an element having multiple IDs, if
other mechanisms (e.g. DOM Core methods) can set an element's ID in a way
that doesn't conflict with the id attribute.
The id DOM attribute must reflect the id content attribute.
title attributeThe title attribute
represents advisory information for the element, such as would be
appropriate for a tooltip. On a link, this could be the title or a
description of the target resource; on an image, it could be the image
credit or a description of the image; on a paragraph, it could be a
footnote or commentary on the text; on a citation, it could be further
information about the source; and so forth. The value is text.
If this attribute is omitted from an element, then it implies that the
title attribute of the
nearest ancestor HTML
element with a title attribute set is also relevant to this
element. Setting the attribute overrides this, explicitly stating that the
advisory information of any ancestors is not relevant to this element.
Setting the attribute to the empty string indicates that the element has
no advisory information.
If the title
attribute's value contains U+000A LINE FEED (LF) characters, the content
is split into multiple lines. Each U+000A LINE FEED (LF) character
represents a line break.
Some elements, such as link and
abbr, define additional semantics for the
title attribute beyond
the semantics described above.
The title DOM
attribute must reflect the title content attribute.
lang (HTML only) and xml:lang (XML only) attributesThe lang attribute
specifies the primary language for the element's
contents and for any of the element's attributes that contain text. Its
value must be a valid RFC 3066 language code, or the empty string. [RFC3066]
The xml:lang
attribute is defined in XML. [XML]
If these attributes are omitted from an element, then it implies that the language of this element is the same as the language of the parent element. Setting the attribute to the empty string indicates that the primary language is unknown.
The lang attribute may
be used on elements of HTML documents. Authors must
not use the lang
attribute in XML documents.
The xml:lang
attribute may be used on elements of XML
documents. Authors must not use the xml:lang attribute in HTML
documents.
To determine the language of a node, user agents must look at the
nearest ancestor element (including the element itself if the node is an
element) that has an xml:lang attribute set or is an HTML element and has a
lang attribute set. That
attribute specifies the language of the node.
If both the xml:lang attribute and the lang attribute are set on an
element, user agents must use the xml:lang attribute, and the lang attribute must be ignored for the purposes of determining
the element's language.
If no explicit language is given for the root element, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language. In the absence of any language information, the default value is unknown (the empty string).
User agents may use the element's language to determine proper processing or rendering (e.g. in the selection of appropriate fonts or pronunciations, or for dictionary selection).
The lang DOM attribute
must reflect the lang content attribute.
xml:base attribute (XML only)The xml:base
attribute is defined in XML Base. [XMLBASE]
The xml:base
attribute may be used on elements of XML
documents. Authors must not use the xml:base attribute in HTML
documents.
dir attributeThe dir attribute
specifies the element's text directionality. The attribute is an enumerated attribute with the keyword ltr mapping to the state ltr, and the keyword
rtl mapping to the state rtl. The attribute
has no defaults.
If the attribute has the state ltr, the element's directionality is left-to-right. If the attribute has the state rtl, the element's directionality is right-to-left. Otherwise, the element's directionality is the same as its parent element, or ltr if there is no parent element.
The processing of this attribute depends on the presentation layer. For example, CSS 2.1 defines a mapping from this attribute to the CSS 'direction' and 'unicode-bidi' properties, and defines rendering in terms of those properties.
The dir DOM attribute on
an element must reflect the dir content attribute of that element, limited to only known values.
The dir DOM
attribute on HTMLDocument objects
must reflect the dir content attribute of the
html element, if any, limited to only
known values. If there is no such element, then the attribute must
return the empty string and do nothing on setting.
class attributeEvery HTML element
may have a class
attribute specified.
The attribute, if specified, must have a value that is an unordered set of unique space-separated tokens representing the various classes that the element belongs to.
The classes that an HTML
element has assigned to it consists of all the classes returned when
the value of the class
attribute is split on
spaces.
Assigning classes to an element affects class matching in
selectors in CSS, the getElementsByClassName() method
in the DOM, and other such features.
Authors may use any value in the class attribute, but are encouraged to use the
values that describe the nature of the content, rather than values that
describe the desired presentation of the content.
The className
and classList DOM
attributes must both reflect the class content attribute.
style attributeAll elements may have the style content attribute set. If specified, the
attribute must contain only a list of zero or more semicolon-separated (;)
CSS declarations. [CSS21]
The attribute, if specified, must be parsed and treated as the body (the part inside the curly brackets) of a declaration block in a rule whose selector matches just the element on which the attribute is set. For the purposes of the CSS cascade, the attribute must be considered to be a 'style' attribute at the author level.
Documents that use style attributes on any of their elements must
still be comprehensible and usable if those attributes were removed.
In particular, using the style attribute to hide and show content, or to
convey meaning that is otherwise not included in the document, is
non-conforming.
The style DOM
attribute must return a CSSStyleDeclaration whose value
represents the declarations specified in the attribute, if present.
Mutating the CSSStyleDeclaration object must create a style attribute on the element (if there
isn't one already) and then change its value to be a value representing
the serialized form of the CSSStyleDeclaration object. [CSSOM]
In the following example, the words that refer to colors are marked up
using the span element and the style attribute to make
those words show up in the relevant colors in visual media.
<p>My sweat suit is <span style="color: green; background: transparent">green</span> and my eyes are <span style="color: blue; background: transparent">blue</span>.</p>
A custom data attribute is an attribute whose name
starts with the string "data-", has at least one character
after the hyphen, is XML-compatible, and has
no namespace.
Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements.
Every HTML element may have any number of custom data attributes specified, with any value.
The dataset DOM
attribute provides convenient accessors for all the data-* attributes on an
element. On getting, the dataset DOM attribute must return a DOMStringMap object, associated with the
following three algorithms, which expose these attributes on their
element:
data- and the name passed to the algorithm.
data- and the name passed to the algorithm.
setAttribute() would have raised an exception when
setting an attribute with the name name, then this
must raise the same exception.
data- and the name passed to the algorithm.
If a Web page wanted an element to represent a space ship, e.g. as part
of a game, it would have to use the class
attribute along with data-* attributes:
<div class="spaceship" data-id="92432"
data-weapons="laser 2" data-shields="50%"
data-x="30" data-y="10" data-z="90">
<button class="fire"
onclick="spaceships[this.parentNode.dataset.id].fire()">
Fire
</button>
</div>
Authors should carefully design such extensions so that when the attributes are ignored and any associated CSS dropped, the page is still usable.
User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values.
All the elements in this specification have a defined content model, which describes what nodes are allowed inside the elements, and thus what the structure of an HTML document or fragment must look like.
As noted in the conformance and terminology sections, for the
purposes of determining if an element matches its content model or not, CDATASection nodes in the
DOM are treated as equivalent to Text nodes, and entity reference nodes are treated as if they
were expanded in place.
The space characters are always allowed between elements. User agents represent these characters between elements in the source markup as text nodes in the DOM. Empty text nodes and text nodes consisting of just sequences of those characters are considered inter-element whitespace.
Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element matches its content model or not, and must be ignored when following algorithms that define document and element semantics.
An element A is said to be preceded or followed by a second element B if A and B have the same parent node and there are no other element nodes or text nodes (other than inter-element whitespace) between them.
Authors must not use elements in the HTML namespace anywhere except where they are explicitly allowed, as defined for each element, or as explicitly required by other specifications. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.
The SVG specification defines the SVG foreignObject
element as allowing foreign namespaces to be included, thus allowing
compound documents to be created by inserting subdocument content under
that element. This specification defines the XHTML html element as being allowed where subdocument
fragments are allowed in a compound document. Together, these two
definitions mean that placing an XHTML html element as a child of an SVG
foreignObject element is conforming. [SVG]
The Atom specification defines the Atom content
element, when its type attribute has the value
xhtml, as requiring that it contains a single HTML
div element. Thus, a div element is allowed in that context, even
though this is not explicitly normatively stated by this specification.
[ATOM]
In addition, elements in the HTML namespace may be orphan nodes (i.e. without a parent node).
For example, creating a td element and
storing it in a global variable in a script is conforming, even though
td elements are otherwise only supposed to
be used inside tr elements.
var data = {
name: "Banana",
cell: document.createElement('td'),
};
Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following categories are used in this specification:
Some elements have unique requirements and do not fit into any particular category.
Metadata content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.
Elements from other namespaces whose semantics are primarily metadata-related (e.g. RDF) are also metadata content.
Most elements that are used in the body of documents and applications are categorized as flow content.
As a general rule, elements whose content model allows any flow content should have either at least one
descendant text node that is not inter-element
whitespace, or at least one descendant element node that is embedded content. For the purposes of this
requirement, del elements and their
descendants must not be counted as contributing to the ancestors of the
del element.
This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.
Sectioning content is content that defines the scope of headers, footers, and contact information.
Each sectioning content element potentially has a heading. See the section on headings and sections for further details.
Heading content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).
Phrasing content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.
All phrasing content is also flow content. Any content model that expects flow content also expects phrasing content.
As a general rule, elements whose content model allows any phrasing content should have either at least one
descendant text node that is not inter-element
whitespace, or at least one descendant element node that is embedded content. For the purposes of this
requirement, nodes that are descendants of del elements must not be counted as contributing to
the ancestors of the del element.
Most elements that are categorized as phrasing content can only contain elements that are themselves categorized as phrasing content, not any flow content.
Text nodes that are not inter-element whitespace are phrasing content.
Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.
All embedded content is also phrasing content (and flow content). Any content model that expects phrasing content (or flow content) also expects embedded content.
Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)
Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.
Parts of this section should eventually be moved to DOM3 Events.
Interactive content is content that is specifically intended for user interaction.
Certain elements in HTML can be activated, for instance a elements, button elements, or
input elements when their type attribute is set
to radio. Activation of those elements can happen in various
(UA-defined) ways, for instance via the mouse or keyboard.
When activation is performed via some method other than clicking the
pointing device, the default action of the event that triggers the
activation must, instead of being activating the element directly, be to
fire a click event on the same
element.
The default action of this click event,
or of the real click event if the element
was activated by clicking a pointing device, must be to fire a further DOMActivate event at the same
element, whose own default action is to go through all the elements the
DOMActivate event bubbled through
(starting at the target node and going towards the Document
node), looking for an element with an activation
behavior; the first element, in reverse tree order, to have one, must
have its activation behavior executed.
The above doesn't happen for arbitrary synthetic events
dispatched by author script. However, the click() method can be used to make it happen
programmatically.
For certain form controls, this process is complicated further by changes that must happen around the click event. [WF2]
Most interactive elements have content models that disallow nesting interactive elements.
Some elements are described as transparent; they have "transparent" as their content model. Some elements are described as semi-transparent; this means that part of their content model is "transparent" but that is not the only part of the content model that must be satisfied.
When a content model includes a part that is "transparent", those parts must not contain content that would not be conformant if all transparent and semi-transparent elements in the tree were replaced, in their parent element, by the children in the "transparent" part of their content model, retaining order.
When a transparent or semi-transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as accepting any flow content.
A paragraph is typically a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.
Paragraphs in flow content are defined
relative to what the document looks like without the ins and del
elements complicating matters. Let view be a view of
the DOM that replaces all ins and del elements in the document with their contents.
Then, in view, for each run of phrasing content uninterrupted by other types of
content, in an element that accepts content other than phrasing content, let first be
the first node of the run, and let last be the last
node of the run. For each run, a paragraph exists in the original DOM from
immediately before first to immediately after last. (Paragraphs can thus span across ins and del
elements.)
A paragraph is also formed by p elements.
The p element can be used to
wrap individual paragraphs when there would otherwise not be any content
other than phrasing content to separate the paragraphs from each other.
In the following example, there are two paragraphs in a section. There is also a header, which contains phrasing content that is not a paragraph. Note how the comments and intra-element whitespace do not form paragraphs.
<section> <h1>Example of paragraphs</h1> This is the <em>first</em> paragraph in this example. <p>This is the second.</p> <!-- This is not a paragraph. --> </section>
The following example takes that markup and puts ins and del
elements around some of the markup to show that the text was changed
(though in this case, the changes don't really make much sense,
admittedly). Notice how this example has exactly the same paragraphs as
the previous one, despite the ins and
del elements.
<section> <ins><h1>Example of paragraphs</h1> This is the <em>first</em> paragraph in</ins> this example<del>. <p>This is the second.</p></del> <!-- This is not a paragraph. --> </section>
For HTML documents, and for HTML elements in HTML documents, certain APIs defined in DOM3 Core become case-insensitive or case-changing, as sometimes defined in DOM3 Core, and as summarized or required below. [DOM3CORE].
This does not apply to XML documents or to elements that are not in the HTML namespace despite being in HTML documents.
Element.tagName, Node.nodeName, and Node.localName
These attributes return tag names in all uppercase and attribute names in all lowercase, regardless of the case with which they were created.
Document.createElement()
The canonical form of HTML markup is all-lowercase; thus, this method will lowercase the argument before creating the requisite element. Also, the element created must be in the HTML namespace.
This doesn't apply to Document.createElementNS(). Thus, it is possible, by
passing this last method a tag name in the wrong case, to create an
element that claims to have the tag name of an element defined in this
specification, but doesn't support its interfaces, because it really has
another tag name not accessible from the DOM APIs.
Element.setAttributeNode()
When an Attr node is set on an HTML element, it must have its name
lowercased before the element is affected.
This doesn't apply to Document.setAttributeNodeNS().
Element.setAttribute()
When an attribute is set on an HTML element, the name argument must be lowercased before the element is affected.
This doesn't apply to Document.setAttributeNS().
Document.getElementsByTagName() and Element.getElementsByTagName()
These methods (but not their namespaced counterparts) must compare the given argument case-insensitively when looking at HTML elements, and case-sensitively otherwise.
Thus, in an HTML document with nodes in multiple namespaces, these methods will be both case-sensitive and case-insensitive at the same time.
Document.renameNode()
If the new namespace is the HTML namespace, then the new qualified name must be lowercased before the rename takes place.
The document.write() family of methods and
the innerHTML
family of DOM attributes enable script authors to dynamically insert
markup into the document.
bz argues that innerHTML should be called something else on XML documents and XML elements. Is the sanity worth the migration pain?
Because these APIs interact with the parser, their behavior varies depending on whether they are used with HTML documents (and the HTML parser) or XHTML in XML documents (and the XML parser). The following table cross-references the various versions of these APIs.
document.write()
| innerHTML
| |
|---|---|---|
| For documents that are HTML documents | document.write() in HTML
| innerHTML in HTML
|
| For documents that are XML documents | document.write() in XML
| innerHTML
in XML
|
Regardless of the parsing mode, the document.writeln(...) method
must call the document.write() method with the same
argument(s), and then call the document.write() method with, as its
argument, a string consisting of a single line feed character (U+000A).
The open()
method comes in several variants with different numbers of arguments.
When called with two or fewer arguments, the method must act as follows:
Let type be the value of the first argument, if
there is one, or "text/html" otherwise.
Let replace be true if there is a second argument and it has the value "replace", and false otherwise.
If the document has an active parser
that isn't a script-created parser, and
the insertion point associated with that
parser's input stream is not undefined (that is,
it does point to somewhere in the input stream), then the
method does nothing. Abort these steps and return the
Document object on which the method was invoked.
This basically causes document.open() to be ignored when it's called
in an inline script found during the parsing of data sent over the
network, while still letting it have an effect when called
asynchronously or on a document that is itself being spoon-fed using
these APIs.
onbeforeunload, onunload, reset timers, empty event queue, kill any pending transactions, XMLHttpRequests, etc
If the document has an active parser, then stop that parser, and throw away any pending content in the input stream. what about if it doesn't, because it's either like a text/plain, or Atom, or PDF, or XHTML, or image document, or something?
Remove all child nodes of the document.
Change the document's character encoding to UTF-16.
Create a new HTML parser and associate it with
the document. This is a script-created
parser (meaning that it can be closed by the document.open() and
document.close() methods, and that the
tokeniser will wait for an explicit call to document.close()
before emitting an end-of-file token).
If type does not have the value
"text/html", then act as if the
tokeniser had emitted a start tag token with the tag name "pre", then
set the HTML parser's tokenisation stage's content model flag to PLAINTEXT.
If replace is false, then:
Document's
History object
Document
Document object, as well as the state of the document at
the start of these steps. (This allows the user to step backwards in
the session history to see the page before it was blown away by the
document.open() call.)
Finally, set the insertion point to point at just before the end of the input stream (which at this point will be empty).
Return the Document on which the method was invoked.
We shouldn't hard-code text/plain there. We
should do it some other way, e.g. hand off to the section on
content-sniffing and handling of incoming data streams, the part that
defines how this all works when stuff comes over the network.
When called with three or more arguments, the open() method on the
HTMLDocument object must call the
open() method on the
Window interface of the object returned
by the defaultView attribute
of the DocumentView interface of the HTMLDocument object, with the same
arguments as the original call to the open() method, and return whatever that method
returned. If the defaultView
attribute of the DocumentView interface of the HTMLDocument object is null, then the
method must raise an INVALID_ACCESS_ERR exception.
The close()
method must do nothing if there is no script-created parser associated with the
document. If there is such a parser, then, when the method is called, the
user agent must insert an explicit "EOF"
character at the insertion point of the
parser's input stream.
In HTML, the document.write(...)
method must act as follows:
If the insertion point is undefined, the
open() method
must be called (with no arguments) on the document object. The insertion point will point at just before the end
of the (empty) input stream.
The string consisting of the concatenation of all the arguments to the method must be inserted into the input stream just before the insertion point.
If there is a pending external script, then the method must now return without further processing of the input stream.
Otherwise, the tokeniser must process the characters that were
inserted, one at a time, processing resulting tokens as they are
emitted, and stopping when the tokeniser reaches the insertion point or
when the processing of the tokeniser is aborted by the tree construction
stage (this can happen if a script
start tag token is emitted by the tokeniser).
If the document.write() method was called
from script executing inline (i.e. executing because the parser parsed a
set of script tags), then this is a
reentrant invocation of the parser.
Finally, the method must return.
In HTML, the innerHTML DOM attribute of all
HTMLElement and HTMLDocument nodes returns a serialization
of the node's children using the HTML syntax.
On setting, it replaces the node's children with new nodes that result
from parsing the given value. The formal definitions follow.
On getting, the innerHTML DOM attribute must return the
result of running the HTML fragment serialization
algorithm on the node.
On setting, if the node is a document, the innerHTML DOM
attribute must run the following algorithm:
If the document has an active parser, then stop that parser, and throw away any pending content in the input stream. what about if it doesn't, because it's either like a text/plain, or Atom, or PDF, or XHTML, or image document, or something?
Remove the children nodes of the Document whose innerHTML
attribute is being set.
Create a new HTML parser, in its initial state,
and associate it with the Document node.
Place into the input stream for the HTML parser just created the string being assigned
into the innerHTML attribute.
Start the parser and let it run until it has consumed all the
characters just inserted into the input stream. (The
Document node will have been populated with elements and a
load event will have
fired on its body
element.)
Otherwise, if the node is an element, then setting the innerHTML DOM
attribute must cause the following algorithm to run instead:
Invoke the HTML fragment parsing
algorithm, with the element whose innerHTML attribute is being set as the
context element, and the string being assigned into
the innerHTML attribute as the input. Let new children be the result
of this algorithm.
Remove the children of the element whose innerHTML
attribute is being set.
Let target document be the ownerDocument of the Element node whose
innerHTML attribute is being set.
Set the ownerDocument of all the nodes in new children to the target document.
Append all the new children nodes to the node
whose innerHTML attribute is being set,
preserving their order.
script elements inserted
using innerHTML do not execute when they are
inserted.
In an XML context, the document.write() method
must raise an INVALID_ACCESS_ERR exception.
On the other hand, however, the innerHTML attribute is indeed
usable in an XML context.
In an XML context, the innerHTML DOM attribute on HTMLElements must return a string in the
form of an internal
general parsed entity, and on HTMLDocuments must return a string in the
form of a document
entity. The string returned must be XML namespace-well-formed and must
be an isomorphic serialization of all of that node's child nodes, in
document order. User agents may adjust prefixes and namespace declarations
in the serialization (and indeed might be forced to do so in some cases to
obtain namespace-well-formed XML). For the innerHTML
attribute on HTMLElement objects,
if any of the elements in the serialization are in no namespace, the
default namespace in scope for those elements must be explicitly declared
as the empty string.
(This doesn't apply to the innerHTML attribute on HTMLDocument objects.) [XML] [XMLNS]
If any of the following cases are found in the DOM being serialized, the
user agent must raise an INVALID_STATE_ERR exception:
Document node with no child element nodes.
DocumentType node that has an external subset public
identifier or an external subset system identifier that contains both a
U+0022 QUOTATION MARK ('"') and a U+0027 APOSTROPHE ("'").
Attr node, Text node,
CDATASection node, Comment node, or
ProcessingInstruction node whose data contains characters
that are not matched by the XML Char production. [XML]
CDATASection node whose data contains the string "]]>".
Comment node whose data contains two adjacent U+002D
HYPHEN-MINUS (-) characters or ends with such a character.
ProcessingInstruction node whose target name is the
string "xml" (case insensitively).
ProcessingInstruction node whose target name contains a
U+003A COLON (":").
ProcessingInstruction node whose data contains the
string "?>".
These are the only ways to make a DOM unserializable. The DOM
enforces all the other XML constraints; for example, trying to set an
attribute with a name that contains an equals sign (=) will raised an
INVALID_CHARACTER_ERR exception.
On setting, in an XML context, the innerHTML DOM attribute on HTMLElements and HTMLDocuments must run the following
algorithm:
The user agent must create a new XML parser.
If the innerHTML attribute is being set on an
element, the user agent must feed the parser just created
the string corresponding to the start tag of that element, declaring all
the namespace prefixes that are in scope on that element in the DOM, as
well as declaring the default namespace (if any) that is in scope on
that element in the DOM.
The user agent must feed the parser just created the
string being assigned into the innerHTML attribute.
If the innerHTML attribute is being set on an
element, the user agent must feed the parser the string
corresponding to the end tag of that element.
If the parser found a well-formedness error, the attribute's setter
must raise a SYNTAX_ERR exception and abort these steps.
The user agent must remove the children nodes of the node whose innerHTML
attribute is being set.
If the attribute is being set on a Document node, let
new children be the children of the document,
preserving their order. Otherwise, the attribute is being set on an
Element node; let new children be the
children of the document's root element, preserving their order.
If the attribute is being set on a Document node, let
target document be that Document node.
Otherwise, the attribute is being set on an Element node;
let target document be the ownerDocument of that Element.
Set the ownerDocument of all the nodes in new children to the target document.
Append all the new children nodes to the node
whose innerHTML attribute is being set,
preserving their order.
script elements inserted
using innerHTML do not execute when they are
inserted.