|Extensible Markup Language|
|Filename extension||.xml +|
|Internet media type||application/xml +|
|Uniform type identifier||public.xml|
|Developed by||W3C +|
|Standard||1.0 (Fourth Edition) 1.1 (Second Edition)|
|File formats category -|
The Extensible Markup Language is a general-purpose specification for creating custom markup languages.
Information in an XML document are structured comparable to a tree data structure.
The only required "thing" in an XML document is the root element of the document. There must only be exactly one root element. This, and all other elements, is composed of a start tag, an end tag, and some content between the tags:
The root element may optionally be preceded by an XML declaration element stating the XML version in use and the encoding of the document:
<?xml version="1.0" encoding="UTF-8"?>
Comments can be placed inside
<!-- This is a comment -->
The elements may be nested in each other's content and may appear many times (depending on the type of document):
<book> <chapter>The Moviegoer</chapter> <chapter>The Adapter</chapter> <index> <entry target="7">Movie</entry> <entry target="10">Combinatorics</entry> </index> </book>
Elements can be assigned attributes. In the example above, the attribute "
target" is assigned to an "
entry" XML element with a value of "
7". Any XML content or attribute value in XML is stored as a string, it's up to the parser or schema to identify certain values such as numbers.
Elements must be properly nested in XML. The following is invalid XML:
Elements without content may also be formed (this also applies to the root element) by "closing" the tag with a slash. (Sometimes, a space after the tag name is placed, but it's optional)
Elements are case-sensitive.
Entity references in text (content and attribute values) "escape" the XML syntax:
- (less than sign)
- escapes the less than sign, which is used to open a tag.
- (greater than sign)
- escapes the greater than sign, which is used to close a tag.
- escapes the ampersand, which is used in other entity references.
- (quotation mark)
- escapes the quotation mark (which will consequently end the string early in attribute values)
- escapes the apostrophe
Other symbols can be represented by prefixing its Unicode code point by "&#" (add x for hexadecimal), for example,
More entities can be declared in an XML document's Document Type Definition (DTD).
These terms are heavily taken from the fact that an XML document resembles a tree in mathematics.
- Root element
- The "topmost" or the first element. It is usually the element directly after the XML and/or DTD declaration.
- An element is a child of another element if it is directly under (or in) it. The children of an element is the collection of all elements directly under it.
- An element is a parent of another element if the other element is a child of the element. Considering the tree structure, an element can only have at most one element (if it has none, it's the root element).
- An element B is a descendant of another element A if there is a direct path, going "down", from A to B. Recursively, we can also say that B is a descendant of A if B is a child of A or B is a child of a descendant of A.
- An element A is an ancestor of another element B if B is a descendant of A.