File Formats Wiki
HTML (HyperText Markup Language)

Filename extension .html +, .htm +
Internet media type text/html
Type code TEXT
Uniform type identifier public.html
Developed by World Wide Web Consortium +
File formats category - v  e   edit

HTML (HyperText Markup Language) is a markup language for web pages.


HTML uses tags in curly brackets with attributes, similar to XML (It is extended from SGML). In fact, a reformulation of HTML that complies with XML, called XHTML, is also made. The last published recommendation of HTML is HTML 4.01 (succeeded by XHTML 1.0), while HTML 5 was published as a Working Draft by the W3C.


An HTML document starts with an html element (the root element), which in turn contains a head element and a body element:


(The tag names can either be uppercase or lowercase.)

The document may also contain a doctype for validation and version information:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "">

Document head[]

The document head contains general information about the page rather than actual content, such as the title, styles, and scripts. The head is defined by the <head> element.


An HTML document is required to have a title. The title appears on the title bar of a browser or on the corresponding tab of the document in a tabbed browser. The title is enclosed in <title> tags. The title must only contain text (including any defined entities). Example:

        <title>The Telcom Website</title>

Metadata focus more on providing some information on the document. A common use is to provide information on how it will be processed by a search engine. It is defined using the <meta /> element. The name attribute specifies a name for the element, and the content tag specifies a value for the element.

Document relationships

The <link /> element defines links and resources associated with the current document, specified using the href attribute. The relationship type is specified using the rel (forward link) and rev (reverse link) attributes.


The <base /> element defines an absolute path used to resolve relative paths in the document.

Document body[]

The document body contains content to be presented to the user. The body is defined by the <body> element.

Element identifiers

Elements may be identified using the id and class attribute. id attributes must be unique within the document. This is a "name" or identifier for the specific element. On the other hand, multiple elements may be assigned a single class attribute, where they all belong to the same class.

Grouping elements

Elements may be assigned some kind of grouping using the <div> (block logical division) and <span> (inline logical division) elements. These specify "parts" of the document to which CSS styles may be applied.


Text is often placed in a <p> (paragraph) element, which specifies a single paragraph. It can also be placed in a <blockquote> (quoted text) or <pre> (preformatted text) element.

The blockquote element is used in conjunction with a <q> (quote, usually rendered between quotation marks) element to render quotations (that may include a citation with the cite attribute).

The pre tag indicates preformatted text often rendered in a fixed-width (monospace) font. Whitespace are also significant in a pre tag. Some tags are not allowed in a pre tag: img, object, big, small, sub and sup.

Line breaks may be defined using the <br> element.

Inline logical divisions may be defined using the <span> element, in which one may specify properties specific to the enclosed text, such as CSS styles or HTML anchors.

The <address> element specifies some contact information. It may only contain inlime text and elements. It may contain a line break.

Phrase elements[]

These are elements that are used inline in text.

  • em (emphasis, often rendered in italics)
  • strong (stronger emphasis, often rendered in bold)
  • cite (citation or reference to a source)
  • dfn (definition of a term)
  • code (source code fragment, often rendered in a fixed-width font)
  • samp (sample output, specially from a program or script)
  • var (source code variable)
  • abbr / acronym (abbreviation or acronym, a definition of the enclosed text may be placed in a title attribute)

Superscript and subscript[]

  • sup (superscripted text)
  • sub (subscripted text)

Document changes[]

Changes to the document may be rendered using the <ins> (inserted text, usually rendered as underlined text) and <del> (deleted text, usually rendered with a strikethrough).


Headings may be defined using the h1, h2, h3, h4, h5 and h6 elements. These function similar to the p element; these describe the topic of the section. h1 is rendered as the most important, while h6 is the least.


  • ul - Unordered list. These are usually rendered will bullets.
  • ol - Ordered list. These are usually rendered with numbers or letters automatically determined by the browser or renderer.
  • li - List item. Used to enumerate items in an unordered or ordered list.
  • dl - Definition list. These contain a term an a definition of the term.
  • dt - Term. This is a term in a definition list, and may only contain inline elements and text.
  • dd - Definition. This is a definition of a term in a definition list.
  • dir and menu - deprecated list elements


  • table - Table. The base element for defining a table.
  • caption - Table caption. States what the table is about. Only up to one caption is allowed, and should go after the table start tag if present.
  • thead - Table header
  • tbody - Table body
  • tfoot - Table footer

These are row groups defined in a table. These may be present for example to allow separation from header rows and table data. Table data are usually placed in the tbody element, while thead and tbody may contain specific column information. When one is present, at least one tr element is also required. The tfoot element is placed before the tbody element.

  • colgroup - Column group. Creates a division of columns in the table.
  • col - Column. May specify specific formatting information for a single column. May also nest within a colgroup element.
  • tr - Table row. Contains a row of cells in the table.
  • th and td - Table header cell and table data cell. Header cells are used as a description of the contents of a row or column, while data cells are the data of the table.


  • a - Anchor. Links to another anchor or another page using the href attribute, which contains a URI to the target anchor or page. An anchor may be defined using the name attribute of an a element, or the id attribute of any element (but these should be unique). The enclosed text defines the text to be displayed for the link.
  • link - document relationship. See Document head.

Images and objects[]

  • img - Image. Renders an image to be displayed on the page. A link to the image is placed in an src attribute. Alternate text to be rendered in the case where the image cannot be rendered or is not yet loaded is placed in an alt attribute. (This is often called "alt text")
  • map - Client-side image map. Defines a structure that can be used to render an image map. The name attribute defines an anchor to the image map.

To use an image map, a usemap attribute is added that links to the defined image map. It may be used in either the img or the object element.

  • area - Client-side image map area. Defines a single region in the image map. Contained within the map element. The shape attribute specifies the region to be used: default for the entire region, rect for a rectangle, circ for a circle, and poly for a polygonal region. The coords attribute specifies points in the map that depends on the type of the area.
  • object - Object. These are objects other than HTML 'objects' to be rendered by either another program or by the renderer itself if supported. Examples are Flash files, sound files and Java applications. Alternate text, HTML content or even other object elements may be placed in its content.
  • param - Object parameter. These are parameters to be passed to the object upon initialization. Any number of param elements may be contained in an object element, but must precede any "alternate" content. The name of the parameter is placed in a name attribute, while its value is in a value attribute.
  • applet - Applet. Deprecated in favor of object.

Font styles[]

  • tt - Teletype. Renders text in a typewriter-like (teletype) font. Usually rendered with a fixed-width font.
  • b - Bold. Renders text in boldface.
  • i - Italic. Renders text in italics.
  • big - Big. Renders text in a bigger font size.
  • small - Small. Renders text in a smaller font size.
  • font and basefont - Font. Renders text with additional font styles. Deprecated in favor of CSS.
  • u - Underline. Renders text underlined. Deprecated.
  • s and strike - Strikethrough. Renders text with a strikethrough. Deprecated.


Comments are enclosed in <!-- and -->, and are ignored and not rendered. This is from SGML syntax.

See also[]

External links[]