XHTML is a Web standard that is a fusion between HTML and XML. The idea is to create a language that can be parsed as XML, but still has the same basic syntax as the HTML that Web designers are used to.
XHTML is also backward compatible with HTML 4. Any Web browser that can read HTML 4 documents should be able to read XHTML documents without a problem. XHTML works great in all the currently shipping popular browsers like FireFox, Safari, Internet Explorer, Opera, etc.
This tip includes some information about differences between HTML and XHTML. It shows how a site can be converted to XHTML and how the validator provided by the World Wide Web Consortium (W3C) can be used to confirm that the site is valid XHTML (or valid HTML for that matter).
XHTML is the current standard for HTML. Sites that are XHTML compliant today have the best chance of being compliant with future versions of HTML and with future Web browsers. In addition, XHTML is much easier to deal with programmatically than traditional HTML so it is easier to use new technologies such as AJAX with a site that is XHTML compliant.
There is no need for a wholesale migration of existing sites from HTML to either HTML 4 or XHTML, but it is good practice for Web designers to get familiar with XHTML and to try to use XHTML on new projects.
The biggest reason not to do XHTML is for browser compliance, but all currently shipping popular browsers support XHTML and (using the transitional doctype) XHTML is largely backward compatible with older browsers as well, so unless you need to support a large number of older browsers XHTML should work fine for the vast majority of normal Web users. XHTML 1.0 was introduced in 2000 and updated in 2002. It is the successor the prior HTML 4 standard which was released in 1997 and was updated by HTML 4.01 in 1999.
The W3C provides a validator which you can use to test whether your site passes XHTML validation or not. Visit the following URL and enter the URL of a Web site. For example, http://www.wired.com currently passes validation.
For your own sites you can start by manually setting the “Doctype” to “XHTML 1.0 Transitional” or one of the other types and seeing how you fare. Most HTML sites will be flagged with lots of errors. The remainder of this tip has information about how you can convert your site to XHTML so it does pass validation.
XHTML and HTML documents identify what standard they are written by including a doctype as the first line of the file. If you include a doctype at the start of your file then you must check that it validates properly. A doctype is essentially a promise to the Web browser (or other parser) that you have ensured that your page will parse according to the rules.
XHTML has three doctypes including:
XHTML Strict – This doctype declares that your file passes XHTML validation and does not use any HTML extensions to XHTML. In particular, your site should use CSS for all of its styles and should not rely on HTML attributes like bgcolor, bordercolor, width, etc.
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
XHTML Transitional – This doctype declares that your file passes XHTML validation, but allowsthe HTML extensions. This is usually the easiest XHTML standard to try for if you are already familiar with HTML and if you are not using CSS for all your site styles yet.
PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
XHTML Frameset – There is also a doctype for XHTML Frameset which is required if your site uses frames.
PUBLIC “-//W3C//DTD XHTML 1.0 Frameset//EN”
You can insert the doctype while you are working on your XHTML conversion since it will tell the validator what standard you are going for.
Differences Between HTML and XHTML
This section contains a list of differences between HTML 4 and XHTML. By adopting these principles on new sites you can be ready to perform XHTML validation when you want. If you are converting a site to XHTML you should work through these differences before checking the site for validation.
Every tag must have an opening and a closing tag. It is common in HTML to include just the opening paragraph tags <p>, but in XHTML every paragraph should be enclosed in <p> … </p> tags. Some commonly omitted closing tags included those in lists </li>, table cells </td> or </th>, and select options </option>. It is good practice to always close these tags in HTML as well.
– Standalone tags must also be closed. In XML a tag with no contents can be closed by placing a space and forward slash before the final angle bracket. So, the HTML break tag becomes <br />. Similarly, a form input becomes <input name=”name” value=”value” /> and an image tag becomes <img src=”src” mce_src=”src” />.
Note – Tags that normally enclose content should not be written using this shorthand even if they happen to be empty. Using <p /> is technically proper XHTML, but may cause problems in some browsers. This applies to the <script> tag, <div>, <span>, <a>, etc. Similarly, standalone tags should be written in shorthand. Although <br></br> is technically proper XHTML, it may cause problems in some browsers.
– All tag names should be in lowercase. Use <h1> rather than <H1>.
– All tag attributes must have both a name and a value (e.g. type=”hidden”). The HTML attributes which are just a single word like checked, selected, nowrap, multiple, etc. must be written with a name and value as in checked=”checked”, selected=”selected”, nowrap=”nowrap”, multiple=”mulitple”, etc.
– All tag attribute names should be in lowercase. This includes src=”” mce_src=””, href=”” mce_href=””, type=””, action=””, and the handlers onclick=””, onsubmit=””, etc. Similarly, all tag attributes values should be in lowercase. This includes method=”post”, method=”get”, type=”text”, type=”password”, etc.
– Attribute values should not contain line breaks or extra whitespace.
– Ampersands in attributes and URLs must be written as &. This is actually true of HTML as well, but many Web browsers are forgiving of stray ampersands. For example, the following is a properly written URL with parameters.
<a href=”default.lasso?name=value&&name2=value2″ mce_href=”default.lasso?name=value&&name2=value2″>Link</a>
– The characters < and > should be encoded as < and > if they are not being used as part of a tag markup.
– All entities starting with & must be ended by a semi-colon. < will work in may browsers, but it should always be written < in XHTML.
– In a URL the # character can be used to jump to a named anchor in HTML. In XHTML the name after the # refers to the ID of any element on the page. You can use <a name=”target” id=”target”></a> to create a target which will work in either HTML or XHTML. Note that IDs should be unique throughout a page.
– HTML entities should be avoided when possible (other than & < >). Instead the UTF-8 variants should be used. For example, a non-breaking space can be generated with rather than .
XHTML is much more strict about tag placement than HTML. There are two types of tags in HTML: block level and inline. Block level tags define the structure of the page and include <p>, <div>, <pre>, <quote>, <h1>, <h2>, etc. Inline tags define the style of text within a block and include <a>, <b>, <img>, <input>, <span>, etc.
The structure of your XHTML document should be a <body> tag which surrounds one or more block level tags. Each block level tag should then contain text and inline tags. The <body> tag should not contain any text or inline tags directly. They all should be surrounded by a block level tag.
A couple common errors include:
– Placing text or other inline tags within the <body> directly. For example a <br /> cannot be placed between paragraphs to create extra whitespace. Instead use <div><br /></div>.
– Placing an inline tag around one or more block level tags. For example you might use bold <b> tags around a series of paragraph <p> tags, but this is not valid. Instead, you should have each <p> tag contain a <b> tag that surrounds its entire contents.
The <form> tag also require some special treatment. Just like the <body> tag this tag can only contain block level tags. In particular, placing <input> tags directly within the <form> tag is not allowed. The common practice of placing hidden inputs at the top of a form is not valid in XHTML. Instead, all of the elements within a <form> should be placed in a block level tag such as <p> or <div>.
It is common to do something like this in HTML:
<form … >
but it needs to be written like this in XHTML:
Finally, table tags must all be properly nested in XHTML. You must have all of your table rows and table cells or table headers including both their opening and closing tags. Most HTML browsers will fix up tables for you by inserting tags automatically where needed (such as if you start a table row or table cell without closing the prior row or cell), but XHTML is strict about every tag having a matching closing tag.
The W3C validator can be found here: <http://validator.w3.org/>
The XHTML RFC can be found here: <http://www.w3.org/TR/xhtml1/>.