chapter 19 Good HTML |
The purpose of a web page is the message. The message of the page is sometimes in the words, sometimes in the images, and sometimes in other elements of the page, but the message is rarely in the code itself. So why bother writing good HTML? The way the message is delivered can have impact on how it is received. HTML is one link in the chain of media that carries your web-based message. Ask a painter why they choose one type of canvas over another, or ask a musician why they choose a type of string or reed or bow for their instrument. You may not be able to discern what type of strings are on a guitar by listening to a recording (although Bill says he can), but it does affect the overall quality of the experience. |
Why Write Good HTML? There are both subjective and objective reasons for writing good HTML. Subjectively, it may or may not be important to you that you do as good a job as possible on every level of every project that you take on. We feel that doing something well is its own reward, but we recognize that it's not always practicable. On the other hand, there are some very pragmatic reasons to at least make sure that your HTML is correct, in spite of the fact that it may already work. As a practical illustration, here's a page that works fine in browsers that are based on the original NCSA Mosaic (including Microsoft Internet Explorer and older Netscape browsers), but does not work in the current Netscape: <HTML> <HEAD> <TITLE> Bad Table </TITLE> </HEAD> <BODY BGCOLOR=WHITE> <TABLE> <TR><TD> <H1>This entire page is in a table. </H1> </BODY> </HTML>
Notice that there is no end tag for the In the case of the missing table end tags, there were a number of web sites that virtually "disappeared" when Netscape 3 was released. A similar problem happened with body backgrounds with the release of Netscape 4 (see the example later in this chapter). HTML Terminology Probably the single most important thing you can learn about HTML is the distinction between tags, attributes, containers, and elements. Once you understand these terms, it will be much easier for you to tell when your code is correct. Here's what they mean:
What You See AIN'T What You Get WYSIWYG editors are a wonderful invention, and we encourage you to use them for prototyping your web sites. The use of a WYSIWYG editor can greatly reduce the amount of time it takes you to layout, view, and re-layout your site while you are in the process of designing it. But for production work, we implore you to be careful. An excellent example of the problem is the "disappearing background" problem that happened with the release of Netscape 4.
The HTML specification allows for one
However, there are evidently some WYSIWYG editors that don't follow
this rule. We have seen a number of sites with two or more <HTML> <HEAD> <TITLE> Bad Body </TITLE> </HEAD> <body> <BODY background=white.gif> <H1>This document has two BODY tags. </H1> </BODY> </HTML> Later releases of Navigator 4 (beginning with 4.03) accumulate attributes from BODY tags. But you really can't count on a browser guessing what your HTML means when it's not correct. For instance, Mosaic 3.0 (the last version) also shows a gray background for this error. The best defense is good HTML. Cleaning Up After a WYSIWYG Editor As an example of the sorts of things you need to watch out for with your WYSIWYG editors, I have created a little page using Alaire's Home Site. Here's a screenshot of the page in the editor Now here's what it looks like in Netscape Navigator: Notice anything different? Let's look at the code and see if we can fix it up. <!-- This document was created with HomeSite 2.5 --> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <TITLE>Test Page</TITLE> </HEAD> <BODY BACKGROUND="/usr/BILL/htmlbook/working/ch19/ lgreentile.gif" TEXT="Navy" LINK="Olive" VLINK="#999933" ALINK="Silver"> <TABLE BORDER=0 CELLSPACING=8 CELLPADDING=5 VALIGN="TOP" BGCOLOR="#CCFF99" WIDTH=350> <TR> <TD>Something here</TD> <TD>Something Else</TD> </TR> <TR> <TD>Something New</TD> <TD>Something Blue</TD> </TR> <TR> <TD>Other things</TD> <TD>Things X, Y, and Z</TD> </TR> <TR> <TD>The Cat in the Hat</TD> <TD>Dr Seuss' Toothbrush</TD> </TR> </TABLE> <P> Here's a paragraph created in Home Site. It has <B>bold and <I>italic text in it.</I></B></P> </BODY> </HTML>
The most glaring problem in the HTML on the previous page is that
the background image didn't show up in the browser (even though
it was fine in the editors preview screen). Notice that the URL
for the The point here is for you to expect flaws in the code that the editor puts out. Always expect to have to fix the code that an automated tool generates. Some people say that the tools will get better, and that's probably true. But the fact remains that after 20 years of trying, there are still no automated tools for any programming language that do as good a job as a careful human. The promise of artificial intelligence that can better a human's creative efforts is yet to be realized. We don't expect that overall situation to change any time soon. We also noticed that the tool doesn't break its lines to fit an 80-column screen (this is important for those of us who use multiple platforms to work on the same files), and the use of tabs for indenting is also not portable. Again, these are easy problems to fix, but they require effort. Always prepare for more complicated pages to have more complicated problems. As a rule, we feel that the WYSIWYG editors are excellent tools for prototyping (indeed, we use them as such), but not for production use. If you must create and maintain a large and complex web site with constantly up-dated information (like a large news or periodical site), we recommend that you either create custom tools for that particular site (as most of the large major sites do) or retain the services of a programmer to do that for you. For large one-time sites that won't change much over time, you can prototype with your WYSIWYG editor and then modify or rewrite the code by hand to make it correct. Common HTML Gotchas There are many common HTML "gotchas" that we see a lot on the web. Of course, each of us has our own peculiar predilection for error, and as such, our problems will not always fit nicely into a preordained list. But we've compiled a short list that you may want to watch out for anyway. These are some of the most frequent HTML problems we see on public web pages. What's in a Quote? Quotation marks (either double " or single ') are used in HTML to contain the values of some attributes. When do you need to use quotes? If all the characters in the value are either letters and A-Z), numbers (0-9), periods (.), or hyphens (-), you don't need to use quotes. If you have any characters besides those mentioned, you need to use quotes. When in doubt, use the quotes. They can't hurt.
The most common type of value that requires quotes, and often
doesn't have them, is the URL (for example, Hanging Quotes On the other hand, you have to use your quotes in matching pairs! For example, this doesn't work well: <HTML> <HEAD> <TITLE> Bad Quotes </TITLE> </HEAD> <BODY BGCOLOR=white> <P>This is a <a href="link.html>link</a> with a missing quote. <P>You won't see any of this text until <a href="link.html">after</a> this other link. </BODY> </HTML> Notice the missing quote in the first link. You don't see it? Look here then. The folks at Netscape gave us this handy-dandy missing quote finder in their View:Source menu, starting with version 3. When you view the source of a document with a missing quote, all the text that's affected will be highlighted and blinking. Try this for yourself: find the bad-quote.html file in the chap19 folder of the <chd> CD-ROM and look at it in Netscape Navigator. Be sure to select View:Source. See it blink? Tell a friend. Straddling Containers Considering the fact that a container--along with all of its content--is a single distinct element, it is reasonable that one container can have other containers as part of its content. That's why you can write something like this:
In this perfectly legal example, the Now consider this example:
Here we decided to end the It is perfectly legal to have one element contain another element, as long as the inner element is valid content for the outer element. But it is not legal to have two element straddle each other. As with many common HTML errors, this may work in some browsers today, and it may not work in later versions of those same browsers. Line Endings Unless you are actually trying to make your HTML unreadable (some people actually want to make it a little tougher to "steal" their code), you should keep your lines to under 80 characters wide (75 is a good rule of thumb). That makes it easier to view your source code in the browser and to work on it on the widest possible variety of platforms. You should also set your editor to use UNIX line-endings, especially if your server runs under UNIX. There are three different types of line-endings:
The line-endings are invisible to you, but visible to your web server and many HTML editors. You will probably find the setting for Unix Line-endings in the Preferences menu of your HTML editor or word processor. Entities vs. Numbers vs. Embedded Characters
HTML uses something called "entities" for characters outside of
the normal English alpha-numeric character set (there's a nice
list of them
here, as well as a complete list in the HTML 4.0 Reference Chapter). Named
entities (e.g., Color Names not Browser-Safe
Remember that the named colors (e.g., Empty ALT Attributes
The Case-Sensitive File Names
Most web servers run under UNIX, which use case-sensitive file
names. Most web authors use Mac or PC platforms, which do not
use case-sensitive file names. That means that if you have a file
named Relative vs. Absolute Links Always use relative links when possible. (See Chapter 12, "Organization".) Absolute links will become a major headache for you when you eventually have to move your site to another machine, or even just another folder on the same machine. (Some WYSIWYG editors use absolute links by default.) Chapter 19 Summary Writing good HTML is not required. No one is going to force you to do it, and most people won't even notice if you don't. But it's a discipline that will serve you well in the long run. It will make life easier on you when new tools and browsers are released and whenever you need to make substantial changes to your site (which will likely be more often than you plan for). In this chapter, you have seen some of the common problems with incorrect HTML, and how to correct them when they are encountered. We encourage you to use the HTML reference that accompanies this book (it's on the CD-ROM, and available as a printed booklet) for an authoritative source of correct HTML syntax. |