TOC BACK FORWARD HOME

UNIX Unleashed, Internet Edition

- 15 -

HTML--A Brief Overview

by David B. Horvath, CCP

The information superhighway is often mentioned in the mainstream media these days. When the media uses that term, it is often referring to the World Wide Web and describe it as if it were something being hardwired together. In reality, the World Wide Web (often referred to as just "the Web") is a collection of systems on the Internet that run software that communicate using a common protocol.

This may sound like a description of the Internet in general because most systems use a common communications protocol (TCP/IP). That is because the model is similar. But instead of people having to write down or remember the addresses, locations, or names resources they need, the software provides the links.

The user starts at one location and then connects to other locations and resources. There are three categories of software required to perform these tasks: the server (providing the information), the Web page, and client software (known as a browser). Major corporations run their own Web servers, smaller companies and individuals use Internet Service Providers (ISPs) to hold their Web pages. The Web browser can be GUI (most are) or CUI (Character User Interface, which most frequently is used by UNIX users). It is the client portion of the equation.

It is the Web page that provides the programming flexibility of the Web itself. Although the language looks complex in the beginning, new material can be created quite easily and modified quickly. With ISPs providing inexpensive or even free Web services to their customers, many people are setting up their own pages. The high level Web page of an individual, company, or organization is referred to as the home page because it is the starting point when looking at their Web pages. Each Web page can contain many links or connections to other Web pages and resources.

What Are URLs?

The links between Web pages (or means of accessing resources through the Web) are through the Universal Resource Locator (or, URL for short). The URL specifies the protocol, user name and password (often omitted), system name, location, and name of the desired file. When working with a Web page, the typical URL looks like the following:

http://www.host.domain/directory/file.html

Several protocols are available, as shown in Table 15.1:

Table 15.1. Available World Wide Web protocols.

Protocol Description
file Get file on current system (client)
ftp File Transfer Protocol
gopher Information Service protocol superseded by http
http HyperText Transport Protocol
mailto Send e-mail
news Net News Transport Protocol (NNTP)
telnet Terminal session communications

With the exception of the http protocol, these have been available on the Internet for several years. Only http is new with the Web.

Chapter 21, "Introducing HyperText Transfer Protocol (HTTP), provides much more detail on http itself.

What Is Hypertext?

Hypertext is the description applied to any document that contains links to other portions of the document or other documents. Instead of reviewing the document in a linear manner (reading a book from beginning to end), it is possible to jump around to other areas. Normal documents often have hypertext-like entries--the reference to Chapter 21 (for more information on http) in the previous section is a link to another portion of this book. The primary difference from a reference and a hypertext link is the effort involved to get to the other area.

With book references, it is up to the user to find the page that the reference is on (through the table of contents or index), and then physically move to it. With hypertext links, the link is executed (by selecting it via mouse or hotkey), and the software gets the material for the reader.

With many tools, you are able to jump to new material via the hypertext link and then back to your original location. With a book, you have to keep your finger or a bookmark at the original location.

Hypertext does not provide any new capability, it just makes it so much easier to take advantage of it.

Description of HTML

The programming of individual Web pages is done through HTML (HyperText Markup Language), which is a subset of SGML (Standard Graphics Markup Language). The HTML code describes what the page should look like to the client software (Web browser) and describes links to other pages.

The language itself defines a set of codes or tags (requests in troff terminology) that tell the Web browser how to display text, images, and links. Like troff requests, HTML tags are ASCII text. The language standard provides guidelines on how these items should be displayed, but it is up to the client software to determine the final form.

When coding HTML, you will encounter WYSIPWYG (What You See Is Probably What You Get). When working with a GUI-based word processor, you have the ability to work in WYSIWYG (What You See Is What You Get) mode--the image on the screen is exactly how it will appear on paper. Because the individual Web browsers interpret the HTML slightly differently, the results will vary between products. The HTML specifications only provided general guidelines on displaying elements, so there can be wide variation.

The HTML language elements, also known as markup tags or just tags, begin with the less than symbol (<) and end with a greater than symbol (>). Immediately following the less than symbol is the command name (which is not case sensitive). For many of the commands, they are followed with attributes and assigned values. Be careful with the assigned values because they may be case sensitive.

The tags describe document elements (document parts or sections). Like the pic requests .PS that requires a .PE, some of the tags require a closure tag; others do not. A closure tag consists of the less than symbol, a slash (/), the command name without any attributes, followed by the greater than symbol. When working with tags that require closure, be very careful when nesting them as the closure tag will close the most recent command of that type.

Some of the elements include:

Notice that the <title> tag requires a closure tag in the form of </title>. The <hr> (horizontal rule) tag does not.

There are several versions of HTML. The original was, of course, version 1. Every browser available should be able to recognize version 1 HTML elements. All but the oldest browsers will support version 2 elements. Most browsers should support version 3, which introduces HTML elements to support tables. As with any standard, it is always evolving and growing.

Several browser vendors (Netscape and Microsoft, for example) have added their own non-standard elements to HTML. When a Web page is coded using the extensions of a particular browser, you will often see a message similar to:

This page optimized for the XYZ browser.

Often followed by a graphical representation of the browser's trademark.


NOTE: Most Web browsers will simply ignore any HTML tags that they do not recognize. If you code a tag incorrectly or use a newer HTML version than the browser supports, you will get odd results. If you are unlucky, the Web browser itself will crash, but you will not get an error message. Some of the tools verify the syntax of your HTML code.

My personal suggestion is that you code for the majority of the Web browsers to enable the most people to view your page. The official standard is maintained by the World Wide Web Consortium. You can get more information on the standard HTML at the following Web page:

http://www.w3.org/

Using a Web Browser

Your operating system may come with a Web browser, or you may have received a copy with other software, or you may have to download one from the Internet. But once you have it installed, there are two basic types of browser: GUI and CUI.

When the Web began, most of the users were connected through UNIX systems with character (or text) interfaces. This precluded the use of pretty graphics to represent links and limited the way that text could be represented. As usage has progressed, the majority of users have GUI interfaces that provide much more capability.

The individual Web browsers all behave a little differently, so you will have to learn how yours works. In general, they all have a location for you to enter an URL and provide some status information on the transfer of data between the host and your client. A good place to start is the home page for your browser. Most browsers have a button or menu option that will fill in the URL for you and go right to that page.

Most will also have a back button or menu option. This should take you to the page you previously visited. This is equivalent to your finger in the book when you look at another section. Most browsers will support multiple levels of previous pages so you can follow a link completely away from your original location and get back there again.

As shown in the section for URLs, one of the types is file. By using this type, you can create HTML files on your client system and look at them before placing them on a server for the world to see.

Some vendors are taking advantage of the file URL type when distributing documentation or other materials (sales literature, for instance). Instead of having to provide a tool for you to look at their information or coding to a proprietary standard (like the Microsoft Windows help facility), they code in HTML. To use their documents, you start up your Web browser and point to their files.

Your machine is not cluttered with different viewers and the vendor's material can be viewed on many different types of machines.

Coding HTML

Coding HTML documents has traditionally been a manual process, just like with troff. With the increased popularity (consumer demand) and business use of the Web, GUI-based Web authoring tools have become available. Although these tools are available and relatively inexpensive (often free or included with other software), there is still value to being able to code basic HTML. Even though there are GUI word processors, troff is still used in some applications.

This chapter provides an introduction to HTML only--it covers the important language elements and provides examples of their usage.


NOTE: In general, you can name your HTML code any name but it should have a suffix of .htm or .html. You should check with your system administrator for the location to place your Web pages, most servers look for them in a directory called public_html under your home directory. If you want people to be able to get your top-level page automatically, you should name it index.htm (index.html) or welcome.htm (welcome.html). Your system administrator can tell you the exact form it should be in.

See the section on GUI tools later in this chapter for more information.

A Minimal HTML Document

The minimum reasonable HTML document contains four elements:

Figure 15.1 shows the output of the minimal HTML document using the Mosaic Web browser from the NSCA (National Super Computing Agency). Listing 15.1 shows the source for it.

Figure 15.1.
Minimal HTML document viewed through Mosaic.


NOTE: You will notice that the activity indicator (the square postage-stamp sized box near the upper right corner of the browser) is black in the Mosaic examples. This is because I ran the browser locally (on my PC) instead of connected to the net. It was much faster that way and the results are the same.

The activity indicator in the Netscape examples shows the AT&T "World" logo instead of the Netscape "N" logo because I use a version from the AT&T Worldnet service (the software and the service were free).


Listing 15.1. Source for minimal HTML document.

<html>
<head>
<title> This is the Title </title>
</head>
<body>
This is the body of the text
</body>
</html>

The text enclosed in the <title> tag is displayed at the top of the window. There may be only one title, if you include more than one in the <head> section, usually only the last one will actually display. The block contained within the <head> tag is used to set up the document and show the title. The block contained within the <body> is where the most tags and text are placed.

As you see from the URL in the figure, this HTML document was displayed from a file on my system; it was not placed on a Web server for the world to see.

This minimal HTML document demonstrates the portions of the document, but really is not very useful. Many more tags and much more text is required.

Font Control

Within the body of the document, you can control the fonts that your text is displayed in. To start off, there are six levels of headings available specified using tags <h1> through <h6>, respectively.

Figure 15.2 shows the behavior of the heading tags using the Mosaic Web browser. Listing 15.2 shows the source for it.

Figure 15.2.
Heading tags viewed through Mosaic.

Listing 15.2. Source for heading tags.

<html>
<head>
<title> Heading Font Control </title>
</head>
<body>
<H1> Heading Level 1 - ABCDEF abcdef &lt;H1&gt; </h1>
<H2> Heading Level 2 - ABCDEF abcdef &lt;H2&gt; </h2>
<H3> Heading Level 3 - ABCDEF abcdef &lt;H3&gt; </h3>
<H4> Heading Level 4 - ABCDEF abcdef &lt;H4&gt; </h4>
<H5> Heading Level 5 - ABCDEF abcdef &lt;H5&gt; </h5>
<H6> Heading Level 6 - ABCDEF abcdef &lt;H6&gt; </h6>
<H7> Heading Level 7 - ABCDEF abcdef &lt;H7&gt; </h7>
<H8> Heading Level 8 - ABCDEF abcdef &lt;H8&gt; </h8>
<H9> Heading Level 9 - ABCDEF abcdef &lt;H9&gt; </h9>
<H10> Heading Level 10 - ABCDEF abcdef &lt;H10&gt; </h10>
</body>
</html>

Looking at Figure 15.2, you will notice that the lines start to get weird after heading level 6. After you go beyond what the standard allows, then things can get odd.

Because the less than and greater than signs have special meaning to HTML, in order to print them, you have to use special character representations. These are in the form of ampersand <&> followed by a mnemonic followed by a semicolon (;) to complete the special character. In Listing 15.2, &lt; and &gt; were used. If you wanted to print an ampersand, you would use &amp;.

Using a version of Netscape navigator, the same source will produce a slightly different screen as shown in Figure 15.3.

Figure 15.3.
Heading tags viewed through Netscape Navigator.

For the other fonts, there are logical and physical style tags. With logical font tags, it is up to the browser to decide how to display them. Logical tags include <EM> for emphasis (usually displayed in italics), <STRONG> for important text (usually displayed in bold), and others.

Figure 15.4 shows the behavior of the logical font style tags using the Mosaic Web browser. Figure 15.5 shows the behavior of the logical font style tags using the Netscape Navigator browser. Listing 15.3 shows the source for it.

Figure 15.4.
Logical font styles viewed through Mosaic.

It is not very obvious that the different font types are really different in Figure 15.4. It is much more obvious in Figure 15.5 what the different fonts are (they are better supported).

Figure 15.5.
Logical font styles viewed through Netscape Navigator.

Listing 15.3. Source for logical font styles.

<html>
<title> Logical Font Styles  </title>
</head>
<body>
<ADDRESS> Postal or E-mail address - ABCDEF abcdef &lt;ADDRESS&gt;
</address> <br>
<CITE> Citations  - ABCDEF abcdef &lt;CITE&gt; </cite> <br>
<CODE> Program Code - ABCDEF abcdef &lt;CODE&gt; </code> <br>
<EM> Emphasis - ABCDEF abcdef &lt;EM&gt; </em> <br>
<KBD> Keyboard Input - ABCDEF abcdef &lt;KBD&gt; </kbd> <br>
<SAMP> Literal (Sample) Characters - ABCDEF abcdef &lt;SAMP&gt;
</samp> <br>
<STRONG> Strong or Important - ABCDEF abcdef &lt;STRONG&gt;
</strong> <br>
<VAR> Variable Name - ABCDEF abcdef &lt;VAR&gt; </var> <br>
</body>
</html>

With the exception of the invalid heading tags, all of them appeared on their own lines (by definition, a heading gets its own line). When specifying font types, it is necessary to tell the browser to go to a new line through the <br> (line break) tag.

Physical tags include <i> for italics, <b> for bold, and others.

Figure 15.6 shows the behavior of the physical font style tags using the Mosaic Web browser. Figure 15.7 shows the behavior of the physical font style tags using the Netscape Navigator browser. Listing 15.4 shows the source for it.

Figure 15.6.
Physical font styles viewed through Mosaic.

Mosaic supports the standard physical font styles, but treats the Netscape extensions as plain text. Netscape Navigator supports the standard physical font styles and its own extensions. Although not obvious from the screen in Figure 15.7, the <BLINK> tag line does actually blink.

Figure 15.7.
Physical font styles viewed through Netscape Navigator.

Listing 15.4. Source for heading tags.

<html>
<head>
<title> Physical Font Styles  </title>
</head>
<body>
<B> Bold - ABCDEF abcdef &lt;B&gt; </b> <br>
<I> Italics  - ABCDEF abcdef &lt;I&gt; </i> <br>
<S> Strike Out - ABCDEF abcdef &lt;S&gt; </s> <br>
<U> Underline - ABCDEF abcdef &lt;U&gt; </u> <br>
<TT> Typewriter Text - ABCDEF abcdef &lt;TT&gt; </tt> <br>
<BLINK> Blink (Netscape extension) - ABCDEF abcdef &lt;BLINK&gt;
</blink> <br>
<FONT SIZE=1> font size 1 (Netscape Extension) - ABCDEF abcdef &lt;FONT SIZE=1&gt;
</FONT> <br>
<FONT SIZE=3> font size 3 (Netscape Extension) - ABCDEF abcdef &lt;FONT SIZE=3&gt;
</FONT> <br>
<FONT SIZE=5> font size 5 (Netscape Extension) - ABCDEF abcdef &lt;FONT SIZE=5&gt;
</FONT> <br>
<FONT SIZE=7> font size 7 (Netscape Extension) - ABCDEF abcdef &lt;FONT SIZE=7&gt;
</FONT> <br>
</body>
</html>

Physical font styles can be combined to produce multiple effects like bold italics or bold underlined.

Formatting Text

When text appears in an HTML document, the browser decides how to display it. You can control the fonts and you can also control how it is formatted. By default, you enter your text-free format and it is automatically justified.

A new paragraph starts with the <P> tag, and if you want to force a line break, you use the <br> tag. The browser decides how to format the text except that it always starts a new paragraph at the beginning of a line (with a blank line above it) and will start text on a new line (without a blank line above it) when you use the line break.

If you have text that is a quotation, put it between <blockquote> tags--it will normally appear indented, the same way that quotations appear in books. If you have text that requires very specific formatting, you can contain it within a <pre> (preformatted) block--it will appear the way you entered it.

Figure 15.8 demonstrates these text formatting tags with the Mosaic Web browser. Figure 15.9 shows the same HTML document with the Netscape Navigator browser. Listing 15.5 shows the source for it.

Figure 15.8.
Text formatting tags viewed through Mosaic.

Mosaic does not support the <blockquote> tag and could not fit the first paragraph entirely on the first line. Netscape Navigator handled these properly.

Figure 15.9.
Text formatting tags viewed through Netscape Navigator.

Listing 15.5. Source for text formatting tags.

<html>
<head>
<title> Text Formatting </title>
</head>
<body>
<p>This is normal text that was typed in on two lines. It will show
as one line if the window is wide enough
<p>This paragraph breaks right here <br> and then continues on the next
line.
<p>A horizontal rule (line) appears below this line <hr> and above this one.
<p><address> this is an address field that is its own paragraph
<br> that takes 2 lines </address>
<p><blockquote> This text is treated as a block quote and is usually
indented </blockquote>
<p><pre>   This text is preformatted.
   I've put 3 spaces before each of 2 lines.
</pre>
</body>
</html>

The heading and paragraph tags were extended as part of HTML version 3. In the new version, the text can be aligned to the left (default), center, or right. Netscape also supports the <center> tag to center text.

Figure 15.10 demonstrates the extended text formatting tags using the Mosaic Web browser. The Netscape Navigator browser behaves the same way and is not shown. Listing 15.6 shows the source for it.

Figure 15.10.
Extended text formatting tags viewed through Mosaic.

Listing 15.6. Source for heading tags.

<html>
<head>
<title> Extended Text Formatting </title>
</head>
<body>
<h5 align=left> This heading is left aligned </h5>
<h5 align=center> This heading is centered </h5>
<h5 align=right> This heading is right aligned </h5>
<p align=left> This text is left aligned <br> even on a second line </p>
<p align=center> This text is centered <br> even on a second line</p>
<p align=right> This text is right aligned <br> even on a second line</p>
<center> This text is centered <br> on multiple lines using the <br>
Netscape extensions </center>
</body>
</html>

Lists

HTML supports the following five different types of lists:

With the exception of glossary (or definition) lists, each element within the list is specified by the <li> tag (list item).

Unordered lists are specified using the <ul> tag and appear with bullets. At the end of the list, the </ul> tag is used. If another <ul> tag is coded within an unordered list, another level of list will be created (an indented sublist). The bullets used for sublists may be the same or different than the list above them.

Ordered lists are specified using the <ol> tag and are sequentially numbered. At the end of the list, the </ol> tag is used. If another <ol> tag is coded within an ordered list, another level of list will be created (an indented sublist). The numbering sequence starts over for each sublist.

Unordered lists can contain ordered lists and vice versa.

Figure 15.11 demonstrates the unordered and ordered lists with the Mosaic Web browser. Figure 15.12 shows the same HTML document with the Netscape Navigator browser. Listing 15.7 shows the source for it.

Figure 15.11.
Unordered and ordered lists viewed through Mosaic.

Mosaic uses the same bullets at all levels of the unordered list while Netscape Navigator uses different ones.

Figure 15.12.
Unordered and ordered lists viewed through Netscape Navigator.

Listing 15.7. Source for unordered and ordered lists.

<html>
<head>
<title> Ordered and Unordered Lists </title>
</head>
<body>
<h5> Unordered List </h5>
<ul>
<li> First Item
<li> Second Item
<ul>
<li>First sub-item under Second Item
<li>Second sub-item under second item
</ul>
<li> Last Item
</ul>
<h5> Ordered List </h5>
<ol>
<li> First Item
<li> Second Item
<ol>
<li>First sub-item under Second Item
<li>Second sub-item under second item
<ul>
<li>Unordered list within ordered sub-list
</ul>
</ol>
<li> Last Item
</ol>
</body>
</html>

Directory lists are specified using the <dir> tag and appear with bullets. At the end of the list, the </dir> tag is used. If another <dir> tag is coded within a directory list, another level of list will be created (an indented sublist). The bullets used for sublists may be the same or different than the list above them.

Menu lists are specified using the <menu> tag and are sequentially numbered. At the end of the list, the </menu> tag is used. In some versions, when another <menu> tag is coded within a menu list, another level of list will be created (an indented sublist). The bullets used for sublists may be the same or different than the list above them.

When working with directory lists and menus, the behavior between browsers differs greatly. You may not be able to nest these lists and the display format can vary (menu list lines often have the bullet omitted).

Figure 15.13 demonstrates the directory and menu lists with the Mosaic Web browser. Figure 15.14 shows the same HTML document with the Netscape Navigator browser. Listing 15.8 shows the source for it.

Figure 15.13.
Directory and menu lists viewed through Mosaic.

Notice the difference in fonts and the fact that the Mosaic menu list only contains one level (no sublists allowed).

Figure 15.14.
Directory and menu lists viewed through Netscape Navigator.

Listing 15.8. Source for directory and menu lists.

<html>
<head>
<title> Directory and Menu Lists </title>
</head>
<body>
<h5> Directory List </h5>
<dir>
<li> First Item
<li> Second Item
<dir>
<li>First sub-item under Second Item
<li>Second sub-item under second item
</dir>
<li> Last Item
</dir>
<h5> Menu List </h5>
<menu>
<li> First Item
<li> Second Item
<menu>
<li>First sub-item under Second Item
<li>Second sub-item under second item
<ul>
<li>Unordered list within ordered sub-list
</ul>
</menu>
<li> Last Item
</menu>
</body>
</html>

Glossary or definition lists are specified using the <dl> tag. At the end of the list, the </dl> tag is used. Each item within the list can consist of two parts: item being defined (specified with the <dt> tag) and the definition (specified with the <dd> tag). Like the unordered list, you can create sub-definition lists by coding another <dl> tag within an existing glossary list.

Figure 15.15 demonstrates the glossary or definition list with the Mosaic Web browser. The behavior of the Netscape Navigator browser is similar. Listing 15.9 shows the source for it.

Figure 15.15.
Glossary or definition list viewed through Mosaic.

Listing 15.9. Source for glossary or definition list.

<html>
<head>
<title> Definition or Glossary List </title>
</head>
<body>
<h5> Definition or Glossary List </h5>
<dl>
<dt> Item to be defined
<dd> This is the definition for the item being defined.  See how the two are
seperated to provide visual information when viewing them.
<dl>
<dt>Item to be defined in a sub-list
<dd>Definition of sub-list item.
</dl>
<dt> Last Item
<dd> last definition.
</dl>
</body>
</html>

Extensions to Lists

Netscape provides a number of extensions to the ordered an unordered lists. The type of bullet can be specified for the entire unordered list and for each item at a specific sublist level. The numbering type for entire ordered lists and for each item at a specific sublist level can be specified. The starting point for entire ordered lists and each sublist can also be specified.

Figure 15.16 demonstrates the Netscape extensions to unordered and ordered lists. The Mosaic browser is not shown because it ignores the extensions. Listing 15.10 shows the source for it.

Figure 15.16.
Netscape extensions to ordered and unordered lists.

Listing 15.10. Source for Netscape extensions to ordered and unordered lists.

<html>
<head>
<title> Netscape Extensions - Unordered and Ordered Lists </title>
</head>
<body>
<h5> Unordered List </h5>
<ul type=square>
<li> First item
<li type=disc> second item (disc)
<li> Third Item
<ul type=circle>
<li>Sub list (circle)
</ul>
<li> last item in unordered list
</ul>
<h5> Ordered List </h5>
<ol type=A>
<li> First item (upper case letters)
<li type=a> second item (lower case letters)
<li> Third Item
<ol type=I>
<li>Sub list (Upper case roman)
<li type=i> second sub list item (lower case roman)
</ol>
<li type=1 value=9> another item in ordered list (numeric)
<li value=7> last item in ordered list
</ol>
</body>
</html>

Notice that the bullet types can be changed for the entire unordered list or sublist (instead of the default for that level) and for each individual line item. The number type can be changed for the entire ordered list or sublist (instead of the default numeric type) and for each individual line item. The numeric value can also be changed in the same way for ordered lists (even if does not make sense as shown in the preceding example--item 9 appearing before item 7 when it should really be 4 and 5, respectively.

Hypertext Tags

In addition to displaying text, the capability to link to other objects, Web pages, or resources is what provides the power behind HTML. Links can take two forms: anchors and images. Anchors are used to provide the actual hypertext links that turn the World Wide Web into a web, which is a collection of interconnected resources. Image tags allow you to load images into your Web page for people to view, adding pictures and drawings to the text.

Anchors

Anchors are used to connect an image or textual description with an action. The action can be any URL and the capability to jump to sections within a document. When the user clicks the associated image or text, the URL is executed and travels down the link.

There are 10 possible actions associated with anchors:

1. Transfer to a new HTML document. <A HREF="http://www.dca.net/"> Text that describes the link</A>

2. Create a positional marker in a HTML document. <A NAME="Section1">This is a positional marker</a>

3. Jump to a positional marker in the current HTML document. <A HREF="#Section1">Go to Section 1</a>

4. Jump to a positional marker in a new HTML document. <A HREF="http://www.host.domain/page.html#Section1">Go to Section 1</a>

5. Get an image file. <A HREF="http://www.host.domain/file.gif">Display the picture</a> (This anchor can be used to load other types of files including sound, video, and executable code.)

6. Create a telnet (terminal emulation) session. <A HREF="telnet://host.domain">Log into host.domain</a>

7. Create an ftp (file transfer protocol) session. <A HREF="ftp://ftp.host.domain">Use FTP to get files</a>

8. Create a gopher (resource search utility) session. <A HREF="gopher://gopher.host.domain">Use Gopher to find files</a>

9. Create an e-mail message. <A HREF="mailto:name@host.domain">Send mail to the webmaster</a>

10. Load a file from the disk attached to the client system. <A HREF="file:///c:/directory/file.ext">Look at file.ext on this machine</a>

These anchors are shown in Figure 15.17 using the Mosaic Web browser. The behavior of the Netscape Navigator browser is similar. Listing 15.11 shows the source for it.

Figure 15.17.
Anchors viewed through Mosaic.

Listing 15.11. Source for anchors.

<html>
<head>
<title> Anchors </title>
</head>
<body>
<A HREF="http://www.dca.net/"> Text that describes the link</A> <br>
<A NAME="Section1">This is a positional marker</a> <br>
<A HREF="#Section1">Go to Section 1</a> <br>
<A HREF="http://www.host.domain/page.html#Section1">
Go to Section 1 in another document</a> <br>
<A HREF="http://www.host.domain/file.gif">Display the picture</a> <br>
<A HREF="telnet://host.domain">Log into host.domain</a> <br>
<A HREF="ftp://ftp.host.domain">Use FTP to get files</a> <br>
<A HREF="gopher://gopher.host.domain">Use Gopher to find files</a> <br>
<A HREF="mailto:name@host.domain">Send mail to the webmaster</a> <br>
<A HREF="file:///c:/directory/file.ext">
Look at file.ext on this machine</a> <br>
</body>
</html>

The individual anchors with HREF parameters are usually displayed in a color--frequently blue when the page is loaded, green after the link has been exercised, and red if there is an error executing it.

You can apply other tags within the description of the link (the text format tags, for instance). You can also include an image. When using with an image, be sure to include a text description for those that use a CUI browser or do not want to wait for the image to download.

Images

Image tags are used to show an image when an HTML document is loaded. The difference between the image tag and the anchor used to get an image file is that the image attached to the image tag will display automatically while it requires user action (clicking on the description) to get the image file named in the anchor.

The most common image formats are .gif and .jpeg, with others often supported.

The general format of the image tag is:

<IMG SRC="URL" ALIGN=TOP ALT="[Text in place of image]">

Where the URL is any valid http format and ALIGN specifies the alignment of the image to related text. All browsers support alignments of TOP, BOTTOM (default), and MIDDLE. Some browsers also support LEFT and RIGHT.

Figure 15.18 shows image placeholders when image loading is disabled with the Mosaic Web browser. Figure 15.19 shows the same HTML document with the Netscape Navigator browser (which cannot deal with .bmp file types). Listing 15.12 shows the source for it.

Figure 15.18.
Images viewed through Mosaic.

The final two images are on the next screen of the viewer. Because Mosaic does not know how to deal with ALIGN=LEFT or ALIGN=RIGHT tags, it ignores them. Netscape Navigator can deal with those image tags, but is not configured to handle the files loaded so it just shows a broken icon for them.

Figure 15.19.
Images viewed through Netscape Navigator.

Listing 15.12. Source for images.

<html>
<head>
<title> Images </title>
</head>
<body>
<p>First text goes right here
<IMG SRC="file:///c:/doublecd/mvlgo.bmp" ALIGN=TOP ALT="[my picture]">
<p>Text goes here
<IMG SRC="file:///c:/doublecd/mvlgo.bmp" ALIGN=BOTTOM ALT="[my picture]">
<p>More text
<IMG SRC="file:///c:/doublecd/mvlgo.bmp" ALIGN=MIDDLE ALT="[my picture]">
<p>even more text
<IMG SRC="file:///c:/doublecd/mvlgo.bmp" ALIGN=LEFT ALT="[my picture]">
<p>next to final text
<IMG SRC="file:///c:/doublecd/mvlgo.bmp" ALIGN=RIGHT ALT="[my picture]">
<p>final text
</body>
</html>

Images can be combined with anchors to show a picture and then load something else when the user clicks on them.

Figure 15.20 shows images and anchors combined using the Mosaic Web browser. The Netscape Navigator browser behaves in a similar manner. Listing 15.13 shows the source for it.

***15unx20***

Figure 15.20.
Images with anchors viewed through Mosaic.

Listing 15.13. Source for images.

<html>
<head>
<title> Images and Anchors</title>
</head>
<body>
<p>First text goes right here
<A HREF="file:///c:/bmp/map5.bmp">
<IMG SRC="file:///c:/doublecd/mvlgo.bmp" ALT="[Small Logo]"> Click on the
image to see a bigger version </A>
<p>Lots more text will go here and maybe more anchors will follow.  More
images (lots more) and even more text would then follow.
</body>
</html>

Whenever you use this technique, make sure you include the text describing the action to take ("Click on the image to see a bigger version") so that people using the CUI viewers or have disabled image display will still be able to use your pages.

The difference between the image tag and the anchor used to get an image is that image tags are automatically loaded when the HTML document is loaded from the server. Anchors that load images only load the images when the user selects them.

A Brief Description of Forms

Forms with HTML provide a means of inputting data from the user. A series of areas are defined on the form that allow different types of input such as text, hidden, image, password, checkbox, radio, submit, and reset. These fields are used as shown in Table 15.2.

Table 15.2. Form field types.

Field Type Description
text Used for input of normal text
hidden Not available for user input; used to track form when received at server
image Pushbuttons based on specified images
password Accepts user input without echoing it
textarea Multiple-line user input area
select option Pull-down or scrollable selection list
submit Sends the completed form to server
reset Clears the contents of the form

Figure 15.21 shows a form with most elements defined using the Mosaic Web browser. Figure 15.22 shows the same HTML document with the Netscape Navigator browser. Listing 15.14 shows the source for it.

Figure 15.21.
Form viewed through Mosaic.

Figure 15.22.
Form viewed through Netscape Navigator.

Listing 15.14. Source for form.

<html>
<head>
<title> Forms </title>
</head>
<body>
<FORM ACTION="http://www.host.domain/cgi-bin/handle_form.pl METHOD="POST">
Choose your option <SELECT NAME="Selection list" SIZE=1>
<OPTION>First
<OPTION SELECTED> Default
<OPTION>Third
</SELECT> <br>
<INPUT TYPE="HIDDEN" NAME="NotSeen" SIZE=10>
Enter Text Here <INPUT TYPE="TEXT" NAME="Text Input" SIZE=20 MAXLENGTH=25>
   Enter Your Password here
<INPUT TYPE="PASSWORD" NAME="Pswd" SIZE=6 MAXLENGTH=12> <br>
Pick one of the following <br>
<INPUT TYPE="RADIO" NAME="Radio" VALUE="First"> First <BR>
<INPUT TYPE="RADIO" NAME="Radio" VALUE="Second" CHECKED> Second <br>
Pick from of the following <br>
<INPUT TYPE="CHECKBOX" NAME="check" VALUE="First"> First
<INPUT TYPE="CHECKBOX" NAME="check" VALUE="Second" CHECKED> Second
<INPUT TYPE="CHECKBOX" NAME="check" VALUE="third" CHECKED> Third <br>
Enter your comments <TEXTAREA NAME="Comments" ROWS=2 COLUMNS=60> </textarea>
<p>When done, press the button below <br>
<INPUT TYPE="Submit" NAME="Submit This Form">
<INPUT TYPE="Reset" NAME="Clear">
</FORM>
</body>
</html>

Chapter 17, Introduction to CGI, provides detailed information about forms and the individual tags and attributes. Chapters 18, 19, and 20 show examples using forms with CGI.

A Brief Description of Tables

HTML tables are used to present data in a tabular form. A series of rows and columns are defined and filled in with data.

Figure 15.23 shows a simple table with borders using the Netscape Navigator browser. The Mosaic browser does not support tables. Listing 15.15 shows the source for it.

Figure 15.23.
Table viewed through Mosaic.

Listing 15.15. Source for table.

<html>
<head>
<title> Tables </title>
</head>
<body>
<Table>
<Table border>
<CAPTION> <H5>Table with Border Caption </H5> </CAPTION>
<TR ALIGN=LEFT VALIGN=MIDDLE> <TH>First<br>column
<TH> second <br> column <th> third column </TR>
<TR> <TD> 1 <TD> 100 </TR>
<TR> <TD> 2 <TD> 200 </TR>
<TR> <TD> 3 <TD> 300 <TD> comment </TR>
</table>
</body>
</html>

Tools

As the Web has increased in popularity, so have the tools available to create Web pages. In the beginning, there were text or programming editors used to manually code HTML. Now, there are many tools and many applications that are "Web enabled," which generally means they can create HTML or directly execute HTML anchors to grab resources.

Most GUI word processors will save documents in HTML format. There are GUI application development tools that will create HTML and CGI scripts in addition to their own proprietary language. You can find the following tools out on the Internet: HoTMetaL (HTML editor), HTML Assistant (HTML editor), Internet Assistant (Microsoft add-in to Word 6 to build Web pages and browse), RTFTOHTM (creates HTML from Rich Text Format files), and others.

And of course, you can always code HTML directly using a text editor.

CGI Scripts and Java Applets

While HTML provides a means of displaying and connecting text, graphics, and other items, it provides no means of procedural programming. In order to process data (like HTML forms), it is necessary to use CGI programs or scripts. CGI stands for Common Gateway Interface--a standard developed to allow host programs to interface with Web pages. Most CGI programs are written in scripting languages such as UNIX shell scripts or Perl; they can also be written in a compiled language such as C or C++.

There are three different ways to execute CGI scripts:

The normal anchor executes the CGI script when you click it, the image executes the CGI script when the Web page loads, and the form action executes the script when the submit button is clicked.

Chapters 17 through 20 provide much more detail on CGI programming.

Java is an object-oriented language that is syntactically similar to C and C++. It is a portable programming language that will run its own virtual code on any machine through the Web browser (which interprets the Java bytecode into native machine language). This is known as a Java applet. Java applications run as an executable program on individual machines without the assistance of a Web browser.

The primary conceptual difference between Java and CGI is that Java runs on the client system and CGI runs on the host. In addition, Java is essentially one language while CGI is a standard method of communicating data to a server and can be written in many different languages.

When running a Java applet from within a Web page, you use the following:

<applet code="Javaname.class" width=400 height=400>
<param name="variable1" value="123">
<param name="variable2" value="your name">
</applet>

The width and height parameters set the window size. The param tags set variables (similar to UNIX environment variables) that the Java applet can access.

To the end-user, a CGI program is much safer because it runs on the host, if there is a problem with the program (for example, a virus or Trojan horse), it is someone else's problem. With Java, it is possible (but improbable) that a program could harm a client machine.


CAUTION: Some Web sites leave files on your client machine known as cookies. These are generally used to track usage and retain information about you between visits to that site (for instance, preferences or your stock portfolio). The data is stored on your machine instead of the host. If this bothers you, you can disable cookies in your Web browser. Cookies are data files, not executable code.

Special Characters

The remainder of this chapter contains tables summarizing the special characters and general HTML tags.

Table 15.3 provides a summary of the special characters available with HTML. You can get the complete list at http://www.w3.org/hypertext/WWW/MarkUp/Entities.html.

Table 15.3. Special characters.

Tag Description
&lt; < (less than symbol)
&gt; > (greater than symbol)
&amp; & (ampersand)
&quot; " (double quote)
&#174; Registered Trademark [TM]
&#169; Copyright [c]
&#nnn; ASCII code (where nnn is the value)

Tag Summary

The following tables summarize the HTML tags and actions. You can get the current specification at http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html.

Information on image maps is available at http://hoohoo.ncsa.uiuc.edu/docs/setup/admin/NewImagemap.html.

Forms information is available at http://hoohoo.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html.

Check out http://www.javasoft.com for information on Java programming.

Table 15.4. Summary of tags--structure.

Tag Description
<HTML> </HTML> Contains entire document
<TITLE> </TITLE> Describes document title
<HEAD> </HEAD> Contains document title
<BODY> </BODY> Contains majority of document
<!-- comment --> Contains comment text
<ISINDEX> Provides search prompt for searchable documents

Table 15.5. Summary of tags--formatting.

Tag Description
<Hn> </Hn> Heading levels where n is 1 to 6 or 7
<Hn ALIGN=xxx> </Hn> HTML 3--defines alignment for heading: LEFT, CENTER, or RIGHT
<P> </P> Defines paragraph start, usually has blank line before; the closing tag is optional
<P ALIGN=xxx> </P> HTML 3--defines alignment for paragraph text: LEFT, CENTER, or RIGHT
<ADDRESS> </ADDRESS> Defines address block
<BLOCKQUOTE> </BLOCKQUOTE> Defines a block containing a quotation
<PRE> </PRE> Defines preformatted text block
<PRE WIDTH=nn> </PRE> Defines preformatted text block of specified size
<CENTER> </CENTER> Netscape--Centers paragraph
<BR> Line break (forces a new line)
<HR> Horizontal rule (draws a line)
<B> </B> Physical format: bold
<I> </I> Physical format: italics
<S> </S> Physical format: strikethrough
<U> </U> Physical format: underline
<TT> </TT> Physical format: typewriter (monospace)
<BLINK> </BLINK> Netscape--physical format: flashing
<FONT SIZE=n> </FONT> Netscape--specifies font size where n is 1 through 7
<BASEFONT SIZE=n> Netscape--default font size for document where n is 1 through 7
<EM> </EM> Logical format: emphasis
<STRONG> </STRONG> Logical format: strong
<CITE> </CITE> Logical format: citation
<CODE> </CODE> Logical format: program code
<KBD> </KBD> Logical format: keyboard input
<SAMP> </SAMP> Logical format: output samples
<VAR> </VAR> Logical format: program variables

Table 15.6. Summary of tags--lists.

Tag Description
<UL> </UL> Lists--unordered, use with <LI>
<OL> </OL> Lists--ordered (numbered), use with <LI>
<DIR> </DIR> Lists--directory, use with <LI>
<MENU> </MENU> Lists--menu, use with <LI>
<LI> Lists--list element or item
<DL> </DL> Lists--definition or glossary, use <DT> and <DD>
<DT> Lists--definition term
<DD> Lists--definition of term
<UL TYPE=xxx> Netscape--bullet type for unordered list where xxx is DISC, CIRCLE, or SQUARE
<LI TYPE=xxx> Netscape--bullet type for unordered list item where xxx is DISC, CIRCLE, or SQUARE
<OL TYPE=xxx> Netscape--number format for ordered list where xxx is A, a, I, i, 1
<LI TYPE=xxx> Netscape--number format for ordered list item where xxx is A, a, I, i, 1
<OL VALUE=n> Netscape--starting point for ordered list
<LI VALUE=n> Netscape--starting point for ordered list item

Table 15.7. Summary of tags--links.

Tag Description
<A HREF="URL"> </A> Links to another document, image, or resource
<A HREF="#label"> </A> Jumps to predefined location in this document
<A HREF="URL#label> </A> Jumps to predefined location in another document
<A NAME="label"> </A> Defines a location

Table 15.8. Summary of tags--images.

Tag Description
<IMG SRC="URL" flags> Loads and displays image based on flags
ALIGN=xxx Flag--displays image where xxx is TOP, BOTTOM, or MIDDLE, can also be LEFT, CENTER, RIGHT
ALT="[description]" Flag--text to describe image if not displayed
ISMAP Flag--specifies an imagemap (used to provide links based on cursor position in image)

Table 15.9. Summary of tags--forms.

Tag Description
<FORM ACTION="URL" METHOD=xxx> </FORM> Used to contain elements of a form, xxx is GET or POST, which determines when the specified URL is executed
<SELECT flags> </SELECT> Pulldown selection list
NAME="variable" SELECT flag--name of input field
<OPTION> text SELECT--option that can be selected
<OPTION SELECTED> text SELECT--option selected by default
<TEXTAREA flags> </TEXTAREA> Accepts multiple line input
ROWS=n TEXTAREA flag--number of rows to display
COLS=m TEXTAREA flag--number of columns to display
NAME="variable" TEXTAREA flag--name of input field
<INPUT flags> Input field
TYPE="xxx" INPUT flag--field type where xxx is CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, or TEXT
CHECKED INPUT flag--checkbox or radio button initially set
NAME="variable" INPUT flag--name of input field
SIZE=nnn INPUT flag--text field size in characters
MAXSIZE=nnn INPUT flag--maximum number of characters acceptable for text field
VALUE="text" INPUT flag--initial value or value when selected

Table 15.10. Summary of tags--tables.

Tag Description
<TABLE flags> </TABLE> Used to contain elements of a table
BORDER TABLE flag--draw border around table
ALIGN=xxx TABLE flag--specify table alignment where xxx is BLEEDLEFT (flush with window), BLEEDRIGHT, CENTER, LEFT (flush with margin), JUSTIFY, or RIGHT
COLSPEC="string" TABLE flag--define columns justification (C--center, D--decimal align, J--justify, L--left, and R--right) and widths
UNITS=unit TABLE flag--specifies units for column width
<CAPTION flag> </CAPTION> Used to contain description of table
ALIGN=xxx CAPTION flag--specifies location of caption (TOP or BOTTOM--above or below table)
<TR flag> </TR> Used to contain row of a table
ALIGN=xxx TR flag--specifies alignment where xxx is LEFT, CENTER, RIGHT
VALIGN=xxx TR flat--specifies vertical alignment where xxx is TOP, MIDDLE, BOTTOM
<TD flag> </TD> Used to contain table data or cells
<TH flag> </TH> Used to contain table column header
ALIGN=xxx TD/TH flag--same as TR ALIGN
VALIGN=xxx TD/TH flag--same as TR VALIGN
COLSPAN=n TD/TH flag--allow item to span n columns
ROWSPAN=n TD/TH flag--allow item to span n rows

Summary

The World Wide Web is a very popular place that is growing for commercial and personal purposes by leaps and bounds. Even with the increasing availability of GUI HTML authoring tools, there is still the need to understand the underlying language.

If you see an interesting Web page, you can view the HTML source in many Web browsers. You can look at the techniques used and learn from them. Try it!

Figures 15.24 and 15.25 show document source using the Mosaic and Netscape Navigator browsers, respectively. What may not be apparent is that Netscape colors the tag names while Mosaic does not.

Figure 15.24.
Document Source viewed through Mosaic.

Figure 15.25.
Document Source viewed through Netscape Navigator.

TOC BACK FORWARD HOME


©Copyright, Macmillan Computer Publishing. All rights reserved.