Prev NEXT

How CGI Scripting Works

Forms: Sending Input

We have seen that the creation of CGI scripts is pretty easy. The Web server executes any executable placed in the cgi-bin directory, and any output that the executable sends to stdout appears in the browser that called the script. Now what we need is a way to send input into a script. The normal way to send input is to use an HTML form.

You see forms all over the Web. Any page where you have been able to type something in is a form. You see them in search engines, guest books, questionnaires, etc. The home page for HowStuffWorks.com contains at least two mini-forms, one for the "How did you get here?" sidebar and one for the suggestions sidebar (yes, a single HTML page can contain multiple forms). You create the form on your HTML page, and in the HTML tags for the form you specify the name of the CGI script to call when the user clicks the Submit button on the form. The values that the user enters into the form are packaged up and sent to the script, which can then use them in any way it likes.

Advertisement

You have actually been seeing this sort of thing constantly and may not have known that it was happening. For example, go to http://www.lycos.com, type the word "test" into the "Search for:" box and press the "Go Get It!" button. The URL of the result page will look like this:

http://www.lycos.com/cgi-bin/pursuit?matchmode=and
                    &cat=lycos&query=test&x=10&y=9

You can see that the Lycos home page is a form. Lycos has a script in the cgi-bin directory named pursuit. The form sends five parameters to the script:

  1. matchmode=and
  2. cat=lycos
  3. query=test
  4. x=10
  5. y=9

The third one is the search string we entered. The other four mean something to the script as well. The CGI script queries the Lycos database for the word "test" and then returns the results. That's the heart of any search engine!

Let's create a simple form to try this out. Create a file named simpleform.htm and enter the following HTML into it:

<html>
<body>
  <h1>A super-simple form<h1>
  <FORM METHOD=GET ACTION="https://www.howstuffworks.com/
cgi-bin/simpleform.cgi">
  Enter Your Name:
  <input name="Name" size=20 maxlength=50>
  <P>
  <INPUT TYPE=submit value="Submit">
  <INPUT TYPE=reset value="Reset">
  </FORM>
</body>
</html>

The HTML code specifies the creation of a form that uses the GET method sent to the CGI script at https://www.howstuffworks.com/cgi-bin/simpleform.cgi. Inside the form is a text input area plus the standard Submit and Reset buttons.

The file https://www.howstuffworks.com/cgi-bin/simpleform.cgi referenced by the form is a C program. It started life as this piece of C code placed in a file named simpleform.c:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  printf("Content-type: text/html\n\n");
  printf("<html>\n");
  printf("<body>\n");
  printf("<h1>The value entered was: ")
  printf("%s</h1>\n", getenv("QUERY_STRING"));
  printf("</body>\n");
  printf("</html>\n");
  return 0;
}

It was compiled with the following command:

gcc simpleform.c -o simpleform.cgi

And it was placed in the cgi-bin directory. This program simply picks up the value sent by the form and displays it. For example, you might see the following:

The value entered was: Name=John+Smith

Name is the identifier for the text input field in the form (each input field on a form should have a unique identifier), and John+Smith is a typical name that might be entered on the form. Note that the "+" replaces the space character.

From this example, you can see that the basic process of setting up a form and getting data from a form into a CGI script is fairly straightforward. Here are a couple of details to keep in mind:

  • Each input field on the form should have a unique identifier.
  • The form needs to use either the GET or the POST method. The GET method has the advantage that you can see the form's values in the URL sent to the script, and that makes debugging easier.
  • There are definite limits to the number of characters that can be sent via the GET method, so POST is preferred for large forms.
  • Data that comes in via the GET method is received by looking at the QUERY_STRING environment variable (usually read with the getenv function in C or the $ENV facility in PERL). Data that comes in via the POST method is available through STDIN using gets in C or read in PERL.
  • The data that comes in is going to have all of the fields concatenated together in a single string, and many characters will be substituted and therefore need translation. For example, all spaces will be replaced with pluses.

The QUERY_STRING environment variable brings up the topic of environment variables in general. There are a number of environment variables that you can examine in your CGI scripts, including:

  • AUTH_TYPE
  • CONTENT_LENGTH
  • CONTENT_TYPE
  • GATEWAY_INTERFACE
  • HTTP_ACCEPT
  • HTTP_USER_AGENT
  • PATH_INFO
  • PATH_TRANSLATED
  • QUERY_STRING
  • REMOTE_ADDR
  • REMOTE_HOST
  • REMOTE_IDENT
  • REMOTE_USER
  • REQUEST_METHOD
  • SCRIPT_NAME
  • SERVER_NAME
  • SERVER_PORT
  • SERVER_PROTOCOL
  • SERVER_SOFTWARE

There are all sorts of interesting pieces of information buried in these environment variables, including the length of the input string (CONTENT_LENGTH), the METHOD used (GET or POST -- REQUEST_METHOD lets you figure out whether to look in STDIN or QUERY_STRING for the input), the IP address of the user's machine (REMOTE_ADDR), and so on. For complete descriptions of these variables, see CGI Environment Variables.