WORLD WIDE WEB COMPONENTS
Browsers and ServersCGI Processing Model
(Common Gateway Interface)
© Norman White, 2013
WWW is example of Client/Server Computing
Server computers are located all around the world and respond to requests (messages) from computers running browser software (Netscape, IE)
Browser applications understand HTML, (and now Javascript, Java etc.)
Server Browser Interaction (simple)
B Server
http request
Browser sends http request to server(I.e. GET index1.html)
Index1.html file
<html>
<head>
<title> Sample Title</title>
</head>
<body>
Here is some text and a picture <img src=“pic1.gif:>
</body>
</html>
Server Response
B Server
http request
HTML file
Server retrieves fileSends file (index1.html) to Browser
B Server
http request
HTML
index1.html
Browser “formats” index1.htmlMay mean retrieving more filesIn order to display
Browser displays file
Browser asks for next file
B Server
http request
HTMLindex1.html
GET pic1.gif
index1.html contains reference to pic1.gifBrowser then requests pic1.gif
Server sends pic1.gif
B Server
http request
Index1.htmlGET pic1.gif
index1.html
pic1.gif
Server next sends pic1.gif
B Server
http request
index1.html
GET pic1.gif
index1.html
pic1.gif
pic1.gif
Browser displays pic1.gif
Browser displays pic1.gif
Processing Non-HTML files
Web Server sends a header in front of each file identifying the file type (HTML,GIF,JPEG etc.)
Most Browsers understand HTML, GIF and TEXT
Browsers can be configured to call external programs to handle new types of files
Helper Apps
These programs are called HELPER applications and dramatically extend the capabilities of the browser, since they can be developed independently of the client software
Examples - Quicktime viewers, sound players, VRML viewers etc.
To see the currently configured viewers go to options on the Browser title bar
Plugins
Browser functionality can also be extended by adding plugins.
Plugins are not standalone applications, but executable code that is dynamically linked into the browser when necessary.
Forms and CGI Programming
HTML provides an easy to use FORM capability, which allows a wide variety of input forms to be easily generated.
Form data types include Text input - One line of text Textarea - Multiple lines of text Check boxes (on/off) Radio boxes (1 of N) Etc.
Forms Processing Logic
Output of Form is formatted and sent to Server, along with the name of a program to process the contents of the form.
The WEB Server takes information from form, and passes it on as input to a Common Gateway Interface Program (CGI)
Output of CGI program is sent back to Client browser as an HTML (or other) file.
CGI programming extends power of WWW
CGI programs can do an almost unlimited set of activities ... Look up info in a database and send it to
Browser. Take input from user and add to a file. Take input and send to a standard business
application CGI program can be in any language that runs
on the server, including a shell language like sh or bash.
CGI Programming
B
httpserver
CGI Program
http form content
input outputHTML
(Note, all processing is on server)
What do you need to do for CGI?
Develop form to collect information from users
Write and test CGI program to handle form information
Put the name of the CGI program in the “ACTION” statement of the form. Note: program can be on another server.
CGITwo Processing options
Two Types of FORM processing options, GET and POST GET - parameters sent as additions to URL
string. Each individual parameter separated by &
POST - Data sent in message body. This is a more general method and can handle more input data.
CGI Processing - review
Server sends form (in html document) to client
Client displays form, and user fills in fields
Client sends form info to server (SUBMIT button)
Server runs the CGI program named in the ACTION section of the FORM
CGI program parses data as input Output of CGI program is sent by the
server to the client (i.e. it should be HTML)
CGI Advantages and Disadvantages
Advantages Very general model, easy to do really neat
things like front end existing applications, databases etc.
Many toolkits available to do common things Disadvantages
All processing is done on server. May overload server
Interaction is all through forms Lot’s of data traffic back and forth
Solution HTML5 and it’s features
GET vs. POSTAlternative CGI methods
GET format Information is passed as a series of
variable=value pairs separated by “&” to program named in action statement by adding them on to the URL (after a “?”)
Simple example – one line form with a field named “userid” and “ACTION=mycgiprog.cgi”
User enters “nwhite” Browser sends the following to the web server
http://www.stern.nyu.edu/~nwhite/mycgiprog.cgi?userid=nwhite
GET ProcessingServer Side
Web server takes the information after the “?” and creates an environment variable named “QUERY_STRING”, then executes the program “mycgiprog.cgi”
QUERY_STRING contains userid=nwhite
CGI program retrieves value of QUERY_STRING from the environment variable, does appropriate processing, and (optionally) sends an HTML response back
Digression – Environment Variables
Both Windows and Unix support environment variables. These are user session variables which contain character strings. Many are automatically created when the user logs in, like PATH, PROMPT etc. Any program can create or retrieve the value of environment variables, so they are often used to pass small amounts of information from one application to another. Different operating systems have different methods for setting and retrieving environment variables. For example, in unix, you can retrieve an environment variables value by putting a $ in front of it I.e. $PATH. In Windows, you put % around it. I.e. %PATH%
Try this in unix echo $PATH
Or Windows echo %PATH%
GET method – more than one parameter
What if we want have more than one field?
No problem QUERY_STRING can contain many variable=value pairs separated by “&”
i.e. userid=nwhite&password=junk&fname=Norman
Possible problem, how big can environment variables be (how many characters)
GET only useful for limited input
POST Method
POST method is more general since it can handle lots of input
Input is passed as a sequence of characters (stdin)
Variable1=value1&Variable2=value2 ….
The Environment variable CONTENT_LENGTH is set to the number of characters of input.
Environment variable Request_Method is set to POST (Instead of GET)
Input processing logic needs to be (slightly) different for GET and POST methods
CGI Output
CGI output is passed back to the browser, hence has to be something (HTML) the browser can understand Like…
Content-type: text/html (indicates an html file) (an empty line REQUIRED) <HTML><HEAD> <TITLE>output of HTML from CGI script</TITLE> </HEAD><BODY> <H1>Sample output</H1> What do you think of <STRONG>this?</STRONG> </BODY></HTML>
Simple Example – GET Method
List the contents of your “websys” directory Create a Shell Script named lister.cgi which
contains#! /bin/sh#echo “Content-type: text/html”echo “ “echo <html><head><title>Listing</title>echo </head><body><p>ls –altecho </body></html>
Questions
What system will lister.cgi run on?What user will be running the program?What directory will the program be running in?What will the output look like?What if I wanted to list someone else’s files?What are the security issues here?
GET Method List the contents of your “websys” directory With options passed as part of url (type=XXX) Create a Shell Script named listera.cgi which contains#! /bin/sh#eval QS=`echo $QUERY_STRING`eval `echo $QS|sed –e “s/\&/ /g”` echo “Content-type: text/html”echo echo <html><head><title>Listing</title> echo </head><body><p>ls *.$typeecho </body></html>
Where is the magic?
What do the following two lines of shell script do?eval QS=`echo $QUERY_STRING`Creates an environment variable that contains contents of QUERY_STRING
eval `echo $QS|sed –e “s/\&/ /g”`Evaluates the contents of QS where all &’s are replaced with blanks.This allows us to pass any number of variables to the program and get their values.(Note we could have done this in one line,
eval `echo $QUERY_STRING|sed –e “s/\&/ /g”`Important elements here:The | symbol (pipe) sends output from one program to input of another.
The ` (backquote) Character... Captures output of a program/processeval unix command to evaluate a string of characters as if it was just typed in.
Try it, what happens?
To run it, put it in your websys directorychmod +rx listera.cgi
Type the following as a URL
http://www.stern.nyu.edu/~userid/websys/listera.cgi?type=html
Your userid
Get variable(s) for QUERY_STRING
What is happneing?
Web server sees a .cgi program in your websys directory.It executes it, after putting everything after the ?into an environment variable named QUERY_STRING.The program converts the information in QUERY_STRING to one or more environment variables, by replacing all the &’s with blanks,And the executing the resulting string.var1=value1&var2=value2&var3=value3 …Becomes var1=value1 var2=value2 var3=value3 ..Executing that statement creates variables var1, var2, var3With values value1, value2, value3
POST METHOD
In the POST method, form data is NOT passed as part of the URL, instead it is passed to the “STANDARD INPUT” of the CGI program.
Advantages Not limited by max size of environment
variables Users can’t see the input fields
Disadvantages A little harder to handle Users can’t save/send the link plus form data
i.e Send the results of a search to someone else
Conclusion
World-Wide-Web model is much more powerful than it appears on the surface
Easily integrated with existing applications
Easy to add new functionality CGI model can do lots of things…
Update files Link to corporate databases Specialized Applications
Caveats
Problems with CGI Model Need to parse input Overhead
Need to start up a new program for every request
Scalability All processing on server, what happens as usage
grows? Reliability
How do we replicate for redundancy?