+ All Categories
Home > Documents > ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway...

ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway...

Date post: 27-Dec-2015
Category:
Upload: maud-watkins
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface
Transcript
Page 1: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

ES 101-02. Module 5Uniform Resource Locators,

Hypertext Transfer Protocol, &Common Gateway Interface

Page 2: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

This Lecture

• Uniform Resource Locators (URL)• Hypertext Transfer Protocol (HTTP)• Common Gateway Interface (CGI)

Page 3: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Definitions

• We previously discussed the Domain Name System, or DNS– Distributed database hosted by DNS servers

– Maps host IP addresses to a mnemonic name• Easier for humans to remember

– Universal registration, ie. every domain name on the Internet is unique

• In order to find resources on a particular server, we must introduce the concept of a URL

Page 4: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Uniform Resource Locators

• The URL allows a client browser to send search data to a server for further processing

• URLs are a scheme for specifying Internet resources using a single line of printable ASCII characters– No control characters are allowed

• The URL structure and syntax allows the web client to access all major Internet protocols via TCP– File Transfer Protocol (FTP)– Hypertext Transfer Protocol (HTTP)– Etc.

• URLs can also be used within HTML documents to provide “links” to other documents

Page 5: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

URL Contents

• A URL contains the following:– Protocol to use when accessing the server, e.g. HTTP

– Internet Domain Name of the site on which the server is running, and the address of the requested server

– Port number of the target application

– Location of the resource in the directory structure

• Example of a URL:

http://www.cern.ch/hypertext/WWW/RDBgate/Implementation.html

Page 6: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

URL Contents (cont’d)

• The previous URL references the file: Implementation.html

• This file is located in the directory: /hypertext/WWW/RDBgate, which is located on the server www.cern.ch

• The protocol used is HTTP

Note that this is an exact reference.Abbreviated references are allowedunder certain conditions.

Page 7: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Allowed Characters in URLs

• Every URL must be written using printable ASCII characters

• This ensures that URLs can be sent by electronic mail– Many mail programs would mishandle control

characters

• However, any non-printable ASCII character can be included in a URL by using a character encoding scheme

Page 8: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

ASCII/IRA Character Set

Page 9: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Character Encoding

• Any ASCII control character can be represented by using the preceding character stream, %xy, where “xy” is equal to the hexadecimal code of the character of interest

• It should be obvious that the “%” character can’t be used in a URL

• There are other disallowed characters:– “Space” and “TAB” characters, double quotation

marks (“), and “Slash” are examples of forbidden characters

Page 10: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Ports and IP Addresses

• Port designations and IP addresses are usually “assumed” if they are not specified in the URL

• However, they can be included within a URL without causing problems: http://www.address.edu:80/path/subdir/file.ext

• If a port number or IP address is not included in the URL, the protocol assumes that the port number is the default for that protocol

• As an example, using the “HTTP” protocol implies port “80”

Page 11: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Ports and IP Addresses (cont’d)

• Numeric IP addresses can be used in place of domain names: http://132.206.9.22/pathname

• You could also include the username and password in the URL– This is not recommended, since the password is not

encrypted. Very bad security practice!!

Page 12: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Partial URLs

• If you are within a given HTML document, it is not necessary to specify the complete URL

• Any information not included in the URL is assumed to be the same as that used to access the current document

• Partial URLs are very useful when constructing large collections of HTML documents that will be kept “together”– Caveat: If you move this collection of documents to a

different folder or server, the links will not work

Page 13: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

URL Forms

• Let’s look at a couple of examples:– File Transfer Protocol (FTP)

– Hypertext Transfer Protocol (HTTP)

Page 14: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

FTP URLs

• FTP URLs designate the files and directories that are accessible using the FTP protocol

• In the absence of any username and password, anonymous FTP access is assumed– This connects you to the server as user “anonymous” with a

password equal to your email address– Examples:

• ftp://internet.address.edu/path/• ftp://ftp.prenhall.com/pub/esm/computer_science.s-041/stallings/

Figures/DCC7e_PDF_Figures/CHAP-02/

– Note that the final “slash” indicates a directory

• The web browser would display this URL as a directory of contents

Page 15: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

FTP Directory Example

Page 16: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP URLs

• HTTP URLs designate files, directories, or server-side programs that are accessible using the HTTP protocol– Example: http://www.site.edu:3232/cgi-bin/srch

• This example references the program “srch” at the site www.site.edu, accessible through the HTTP server, using Port = 3232

• An HTTP URL must always point to either a file, or a directory– A directory is indicated by terminating the URL with a “slash”

– Example: http://www.site.edu/htmldocs/

Note the slash

Page 17: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP History

• HTTP is a protocol utilized for transmitting information with the efficiency necessary for making hypertext “jumps”

• It is documented in the IETF standards as RFC 2616• It is a transaction-oriented, client/server protocol• The most common use of HTTP is to handle

communications between a web browser (client), and a web server– Other examples: Accessing a CD using HTTP

• To provide reliability, HTTP utilizes TCP

Page 18: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

TCP/IP Architecture

Page 19: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Hypertext Transfer Protocol

• In order to develop interactive HTML documents, we need to first review the interaction between a WWW client (browser) and an HTTP server– A web site is a directory of interactive HTML

documents and programs

• This interaction involves two distinct, but closely related issues– HTTP communication methods

– How a HTTP server handles a client request

Page 20: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Communication Methods

• HTTP provides a number of communication methods, such as:– GET, POST, HEAD, etc.

• These methods allow a client to receive information from the server, and send information to the server

Page 21: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Request Handling

• If the client requests a file, the server simply locates the file and sends it to the client– If the file is not available, an error message is returned to the client

• Consider the situation when the client wants to send information to the server for more complicated processing– The HTTP server software does not do this processing, but hands it

off to another program via the Common Gateway Interface (cgi-bin)– The program that receives the processing request is referred to as a

“gateway program”– This implies that there are two interfaces to the HTTP server

• HTTP client interactions• CGI interactions

Page 22: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Gateway Programs

• Gateway programs can be referenced using URLs• When the HTTP server needs to activate the

program, it invokes the CGI mechanism to pass the data to the target program

• The CGI program acts on the data, and returns it to the HTTP protocol

• In order to understand the CGI program, we must first discuss the HTTP protocol

• After this discussion, we will cover the CGI

Page 23: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Overview

• HTTP is an Internet-based, client/server protocol that has been designed for the rapid and efficient delivery of HTML documents

• The client can make multiple concurrent requests of the HTTP server– Each request is processed individually

– The server has no recollection of previous connections

– This type of protocol is “stateless”

• Statelessness is a very important feature of HTTP– Speeds up processing of requests

Page 24: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Communications

• All HTTP communications utilize 8-bit characters• This allows the safe transmission of any type of

data, such as HTML documents• An HTTP connection has four stages:

– Open the connection

– Request

– Response

– Close the connection

Page 25: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Open Connection

• The client contacts the server at the correct IP address, using TCP Port 80

• Note that the DNS servers allow mapping mnemonic names to IP addresses

• TCP Port 80 is a “well known” port

Page 26: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

TCP Well Known Ports

Page 27: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Request

• The client sends a message to the server requesting service.

• The client request contains HTTP request headers that define the “method” requested for the transaction

• The request header is followed by information about the capabilities of the client, followed by the data to be sent to the HTTP server, if any

Page 28: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Response

• The server sends a response to the client• The response is composed of “response headers”

describing the state of the transaction• The response header is then followed by any data

required for the client

Page 29: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Close Connection

• The connection is closed by the client

Page 30: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

HTTP Procedure

• The procedure outlined previously implies that only a single download or process can be handled per connection

• This has some implications regarding handling of a request

• Consider the following scenarios:– Single Transaction per Connection

– Statelessness of the Connection

Page 31: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Single Transaction per Connection

• Suppose HTTP is utilized to access an HTML document that contains ten different images

• As a result, the document is composed of 11 distinct connections– HTML document

– Ten additional requests for the images

Page 32: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Statelessness of the Connection

• Suppose a user retrieves a “fill-in” HTML form from the HTTP server

• The user would then enter their username and password in order to access restricted data

• After the client submits the form data, the HTTP server hands off the information to the CGI program

• The CGI program then processes the data, and returns it as an HTML document, which is then delivered to the client

• Note that the HTTP server would not retain any knowledge of this connection. The state information would be included in the form data

Page 33: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Eavesdropping

• Recall that all HTML information is passed back and forth between the client and server in unencrypted ASCII character format

• This implies that a machine could “listen” on Port 80 to the data sent between the HTTP server and the client

• If security is required, a secure form of HTTP must be used (HTTPS)– Secure communication is beyond the scope of this course

Page 34: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Common Gateway Interface

• This is the standard method for communication between HTTP servers, and server-side gateway programs

• When access to a gateway program is required, the CGI process activates the program, and sends it any data required

• When the processing is finished, the CGI process sends the information back to the HTTP server

• Gateway programs can be compiled programs written in any high-level language, or scripting language– High-level languages: C, C++, Pascal– Scripting languages: perl, tcl, Unix shell, etc.

Page 35: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Common Gateway Interface (cont’d)

• Gateway programs reside in the “/cgi-bin” folder of the server

Page 36: ES 101-02. Module 5 Uniform Resource Locators, Hypertext Transfer Protocol, & Common Gateway Interface.

Next Lecture(s)

• This presentation concludes our discussion of HTTP/URLs, which are Layer 5 constructs

• The next topic of discussion will be on utilities that are of use in web development, and HTML

• At the conclusion of these lectures, we will discuss how to use these tools to build a web site


Recommended