Clarke, R. J (2001) L909-06: 1
Office Automation & Intranets
BUSS 909
Lecture 6Web Architecture and
Standards
Clarke, R. J (2001) L909-06: 2
Notices (1) Assignment 2 is available from the
BUSS909 Intranet- includes a Marking Criteria sheet
there are files on the intranet that provide information needed for the assignment:Organising Structures and SchemesMedia & Content ClassificationNavigation, Labeling and Searching
Clarke, R. J (2001) L909-06: 3
Notices (2)
Additional files have been placed on the BUSS909 Intraneta fundamentals of ‘Information Theory and
Systems Theory’ file called sl909-00. ppt
an introduction to different types of services on the internet is available in a file called sl909-03.ppt
Clarke, R. J (2001) L909-06: 4
Agenda (1)
WWW BasicsWeb Server OverviewWeb Documents & TreesHypertext Transfer Protocol (HTTP)Serving a Web Document- Example
Clarke, R. J (2001) L909-06: 5
WWW Basics
Clarke, R. J (2001) L909-06: 6
WWW Basics
WWW and the InternetWeb Client and Web Server SoftwareUniversal Resource Locators (URLs)Hypertext Transfer Protocol (HTTP)Hypertext Markup Language (HTML)
Clarke, R. J (2001) L909-06: 7
Uniform Resource Locators
Clarke, R. J (2001) L909-06: 8
Uniform Resource Locators (1)Definition
a Uniform Resource Locator (URL) is the address of a network resource. URLs for the WWW actually contain several components
the first component identifies the URL scheme or protocol being used to transfer information
Clarke, R. J (2001) L909-06: 9
Uniform Resource Locators (2) Some Popular URL Schemes
Hypertext Transfer Protocol http
HTTP using Secure Sockets Layer (SSL) https
E-mail Address mailto
File Transfer Protocol ftp
Finger protocol finger
Gopher protocol gopher
Wide Area Information Server wais
Usenet news news
Usenet news via Network News Transfer Protocol (NNTP) nntp
Usenet news via SSL-encrypted NNTP snews
Host-specific filenames file
Internet Relay Chat session irc
Telnet interactive session telnet
Clarke, R. J (2001) L909-06: 10
Uniform Resource Locators (3) Server Name & Resource
the second component identifies the name of a server sitting on the Internet from which a resource is being requested
the third component identifies part of the server’s subdirectory and the file name for a resource- most likely a HTML document
Clarke, R. J (2001) L909-06: 11
Uniform Resource Locators (4) ‘Complete URL’ to UOW Home Page
URL schemeserver name server’s subdirectory and
resource file name
http://www.uow.edu.au/index.html
Clarke, R. J (2001) L909-06: 12
Uniform Resource Locators (5) Incomplete URL top UOW Home Page
However, the shorter URL
http://www.uow.edu.au/index.html
points to the ‘home page’ of that serverWeb servers have a default filename
often default.html or index.htmlNote: either this URL or the previous
one enables the user to view the home page for UOW web site
Clarke, R. J (2001) L909-06: 13
Uniform Resource Locators (6)Omitting the Scheme in Web URLs
Because of the popularity of WWW, the scheme is occasionally omitted
web browsers are able to substitute this parts of web URLs
the URL terra.uow.edu.au is interpreted by Netscape as http://terra.uow.edu.au/
Clarke, R. J (2001) L909-06: 14
Uniform Resource Locators (7)Partial or Relative Web URLs
a partial or relative URL is one which does not have a protocol, host, port, or path
eg. rsch-ss.htm when referenced by http://www.uow.edu.au/commerce/buss/
research.htm
is a relative form of
http://www.uow.edu.au/commerce/buss/rsch-ss.htm
Clarke, R. J (2001) L909-06: 15
Uniform Resource Locators (8)Anchors in Web URLs
Web URLs support the use of a # sign after the HTML filename to indicate an anchor
for example, http://www.uow.edu.au/residences/
inter_house/#Facilities refers to the “Facilities” section of the document inter_house.htm
Clarke, R. J (2001) L909-06: 16
Uniform Resource Locators (9)Preserving State Information in URLs ...
WWW is inherently statelessonce a request from a client is
answered by a HTTP server, the transaction is effectively concluded
the transaction’s current status is lost, that is normally not recorded for future transactions
Clarke, R. J (2001) L909-06: 17
Uniform Resource Locators (10)… Preserving State Information in URLs ...
state information must be available for many uses like:electronic commerce across internet
(shopping carts), extranet (EDI), etcresearching on the web with search
engines which generally involves multiple attempts at converging on a small set of useful sources
Clarke, R. J (2001) L909-06: 18
Uniform Resource Locators (11)… Preserving State Information in URLs ...
however, state can be preserved for the duration of a user’s session by placing additional information into the URL
this information is typically sent to the CGI-BIN area on the server- the CGI-BIN area is where user provided executable routines are placed for execution during a user’s session
Clarke, R. J (2001) L909-06: 19
Uniform Resource Locators (12)… Preserving State Information in URLs ...
conventions exist for passing state information to CGI routines
search parameters can form state information- for example, search term “intranets” can be sent as a parameter to the query routine located in the CGI bin of Ultavista search engine
Clarke, R. J (2001) L909-06: 20
Uniform Resource Locators (13)… Preserving State Information in URLs
Everything after the ? is the parameter string that is past to the query routine located on the Altavista site
http://www.altavista.com/cgi-bin/ query?pg=q&kl=XX&q=intranets&search=Search
Clarke, R. J (2001) L909-06: 21
Web Server Overview
Clarke, R. J (2001) L909-06: 22
Web Server Overview
Web Server ComponentsRelationship to HTTPLimits of Web Servers
Clarke, R. J (2001) L909-06: 23
Web Documents & Trees
Clarke, R. J (2001) L909-06: 24
Web Documents & Trees
MIME file extensions and typesDocuments, Links and AnchorsDocument Tree Organisation
Clarke, R. J (2001) L909-06: 25
Hypertext Transfer Protocol
Clarke, R. J (2001) L909-06: 26
Hypertext Transfer Protocol
browser and server communicate using HTTPsimple set of rules designed to be
suitable for hypermedia systems distributed across networks
must understand this protocol in order to understand the WWW
HTTP defines a simple request-response ‘conversation’
Clarke, R. J (2001) L909-06: 27
Hypertext Transfer Protocol
HTTP does define how to correctly format the request and the responsethe client- often but not necessarily a
browser- is the requesting program and establishes a connection to the receiving program or server
the server replies with a response including the requested information if possible
Clarke, R. J (2001) L909-06: 28
Hypertext Transfer Protocol
HTTP does not define:how the network connection is made or
managed, orhow the information is actually transmitted
(this is done by lower-level protocols such as TCP/IP)
HTTP requests consist of a method, a Universal Resource Identifier (URI), a protocol version, and other information
Clarke, R. J (2001) L909-06: 29
Hypertext Transfer Protocol HTTP Requests: Methods ...
HTTP Methods- commonly supported methods include:GET- which returns the object;
retreives the informationHEAD- returns only information about
the object, but not the object itselfPOST- send information to be stored on
the server (eg. input to scripts)
Clarke, R. J (2001) L909-06: 30
Hypertext Transfer Protocol ... HTTP Requests: Methods
some HTTP methods are not supported by many browsers because they may put the integrity of the server at risk: PUT- send a new copy of an existing objectDELETE- permanently remove an object
other medthos may be added to the standard in the future- HTTP is extensible and has evolved- slowly
Clarke, R. J (2001) L909-06: 31
Hypertext Transfer Protocol HTTP Requests: Information Client -> Server
User-Agent: kind of browser making requestIf-Modified-Since: the object is returned only
if it is newer than a specified date (can save the cost of a retrieval)
Accept: the MIME types and formats the browser has been congigured to accept (can save the cost of downloading an unreadable document)
Authorization: user password etc. as required
Clarke, R. J (2001) L909-06: 32
Serving Documents- Example
Clarke, R. J (2001) L909-06: 33
Serving Documents- Example 1: Server waits for a new request
httpd program waits for a clients request to arrive from somewhere on the Internet
server listens to a port until someone calls it and until that occurs it is dormant
Clarke, R. J (2001) L909-06: 34
Serving Documents- Example2: Request arrives from client ...
ultimately a request is sent by a client to the server either by typing a URL or selecting a HTML anchor
the network software (client) locates the server computer and sets up a 2-way network connection from the client to the server
Clarke, R. J (2001) L909-06: 35
Serving Documents- Example... 2: Request arrives from client
client can locate servers by the use of Internet protocols and the name service (DNS) to locate and initiate a connection with the server
once the connection is established the client sends the HTTP request:
GET /sample.htm HTTP/1.0
sent over the network in ASCII, server receives it and saves it
Clarke, R. J (2001) L909-06: 36
Serving Documents- Example3: server parses the request ...
server decodes the request using HTTP protocol to determine what to do
there are three important pieces of information:the method instructs the server as to
what action should be taken. The GET method is used to locate and read the file and return it to the client ...
Clarke, R. J (2001) L909-06: 37
Serving Documents- Example... 3: server parses the request
the document (/sample.htm) can be fetched by the server because it knows where it is in the document tree, and the
browser protocol being used (HTTP/1.0) so that the contents can eventually be returned to the client sent back over the same connection as the request. (Note that the server need not find the client on the Internet or make a new connection)
Clarke, R. J (2001) L909-06: 38
Serving Documents- Example4: Read other information (if necessary) ...
the httpd program reads the rest of the requests needed
using HTTP/1.0 the browser is expected to send additional information about itself to the server
this meta-information describes the browser and its capabilities which may be needed by the server to reply to the request
Clarke, R. J (2001) L909-06: 39
Serving Documents- Example... 4: Read other information (if necessary)
for example:User-agent: Mosaic for X Windows/2.4
Accept: text/plain
Accept text/html
Accept: image/*
indicates the browser is Mosaic configured to display text, and any kind of image
Clarke, R. J (2001) L909-06: 40
Serving Documents- Example5: Do the requested method ...
Assuming no errors, the httpd program executes the request
to GET a document requires looking up the file /sample.htm in its document tree using its standard operating system
there are two alternative courses of action depending on sucess or failure
Clarke, R. J (2001) L909-06: 41
Serving Documents- Example... 5: Do the requested method (Success) ...
the httpd daemon sends a result code and the information that describes the type of information expected by the clientas the document is found a code 200 (everything
is OK) is sent and the document will followthe information is a HTML document so the
Content-type: text/htm; the document is 1066 bytes long so the Content-length: 1066
the server software and the file date are also included
Clarke, R. J (2001) L909-06: 42
Serving Documents- Example... 5: Do the requested method (Success)
the header sent to the client might look something like this:
HTTP/1.0 200 Document followsServer: NCSA/1.4Date: Thu, 20 Jul 1996 22:00:00 GMTContent-type: text/htmlContent-length: 1066Last-modified: Thu, 20 Jul 1996 20:38:40 GMT
Clarke, R. J (2001) L909-06: 43
Serving Documents- Example5: Do the requested method (Failure)...
if the requested file could not be found or read then the status code will not be 200
the most common problem is that the name of the requested file is misspelt so the server cannot find it
if the requested file was called smple.htm it would not be found- the server would send a status code 403
Clarke, R. J (2001) L909-06: 44
Serving Documents- Example... 5: Do the requested method (Failure)...
the response might look like this:HTTP/1.0 403 Not Found
Server: NCSA/1.4
Date: Thu, 20 Jul 1996 22:00:00 GMT
Content-type: text/htm
Content-length: 0
Clarke, R. J (2001) L909-06: 45
Serving Documents- Example6: Finish Up
when the file is completely sent or an error message is sent,the httpd server has finished its work- it closes
the file if it was open, and closes the network port which terminates the network connection
the client receives and formats the data- the server knows nothing
the httpd server listens for another request (go back to step 1)
Clarke, R. J (2001) L909-06: 46
Web Server Operations
Clarke, R. J (2001) L909-06: 47
Web Server Operations
a web server has a collection of information in a document tree and it serves it according to the HTTP protocol
web servers are reactive programs waiting until a request is made; it attempts to make it, this is repeated etc.
the previous example is only slightly simplified
Clarke, R. J (2001) L909-06: 48
Web Server OperationsHandling Multiple Requests (1)
if a server processes one request at a time, but can receive many simultaneous requests then delays will occur- an image may take several seconds to serve without a priority scheme, small jobs that can
be serviced quickly take inordinate amount of time to serve
with a large number of hits servers can go down- backlog can be too great
Clarke, R. J (2001) L909-06: 49
Web Server OperationsHandling Multiple Requests (2)
web servers are therefore designed to handle as many requests as possible simultaneously
several strategies are available to do this (the last two are are more difficult unless special software is used): clone a copy of the httpd program for each
request- very easy under UNIX multithreading the httpd program spreading the work amongst several helper
programs
Clarke, R. J (2001) L909-06: 50
Web Server OperationsCloning Servers (1)
each request is processed by a new copy of the httpd program
the original server called the parent immediately returns to listening for another request
the new copy called the child performs the processing
Clarke, R. J (2001) L909-06: 51
Web Server OperationsCloning Servers (2)
the parent passes the network connection to the adult at the time that it is first spawned
when the has services the request, it terminates forever
the web server hardware may have many copies of the httpd program running simultaneously
Clarke, R. J (2001) L909-06: 52
Web Server OperationsMultithreaded Execution
many mechanisms can be used for implementing this approach server may monitor the progress of several
connections, switching between them as necessary
when a lengthy process is in operation the server may switch to another pending task
when the pending processes is complete it can return to the previous lengthy process
server closes the network connections of any finished processes
this can be an extremely efficient method
Clarke, R. J (2001) L909-06: 53
Web Server OperationsServers as Cooperating Sets of Programs
the httpd server itself can be made a set of cooperating programs specialised to perform particular tasks
One program reads the requests fro the network, another allocates them to specialised helper programs
the scheme is very efficient, the number of helpers can be adjusted to meet the number of requests, the type of requests (generally less common) or the size of the system
Clarke, R. J (2001) L909-06: 54
Web Server OperationsMultiple Web Services on the same Servers
more than one web service can run on the same computer
any number of httpd programs can run on a UNIX machine as long as they have a unique port number the following web services are on the same computer
but different ports (the superuser sets up port 80 servers, but users can own and operate unrestricted ports above 1024):
http://www.rods.org/index.htm (port 80)http://www.rods.org:8080/index.htm (port 8080)http://www.rods.org:8081/index.htm (port 8081)
Clarke, R. J (2001) L909-06: 55
Web Server OperationsEstablishing a Two-Way Network Connection
client must look up the network address of the server using its name
the client’s system software sends a packet back to the server, requesting a connection
the server’s system software sends a packet back to the client, agreeing to set up a connection
the client program is connected to the new network connection
the server program is connected to the new network connection