Alexandros Labrinidis, Univ. of Pittsburgh 2 CMU . CS 415 . 07 November 2002
Examples of db-powered web sites
n Google.com (search engine)n Amazon.com (shopping)n eBay.com (auctions)n WellsFargo.com (online banking)n weather.com (forecasts)n expedia.com (travel)n NSF.gov (proposal submission)n my.yahoo.con (personalized newspaper)n NYTimes.com (electronic newspaper)
Alexandros Labrinidis, Univ. of Pittsburgh 3 CMU . CS 415 . 07 November 2002
Motivating Example: nytimes.com
Alexandros Labrinidis, Univ. of Pittsburgh 4 CMU . CS 415 . 07 November 2002
Why is WebDB so popular?
n Arguments from web “producers”:n easy to “publish” databases over the Webn wealth of information availablen enable personalizationn allow targeted advertising
n Arguments from web “consumers”:n no need to install special softwaren easy to learn - uniform user interface n personalized content
n Today – even seemingly static sites have WebDB
Alexandros Labrinidis, Univ. of Pittsburgh 5 CMU . CS 415 . 07 November 2002
Typical WebDB architecture – 3 tiers
n Web server: handle HTTP requestsn Application server: web workflown DB server: data storage & queries
User 1
User 2
User 3
User …
User n
internet
web server
app server
db server
Alexandros Labrinidis, Univ. of Pittsburgh 6 CMU . CS 415 . 07 November 2002
Typical WebDB architecture – 2 tiers
2-tiers: incorporate application server within web server
User 1
User 2
User 3
User …
User n
internet
web & appserver
db server
Alexandros Labrinidis, Univ. of Pittsburgh 7 CMU . CS 415 . 07 November 2002
HTTP: HyperText Transfer Protocol
Client request:
nitrogen{4} telnet www.google.com 80
Trying 216.239.37.101...
Connected to www.google.com.
Escape character is '^]'.
GET /index.html HTTP/1.0
[two carriage returns]
Alexandros Labrinidis, Univ. of Pittsburgh 8 CMU . CS 415 . 07 November 2002
HTTP: HyperText Transfer Protocol
Server response:
HTTP/1.0 200 OKContent-Length: 2532Connection: CloseServer: GWS/2.0Date: Thu, 07 Nov 2002 16:57:59 GMTContent-Type: text/htmlCache-control: private
Set-Cookie: PREF=ID=24bce47555c1db8b:TM=1036688279:LM=1036688279:S=t4XqRr3VPTPwKMEp; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
<html>….</html>Connection closed by foreign host.
Alexandros Labrinidis, Univ. of Pittsburgh 9 CMU . CS 415 . 07 November 2002
HTML: HyperText Markup Language
n page wrapped in <html></html>
n many formatting commands:n <font color=“red”>SOMETHING IN RED</font>n CSS: Cascading StyleSheets
n Form example:<form action=“/activate.cgi” method=POST>Please give your name:<input type=text name=“username”
size=15 maxlength=30><input type=submit value=“Submit Name”></form>
Alexandros Labrinidis, Univ. of Pittsburgh 10 CMU . CS 415 . 07 November 2002
CGI: Common Gateway Interface
n Introduced in the early 90sn Protocol for exchanging form data between
client and server
User requests formWeb Server sends form to user
User submits form CGI forwards formdata to application
process data &generate output
Web Server sends output to userUser receives output
network db server
Alexandros Labrinidis, Univ. of Pittsburgh 11 CMU . CS 415 . 07 November 2002
CGI encoding
n method=GETn Form data are encoded as part of the URL
(e.q. http://www.google.com/search?q=cmu)
n method=POSTn Form data are passed via environment variables
n encoding of query string:n (name,value) pairs from FORMn name1=value1&name2=value2&…n space à +n other controls characters in hexadecimal format
Alexandros Labrinidis, Univ. of Pittsburgh 12 CMU . CS 415 . 07 November 2002
Cookies
n HTTP is connection-lessn a new connection is established with every requestn HTTP/1.1 supports persistent connections (but not popular)
n Q: How to maintain state over a session?n A: Cookies
n Cookies are text-only strings that are stored at the browsers’ memory (and disk)
n .google.com TRUE / FALSE 2147368447 PREF ID=0a2612f3162aa05f:TM=1035999053:LM=1035999053:S=ko0-i9cOsU6nMaW6
Alexandros Labrinidis, Univ. of Pittsburgh 13 CMU . CS 415 . 07 November 2002
Cookies and Databases
n Use cookie to store client-ID
n Storing client-ID enables personalizationn Example: NYTimes.com
n Storing client-ID enables access to restricted areasn Example: NYTimes.com (subscription-based access to articles)
n Use cookie to prohibit duplicate submissions on pollsn Randomly generated IDn Matched against database of those already “voted”
Alexandros Labrinidis, Univ. of Pittsburgh 14 CMU . CS 415 . 07 November 2002
Cookies and Privacy
n Cookies only store text-strings given by serversn Include domain, expiration date, etc
n Only a server from the same domain can access a previously stored cookie
n The case of Doubleclick:n Same cookie used by an advertising agencyn Match individuals with browsing profiles that span multiple sitesn Huge data mining opportunityn Huge controversy
Alexandros Labrinidis, Univ. of Pittsburgh 15 CMU . CS 415 . 07 November 2002
Client-side programs
n Extend web user interface by running programs at clientsn Allow for sophisticated UIsn Must be careful of malicious code (from untrusted servers)
n Javan Full-fledged programming languagen Protection capabilities
n Other client-side scripting languages:n JavaScript (SUN)n VBScript (MSFT)n Flash/Shockwave (Macromedia)
Alexandros Labrinidis, Univ. of Pittsburgh 16 CMU . CS 415 . 07 November 2002
Server-side programs
n Implement server applications and workflown Programs that generate HTMLn Embedded HTML
n Java:n Servletsn Java Server Pages (JSP – SUN)
n Server-side scripting:n Active Server Pages (ASP – MSFT)n PHPn mod_perl
Alexandros Labrinidis, Univ. of Pittsburgh 17 CMU . CS 415 . 07 November 2002
Embedded HTML: PHP
n PHP stands for “PHP Hypertext Processor”n http://www.php.net
n Example:
<html> <head> <title>Example</title> </head> <body> <?php echo "Hi, I'm a PHP script!"; ?> </body> </html>
Alexandros Labrinidis, Univ. of Pittsburgh 18 CMU . CS 415 . 07 November 2002
Generating HTML: mod_perl
n mod_perl: efficiently use perl to generate HTMLn http://perl.apache.org
n Example:
#!/bin/perlprint << EOF;<html> <head> <title>Example</title> </head> <body>Hi, I was generated by mod_perl! </body> </html>EOF
Alexandros Labrinidis, Univ. of Pittsburgh 19 CMU . CS 415 . 07 November 2002
DB interface: mod_perl
n Two modules – allow for portability:n DBI: database independent libraryn DBD: database dependent driver
n Example:
use DBI;my $dbh = DBI->connect(“oracle”, “user”, “pass”);my $stmt = $dbh->prepare(“SELECT * from foo;”);$stmt->execute();while (@row = $stmt->fetchrow_array() ) {
print “Row: @row\n”;}$dbh->disconnect();
Alexandros Labrinidis, Univ. of Pittsburgh 20 CMU . CS 415 . 07 November 2002
Improving Performance: CGI
n CGI forks new process on every calln context switchn re-connect to db server
n mod_perl: maintain pool of processesn no context switch – assign call to existing processn scales better – 10 times faster
[Labrinidis & Roussopoulos, SIGMOD Record, Mar 2000]
n mod_perl: re-use db connectionsn no need to re-connect to db servern twice faster than mod_perl
Alexandros Labrinidis, Univ. of Pittsburgh 21 CMU . CS 415 . 07 November 2002
Improving Performance: Caching
n Web Cachingn store static content close to the usersn avoid re-transmitting over the networkn consistency: Time-To-Live (TTL)
n Dynamic Web Cachingn store dynamic content (from db) and re-use if possiblen avoid re-computing from databasen what to cache:
n entire web pagesn query resultsn HTML fragments