Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 0 times |
ANALYSE AF WEBADFÆRD - OAW
SUMMARY, LECTURE 2
• Users, Visits, Pageviews• Reach, Acquisition rate, Conversion
Rate, Retention Rate, Loyalty • Abandonment, Attrition, Churn• Recency, Frequency, Monetary value,
Duration, Yield• Acquisition cost, Conversion cost, Net
Yield, Connect rate
ANALYSE AF WEBADFÆRD - OAW
WEB SERVERS
• A Web server is a program that, using the client/server model and the World Wide Web's Hypertext Transfer Protocol (HTTP), serves the files that form Web pages to Web users (whose computers contain HTTP clients that forward their requests). Every computer on the Internet that contains a Web site must have a Web server program. Two leading Web servers are Apache, the most widely-installed Web server, and Microsoft's Internet Information Server (IIS). Other Web servers include Novell's Web Server for users of its NetWare operating system and IBM's family of Lotus Domino servers, primarily for IBM's OS/390 and AS/400 customers.
whatis.com, Feb. 2002
ANALYSE AF WEBADFÆRD - OAW
THE WEB SERVER LOG
• An access log is a list of all the requests for individual files that people have requested from a Web site. These files will include the HTML files and their imbedded graphic images and any other associated files that get transmitted. The access log (sometimes referred to as the "raw data") can be analyzed and summarized by another program. In general, an access log can be analyzed to tell you: – The number of visitors (unique first-time requests) to a home page – The origin of the visitors in terms of their associated server's domain
name (for example, visitors from .edu, .com, and .gov sites and from the online services)
– How many requests for each page at the site, which can be presented with the pages with most requests listed first
– Usage patterns in terms of time of day, day of week, and seasonally
whatis.com, Feb. 2002
ANALYSE AF WEBADFÆRD - OAW
THE WEB SERVER LOG
• Boundaries for any type of log analysis• Common Log Format – Extended CLF.
Data Element CLF ECLF
Host
Ident
Authuser
Time
Request
Status
Bytes
Referrer
User-agent
ANALYSE AF WEBADFÆRD - OAW
212.97.237.62 - - [22/Oct/2001:02:22:24 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"212.97.237.62 - - [22/Oct/2001:02:22:30 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"131.202.130.143 - - [22/Oct/2001:02:27:57 +0200] "GET /research/bed/ HTTP/1.1" 200 9079 "http://google.yahoo.com/bin/query?p=Boolean+expression&hc=0&hs=0" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"131.202.130.143 - - [22/Oct/2001:02:27:58 +0200] "GET /research/bed/icons/Book.gif HTTP/1.1" 200 227 "http://www.itu.dk/research/bed/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"131.202.130.143 - - [22/Oct/2001:02:27:58 +0200] "GET /research/bed/icons/Tools.gif HTTP/1.1" 200 251 "http://www.itu.dk/research/bed/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"131.202.130.143 - - [22/Oct/2001:02:27:58 +0200] "GET /people/hra/hoved_logo4.gif HTTP/1.1" 200 3643 "http://www.itu.dk/research/bed/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"209.185.143.138 - - [22/Oct/2001:02:43:58 +0200] "HEAD /people/kfl/fltk-1.0.4-linux-intel.rpm HTTP/1.0" 200 0 "-" "Slurp.so/1.0 ([email protected]; http://www.inktomi.com/slurp.html)"216.200.130.207 - - [22/Oct/2001:03:03:08 +0200] "HEAD /courses/W2/F2001/ HTTP/1.0" 200 0 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"216.200.130.207 - - [22/Oct/2001:03:03:10 +0200] "GET /courses/W2/F2001/ HTTP/1.0" 200 39357 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"133.11.12.2 - - [22/Oct/2001:03:04:57 +0200] "HEAD /people/birkedal/papers/index.html HTTP/1.0" 200 0 "-" "-"213.122.171.29 - - [22/Oct/2001:03:11:40 +0200] "HEAD /people/birkedal/realizability/index.html HTTP/1.0" 200 0 "-" "Mozilla/3.0 (compatible)"202.70.68.176 - - [22/Oct/2001:03:22:03 +0200] "GET / HTTP/1.1" 200 77 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"202.70.68.176 - - [22/Oct/2001:03:22:07 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"199.172.149.172 - - [22/Oct/2001:03:31:12 +0200] "GET /people/jm/ HTTP/1.0" 200 1539 "-" "ArchitextSpider"199.172.149.173 - - [22/Oct/2001:03:39:14 +0200] "GET /research/ddd/ HTTP/1.0" 200 2342 "-" "ArchitextSpider"12.75.131.29 - - [22/Oct/2001:03:42:35 +0200] "GET /connection HTTP/1.1" 404 272 "http://www1.umn.edu/twincities/directory/indexi.html" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)"12.75.131.29 - - [22/Oct/2001:03:42:50 +0200] "GET / HTTP/1.1" 200 77 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)"12.75.131.29 - - [22/Oct/2001:03:42:51 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)"12.75.131.29 - - [22/Oct/2001:03:43:10 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)"12.75.131.29 - - [22/Oct/2001:03:43:13 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)"61.9.192.142 - - [22/Oct/2001:03:45:24 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"61.9.192.142 - - [22/Oct/2001:03:45:26 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"61.9.192.142 - - [22/Oct/2001:03:46:11 +0200] "POST /main/cgi-bin/people.cgi HTTP/1.1" 200 2206 "http://www.it-c.dk/English/find_person/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"61.9.192.142 - - [22/Oct/2001:03:47:34 +0200] "GET /courses HTTP/1.1" 301 299 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"66.7.131.158 - - [22/Oct/2001:03:47:57 +0200] "GET /courses/GP/F2000/index.html HTTP/1.0" 200 4393 "-" "Openfind data gatherer, Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html)"130.226.133.8 - - [22/Oct/2001:04:00:47 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6"130.226.133.92 - - [22/Oct/2001:04:01:05 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6"130.226.141.6 - - [22/Oct/2001:04:02:00 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6"130.226.141.15 - - [22/Oct/2001:04:02:00 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6"130.226.143.195 - - [22/Oct/2001:04:02:49 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6"66.7.131.158 - - [22/Oct/2001:04:08:21 +0200] "GET /courses/GP/F2000/Eksempler/JavaSoftwareSolutions/chap07/Doodle.html HTTP/1.0" 200 255 "-" "Openfind data gatherer, Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html)"80.62.239.98 - - [22/Oct/2001:04:12:28 +0200] "GET /people/tofte HTTP/1.1" 301 298 "http://www.it-c.dk/Internet/itu/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"80.62.239.98 - - [22/Oct/2001:04:12:28 +0200] "GET /people/tofte/leftorange.htm HTTP/1.1" 200 1279 "http://www.it-c.dk/people/tofte/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"80.62.239.98 - - [22/Oct/2001:04:12:29 +0200] "GET /people/tofte/pics/spacer22.GIF HTTP/1.1" 404 286 "http://www.itu.dk/people/tofte/leftorange.htm" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"80.62.239.98 - - [22/Oct/2001:04:12:30 +0200] "GET /people/tofte/Tofte2.jpg HTTP/1.1" 200 10618 "http://www.it-c.dk/people/tofte/madscontents.htm" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"66.7.131.158 - - [22/Oct/2001:04:15:28 +0200] "GET /courses/GP/F2000/Eksempler/JavaSoftwareSolutions/chap11/MirroredPictures.html HTTP/1.0" 200 228 "-" "Openfind data gatherer, Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html)"66.7.131.158 - - [22/Oct/2001:04:23:33 +0200] "GET /courses/GP/F2000/Eksempler/Tekstfiler/places.txt HTTP/1.0" 200 90 "-" "Openfind data gatherer, Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html)"199.172.149.172 - - [22/Oct/2001:04:24:14 +0200] "GET /people/hra/notes-index.html HTTP/1.0" 200 1670 "-" "ArchitextSpider"66.7.131.158 - - [22/Oct/2001:04:29:43 +0200] "GET /courses/GP/F2000/hold.html HTTP/1.0" 200 5212 "-" "Openfind data gatherer, Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html)"61.9.149.155 - - [22/Oct/2001:04:33:22 +0200] "GET /courses/W2/ssh.html HTTP/1.1" 200 2602 "-" "-"203.58.38.86 - - [22/Oct/2001:04:36:34 +0200] "GET /~haas/GC/c-tut.html HTTP/1.0" 200 77 "http://www.student.dtu.dk/~c971714/GC/c-tut.html" "Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)"203.58.38.86 - - [22/Oct/2001:04:36:37 +0200] "GET /~haas/GC/c-tut.php HTTP/1.0" 200 24819 "http://www.itu.dk/~haas/GC/c-tut.html" "Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)"64.55.148.54 - - [22/Oct/2001:04:43:30 +0200] "GET /people/slauesen/ HTTP/1.0" 200 11173 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"64.55.148.54 - - [22/Oct/2001:04:47:21 +0200] "GET /main/projektboers.html HTTP/1.0" 200 75604 "-" "Mozilla/2.0 (compatible; Ask Jeeves)"198.81.17.166 - - [22/Oct/2001:04:48:13 +0200] "POST /main/cgi-bin/people.cgi HTTP/1.0" 200 928 "http://www.it-c.dk/English/find_person/" "Mozilla/4.0 (compatible; MSIE 5.01; AOL 6.0; Windows 98)"
AN EXAMPLE, IT-C.DK (Oct 2001)
ANALYSE AF WEBADFÆRD - OAW
HOST
• Fully qualified domain name of the client or its IP address if the name is unavailable
• The address to which the server’s response will be sent
• Reverse Address Lookup on the fly is possible – however in most cases performed while postprocessing the log instead
• Important issues: dial up connections, proxies
ANALYSE AF WEBADFÆRD - OAW
IDENT
• Identifier supplied by client applications that support identd (identification daemon)
• Mail, Ftp, Irc .. Rarely http. • Also referred to as RFC931
ANALYSE AF WEBADFÆRD - OAW
AUTHUSER
• The authenticated user name (if user authentication is required for that file)
ANALYSE AF WEBADFÆRD - OAW
TIME
• Usually the time when the web server completed responding to the HTTP request
• DD/Month/YYYY:HH:MM:SS +XXX0
ANALYSE AF WEBADFÆRD - OAW
REQUEST
• The actual request from the user client. Typically it looks like the following:
• Different types of requests: GET, POST, HEAD
• Protocol version included (HTTP/1.1)
"GET /people/tofte/leftorange.htm HTTP/1.1"
ANALYSE AF WEBADFÆRD - OAW
STATUS
• A three-digit status code, which the server returns to the browser– Four classes of codes. Information (100 series).
Success (200 series). Redirect (300 series). Failure (400 series). Server Error (500 series).
• Examples – 200 OK, 302 Redirect, 401 Unauthorized, 403
Forbidden, 404 File not found
ANALYSE AF WEBADFÆRD - OAW
BYTES
• For GET requests: Number of bytes returned by the server to the client.
ANALYSE AF WEBADFÆRD - OAW
REFERRER
• Indicates the page where the visitor was located when making the request
• Important for path-analysis• Can be used for referring schemes and
for measuring banner effects etc.• RFC2068 (HTTP/1.1):
– Note: Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent.
ANALYSE AF WEBADFÆRD - OAW
USERAGENT
• Browser name/version (operating system) – "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"– "Mozilla/3.0 (Macintosh; I; PPC)"
• Note reg. Mozilla:– Mozilla was Netscape Communication's nickname for
Navigator, its Web browser, and, more recently, the name of an open source public collaboration aimed at making improvements to Navigator.
ANALYSE AF WEBADFÆRD - OAW
USERAGENT - STATISTICS
• Example from early 20021 MSI E 5.0 61.48
2 MSI E 6.0 22.99
3 MSI E 5.5 6.75
4 MSI E 4.0 4.62
5 Unresolved: J ava Enabled 0.87
6 Netscape 4.7 0.73
7 MSI E (AOL) 5.5 0.35
8 MSI E 3.0 0.34
9 MSI E (AOL) 5.0 0.25
10 Netscape 4.5 0.24
21 Web TV 0,02
25 Opera 5.1 0,01
ANALYSE AF WEBADFÆRD - OAW
MORE OPTIONS
• Filename• Time-to-serve• IP address• Server port• URL-requested• Cookie
ANALYSE AF WEBADFÆRD - OAW
THE QUIZ1. The referrer indicates where in the world the users is located. 2. Apache installed on Windows 2000 is an open source web
server3. A webserver receives information from the client (the browser)4. A webserver sends information to the client (the browser)5. Webserver failures returns a 30x status code6. It is possible to calculate an estimate of the website’s traffic
(eg Gb per month) from the web server log7. One IP number in the webserver log is by definition one user8. A line in a web server log file is maximum 80 characters9. Microsoft has a market share of less than one third of all
webservers in the world10. User agent information is part of the Common Logfile Format
ANALYSE AF WEBADFÆRD - OAW
AN EXAMPLE
80.62.239.98 - - [22/Oct/2001:04:12:28 +0200] "GET /people/tofte/leftorange.htm HTTP/1.1" 200 1279 "http://www.it-c.dk/people/tofte/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
ANALYSE AF WEBADFÆRD - OAW
MORE INFORMATION
• Apache HTTP Server Documentation, Log Files– http://httpd.apache.org/docs/logs.html
• Microsoft IIS Log Format– http://www.microsoft.com/windows2000/en/server/iis
/htm/core/iiabtlg.htm#MicrosoftIISLogFormat
• HTTP/1.1 Documentation– http://www.w3.org/Protocols/rfc2068/rfc2068