Post on 10-Feb-2016
description
transcript
Evaluating Web Server Log Analysis Tools
David Stromdavid@strom.com
SD’98 2/13/98
SD'98 (c) David Strom, Inc. 2
Summary
• Examine different log files• What you can and can’t learn from your
logs• Pros and cons of various tools
SD'98 (c) David Strom, Inc. 3
Different types of log files
• Access• Error• Referral• Other
SD'98 (c) David Strom, Inc. 4
Access logs
• Domain name• Date, time• Server command processed and result• URL of visitor• Bytes transmitted
SD'98 (c) David Strom, Inc. 5
Sample access log data• rm258.fav.usu.edu [31/May/1995:09:03:23 +0600] "GET
/NEI.html HTTP/1.0" 302 396• rm258.fav.usu.edu [31/May/1995:09:03:28 +0600] "GET
/xculture/nei/nei.html HTTP/1.0" 200 2114• rm258.fav.usu.edu [31/May/1995:09:03:30 +0600] "GET
/gifs/sedlbutton.gif HTTP/1.0" 200 1336• 129.71.83.161 [31/May/1995:09:20:32 +0600] "GET /RELs.html
HTTP/1.0" 304 0• Leslie-Francis.tenet.edu [31/May/1995:09:36:06 +0600]
"GET / HTTP/1.0" 200 1867• ls973.ulib.albany.edu [31/May/1995:09:40:52 +0600] "GET
/viii1.html HTTP/1.0" 404 244
SD'98 (c) David Strom, Inc. 6
Errors reported in your logs
• Clients that time out (or leave in frustration!)
• Scripts that don’t produce any output• Server bugs• User authentication or configuration
problems
SD'98 (c) David Strom, Inc. 7
Sample error log data• [Thu May 30 07:25:32 1996] send timed out for
bamberg.sedl.org• [Thu May 30 07:57:41 1996] send timed out for
kenya.sedl.org• [Thu May 30 08:23:11 1996] send timed out for ppp092.kyoto-
inet.or.jp• [Thu May 30 09:15:52 1996] access to
/usr/local/www/htdocs/scimath/compass/vol03 failed for 170.211.67.51, reason: File does not exist
• [Thu May 30 09:57:56 1996] send timed out for dd10-048.compuserve.com
• [Thu May 30 10:47:25 1996] read timed out for ncia110b.ncia.net
SD'98 (c) David Strom, Inc. 8
Referral logs
• Who links to your site?• Who downloads your pages?
SD'98 (c) David Strom, Inc. 9
Sample referral log data• http://www.isisnet.com/ ->/change/welcome.html• http://www.ipl.org/ref/RR/EDU/Research-rr.html
->/welcome.html• http://www.tenet.edu/snp/main.html
->/policy/networks/toc.html• http://www.tenet.edu/new/main.html
->/policy/networks/toc.html• http://guide-p.infoseek.com/NS/Titles?qt=teacher+training -
>/resources/SCIMAST/announcement.html• http://www.tenet.edu/new/main.html
->/policy/networks/toc.html• http://www.tenet.edu/new/main.html
->/policy/networks/toc.html• http://www.nwrel.org/national/regional-labs.html
->/welcome.html
SD'98 (c) David Strom, Inc. 10
Common log format
• Output by most standard servers• Needed by most third-party log analyzers• hoohoo.ncsa.uiuc.edu/docs/setup/httpd/Overview.html
SD'98 (c) David Strom, Inc. 11
Extended/custom log formats
• Log whatever you wish in whatever order you wish
• Useful if you will read them regularly!• But can’t work with the analyzers• Now in IIS v4, NSCP v3, others.
SD'98 (c) David Strom, Inc. 12
What you can learn from your log files
• Hits per day• Domain origins• The path people take in and around your
web• Problem areas
SD'98 (c) David Strom, Inc. 13
HITS
• (How Idiots Track Success)• Nobody uses this word anymore• Doesn’t really measure individual users,
just access• Catching servers and proxies mess up these
statistics
SD'98 (c) David Strom, Inc. 14
Domain origins
• Where users are coming from -- sometimes• Just because they are from ibm.net doesn’t
mean they work at IBM!• Forgotten accounts, friends and family
using the account• Hacked user names• Proxies don’t help here either
SD'98 (c) David Strom, Inc. 15
The path people take in and around your web
• Search engines help sometimes• Which search site was the most popular
front door • Who links to you and why• Is there a pattern or a random walk?
SD'98 (c) David Strom, Inc. 16
Problem areas to deal with
• Broken links (locally)• Broken outbound links• Time outs (sunspots?)
SD'98 (c) David Strom, Inc. 17
What you can’t learn from your logs
• Who are these people, anyway?– No specific user names– Is it a bot or a real human?
• How long did they view a page?– Most people don’t spend much time on your
web– Where did they go visit next?
SD'98 (c) David Strom, Inc. 18
What technologies are available?
• Built-in analyzer tools• Sites that capture user info• Secure sites with registration• Build your own from perl• Third-party tools
SD'98 (c) David Strom, Inc. 19
Built-in tools
• WebSite, website.ora.com• IIS with Site Server,
www.microsoft.com/iis• Netscape servers, www.netscape.com• Easy to use but limited
SD'98 (c) David Strom, Inc. 20
WebSite Professional v2
• Win NT, 95• Best web server for learning about logs, best
docs• QuickStats module for instant analysis:
– single report but nice set of information– shows today, last two days requests and unique
hosts– IP addresses of visitors, average requests/hour
SD'98 (c) David Strom, Inc. 21
IIS Site Server
• NT Server v4 w/SP3 only• Lots of preconfigured reports• Two versions, Express and Full (customized
reports)• backoffice.microsoft.com/products/
siteserver/express/
SD'98 (c) David Strom, Inc. 22
Netscape v3 web servers
• Various NT, Unix versions• Reports for a few variables but nothing too
extensive• Best to use a third-party tool here
SD'98 (c) David Strom, Inc. 23
Sites that capture user info
• WebCounter, www.digits.com -- third-party hit counter
• Someone else does the programming and debugging
• But beyond your control
SD'98 (c) David Strom, Inc. 24
Secure sites with registration
• You know your users• But many won’t register, or forget their
passwords• Requires scripting, database integration,
more maintenance
SD'98 (c) David Strom, Inc. 25
Build your own from perl
• Needs some in-house support• Works best with Unix-based webs• Examples:
– refstats, members.aol.com/htmlguru/refstats.html
– surfreport, bienlogic.com/SurfReport/
SD'98 (c) David Strom, Inc. 26
Third-party tools
• WebTracker, www.CQMInc.com/webtrack• WebTrends, www.webtrends.com• net.Genesis, www.netgen.com• MarketWave, www.marketwave.com• IIS Assistant, www.go-iis.com
SD'98 (c) David Strom, Inc. 27
Third-party tools (con’t)
• Can make very pretty reports• Customizable • Make sure they support your particular log
format• Not that expensive, mostly run on Windows