Post on 25-Dec-2015
transcript
Visualization of the Webpage Popularity for Ping Wales
Visualization of the Popularity of the Web Access
for Ping Wales
Xiaochuan Huang (George)
Supervised by Dr Markus RoggenbachDepartment of Computer Science
University of Wales SwanseaNov. 2005 @ Gregynog
Visualization of the Webpage Popularity for Ping Wales
Overview
1. A Regular Website Report
2. Specification
3. Technology Involved
4. A First Approach
Visualization of the Webpage Popularity for Ping Wales
1. A Regular Website Report
What the project is aboutOur customer, Ping Media Ltd; the website, Ping Wales;
What they need; and the technical infrastructure
Visualization of the Webpage Popularity for Ping Wales
1. A Regular Website Report
What the project is about
Introducing similar toolsLog file analyzers;The AWStats and Analogs 6.0;Graphic statistics generated by AWStats and Analog
Visualization of the Webpage Popularity for Ping Wales
1. A Regular Website Report
Visualization of the Webpage Popularity for Ping Wales
1. A Regular Website Report
What the project is aboutOur customer, Ping Media Ltd; the website, Ping Wales;What they need; and the technical infrastructure
Introducing similar toolsLog file analyzers;The AWStats and Analogs 6.0;Graphic statistics generated by AWStats and Analog
Why this application is necessaryCustomer’s needs; The shortage of existing applications;Extendable project
Visualization of the Webpage Popularity for Ping Wales
2. Specification
ComponentsThe filter/parser;The analyzer;Two databases;Visualization
Going through the processesTake daily log file -> parse with DB1 -> output filtered result -> write result into DB2Given a specified duration -> access DB2 -> generate the records -> output an visualized report
Visualization of the Webpage Popularity for Ping Wales
3. Technologies Involved
The Apache log filesIntroduction;
Visualization of the Webpage Popularity for Ping Wales
3.Technologies Involved
The Apache log filesIntroduction;Format;"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined220.244.224.104 - - [12/Jan/2005:00:12:38 +0000] "GET /hardware/toshiba-small-80gb-hdd.html HTTP/1.0" 200 11020 "http://www.pingwales.co.uk/business/apple-keynote.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041204 Epiphany/1.4.4"
Visualization of the Webpage Popularity for Ping Wales
The Apache log filesIntroduction;Format
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined220.244.224.104 - - [12/Jan/2005:00:12:38 +0000] "GET /hardware/toshiba-small-80gb-hdd.html HTTP/1.0" 200 11020 "http://www.pingwales.co.uk/business/apple-keynote.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041204 Epiphany/1.4.4"
Log string analysis:(%h) 220.244.224.104: the IP address of the client (%l) The RFC 1413, identity of the client (%u) The userid of the requesting person(%t) [12/Jan/2005:00:12:38 +0000]: the request time(\"%r\") "GET /hardware/toshiba-small-80gb-hdd.html HTTP/1.0" method, request page,
client protocol(%>s) 200: the status code (%b) 11020: the size of the object returned to the client (\"%{Referer}i\") the site that the client reports having been referred from. (\"%{User-agent}i\") identifying information of client browser
Visualization of the Webpage Popularity for Ping Wales
3. Technologies Involved
The Apache log files
Programming language – Rubyinterpreted scripting language for quick and easy
object-oriented programming
% rubyputs "Hello, world!“^DHello, world!
% cd sample% ruby eval.rbruby> a = "Hello, world!" "Hello, world!“ruby> puts a Hello, world!Nilruby> ^D%
Visualization of the Webpage Popularity for Ping Wales
3. Technologies Involved
The Apache log files
Programming language – Ruby
Database accessMySQL,
The two databases
Access DB with Ruby
Visualization of the Webpage Popularity for Ping Wales
4. A First Approachload the daily log fileParsing/Filteringwhile not end of file
read hit, line by linefor each hit, getIP(%h), getTime(%t), getReq(\"%r\"), getSt(%>s)
Check if even(first( getSt() )), then go through the articles database looking for getIP()
if there is, write such hit to database 2, read nextgo to next hit
AnalyzingSpecify StartingTime, EndTime, build an array/stack: myArrayRead through records from database 2, for those within the specified time
for each hit,if getIP() is in myArray, then counter+=1otherwise, write this hit to myArray, initial counter
Sort myArray according to counter of each elementWrite out the result of top Ns to file, for visualizing
Water flow model Take daily log file -> parse with DB1 -> output filtered result -> write result into DB2Given a specified duration -> access DB2 -> generate the records -> output an visualized report
Daily Log File
FilterDatabase 1
<webpage add DB>
Database 2<page visits records>
VisualizationTool
GraphicReport
AnalyzerPeriod entry Records
Visualization of the Webpage Popularity for Ping Wales
Summary
What I have done so far
&
What I am planning to do next
End…
hey weak up, there he ends !! LOLGeorge 21/11/2005 @Gregynog