MASTER OF SOFTWARE ENGINEERING PORTFOLIO
by
ERIC F. DAVIS
B.S., Kansas State University, 2003
A REPORT
submitted in partial fulfillment of the requirements for the degree
MASTER OF SOFTWARE ENGINEERING
Department of Computing and Information Sciences
College of Engineering
KANSAS STATE UNIVERSITY
Manhattan, Kansas
2008
Approved by:
Major Professor
Dr. William H. Hsu
Abstract
The KDD-Research Entity Search Tool (KREST) is a standalone application that allows a user to
find specific pieces of data available on the World Wide Web. It allows web crawling, web
searching, and the option to perform entity searches. The web searches and entity searches can
be performed on web pages that are loaded into the program or on pages that were crawled using
the web crawling portion of the application. The benefit of having an all-in-one tool like KREST
is that it allows the user to find the specific piece of contact information that they want, such as
an email address, a phone number, or a street address. It attempts to replace the current tedious
search method of having to open an internet browser, go to a search engine, attempting to
determine the proper search term, and then wading through matching pages until the find the
desired information.
iii
Table of Contents
List of Figures ................................................................................................................................ iv
List of Tables ................................................................................................................................ vii
CHAPTER 1 - Vision Document.................................................................................................... 1
CHAPTER 2 - Project Plan........................................................................................................... 13
CHAPTER 3 - Software Quality Assurance Plan......................................................................... 23
CHAPTER 4 - Architectural Design............................................................................................. 29
CHAPTER 5 - Technical Inspection Checklist ............................................................................ 79
CHAPTER 6 - Component Design ............................................................................................... 82
CHAPTER 7 - Test Plan ............................................................................................................. 125
CHAPTER 8 - Test Assessment Evaluation............................................................................... 166
CHAPTER 9 - User’s Manual .................................................................................................... 197
CHAPTER 10 - Project Evaluation ............................................................................................ 214
References................................................................................................................................... 224
Appendix A - Source Metrics ..................................................................................................... 226
iv
List of Figures
Figure 1.1 Project Overview........................................................................................................... 2
Figure 1.2 Project Block Diagram .................................................................................................. 3
Figure 1.3 KREST Data Flow Diagram.......................................................................................... 4
Figure 1.4 System Use Case ........................................................................................................... 5
Figure 2.1 Project Schedule .......................................................................................................... 15
Figure 4.1 Package View .............................................................................................................. 30
Figure 4.2 KREST Application Package ...................................................................................... 30
Figure 4.3 Controller Package ...................................................................................................... 31
Figure 4.4 KrestController Class .................................................................................................. 32
Figure 4.5 KrestAboutDialog Class.............................................................................................. 33
Figure 4.6 WebCrawler Class ....................................................................................................... 33
Figure 4.7 SiteVisitor Class .......................................................................................................... 34
Figure 4.8 ThreadController Class................................................................................................ 34
Figure 4.9 HTTPReader Class ...................................................................................................... 35
Figure 4.10 WebSearcher Class.................................................................................................... 35
Figure 4.11 EntitySearcher Class.................................................................................................. 35
Figure 4.12 View Package ............................................................................................................ 36
Figure 4.13 KrestView Class ........................................................................................................ 36
Figure 4.14 CrawlerObserver Class.............................................................................................. 37
Figure 4.15 SearchObserver Class................................................................................................ 37
Figure 4.16 EntityObserver Class ................................................................................................. 38
Figure 4.17 Model Package .......................................................................................................... 38
Figure 4.18 KrestModel Class ...................................................................................................... 38
Figure 4.19 KrestObjectLibrary Class .......................................................................................... 39
Figure 4.20 WebObject Class ....................................................................................................... 39
Figure 4.21 Webpage Class .......................................................................................................... 39
Figure 4.22 KrestEntity Class ....................................................................................................... 40
v
Figure 4.23 AddressEntity Class................................................................................................... 40
Figure 4.24 EmailEntity Class ...................................................................................................... 40
Figure 4.25 FaxEntity Class.......................................................................................................... 41
Figure 4.26 PhoneEntity Class...................................................................................................... 41
Figure 4.27 ZipEntity Class .......................................................................................................... 41
Figure 4.28 OverarchingEntity Class............................................................................................ 42
Figure 4.29 Web Crawl Sequence Diagram ................................................................................. 43
Figure 4.30 Web Search Sequence Diagram ................................................................................ 44
Figure 4.31 Entity Search Sequence Diagram .............................................................................. 45
Figure 6.1 Package View .............................................................................................................. 83
Figure 6.2 KREST Application Package ...................................................................................... 83
Figure 6.3 Controller Package ...................................................................................................... 84
Figure 6.4 KrestController Class .................................................................................................. 85
Figure 6.5 KrestAboutDialog Class.............................................................................................. 91
Figure 6.6 FileLoader Class.......................................................................................................... 92
Figure 6.7 WebCrawler Class ....................................................................................................... 93
Figure 6.8 SiteVisitor Class .......................................................................................................... 95
Figure 6.9 ThreadController Class................................................................................................ 98
Figure 6.10 HTTPReader Class .................................................................................................... 99
Figure 6.11 WebSearcher Class.................................................................................................. 100
Figure 6.12 EntitySearcher Class................................................................................................ 101
Figure 6.13 View Package .......................................................................................................... 103
Figure 6.14 KrestView Class ...................................................................................................... 104
Figure 6.15 CrawlerObserver Class............................................................................................ 105
Figure 6.16 SearchObserver Class.............................................................................................. 107
Figure 6.17 EntityObserver Class ............................................................................................... 108
Figure 6.18 Model Package ........................................................................................................ 109
Figure 6.19 KrestModel Class .................................................................................................... 110
Figure 6.20 KrestObjectLibrary Class ........................................................................................ 111
Figure 6.21 WebObject Class ..................................................................................................... 112
Figure 6.22 Webpage Class ........................................................................................................ 113
vi
Figure 6.23 KrestEntity Class ..................................................................................................... 115
Figure 6.24 AddressEntity Class................................................................................................. 116
Figure 6.25 EmailEntity Class .................................................................................................... 117
Figure 6.26 FaxEntity Class........................................................................................................ 118
Figure 6.27 PhoneEntity Class.................................................................................................... 119
Figure 6.28 ZipEntity Class ........................................................................................................ 121
Figure 6.29 OverarchingEntity Class.......................................................................................... 122
Figure 9.1 Opening KREST Screen............................................................................................ 198
Figure 9.2 Completed Breadth-First Web Crawl........................................................................ 200
Figure 9.3 Depth-First Crawl in Progress ................................................................................... 201
Figure 9.4 Saving a Web Crawl.................................................................................................. 202
Figure 9.5 Stopping a Web Crawl............................................................................................... 203
Figure 9.6 Resetting a Web Crawl.............................................................................................. 204
Figure 9.7 Performing a Web Search.......................................................................................... 205
Figure 9.8 Filtering the Web Search by Back-link Count .......................................................... 206
Figure 9.9 Performing an Entity Search ..................................................................................... 207
Figure 9.10 How to Load Data into KREST............................................................................... 208
Figure 9.11 How to Save Entity Search Results ......................................................................... 209
Figure 9.12 KREST Application with Exit Methods Circled ..................................................... 210
Figure 9.13 How to Access the Help Menu................................................................................ 211
Figure 10.2 Phase Breakdown .................................................................................................... 217
Figure 10.3 Project Activity Breakdown .................................................................................... 218
Figure 10.4: Phase 1 Activity Breakdown .................................................................................. 219
Figure 10.5: Phase 2 Activity Breakdown .................................................................................. 220
Figure 10.6: Phase 3 Activity Breakdown .................................................................................. 221
vii
List of Tables
Table 2.1 COCOMO Effort Adjustment Factors .......................................................................... 16
Table 2.2 Project Effort Adjustment Factor Values..................................................................... 16
Table 5.1 Technical Inspection Checklist ..................................................................................... 80
Table 6.1 Detailed Description of the KrestApplication Class..................................................... 83
Table 6.2 Detailed Description of the KrestController Class ....................................................... 86
Table 6.3 Detailed Description of the KrestAboutDialog Class................................................... 91
Table 6.4 Detailed Description of the FileLoader Class............................................................... 92
Table 6.5 Detailed Description of the WebCrawler Class............................................................ 93
Table 6.6 Detailed Description of the SiteVisitor Class ............................................................... 95
Table 6.7 Detailed Description of the ThreadController Class..................................................... 98
Table 6.8 Detailed Description of the HTTPReader Class ........................................................... 99
Table 6.9 Detailed Description of the WebSearcher Class......................................................... 100
Table 6.10 Detailed Description of the EntitySearcher Class..................................................... 102
Table 6.11 Detailed Description of the KrestView Class ........................................................... 104
Table 6.12 Detailed Description of the CrawlerObserver Class................................................. 105
Table 6.13 Detailed Description of the SearchObserver Class................................................... 107
Table 6.14 Detailed Description of the EntityObserver Class .................................................... 108
Table 6.15 Detailed Description of the KrestModel Class ......................................................... 110
Table 6.16 Detailed Description of the KrestObjectLibrary Class ............................................. 111
Table 6.17 Detailed Description of the WebObject Class .......................................................... 113
Table 6.18 Detailed Description of the Webpage Class ............................................................. 113
Table 6.19 Detailed Description of the KrestEntity Class.......................................................... 115
Table 6.20 Detailed Description of the AddressEntity Class ..................................................... 116
Table 6.21 Detailed Description of the EmailEntity Class ......................................................... 117
Table 6.22 Detailed Description of the FaxEntity Class............................................................. 118
Table 6.23 Detailed Description of the PhoneEntity Class......................................................... 119
Table 6.24 Detailed Description of the ZipEntity Class ............................................................. 121
viii
Table 6.25 Detailed Description of the OverarchingEntity Class............................................... 122
Table 7.1 Test Case 1.................................................................................................................. 129
Table 7.2 Test Case 2.................................................................................................................. 130
Table 7.3 Test Case 3.................................................................................................................. 136
Table 7.4 Test Case 4.................................................................................................................. 140
Table 7.5 Test Case 5.................................................................................................................. 147
Table 8.1 Test Results Summary ................................................................................................ 166
Table 8.2 Test Log for Test Case 1............................................................................................. 167
Table 8.3 Test Log for Test Case 2............................................................................................. 168
Table 8.4 Test Log for Test Case 3............................................................................................. 173
Table 8.5 Test Log for Test Case 4............................................................................................. 176
Table 8.6 Test Log for Test Case 5............................................................................................. 182
Table 10.1 Project Phase Completion Dates............................................................................... 216
1
CHAPTER 1 - Vision Document
1 Introduction
1.1 Motivation
The motivation for this project is to improve upon the current state of web searching.
Searching for contact information in the web is a tedious task in the current state: it
involves trying to determine a proper search string, followed by wading through
matching pages looking for the contact information desired. The goal of this project is
to improve upon this process by providing the contract information after one search,
without requiring the user to wade through pages that match the search string.
1.2 KDD-Research Entity Search Tool (KREST)
The KDD-Research Entity Search Tool (KREST) is the answer to the search related
problems mentioned above. It breaks apart web pages into entities, such as email
addresses, phone numbers, and fax numbers. Specific entity results are returned to the
user, rather than the old way of returning page matches. KREST also allows the user to
perform a web crawl from a given starting webpage, and to perform a traditional web
search in addition to performing an entity search.
Entity search works by allowing the user to specify what specific type of information
they are looking for. To the end user, using entity search will seem somewhat like
using a database to query for information -- just the information requested will be
returned, without any additional filler. Rather than being forced to search for a general
term like “Amazon Customer Service”, entity search will allow the user to specify that
they are looking for the Amazon Customer Service phone number by entering a search
term like “Amazon Customer Service #phone”. Alternatively, if the user was looking
for the email address of the professors at Kansas State University that teach database
courses, they could search for that specifically by a search term such as “Kansas State
University professor database #email”. Upon receiving a search term, the entity search
2
tool will look for the pages that match the search text. Those pages that match the
search text will then be broken to extract the requested entities (if they exist on those
pages). The entities that match will then be returned to the user with links to the pages
that contained the information in case the user wishes to verify the information.
1.3 Terms & Definitions
Actor – For UML purposes, the actor is the end user of the system.
Entity – A specific piece of information, such as an email address or a phone number.
Knowledge Discovery in Databases (KDD) – A group headed by Dr. William Hsu
whose primary focus is data-mining.
Sequence Diagram – A graphical design used to display the order in which objects
interact during a certain period.
Unified Modeling Language (UML) – A standard notation used to describe real-world
objects.
Use Case Diagram – A behavioral diagram defined by UML. It provides a graphical
depiction of system functionality in terms of actors.
2 Project Overview
The Project Overview section provides information about the structure and goals of the
KREST project.
Figure 1.1 Project Overview
3
2.1 Introduction
Figure 1.1 provides a high level overview of what the KREST project is working to
achieve. It will allow the user to perform web crawls, web searches, and entity
searches all within the same tool. It will be a self-contained application that works
separately from the user’s normal Internet browser. The KREST environment will
update and extract data from a database that stores previously crawled web pages.
Figure 1.2 Project Block Diagram
Figure 1.2 provides a block diagram of how the KREST project will operate. The user
will interact with the KREST tool within the KREST environment. The KREST
4
environment makes calls to the Application Level where the Web Crawler Service, the
Web Search Service or the Entity Search Service perform the work. Each of these
services makes use of the Website Database in the Storage Level. All of the work
being performed is done on the JAVA Virtual Machine, which in turn runs on the
user’s actual system hardware.
Figure 1.3 KREST Data Flow Diagram
Figure 1.3 provides a view of how data will be used throughout the program, especially
for entity searching. The database will contain web pages, which are linked to for
specific entity instantiations.
2.2 Project Goal
The goal of the KREST project is to create an application that provides the ability to
perform entity searches on either previously loaded data or crawled web pages. The
project should be able to reproduce the findings from Tao Cheng’s entity search work
[2], which is searching for contact information based on a publicly available dataset of
web pages.
2.3 Project Purpose
The purpose of the KREST project is to provide a tool that allows enhanced web
searching by way of entity search. It is also to provide a standalone application that
5
will speed up searches on the client end. The developed application will act as a
platform for future KDD students to perform entity search testing, and provide a good
base for future entity search enhancements.
3 Project Requirements
The Project Requirements section will detail all of the requirements for the KREST project.
Each requirement will be discussed in detail, as well as the associated requirement number,
and the planned release that will fulfill the requirement (i.e. Demo 1, Demo 2, or Final
Release). All of the project’s critical requirements will be noted.
Figure 1.4 System Use Case
The requirements are broken out into four distinct sections based on the Use Case diagram
found in Figure 1.4: Application Requirements, Web Crawler Requirements, Web Search
Requirements, and Entity Search Requirements. This makes it easier to track the
6
requirements between different parts of the application, and also makes it easier to refine
and add requirements as the project progresses.
3.1 Application Requirements
This section details all of the requirements related to the main application that are not
specific to the web crawler, the web search, or the entity search pieces. The
requirements are numbered ARI 1XX, where ARI stands for Application Requirement
Item.
3.1.1 ARI 100 [Critical Requirement]
The program shall provide a GUI for user interaction. This is a critical requirement
because the usefulness of the system would be extremely limited if done in a command
line format.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.1.2 ARI 101
The application shall be executable in a single step (e.g. without having to perform any
setup steps).
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.1.3 ARI 102
The application shall have a menu bar that contains at a minimum: a File menu and a
Help menu.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.1.4 ARI 103 [Critical Requirement]
The application shall allow the user to load a data set of web pages. This is a critical
requirement because in order to reproduce the findings of [2], the same data set needs
to be used.
• Build Release Applicability: Final Release
3.1.5 ARI 104
The application shall allow the user to save entity search results.
7
• Build Release Applicability: Final Release
3.1.6 ARI 105
The application's Help menu shall contain at a minimum an ‘About’ menu item
• Build Release Applicability: Demo 2, Final Release
3.1.7 ARI 106
The application's menu bar shall contain shortcut keys.
• Build Release Applicability: Demo 2, Final Release
3.1.8 ARI 107 [Critical Requirement]
The application shall be platform independent. This is a critical requirement because
while the application is being developed using Windows, the goal is to also allow it to
be used on both Linux and Unix as well.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.1.9 ARI 108
The application shall be able to be minimized.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.1.10 ARI 109
The application shall be able to be closed without having to perform a Control-C from
the command line.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.2 Web Crawler Requirements
This section details all of the requirements related to the web crawling portion of the
project. The requirements are numbered WCRI 1XX, where WCRI stands for Web
Crawling Requirement Item.
3.2.1 WCRI 100 [Critical Requirement]
The user shall have the ability to perform a web crawl based on a starting website. This
is a critical requirement because without the web crawling portion of the project, the
usefulness of the project is extremely limited (it would be limited to only using user
8
loaded data sets). By allowing user specified web crawls to be performed, the user can
tailor the search to their needs.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.2.2 WCRI 101 [Critical Requirement]
The user shall be allowed to specify the starting website (if none is specified,
http://www.cis.ksu.edu will be used). This is a critical requirement because allowing
the user to specify the start point to crawl from allows a good web crawl to take place.
Without allowing the user to specify the start point, there would not be any usefulness
to the web crawler.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.2.3 WCRI 102
The user shall have the ability to specify the maximum depth of the web crawl.
• Build Release Applicability: Demo 2, Final Release
3.2.4 WCRI 103
The user shall have the ability to specify a log file in which to save the results of the
crawl.
• Build Release Applicability: Demo 2, Final Release
3.2.5 WCRI 104 [Critical Requirement]
The user shall be allowed to specify the maximum number of websites to crawl before
stopping. This is a critical requirement because without allowing the user to specify
how many websites to search, it would have to be bounded by the application, which is
not a good solution. By allowing the user to specify the maximum number of websites,
it allows much better control over the web crawl.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.2.6 WCRI 105
The user shall be allowed to stop the crawl at any time before it finishes.
• Build Release Applicability: Demo 2, Final Release
9
3.2.7 WCRI 106
The user shall be notified when the crawl is complete.
• Build Release Applicability: Demo 2, Final Release
3.2.8 WCRI 107
The user shall be kept apprised of the total number of pages left to crawl.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.2.9 WCRI 108
The user shall be apprised of the total number of pages crawled.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.2.10 WCRI 109 [Critical Requirement]
The crawler shall follow the robot exclusionary protocol. This is a critical requirement
because it keeps the web crawler from crawling pages that are not intended to be
crawled. If a robot protocol is not specified for a domain, all pages will considered to
be able to be crawled.
• Build Release Applicability: Demo 2, Final Release
3.2.11 WCRI 110 [Critical Requirement]
The crawler shall use multiple threads to avoid putting too much stress on an individual
web host. This is a critical requirement because it will help prevent overloading a web
host with numerous requests one right after another.
• Build Release Applicability: Demo 2, Final Release
3.3 Web Search Requirements
This section details all of the requirements related to the web search portion of the
project. The requirements are numbered WSRI 1XX, where WSRI stands for Web
Search Requirement Item.
3.3.1 WSRI 100 [Critical Requirement]
The user shall be allowed to search over previously crawled web pages. This is a
critical requirement because it is important to provide a web search functionality
10
similar to what is available on the web for comparison to the entity search portion of
the project.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.3.2 WSRI 101 [Critical Requirement]
The user shall have a box to enter search terms. This is a critical requirement because
without a box for user to enter search terms, it would not be possible to provide an
entity search capability.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.3.3 WSRI 102
The user shall be allowed to specify the minimum number of back-links required for a
page containing the search term to be considered a match.
• Build Release Applicability: Demo 2, Final Release
3.3.4 WSRI 103
The URLs that match the search terms shall be sorted in order of number of back-links.
• Build Release Applicability: Demo 2, Final Release
3.3.5 WSRI 104 [Critical Requirement]
The URLs that match the search terms shall be displayed in a scrollable text box. This
is a critical requirement because all of the results need to be shown to the user in a
useful fashion.
• Build Release Applicability: Demo 1, Demo 2, Final Release
3.4 Entity Search Requirements
This section details all of the requirements related to the entity search portion of the
project. The requirements are numbered ESRI 1XX, where ESRI stands for Entity
Search Requirement Item.
3.4.1 ESRI 100 [Critical Requirement]
11
The user shall have the ability to search for entities from previously crawled websites.
This is a critical requirement because providing an entity search capability is the
primary thrust of the project.
• Build Release Applicability: Demo 2, Final Release
3.4.2 ESRI 101 [Critical Requirement]
The user shall have a box to enter search terms. This is a critical requirement because
without a box for user to enter search terms, it would not be possible to provide an
entity search capability.
• Build Release Applicability: Demo 2, Final Release
3.4.3 ESRI 102 [Critical Requirement]
There shall entities for at a minimum: email address, phone number, fax number, street
address, and zip code. This is a critical project requirement, because this is the
minimum amount of information required to reproduce the findings from Tao Cheng’s
work [2].
• Build Release Applicability: Demo 2, Final Release
3.4.4 ESRI 103
There shall be an overarching entity that gathers all contact info.
• Build Release Applicability: Demo 2, Final Release
3.4.5 ESRI 104
The entity search results shall be ranked based on highest score.
• Build Release Applicability: Final Release
3.4.6 ESRI 105 [Critical Requirement]
The user shall be allowed to specify search terms in addition to entity terms. This is a
critical requirement because without being allowed to specify additional search terms, it
would not be possible to return any interesting results.
• Build Release Applicability: Demo 2, Final Release
3.4.7 ESRI 106 [Critical Requirement]
12
The entities that match the search terms shall be displayed in a scrollable text box. This
is a critical requirement because all of the results need to be shown to the user in a
useful fashion.
• Build Release Applicability: Demo 2, Final Release
4 Assumptions
• Java Runtime Environment 1.3.1 or later will be installed on the computer running the
application.
• In order to run a search, the user will have an active Internet connection.
• In order to perform a Web Crawl in a reasonable amount of time, the user will have a
high-speed Internet connection (DSL or better).
• The user will need a minimum of 512 MB of memory.
• The user will have a computer with a minimum speed of 1.6 GHz.
5 Constraints
• Java will be used for the web crawling. While it will not be as efficient as using other
languages, there is much web functionality defined in the JDK, making it easier to write
the web crawling.
• Entity Search is being limited to searching for contact info entities. An excellent future
enhancement would be to add other entity types.
6 Environment
• Eclipse 3.3.0 will be used as the IDE.
• Java version JDK 1.5 will be used.
• The Jigloo plugin for Eclipse will be used for GUI development.
13
CHAPTER 2 - Project Plan
1 Task Breakdown
1.1 Project Phases
The project is broken into three distinct phases: the Inception Phase, the Elaboration
Phase, and the Production Phase.
1.1.1 Inception Phase
The inception phase is focused on creating the scope of the project, and developing the
formal project requirements. A vision document will be developed during this phase,
which details the project scope and requirements. A project plan will also be created
during this phase that describes the project schedule and effort estimate. A software
quality assurance plan will also be designed which will list the required project
documentation as well as the steps that will be taken to ensure a quality project is
delivered.
An initial prototype is created during this phase that will show project feasibility. It
will demonstrate some of the project requirements listed in the vision document.
The inception phase is complete when the developer has delivered a prototype as well
as all required documentation to the supervisory committee, and the supervisory
committee has reviewed and approved all items. The first presentation will be given at
the end of this phase.
1.1.2 Elaboration Phase
During the elaboration phase the architecture of the project will be finalized into an
architectural design plan. In addition, all documents from the inception phase will be
14
updated to include any revisions noted by the supervisory committee from the first
project presentation. The project requirements for the project will be formally specified
using OCL. Also, a formal test plan will be developed that will include the method of
testing, as well as the way of documenting, tracking, and fixing bugs found. Two
fellow MSE students will perform technical inspections of the architectural design and
will report on their findings.
A second prototype will be created during this phase that expands upon the first
prototype. It will demonstrate some of the more challenging project requirements, as
well as showing features requested by the supervisory committee.
The elaboration phase is complete after the developer delivers the second version of the
prototype and all required documentation, and the supervisory committee has given its
approval. The second presentation will be given at the end of this phase.
1.1.3 Production Phase
The production phase focuses on project implementation and testing. During this phase
the developer will complete the coding of the project, as well as produce all supporting
documentation (User Manual, Project Evaluation, Test Logs, etc.)
The production phase is complete when the developer has completed all required
functionality in the project, has delivered the project and all supporting documentation
to the supervisory committee, and the supervisory committee has reviewed and
approved all items. The final presentation will be given at the end of this phase.
1.2 Project Schedule
The current schedule for the project is displayed in Figure 1. If viewing this document
in digital format, the chart can be seen better by increasing the zoom. (A PDF version
of the Gantt chart is also available on the project website.) Note: This schedule held
through both the Inception and Elaboration phases of the project.
15
Figure 2.1 Project Schedule
2 Cost Estimate
Barry Boehm’s Constructive Cost Model (COCOMO) will be used to estimate project
effort and time. The COCOMO model was developed in the early 1980’s and has a
wide range of applicability to software projects.
2.1 Elaboration Phase - COCOMO
Intermediate COCOMO will be used, which is an extension of Basic COCOMO. It
includes an Effort Adjustment Factors (EAF) variable, which adjusts the level of effort
due to estimated project attributes. The KDD-Research Entity Search Tool project is an
Organic Project in COCOMO terms, because it will be a relatively small software
project with somewhat flexible requirements, and a developer with application
programming experience.
Effort will be estimated using the formula: Effort = 3.2 * EAF * (KLOC)1.05
, where
KLOC represents the number of thousands of lines of source code developed. Time
will be estimated in months using the formula:
Time = 2.5 * Effort0.38
.
There are a total of 15 Effort Adjustment Factors, which have different values within a
give range. Each factor is classified as very low, low, nominal, high, very high, or
16
extra high. This classification gives a value to the adjustment factor. The 15 Effort
Adjustment Factors can be found in Table 1.
Table 2.1 COCOMO Effort Adjustment Factors
Identifier Effort Adjustment Factor Possible Range of
Values
RELY Required Software Reliability 0.75 – 1.40
DATA Size of Application Database 0.94 – 1.16
CPLX Complexity of the Product 0.70 – 1.65
TIME Run-time Performance Requirements 1.00 – 1.66
STOR Memory Constraints 1.00 – 1.56
VIRT Virtual Machine Volatility 0.87 – 1.30
TURN Required Turnabout Time 0.87 – 1.15
ACAP Analyst Capability 1.46 – 0.71
AEXP Applications Experience 1.29 – 0.82
PCAP Software Engineer Capability 1.42 – 0.70
VEXP Virtual Machine Experience 1.21 – 0.90
LEXP Programming Language Experience 1.14 – 0.95
TOOL Use of Software Tools 1.24 – 0.82
MODP Use of Modern Software Practices 1.24 – 0.83
SCED Required Development Schedule 1.23 – 1.10
The values chosen for the KDD-Research Entity Search Tool are given in Table 2, as
well as an explanation for the value chosen.
Table 2.2 Project Effort Adjustment Factor Values
Identifier Classification Value Reasoning
RELY Low 0.88 Project is not safety critical,
and does not have to be
completely reliable
17
DATA High 1.08 A large number of web pages
are needed in order to
perform a thorough search
CPLX Nominal 1.00 Web crawling, Web Search,
and Entity Search are not
overly complicated concepts
TIME Nominal 1.00 Response time is important
yet not overly critical
STOR Very High 1.21 Crawling and searching will
require a lot of memory
usage
VIRT Low 0.87 Low complexity of the
hardware and software
TURN Low 0.87 Since this is a single
developer project, the
turnaround time on results is
low
ACAP High 0.86 Developer has 4+ years
experience in software
engineering
AEXP High 0.91 Developer has 3+ years
experience in applications
development
PCAP High 0.86 Developer has applicable
experience
VEXP Nominal 1.00 Developer has 2+ years
experience developing for
Java virtual machine
LEXP High 0.95 Developer has 2+ years
experience developing using
Java
18
TOOL Nominal 1.00 Moderate experience with
tools being used
MODP Very High 0.83 Developer has 4+ years
experience in employing
modern software engineering
practices
SCED Nominal 1.00 Project has a tight schedule,
but some slippage is
allowable
Based on these numbers, the value for EAF is: 0.95
The estimated size of the project is 2.25 KLOC. This estimated is based determining
the KLOC of other the source of available web crawlers. Simple applet web crawlers
with minimal extra features average about 0.75 KLOC. The 2 KLOC estimate is
calculated by doubling the web crawler estimate to include entity search plus an
additional 0.75 KLOC for additional GUI features that are not available in the applets.
Using these figures, the Effort and Time values are calculated as:
Effort = 3.2 * 0.95 * 2.251.05
= 7.12
Time = 2.5 * 7.120.38
= 5.27
This means that COCOMO estimates that 7.12 staff months will be necessary to
complete the project. The Time value estimates that the project can be completed in
5.27 chronological months. I believe that this estimate is fairly accurate. In fact, it is
very close to the estimate presented in the Gantt chart in Section 1.2. As can be seen in
the Gantt chart, additional time is needed for the project to produce additional
documentation for the MSE project.
The COCOMO model is not without its faults however. It is based on projects that
were created by teams of members, so it may not apply perfectly to projects where
19
there is a single developer. This estimate also assumes a fairly steady development
time, with little interruption – increased project complexity, scope, or misjudged EAF
values can cause the estimate to be off.
2.2 Production Phase Estimates
Estimates for the Production Phase were complied differently than those for the
Elaboration Phase. At the completion of the Elaboration Phase, there were a total of 2K
SLOC developed. This represented the implementation of 29 out of 34 requirements,
which means that 85 percent of all requirements have been implemented. Assuming
that all requirements represent about the same amount of SLOC to develop, this means
that there are about 353 SLOC left to develop ((2000 / 0.85) - 2000).
At the completion of the Elaboration Phase, software productivity was calculated as
17.86 SLOC per hour. This means that the time remaining in software development
during the production phase should be about 20 hours (353 / 17.86). Due to the
developer only being able to devote about 2 hours a day to software development, this
represents about 10 days worth of coding remaining. The original estimates for testing
(21 days) and documentation (25 days) still hold. This means that the time required for
the Production Phase should be 56 days (10 + 21 + 25).
3 Architecture Elaboration Plan
This section details all of the documents and artifacts that are to be completed by the
end of the Elaboration phase before the second presentation.
3.1 Vision Document Revision
Suggestions from the supervisory committee during the first project presentation
regarding the vision document will be included in a revision of the vision document.
The document will also be updated to include a complete requirements listing. The
requirements will be ranked in order of importance, and will have unique identifiers.
The major professor will approve the changes to the document.
20
3.2 Project Plan Revision
Suggestions from the supervisory committee during the first project presentation
regarding the project plan will be included in a revision of the project plan. The Gantt
chart will be updated with any changes in schedule, and the COCOMO estimate will be
updated based on any changes regarding the cost estimate. The major professor will
approve the changes to the document.
3.3 Architectural Design
The architectural design document will use UML to create the architectural
components. It will include all state, sequence, class, and data models for the projects.
The major professor will approve the architectural design document.
3.4 Prototype Development
The prototype developed during the Inception Phase will be expanded upon during the
Elaboration Phase. Additions will include new functionality, as well as suggestions
from the supervisory committee during the first project presentation. The features
implemented for the prototype will be approved by the major professor.
3.5 Test Plan
A test plan will be developed that ensures that all requirements specified in the Vision
Plan are met. The document will contain detailed instructions on how to evaluate the
product, and will be approved by the major professor.
3.6 Formal Technical Inspections
Two MSE students will provide input into the project by completing formal technical
inspections. The inspectors will use a formal inspection checklist that will be produced
during the Elaboration Phase. Both inspectors will produce a report based on their
findings.
3.7 Formal Requirements Specification
The web crawling portion of the project will be specified using OCL. This section was
chosen rather than the entity search portion of the project because it will allow for a
21
more substantial formal specification. The specification will be done in OCL using the
USE (UML-based Specification Environment) tool. The major professor will approve
the formal requirements specification.
4 Software Production Plan
This section details all of the documents and artifacts that are to be completed by the end of
the Production phase before the third presentation.
4.1 Test Plan Revision
Suggestions from the supervisory committee during the second project presentation
regarding the test plan will be included in a revision of the document. The document
will also be updated with specific file names to be loaded. The major professor will
approve the changes to the document.
4.2 Architectural Design Revision
Suggestions from the supervisory committee during the second project presentation
regarding the design will be included in a revision of the Architectural Design
document. The major professor will approve the changes to the document.
4.3 Component Design
The component design document will use UML to convey detailed information about
the software components. It will include all attributes and methods for the classes in
the project. The major professor will approve the component design document.
4.4 Final Software Executable
The prototype developed during the Architecture Elaboration Phase will be expanded
upon during the Production Phase. Additions will include all remaining required
functionality, as well as late suggestions from the supervisory committee during the
second project presentation. The features implemented for the final executable will be
approved by the major professor.
22
4.5 Formal Technical Inspections
Two MSE students will provide input into the project by completing formal technical
inspections. The inspectors will use a formal inspection checklist that was produced
during the Elaboration Phase. Both inspectors will produce a report based on their
findings.
4.6 User’s Manual
At the completion of software development, the developer will create a User’s Manual,
which will act as a guide for using the completed system. The manual will be broken
up into different sections for performing various tasks within the system, and will act as
a basic walkthrough of the system. The manual will also list various troubleshooting
problems and solutions.
4.7 Test Assessment
At the completion of software development, the developer will run the tests contained
in the Test Plan document, and will record the results. The Test Assessment document
will contain the results of running these tests.
4.8 Technical Instructions for Reuse and Extension
At the completion of software development, the developer shall produce a guide that
explains how to reuse the project in the future for other MSE projects. The document
shall also describe how to extend various features within the project to adapt the project
for different types of use.
4.9 Project Assessment
At the completion of software development and testing, the developer will write up a
document containing the developer’s opinion on the project. The document will
describe in detail what went well, what could have been better, and what simply did not
work. The Project Assessment will also contain the final metrics for the project.
23
CHAPTER 3 - Software Quality Assurance Plan
1 Software Production Plan
This document defines the steps taken to ensure that the Knowledge Discovery in
Databases (KDD) Research Entity Search Tool project is a high quality product. All
required documentation for the project is listed.
2 Management
2.1 Organization
Supervisory Committee
• Dr. Scott DeLoach
• Dr. David Gustafson
• Dr. William Hsu
Major Professor
• Dr. William Hsu
Developer
• Eric Davis
Formal Technical Inspectors
• Steve Stampbach
• Tim Weninger
2.2 Tasks
All project tasks are discussed in detail in the Project Plan. The Project Plan includes a
Gantt chart that lays out all of the tasks and their deadlines.
24
2.3 Responsibilities
2.2.1 Supervisory Committee
The role of the supervisory committee is to prepare for and attend each of the three
project presentations that will occur at the end of each project phase. The committee
members will provide feedback and suggestions on the state of the project.
2.2.2 Major Professor
The role of the major professor is twofold: to act as a supervisory committee member,
and to meet weekly with the developer to discuss progress, expectations, and to provide
suggestions.
2.2.3 Developer
The role of the developer is to produce the product and all supporting documentation.
The developer is responsible for maintaining a time log, and for meeting weekly with
the major professor to discuss the project.
2.2.4 Formal Technical Inspectors
The roles of the formal technical inspectors are for completing a formal inspection of
the project’s architecture, design, and source code. They will submit a report on their
findings during the formal inspection.
3 Documentation
The official documentation requirements for MSE projects are defined at:
http://mse.cis.ksu.edu/online/mse-portfolio.htm. Additional documentation may be
required at the discretion of the major professor and developer. The planned
documentation for the project is listed in Section 12 of this document.
All project documentation will be available on the project website:
http://www.cis.ksu.edu/~efd3467/index.html
4 Standards, Practices, Conventions, and Metrics
25
4.1 Documentation Standards
IEEE standards will be followed for all applicable documentation throughout the
project
4.2 Coding Standards
Java naming conventions will be followed for all source code developed. Source code
API will be generated using Javadoc.
4.3 Metrics
COCOMO will be used to estimate project effort.
5 Reviews and Audits
All documentation, source code, and executable products will be evaluated by members of
the supervisory committee at the conclusion of each phase of the project. Formal
inspections of the architecture, design, and source code will be conducted by the formal
technical inspectors when the coding is complete.
6 Testing
The Test Plan will list the test procedures and expected results of all tests in detail.
However, a brief description of what type of testing will be performed will be given below.
General testing of the web crawling and web search portions of the project will be
performed by crawling the Kansas State University Department of Computer and
Information Sciences domain and searching for specific pieces of information. Sample
queries include “professor of machine learning”, “computer graphics”, and “enrollment
forms”. The results returned will then be verified manually to ensure that the pages being
returned actually contain the requested search strings.
The formal testing of the entity search portion of project will follow the same tests as found
in Tao Cheng’s entity search work [2]. A dataset based on a 2006 general web crawl from
WebBase Project will be used. The original data was over 2TB, so it will have to be scaled
down in order to allow reasonable testing to occur. Once the data is scaled down, entity
26
searches will be performed on the data to see if the correct information can be extracted.
Sample queries include “Amazon Customer Service #phone”, “Bill Gates #email”, and
“Ebay Customer Service #phone”. The results given as the best results by the entity
searcher will be manually checked for accuracy.
7 Problem Reporting and Corrective Actions
All problems found during testing will be recorded in the Software Problem Report
spreadsheet. Each problem found will list the problem, the estimated time to fix, the date
fixed, and the corrective action taken. If the problem cannot or will not be solved during
the project, it will be noted. All problems will be discussed with the major professor.
8 Tools, Technologies, and Methodologies
The following tools will be used for coding, testing, and documentation:
• Eclipse IDE – for software development
• Eclipse FatJar – for building executable JAR files
• Eclipse Jigloo Plug-in – for GUI development
• Microsoft Word – for documentation development
• Microsoft Excel – for risk and problem report tracking and time logs
• Microsoft PowerPoint – for project presentation creation
• Adobe Acrobat – for document conversion to PDF
• Microsoft Project – for project planning
• Microsoft Visio – for software design development
• USE 2.3.1 – for developing formal specifications
9 Code and Media Control
All developed source code will be controlled using a CVS system. The CVS is located at:
http://fingolfin.user.cis.ksu.edu/repos/KDD/projects/entitysearch.
All documents will be maintained on the developer’s personal computer with associated
version numbers. Change logs will be maintained in each document. All completed
27
project documentation will be available on the project website at:
http://www.cis.ksu.edu/~efd3467/index.html.
10 Risk Management
Software risks will be documented in the Software Risk Reporting and Mitigation
spreadsheet. The risks and potential mitigation strategies will be discussed with the major
professor as they appear.
11 Deliverables
The following are the deliverables for each phase of the project:
Phase I
• Vision Document
• Project Plan
• Prototype Demonstration
• Software Quality Assurance Plan
• Time Log
• Presentation
Phase II
• Vision Document
• Project Plan
• Software Requirements Specification
• Architecture Design
• Test Plan
• Software Risk Reporting and Mitigation Document
• Technical Inspection Checklist
• Executable Architecture Prototype
• Action Items
• Time Log
28
• Presentation
Phase III
• Component Design
• Source Code
• Executable Project
• User Manual
• Formal Technical Inspection Letters
• Project Evaluation
• Software Problem Reports
• Time Log
• Presentation
29
CHAPTER 4 - Architectural Design
1 Introduction
The purpose of this document is to provide an architectural design of the KDD-Research
Entity Search Tool (KREST). The document will illustrate class diagrams and sequence
diagrams. The purpose of each class in the diagrams will be given. Also, a formal
specification of the web crawler portion of the project will be given in Section 3.
1.1 Background
The purpose of KREST is to provide a multifunctional web search tool that runs as a
standalone application. The project allows the user to perform a web crawl, to perform
a basic web search over the crawled pages, and to perform an entity search over the
crawled pages. The project also allows the user to perform web searches and entity
searches based on datasets that can be loaded into the tool.
2 KDD-Research Entity Search Tool Architecture
2.1 Package View
The KREST project will follow the Model-View-Controller (MVC) architecture, with
an application class to kick off the project. This allows the screen to be easily updated
via changes to the model.
30
Figure 4.1 Package View
2.2 Application Package
Figure 4.2 KREST Application Package
2.2.1 Class Description
2.1.1.1 KrestApplication
The KrestApplication class is a very simple class that will be used to start up the
program. It will startup the KrestController and makes it visible.
2.3 Controller Package
31
Figure 4.3 Controller Package
2.3.1 Class Description
2.3.1.1 KrestController
The KrestController class is the class responsible for getting all of the other parts up
and running. It is responsible for signaling the web crawls, web searches, and entity
searches to begin processing. It also controls displaying the form.
32
Figure 4.4 KrestController Class
2.3.1.2 KrestAboutDialog
The KrestAboutDialog class is a Dialog that displays information about the KREST
application.
33
Figure 4.5 KrestAboutDialog Class
2.3.1.3 WebCrawler
The WebCrawler class is responsible for setting up everything needed for a web
crawl, and starting the process to do it.
Figure 4.6 WebCrawler Class
2.3.1.4 SiteVisitor
The SiteVisitor is responsible for visiting individual web pages. Each instance of
the SiteVisitor class is a thread that represents a different web page being visited.
34
Figure 4.7 SiteVisitor Class
2.3.1.5 ThreadController
The ThreadController class is responsible for ensuring that only up to the
maximum number of specified web crawling threads are running at any one time.
The web crawling threads are instances of the SiteVisitor class. The
ThreadController maintains tickets to keep track of which threads are allowed to
run. If a thread has a ticket, it is allowed to run, otherwise it sleeps while waiting
to grab a ticket.
Figure 4.8 ThreadController Class
2.3.1.6 HTTPReader
35
The HTTPReader class is responsible for downloading the text of a given web page.
If the given web page does not exist, it will throw an exception.
Figure 4.9 HTTPReader Class
2.3.1.7 WebSearcher
The WebSearcher class is responsible for setting up everything needed for a web
search, and starting the process to do it.
Figure 4.10 WebSearcher Class
2.3.1.8 EntitySearcher
The EntitySearcher class is responsible for setting up everything needed for a web
crawl, and starting the process to do it.
Figure 4.11 EntitySearcher Class
36
2.4 View Package
Figure 4.12 View Package
2.4.1 Class Description
2.4.1.1 KrestView
The KrestView class is an abstract class that can be implemented by the
CrawlerObserver, the SearchObserver, and the EntityObserver classes. It is used to
update the display based on changes from the model.
Figure 4.13 KrestView Class
2.4.1.2 CrawlerObserver
The CrawlerObserver class is responsible for updating the screen when the model
changes due to web crawling.
37
Figure 4.14 CrawlerObserver Class
2.4.1.3 SearchObserver
The SearchObserver class is responsible for updating the screen when the model
changes due to web searching.
Figure 4.15 SearchObserver Class
2.4.1.4 EntityObserver
The EntityObserver class is responsible for updating the screen when the model
changes due to entity searching.
38
Figure 4.16 EntityObserver Class
2.5 Model Package
Figure 4.17 Model Package
2.5.1 Class Description
2.5.1.1 KrestModel
The KrestModel class is responsible for holding the current KrestObjectLibrary
object, and making the appropriate pieces available to other classes.
Figure 4.18 KrestModel Class
2.5.1.2 KrestObjectLibrary
39
The KrestObjectLibrary class is responsible for holding onto all created
WebObjects.
Figure 4.19 KrestObjectLibrary Class
2.5.1.3 WebObject
The WebObject class is an abstract class that can be implemented by both the
Webpage class and the KrestEntity class. It is used to hold data found based on
web crawls and web or entity searches.
Figure 4.20 WebObject Class
2.5.1.4 Webpage
The Webpage class is responsible for holding onto information about a single web
site. Each web site explored will have its own Webpage instance.
Figure 4.21 Webpage Class
2.5.1.5 KrestEntity
40
The KrestEntity class is an abstract class that can be implemented by the
AddressEntity, EmailEntity, FaxEntity, PhoneEntity, ZipEntity, and
OverarchingEntity. It is used to hold data found based on entity searches.
Figure 4.22 KrestEntity Class
2.5.1.6 AddressEntity
The AddressEntity class is responsible for holding onto information about a single
address entity. Each address found during an entity search will have its own
instance of this class.
Figure 4.23 AddressEntity Class
2.5.1.7 EmailEntity
The EmailEntity class is responsible for holding onto information about a single
email entity. Each email address found during an entity search will have its own
instance of this class.
Figure 4.24 EmailEntity Class
41
2.5.1.8 FaxEntity
The FaxEntity class is responsible for holding onto information about a single fax
entity. Each fax number found during an entity search will have its own instance of
this class.
Figure 4.25 FaxEntity Class
2.5.1.9 PhoneEntity
The PhoneEntity class is responsible for holding onto information about a single
phone entity. Each phone number found during an entity search will have its own
instance of this class.
Figure 4.26 PhoneEntity Class
2.5.1.10 ZipEntity
The ZipEntity class is responsible for holding onto information about a single zip
entity. Each zip code found during an entity search will have its own instance of
this class.
Figure 4.27 ZipEntity Class
42
2.5.1.11 OverarchingEntity
The OverarchingEntity class is responsible for holding onto information about all
entity types. Each street address, email address, fax number, phone number, and
zip code found during an overarching entity search will have its own instance of
this class.
Figure 4.28 OverarchingEntity Class
2.6 Sequence Diagrams
The following three sub-sections show the sequence diagrams for three different user
actions: performing a web crawl, performing a web search, and performing an entity
search.
2.6.1 User Performs a Web Crawl
Prerequisites: KREST is already running.
Sequence of Events:
1. User presses the ‘Begin Crawl’ button.
43
2. The KrestController is notified that the crawl button was pressed, and tells the
WebCrawler to begin the crawl.
3. The WebCrawler tells the SiteVisitor to start visiting web pages.
4. SiteVisitor updates the model with the web pages visited.
5. SiteVisitor notifies that the crawl is complete.
6. WebCrawler updates the screen with the latest information via the
CrawlerObserver.
Post-conditions: The KrestModel is update with all pages visited, and the screen is
updated for the user.
Figure 4.29 Web Crawl Sequence Diagram
2.6.2 User Performs a Web Searchl
Prerequisites:
1. KREST is already running.
2. A web crawl has already been performed.
44
Sequence of Events:
1. User presses the ‘Begin Search’ button.
2. The KrestController is notified that the search button was pressed, and tells
the WebSearcher to begin the search.
3. The WebSearcher queries the KrestModel for all Webpages.
4. The WebSearcher searches through the crawled pages for the search terms.
5. WebSearcher updates the screen with the matching pages via the
SearchObserver.
Post-conditions: The screen is updated with all matching web pages for the user.
Figure 4.30 Web Search Sequence Diagram
2.6.3 User Performs an Entity Search
Prerequisites:
45
1. KREST is already running.
2. A web crawl has already been performed.
Sequence of Events:
1. User presses the ‘Begin Search’ button.
2. The KrestController is notified that the search button was pressed, and tells
the EntitySearcher to begin the search.
3. The EntitySearcher queries the KrestModel for all Webpages.
4. The EntitySearcher searches through the crawled pages for the search terms,
and extracts corresponding entities.
5. EntitySearcher updates the screen with the entities found via the
EntityObserver.
Post-conditions: The screen is updated with all found entities.
Figure 4.31 Entity Search Sequence Diagram
46
3 Formal Specification
The project was formally specified in OCL and validated using USE 2.3.1. All of the
important classes, attributes, and operations were specified. Invariants and pre and post
conditions were also specified. The formal specification is contained below:
model krest
--
-- APPLICATION PACKAGE
--
class KrestApplication
operations
init()
end
--
-- CONTROLLER PACKAGE
--
class KrestController
attributes
operations
KrestController()
initGui()
crawlButtonActionPerformed()
resetCrawlerButtonActionPerformed()
searchButtonActionPerfomed()
resetTables()
entitySearchButtonActionPerformed()
end
47
class KrestAboutDialog
operations
KrestAboutDialog()
end
class WebSearcher
attributes
matches: Set(WebObject)
searchString: String
operations
WebSearcher(newSearchString: String)
beginSearch()
end
class WebCrawler
attributes
siteVisitor: SiteVisitor
debugSwitch: Boolean
operations
WebCrawler()
beginCrawl(pageAddress: String, searchString: String, maxToCrawl: Integer,
minBacklinks: Integer, filePath: String, maxDepth: Integer)
stopCrawling()
performReset(partial: Boolean)
getMatches(): Set(WebObject)
getSiteVisitorThreads(): Integer
getSiteVisitor(): SiteVisitor
end
class ThreadController
48
attributes
crowdSize: Integer
maxCrowdSize: Integer
ticketDatabase: Integer
operations
ThreadController(maxCrowdSizes: Integer)
getTicket(): Integer
returnTicket(ticket: Integer)
findFreeTicket(): Integer
end
class EntitySearcher
attributes
entityMatches: Set(KrestEntity)
entityType: Integer
searchString: String
operations
EntitySearcher(newSearchString: String)
beginSearch()
end
class SiteVisitor
attributes
MAX_THREADS: Integer
threadLimiter: ThreadController
debugSwitch: Boolean
searchString: String
crawlCounter: Integer
pagesToCrawl: Integer
pagesVisited: Integer
maxCrawl: Integer
49
threadCount: Integer
fileName: String
pageAddress: String
keepProcessing: Boolean
threadList: Set(SiteVisitor)
maxDepth: Integer
currentDepth: Integer
pageDatabase: Set(Webpage)
pageToFetch: Webpage
operations
SiteVisitor(pageAddr: String, searchStr: String, maxToCrawl: Integer, filePath: String,
maxSearchableDepth: Integer, curDepth: Integer)
start()
run()
stopAllThreads()
resetCrawler(partial: Boolean)
getMatches(): Set(Webpage)
getThreadCount(): Integer
getCrawlCount(): Integer
getQueueCount(): Integer
loadPage(page: String): String
extractHyperTextLinks(page: String)
containsSearchString(page: String): Boolean
alreadyVisited(pageAddr: String): Boolean
markAsVisited(pageAddr: String)
end
class HTTPReader
attributes
HTTP_PORT: Integer
operations
50
HTTPReader()
downloadWWWPage(): String
end
--
-- MODEL PACKAGE
--
class KrestModel
attributes
library: KrestObjectLibrary
name: String
operations
KrestModel()
setName(newName: String)
getName(): String
addObject(webObject: WebObject)
removeObject(webObject: WebObject)
getData(): KrestObjectLibrary
addObserver(observer: KrestView)
end
class KrestObjectLibrary
attributes
objects: Set(WebObject)
operations
KrestObjectLibrary()
findObjectByName(name: String): WebObject
findObjectsByType(type: Integer): KrestObjectLibrary
getKeys(): Set(String)
51
addObject(newObject: WebObject)
removeObject(objectToRemove: WebObject)
getAllObjects(): Set(WebObject)
end
class WebObject
attributes
name: String
operations
WebObject(newName: String)
getName(): String
setName(newName: String)
end
class Webpage < WebObject
attributes
pageText: String
backlinksCount: Integer
backlinks: Set(String)
operations
Webpage(newName: String)
getText(): String
setText(newText:String)
getBacklinkCount(): Integer
addNewBacklink(backlinkName: String)
end
class KrestEntity < WebObject
attributes
entityName: String
entityPattern: String
52
numberOfOccurrences: Integer
occurrenceList: Set(KrestEntity)
operations
KrestEntity()
getName(): String
setName(newName: String)
addOccurrence(websiteFound: String)
getAllOccurrences(): Set(KrestEntity)
end
class AddressEntity < KrestEntity
attributes
streetAddress: String
cityString: String
stateString: String
operations
AddressEntity(newStreet: String, newCity: String, newState: String)
getStreet(): String
getCity(): String
getState(): String
setStreet(newStreet: String)
setCity(newCity: String)
setState(newState: String)
end
class PhoneEntity < KrestEntity
attributes
areaCode: String
phoneNumber: String
operations
PhoneEntity(newAreaCode: String, newPhoneNumber: String)
53
getAreaCode(): String
getPhoneNumber(): String
getAreaAndPhoneNumber(): String
setAreaCode(newAreaCode: String)
setPhoneNumber(newPhoneNumber: String)
end
class FaxEntity < KrestEntity
attributes
areaCode: String
faxNumber: String
operations
FaxEntity(newAreaCode: String, newFaxNumber: String)
getAreaCode(): String
getFaxNumber(): String
getFullFaxNumber(): String
setAreaCode(newAreaCode: String)
setFaxNumber(newFaxNumber: String)
end
class ZipEntity < KrestEntity
attributes
zipCode: String
operations
ZipEntity(newZipCode: String)
getZipCode(): String
setZipCode(newZipCode: String)
end
class OverarchingEntity < KrestEntity
attributes
54
phoneAreaCode: String
phoneNumber: String
faxAreaCode: String
faxNumber: String
streetString: String
cityString: String
stateString: String
zipString: String
emailAddress: String
operations
OverarchingEntity(phoneAreaCodeString: String, phoneNumberString: String,
faxAreaCodeString: String, faxNumberString: String, newStreetString: String, newCityString:
String, newStateString: String, newZipCodeString: String, newEmailAddress: String)
getPhoneAreaCode(): String
getPhoneNumber(): String
getAreaAndPhoneNumber(): String
getFaxAreaCode(): String
getFaxNumber(): String
getFaxAreaAndNumber(): String
getStreetAddress(): String
getCity(): String
getState(): String
getZipCode(): String
getEmailAddress(): String
setPhoneAreaCode(newAreaCode: String)
setPhoneNumber(newPhoneNumber: String)
setFaxAreaCode(newAreaCode: String)
setFaxNumber(newFaxNumber: String)
setStreetAddress(newStreetAddress: String)
setCity(newCity: String)
setState(newState: String)
55
setZipCode(newZipCode: String)
setEmailAddress(newEmailAddress: String)
end
class EmailEntity < KrestEntity
attributes
emailAddress: String
operations
EmailEntity(newEmail: String)
getEmailAddress(): String
setEmailAddrses(newEmail: String)
end
--
-- VIEW PACKAGE
--
class KrestView
attributes
crawler: CrawlObserver
search: SearchObserver
entity: EntityObserver
operations
KrestObserver()
end
class CrawlObserver
attributes
operations
CrawlerObserver()
56
updateCurrentlyCrawlingField(): String
updateCrawledURLsTextField(): String
updateQueuedSitesTextField(): String
updateCrawlProgressBar(): Integer
end
class EntityObserver
attributes
operations
EntityObserver()
updateEntitySearchResults(results: KrestObjectLibrary)
end
class SearchObserver
attributes
operations
SearchObserver()
updateWebSearchResults(results: KrestObjectLibrary)
end
--
-- ASSOCIATIONS
--
--
-- CONTROLLER PACKAGE
--
association Dialog between
KrestController[1]
57
KrestAboutDialog[0..1] role dialog
end
association Searcher between
KrestController[1]
WebSearcher[1] role searcher
end
association Crawler between
KrestController[1]
WebCrawler[0..1] role crawler
end
association Entity between
KrestController[1]
EntitySearcher[1] role entity
end
association Threads between
WebCrawler[1]
ThreadController[1] role threads
end
association Visitor between
WebCrawler[1]
SiteVisitor[1..*] role visitor
end
association Reader between
SiteVisitor[1]
HTTPReader[1] role reader
58
end
--
-- MODEL PACKAGE
--
association Library between
KrestModel[1]
KrestObjectLibrary[1] role library
end
association Objects between
KrestObjectLibrary[1]
WebObject[0..*] role objects
end
--
-- VIEW PACKAGE
--
association CrawlerView between
KrestView[1]
CrawlObserver[1] role crawlerview
end
association SearcherView between
KrestView[1]
SearchObserver[1] role searchview
end
association EntityView between
59
KrestView[1]
EntityObserver[1] role entityview
end
--
-- CONSTRAINTS
--
constraints
--
-- All WebSearcher matches must have unique names
--
context ws : WebSearcher
inv UniqueNamesWebSearcherMatches:
ws.matches->forAll(p1,p2 | p1 <> p2
implies p1.name <> p2.name)
--
-- Every ThreadController has a current crowd size, a max crowd size,
-- and a tickets in the database count >= 0
--
context tc : ThreadController
inv PositiveCrowdSize:
tc.crowdSize >= 0
inv PositiveMaxCrowdSize:
tc.maxCrowdSize >= 0
inv PositiveDatabaseTicketsCount:
tc.ticketDatabase >= 0
60
--
-- All EntitySearcher matches must have unique names, and entity type
-- must be >= 0
--
context es : EntitySearcher
inv UniqueNamesEntitySearcherMatches:
es.entityMatches->forAll(p1,p2 | p1 <> p2
implies p1.entityName <> p2.entityName)
inv PositiveEntityType:
es.entityType >= 0
--
-- Every SiteVisitor has a MAX_THREADS value that must be >= 0,
-- crawlCounter that must be >= 0, a pages left
-- to crawl counter that must be >= 0, a pages visited counter that must be >= 0,
-- a max number of pages to
-- crawl that must be >= 0, a current thread count that must be >= 0, a maximum
-- search depth value that must
-- be >= 0, a current depth count that must be >= 0 and <= the max search depth,
-- a page database that only
-- contains unique webpages
--
context sv : SiteVisitor
inv PositiveMaxThreads:
sv.MAX_THREADS >= 0
inv PositiveCrawlCounter:
sv.crawlCounter >= 0
inv PositivePagesToCrawl:
61
sv.pagesToCrawl >= 0
inv PositivePagesVisited:
sv.pagesVisited >= 0
inv PositiveMaxCrawlCount:
sv.maxCrawl >= 0
inv PositiveThreadCount:
sv.threadCount >= 0
inv PositiveMaxSearchDepth:
sv.maxDepth >= 0
inv PositiveCurrentDepth:
sv.currentDepth >= 0
inv CurrentDepthNotGreaterThanMaxDepth:
sv.currentDepth <= sv.maxDepth
inv UniqueWebpagesOnly:
sv.pageDatabase->forAll(p1,p2 | p1 <> p2
implies p1.name <> p2.name)
--
-- Every HTTPReader has a HTTP_PORT value between 0 and 65535
--
context hr : HTTPReader
inv PositivePortValue:
hr.HTTP_PORT >= 0
inv PortValueLessThanMax:
hr.HTTP_PORT <= 65535
--
-- All KrestObjectLibrary objects must have unique names
--
62
context lib : KrestObjectLibrary
inv UniqueNamesKrestObjectLibrary:
lib.objects->forAll(p1,p2 | p1 <> p2
implies p1.name <> p2.name)
--
-- Every Webpage object has a positive number of backlinks
--
context wp : Webpage
inv PositiveBacklinks:
wp.backlinksCount >= 0
--
-- All KrestEntities must have unique names, and the number of occurrences must
-- be positive
--
context ent : KrestEntity
inv UniqueNamesKrestEntities:
ent.occurrenceList->forAll(p1,p2 | p1 <> p2
implies p1.entityName <> p2.entityName)
inv PositiveOccurrenceCount:
ent.numberOfOccurrences >= 0
--
-- All WebObjects must be either a Webpage or KrestEntity, but not both
--
context wo: WebObject
inv IsOneOfItsSubtypes:
63
wo.oclIsKindOf(Webpage) or wo.oclIsKindOf(KrestEntity)
inv MutualExclusion1:
if wo.oclIsKindOf(Webpage) then not wo.oclIsKindOf(KrestEntity) else
wo.oclIsKindOf(KrestEntity) endif
--
-- OPERATIONS
--
--
-- Any added objects to the KrestModel must be new objects
--
context KrestModel::addObject(webObject: WebObject)
pre cond1 : library.objects->excludes(webObject)
post cond2 : library.objects = library.objects@pre->including(webObject)
post cond3 : (library.objects - library.objects@pre)->size() = 1
--
-- Deleting an object from the KrestModel must remove it while the other objects
-- remain unchanged
--
context KrestModel::removeObject(webObject: WebObject)
pre cond1 : library.objects->includes(webObject)
post cond2 : library.objects = library.objects@pre->excluding(webObject)
post cond3 : (library.objects@pre - library.objects)->size() = 1
--
-- Finding an object by name in the KrestObjectLibrary
--
64
context KrestObjectLibrary::findObjectByName(name: String): WebObject
post cond1 : result = objects->any(c1 | c1.name = name)
--
-- Any added objects to the KrestObjectLibrary must be new objects
--
context KrestObjectLibrary::addObject(newObject: WebObject)
pre cond1 : objects->excludes(newObject)
post cond2 : objects = objects@pre->including(newObject)
post cond3 : (objects - objects@pre)->size() = 1
--
-- Deleting an object from the KrestObjectLibrary must remove it while the other
-- objects remain unchanged
--
context KrestObjectLibrary::removeObject(objectToRemove: WebObject)
pre cond1 : objects->includes(objectToRemove)
post cond2 : objects = objects@pre->excluding(objectToRemove)
post cond3 : (objects@pre - objects)->size() = 1
--
-- Getting all objects from the KrestObjectLibrary returns all objects
--
context KrestObjectLibrary::getAllObjects(): Set(WebObject)
post cond1 : result = self.objects
--
65
-- Getting the name from the WebObject returns its name
--
context WebObject::getName(): String
post cond1 : result = self.name
--
-- Setting the name for the WebObject sets its name
--
context WebObject::setName(newName: String)
post cond1 : self.name = newName
--
-- Getting the page text from the Webpage returns its text
--
context Webpage::getText(): String
post cond1 : result = self.pageText
--
-- Setting the pageText for the Webpage sets its text
--
context Webpage::setText(newText: String)
post cond1 : self.pageText= newText
--
-- Getting the backlink count from the Webpage returns its count
--
66
context Webpage::getBacklinkCount(): Integer
post cond1 : result = self.backlinksCount
--
-- Any added backlinks to the Webpage must be new objects
--
context Webpage::addNewBacklink(backlinkName: String)
pre cond1 : backlinks->excludes(backlinkName)
post cond2 : backlinks = backlinks@pre->including(backlinkName)
post cond3 : (backlinks - backlinks@pre)->size() = 1
--
-- Getting the name from the KrestEntity returns its name
--
context KrestEntity::getName(): String
post cond1 : result = self.entityName
--
-- Setting the name for the KrestEntity sets its entityName
--
context KrestEntity::setName(newName: String)
post cond1 : self.entityName = newName
--
-- Any added occurrence of an entity will be a new occurrence, and will increment
-- the number of occurrences
--
67
context KrestEntity::addOccurrence(websiteFound: String)
pre cond1 : self.occurrenceList.entityName->excludes(websiteFound)
post cond2 : occurrenceList.entityName = occurrenceList.entityName@pre-
>including(websiteFound)
post cond3 : (occurrenceList - occurrenceList@pre)->size() = 1
post cond4 : (numberOfOccurrences - numberOfOccurrences@pre) = 1
--
-- Getting all KrestEntity occurrences returns the list of occurrences
--
context KrestEntity::getAllOccurrences(): Set(KrestEntity)
post cond1 : result = self.occurrenceList
--
-- Creating a new AddressEntity sets the values passed in
--
context AddressEntity::AddressEntity(newStreet: String, newCity: String, newState:
String)
post cond1 : streetAddress = newStreet
post cond2 : cityString = newCity
post cond3 : stateString = newState
--
-- Getting the Street from AddressEntity returns the Street string
--
context AddressEntity::getStreet(): String
post cond1 : result = self.streetAddress
68
--
-- Getting the City from AddressEntity returns the city string
--
context AddressEntity::getCity(): String
post cond1 : result = self.cityString
--
-- Getting the stateString from AddressEntity returns the state string
--
context AddressEntity::getState(): String
post cond1 : result = self.stateString
--
-- Setting the street for AddressEntity stores the new street
--
context AddressEntity::setStreet(newStreet: String)
post cond1 : streetAddress = newStreet
--
-- Setting the city for AddressEntity stores the new city
--
context AddressEntity::setCity(newCity: String)
post cond1 : cityString = newCity
--
-- Setting the state for AddressEntity stores the new state
--
69
context AddressEntity::setState(newState: String)
post cond1 : stateString = newState
--
-- Creating a new PhoneEntity sets the values passed in
--
context PhoneEntity::PhoneEntity(newAreaCode: String, newPhoneNumber: String)
post cond1 : areaCode = newAreaCode
post cond2 : phoneNumber = newPhoneNumber
--
-- Getting the Area Code from PhoneEntity returns the area code
--
context PhoneEntity::getAreaCode(): String
post cond1 : result = self.areaCode
--
-- Getting the Phone Number from PhoneEntity returns the phone number
--
context PhoneEntity::getPhoneNumber(): String
post cond1 : result = self.phoneNumber
--
-- Getting the Area Code and Phone Number from PhoneEntity returns the area code
concatenated with the phone number
--
70
context PhoneEntity::getAreaAndPhoneNumber(): String
post cond1 : result = self.areaCode.concat(self.phoneNumber)
--
-- Setting the area code for PhoneEntity stores the new area code
--
context PhoneEntity::setAreaCode(newAreaCode: String)
post cond1 : areaCode = newAreaCode
--
-- Setting the phone number for PhoneEntity stores the new phone number
--
context PhoneEntity::setPhoneNumber(newPhoneNumber: String)
post cond1 : phoneNumber = newPhoneNumber
--
-- Creating a new FaxEntity sets the values passed in
--
context FaxEntity::FaxEntity(newAreaCode: String, newFaxNumber: String)
post cond1 : areaCode = newAreaCode
post cond2 : faxNumber = newFaxNumber
--
-- Getting the Area Code from FaxEntity returns the area code
--
context FaxEntity::getAreaCode(): String
post cond1 : result = self.areaCode
71
--
-- Getting the Fax Number from FaxEntity returns the phone number
--
context FaxEntity::getFaxNumber(): String
post cond1 : result = self.faxNumber
--
-- Getting the Area Code and Fax Number from FaxEntity returns the area code
-- concatenated with the fax number
--
context FaxEntity::getFullFaxNumber(): String
post cond1 : result = self.areaCode.concat(self.faxNumber)
--
-- Setting the area code for FaxEntity stores the new area code
--
context FaxEntity::setAreaCode(newAreaCode: String)
post cond1 : areaCode = newAreaCode
--
-- Setting the fax number for FaxEntity stores the new phone number
--
context FaxEntity::setFaxNumber(newFaxNumber: String)
post cond1 : faxNumber = newFaxNumber
--
72
-- Creating a new ZipEntity sets the value passed in
--
context ZipEntity::ZipEntity(newZipCode: String)
post cond1 : zipCode = newZipCode
--
-- Getting the Zip Code from ZipEntity returns the zip code
--
context ZipEntity::getZipCode(): String
post cond1 : result = self.zipCode
--
-- Setting the zip code for ZipEntity stores the new zip code
--
context ZipEntity::setZipCode(newZipCode: String)
post cond1 : zipCode = newZipCode
--
-- Creating a new EmailEntity sets the value passed in
--
context EmailEntity::EmailEntity(newEmail: String)
post cond1 : emailAddress = newEmail
--
-- Getting the Email Address from EmailEntity returns the email adress
--
73
context EmailEntity::getEmailAddress(): String
post cond1 : result = self.emailAddress
--
-- Setting the Email Address for EmailEntity stores the new email address
--
context EmailEntity::setEmailAddrses(newEmail: String)
post cond1 : emailAddress = newEmail
--
-- Creating a new OverarchingEntity sets the values passed in
--
context OverarchingEntity::OverarchingEntity(phoneAreaCodeString: String,
phoneNumberString: String, faxAreaCodeString: String, faxNumberString: String,
newStreetString: String, newCityString: String, newStateString: String, newZipCodeString:
String, newEmailAddress: String)
post cond1 : self.phoneAreaCode = phoneAreaCodeString
post cond2 : self.phoneNumber = phoneNumberString
post cond3 : self.faxAreaCode = faxAreaCodeString
post cond4 : self.faxNumber = faxNumberString
post cond5 : self.streetString = newStreetString
post cond6 : self.cityString = newCityString
post cond7 : self.stateString = newStateString
post cond8 : self.zipString = newZipCodeString
post cond9 : self.emailAddress = newEmailAddress
--
-- Getting the Area Code from OverarchingEntity returns the area code
--
74
context OverarchingEntity::getPhoneAreaCode(): String
post cond1 : result = self.phoneAreaCode
--
-- Getting the Phone Number from OverarchingEntity returns the phone number
--
context OverarchingEntity::getPhoneNumber(): String
post cond1 : result = self.phoneNumber
--
-- Getting the Area Code and Phone Number from OverarchingEntity returns the area
-- code concatenated with the phone number
--
context OverarchingEntity::getAreaAndPhoneNumber(): String
post cond1 : result = self.phoneAreaCode.concat(self.phoneNumber)
--
-- Getting the Area Code from OverarchingEntity returns the area code
--
context OverarchingEntity::getFaxAreaCode(): String
post cond1 : result = self.faxAreaCode
--
-- Getting the Fax Number from OverarchingEntity returns the phone number
--
context OverarchingEntity::getFaxNumber(): String
75
post cond1 : result = self.faxNumber
--
-- Getting the Area Code and Fax Number from OverarchingEntity returns the
-- area code concatenated with the fax number
--
context OverarchingEntity::getFaxAreaAndNumber(): String
post cond1 : result = self.faxAreaCode.concat(self.faxNumber)
--
-- Getting the Street from OverarchingEntity returns the Street string
--
context OverarchingEntity::getStreetAddress(): String
post cond1 : result = self.streetString
--
-- Getting the City from OverarchingEntity returns the city string
--
context OverarchingEntity::getCity(): String
post cond1 : result = self.cityString
--
-- Getting the stateString from OverarchingEntity returns the state string
--
context OverarchingEntity::getState(): String
post cond1 : result = self.stateString
76
--
-- Getting the zip code from OverarchingEntity returns the zip code string
--
context OverarchingEntity::getZipCode(): String
post cond1 : result = self.zipString
--
-- Getting the email address from OverarchingEntity returns the email address
-- string
--
context OverarchingEntity::getEmailAddress(): String
post cond1 : result = self.emailAddress
--
-- Setting the phone area code for OverarchingEntity stores the new area code
--
context OverarchingEntity::setPhoneAreaCode(newAreaCode: String)
post cond1 : phoneAreaCode = newAreaCode
--
-- Setting the phone number for OverarchingEntity stores the new phone number
--
context OverarchingEntity::setPhoneNumber(newPhoneNumber: String)
post cond1 : phoneNumber = newPhoneNumber
--
-- Setting the fax area code for OverarchingEntity stores the new area code
77
--
context OverarchingEntity::setFaxAreaCode(newAreaCode: String)
post cond1 : faxAreaCode = newAreaCode
--
-- Setting the fax number for OverarchingEntity stores the new phone number
--
context OverarchingEntity::setFaxNumber(newFaxNumber: String)
post cond1 : faxNumber = newFaxNumber
--
-- Setting the street for OverarchingEntity stores the new street
--
context OverarchingEntity::setStreetAddress(newStreetAddress: String)
post cond1 : streetString = newStreetAddress
--
-- Setting the city for OverarchingEntity stores the new city
--
context OverarchingEntity::setCity(newCity: String)
post cond1 : cityString = newCity
--
-- Setting the state for OverarchingEntity stores the new state
--
context OverarchingEntity::setState(newState: String)
78
post cond1 : stateString = newState
--
-- Setting the zip code for OverarchingEntity stores the new zip code
--
context OverarchingEntity::setZipCode(newZipCode: String)
post cond1 : zipString = newZipCode
--
-- Setting the Email Address for OverarchingEntity stores the new email address
--
context OverarchingEntity::setEmailAddress(newEmailAddress: String)
post cond1 : emailAddress = newEmailAddress
79
CHAPTER 5 - Technical Inspection Checklist
1 Software Production Plan
This document provides a checklist to be used in the technical inspection of the Knowledge
Discovery in Databases (KDD) Research Entity Search Tool project. It provides a
guideline for the inspectors to follow to ensure that the Architectural Design Document and
the OCL formal specification model are both complete and correct.
2 Items to be Inspected
Vision Document 2.0 will need to be referenced by the inspectors while completing the
technical inspection.
2.1 UML Diagrams
• Class Diagrams
• Sequence Diagrams
• Class Descriptions
2.2 Formal Specification
• Class Diagrams
3 Formal Inspectors
• Steve Stampbach
Contact: [email protected]
• Tim Weninger
Contact: [email protected]
4 Formal Inspection Checklist
80
Table 5.1 Technical Inspection Checklist
Item # Inspection Item Pass/Fail/Partial Comments
TI-1 The symbols used in the class
diagrams conform to UML
standards.
TI-2 The symbols used in the sequence
diagrams conform to UML
standards.
TI-3 The classes in the class diagrams
have corresponding descriptions
provided in the Architectural
Design Document.
TI-4 The descriptions of the classes in
the Architecture Design Document
are clear and concise.
TI-5 The classes in the formal
specification are consistent with
those in the Architectural Design
Document (related to Web
Crawling only).
TI-6 The attributes in the formal
specification are consistent with
the attributes of the corresponding
class diagrams.
TI-7 The associations in the formal
specification are present in the
class diagrams as association links.
TI-8 The multiplicities in the formal
specification are consistent with
the multiplicities of the
81
corresponding class diagrams.
TI-9 The sequence diagrams are clear
and concise.
82
CHAPTER 6 - Component Design
1 Introduction
The purpose of this document is to provide a component design of the KDD-Research
Entity Search Tool (KREST). The document will illustrate class diagrams. The purpose of
each class in the diagrams will be given, as well as a description of the attributes and
methods.
1.1 Background
The purpose of KREST is to provide a multifunctional web search tool that runs as a
standalone application. The project allows the user to perform a web crawl, to perform
a basic web search over the crawled pages, and to perform an entity search over the
crawled pages. The project also allows the user to perform web searches and entity
searches based on datasets that can be loaded into the tool.
2 KDD-Research Entity Search Tool Architecture
2.1 Package View
The KREST project will follow the Model-View-Controller (MVC) architecture, with
an application class to kick off the project. This allows the screen to be easily updated
via changes to the model.
83
Figure 6.1 Package View
2.2 Application Package
Figure 6.2 KREST Application Package
2.2.1 Class Description
2.2.1.1 KrestApplication
The KrestApplication class is a very simple class that will be used to start up the
program. It will startup the KrestController and makes it visible.
Table 6.1 Detailed Description of the KrestApplication Class
Class Visibility Extends Implements
KrestApplication public JDialog none
Attribute Visibility Type Other
Function Visibility Parameters Returns Actions
84
init public void void Starts the
application
Main public String void Main method
2.3 Controller Package
Figure 6.3 Controller Package
2.3.1 Class Description
2.3.1.1 KrestController
The KrestController class is the class responsible for getting all of the other parts up
and running. It is responsible for signaling the web crawls, web searches, and entity
searches to begin processing. It also controls displaying the form.
85
Figure 6.4 KrestController Class
86
Table 6.2 Detailed Description of the KrestController Class
Class Visibility Extends Implements
KrestController public JFrame none
Attribute Visibility Type Other
aboutAction private AbstractAction
aboutMenuItem private JMenuItem
crawlButton private JButton
crawlOptionPanel private JPanel
crawlProgressBar private JProgressBar
crawlProgressLabel private JLabel
crawledURLsLabel private JLabel
crawledURLsTextField private JTextField
currentlyCrawlingLabel private JLabel
currentlyCrawlingTextField private JTextField
entitySearchButton private JButton
entitySearchPanel private JPanel
entitySearchResultsTable private JTable
entitySearchScrollPane private JScrollPane
entitySearchStringLabel private JLabel
entitySearchStringTextField private JTextField
entitySearcher private EntitySearcher
exitAction private AbstractAction
exitMenuItem private JMenuItem
fileMenu private JMenu
helpMenu private JMenu
jSeparator1 private JSeparator
jSeparator2 private JSeparator
krestMenuBar private JMenuBar
krestTabbedPane private JPane
loadDataAction private AbstractAction
87
loadDataMenuItem private JMenuItem
logFileCheckbox private JCheckBox
logFileTextField private JTextField
matchingPagesTable private JTable
maxDepthRadioButton private JRadioButton
maxDepthTextField private JTextField
maxSitesComboBox private JComboBox
maxSitesRadioButton private JRadioButton
minBacklinksLabel private JLabel
minBacklinksTextField private JTextField
parentFrame private KrestController
queueSitesCountLabel private JLabel
queueSitesCountTextField private JTextField
resetCrawlerButton private JButton
saveResultsAction private AbstractAction
saveResultsMenuItem private JMenuItem
searchResultsScrolledPane private JScrollPane
searchStringLabel private JLabel
searchStringTextField private JTextField
serialVersionUID private Long final, static
startCrawlLabel private JLabel
startCrawlTextField private JTextField
view private KrestView
webCrawler private WebCrawler
webCrawlerPanel private JPanel
webSearchButton private JButton
webSearchPanel private JPanel
webSearcher private WebSearcher
Function Visibility Parameters Returns Actions
KrestController public void void Constructor to
88
set up the class
crawlButtonActionPerforme
d
private void void Starts the
crawling action
entitySearchButtonActionP
erformed
private ActionEvent void Starts the entity
search action
getAboutAction private void AbstractActi
on
Gets the ‘About’
menu action
getCrawlOptionsPanel private void JPanel Gets the crawl
options panel
getCrawlProgressBar private void JProgressBar Gets the crawl
progress bar
getCrawlProgressLabel private void JLabel Gets the crawl
progress label
getCrawledURLsLabel private void JLabel Gets the crawled
URLs label
getCrawledURLsTextField private void JTextField Gets the crawled
URLs text field
getCurrentlyCrawlingLabel private void JLabel Gets the
currently
crawling label
getCurrentlyCrawlingTextF
ield
private void JTextField Gets the
currently
crawling text
field
getEntitySearchButton private void JButton Gets the entity
search button
getEntitySearchResultsTabl
e
private void JTable Gets the entity
search results
table
getEntitySearchScrollPane private void JScrollPane Gets the entity
89
search scroll
pane
getEntitySearchStringLabel private void JLabel Gets the entity
search string
label
getEntitySearchStringTextF
ield
private void JTextField Gets the entity
search string
text field
getExitAction private void AbstractActi
on
Gets the action
for the exit
menu item
getExitMenuItem private void JMenuItem Gets the exit
menu item
getJSeparator2 private void JSeparator Gets the
JSeparator
getLoadAction private void AbstractActi
on
Gets the action
for the load
menu item
getLoadDataMenuItem private void JMenuItem Gets the load
data menu item
getLogFileCheckbox private void JCheckbox Gets the log file
checkbox
getMatchingPagesTable private void JTable Gets the table of
matching pages
getMaxDepthRadioButton private void JRadioButto
n
Gets the max
depth radio
button
getMaxDepthTextField private void JTextField Gets the max
depth text field
getMaxSitesRadioButton private void JRadioButto Gets the max
90
n sites radio
button
getMinBacklinksLabel private void JLabel Gets the min
back links label
getMinBacklinksTextField private void JTextField Gets the min
back links text
field
getQueueSitesCountLabel private void JLabel Gets the queue
sites count label
getQueueSitesCountTextFie
ld
private void JTextField Gets the queue
sites count text
field
getResetCrawlerButton public void JButton Gets the reset
crawler button
getSaveResultsAction private void AbstractActi
on
Gets the action
for saving
results
getSaveResultsMenuItem private void JMenuItem Gets the save
results menu
item
getSearchResultsScrolledPa
ne
private void JScrolledPan
e
Gets the scroll
pane containing
search results
getSearchStringLabel private void JLabel Gets the search
string label
getSearchStringTextField private void JTextField Gets the search
strin text field
getWebSearchButton private void JButton Gets the web
search button
getWebSearchPanel private void JPanel Gets the web
91
search panel
initGUI private void void Build and
display the GUI
resetCrawlerButtonActionP
erformed
private void void Action that
takes place
when the reset
crawler button is
pressed
searchButtonActionPerform
ed
private void void Action that
takes place
when the search
button is pressed
2.3.1.2 KrestAboutDialog
The KrestAboutDialog class is a Dialog that displays information about the KREST
application.
Figure 6.5 KrestAboutDialog Class
Table 6.3 Detailed Description of the KrestAboutDialog Class
Class Visibility Extends Implements
KrestAboutDialog public JDialog none
Attribute Visibility Type Other
serialVersionUID public Long final, static
Function Visibility Parameters Returns Actions
KrestAboutDialog public JFrame void Constructor
which sets up
the dialog
92
2.3.1.3 FileLoader
The FileLoader class is responsible for loading in previously retrieved data into the
application.
Figure 6.6 FileLoader Class
Table 6.4 Detailed Description of the FileLoader Class
Class Visibility Extends Implements
FileLoader public JDialog none
Attribute Visibility Type Other
fileToLoad private String
library private KrestObject
Library
parent private KrestContro
ller
Function Visibility Parameters Returns Actions
FileLoader public String,
KrestContro
ller
void Constructor
which sets up
the class
readInWebBaseFile private void void Reads in the
chosen file in
the WebBase
format
2.3.1.4 WebCrawler
The WebCrawler class is responsible for setting up everything needed for a web
crawl, and starting the process to do it.
93
Figure 6.7 WebCrawler Class
Table 6.5 Detailed Description of the WebCrawler Class
Class Visibility Extends Implements
WebCrawler public JDialog none
Attribute Visibility Type Other
debugSwitch public Boolean
siteVisitor private SiteVisitor static
Function Visibility Parameters Returns Actions
WebCrawler public none void Constructor
which sets up
the class
beginCrawl Public String,
String,
Integer,
Integer,
String,
Integer
void Starts up the
web crawl
based upon
the beginning
page address
getMatches public void Vector Gets the
matching web
pages crawled
getSiteVisitor public void SiteVisitor Gets the
SiteVisitor
class object
getSiteVisitorThreads public void Integer Gets the
94
current
number of
running
SiteVisitor
threads
performReset public Boolean void Reset the
database by
forcing a
removal of all
crawled pages
stopCrawling public void void Stop the
current web
crawl
2.3.1.5 SiteVisitor
The SiteVisitor is responsible for visiting individual web pages. Each instance of
the SiteVisitor class is a thread that represents a different web page being visited.
95
Figure 6.8 SiteVisitor Class
Table 6.6 Detailed Description of the SiteVisitor Class
Class Visibility Extends Implements
SiteVisitor public Thread none
Attribute Visibility Type Other
MAX_THREADS private Integer static, final
crawlCounter public Integer static
currentDepth private Integer
debugSwitch public Boolean
fileName public String
fileWriter private FileWriter
keepProcessing public Boolean
library public KrestObject
Library
96
maxCrawl public Integer
maxDepth private Integer
minBacklinks public Integer
observer public CrawlerObs
erver
pageAddress public String
pageDatabase public Hashtable
pageMatches public Vector
pageToFetch public URL
pagesToCrawl public Integer
pagesVisited public Integer
printStream private BufferedWri
ter
searchString public String
threadCount public Integer
threadLimiter public ThreadContr
oller
threadList private ArrayList
Function Visibility Parameters Returns Actions
SiteVisitor public String,
String,
Integer,
Integer,
String,
Integer
void Constructor to set
up the class
SiteVisitor public String,
String,
Integer,
Integer,
String,
void Constructor to set
up the class
97
Integer,
Integer
alreadyVisited private String Boolean Checks to see
whether or not a
page has already
been visited
containsSearchString private String Boolean Checks to see
whether or not a
page contains the
search string
extractHyperTextLinks private String Vector Extracts the links
from the web
page
getCrawlCount public void Integer Gets the number
of pages crawled
getMatches public void Vector Gets the
matching web
pages
getQueueCount public void Integer Gets the number
of pages in the
queue
getThreadCount public void Integer Gets the number
of SiteVisitor
threads running
loadPage private URL String Gets the text of
the given URL
markAsVisited private String void Marks that the
page has been
visited so that it
is not crawled
again
98
resetCrawler public Boolean void Resets the crawl
information
run public void void Starts the crawl
stopAllThreads public void void Stops all threads
from crawling
2.3.1.6 ThreadController
The ThreadController class is responsible for ensuring that only up to the maximum
number of specified web crawling threads are running at any one time. The web
crawling threads are instances of the SiteVisitor class. The ThreadController
maintains tickets to keep track of which threads are allowed to run. If a thread has a
ticket, it is allowed to run, otherwise it sleeps while waiting to grab a ticket.
Figure 6.9 ThreadController Class
Table 6.7 Detailed Description of the ThreadController Class
Class Visibility Extends Implements
ThreadController public none none
Attribute Visibility Type Other
crowdSize public Integer
maxCrowdSize public Integer
ticketDatabase public Integer []
Function Visibility Parameters Returns Actions
99
ThreadController public Integer void Constructor
which sets up
the class
findFreeTicket protected void Integer Finds the first
available free
ticket
getTicket public void Integer Grabs a ticket
returnAllTickets public void void Returns all
tickets to the
database
returnTicket public Integer void Returns a
specific ticket to
the database
2.3.1.7 HTTPReader
The HTTPReader class is responsible for downloading the text of a given web page.
If the given web page does not exist, it will throw an exception.
Figure 6.10 HTTPReader Class
Table 6.8 Detailed Description of the HTTPReader Class
Class Visibility Extends Implements
HTTPReader public none none
Attribute Visibility Type Other
HTTP_PORT public Integer final, static
in public DataInputSt
ream
100
Function Visibility Parameters Returns Actions
checkRobotExclusionaryPr
otocol
private URL Boolean Checks to see if
crawling the
page is allowed
downloadWWWPage public URL String Grabs the text
of the web page
at the given
URL
2.3.1.8 WebSearcher
The WebSearcher class is responsible for setting up everything needed for a web
search, and starting the process to do it.
Figure 6.11 WebSearcher Class
Table 6.9 Detailed Description of the WebSearcher Class
Class Visibility Extends Implements
WebSearcher public none none
Attribute Visibility Type Other
debugSwitch public Boolean
matches public ArrayList
searchString public String
searchStrings public ArrayList
101
view public KrestView
Function Visibility Parameters Returns Actions
WebSearcher public KrestView void Constructor
which sets up
the class
beginSearch public String Integer Kicks off the
search for web
pages the
contain the
search strings
determineMatches private ArrayList ArrayList Finds all matches
in the web page
list
2.3.1.9 EntitySearcher
The EntitySearcher class is responsible for setting up everything needed for a web
crawl, and starting the process to do it.
Figure 6.12 EntitySearcher Class
102
Table 6.10 Detailed Description of the EntitySearcher Class
Class Visibility Extends Implements
EntitySearcher public none none
Attribute Visibility Type Other
entityString public String
library public KrestObjectL
ibrary
matches public ArrayList
searchString public String
searchStrings public ArrayList
view public KrestView
Function Visibility Parameters Returns Actions
EntitySearcher public KrestView void Constructor that
sets up the class
beginSearch public String Integer Kicks off the
search for entities
determineEmailMatches private ArrayList ArrayList Finds all email
entities in the
matching web
pages
determineFaxMatches private ArrayList ArrayList Finds all fax
entities in the
matching web
pages
determineOverarchingMatc
hes
private ArrayList ArrayList Finds all entities
in the matching
web pages
determinePhoneMatches private ArrayList ArrayList Finds all phone
number entities in
the matching web
103
pages
determineStreetAddressMat
ches
private ArrayList ArrayList Finds all street
address entities in
the matching web
pages
determineZipMatches private ArrayList ArrayList Finds all zip code
entities in the
matching web
pages
2.4 View Package
Figure 6.13 View Package
2.4.1 Class Description
2.4.1.1 KrestView
The KrestView class is an abstract class that can be implemented by the
CrawlerObserver, the SearchObserver, and the EntityObserver classes. It is used to
update the display based on changes from the model.
104
Figure 6.14 KrestView Class
Table 6.11 Detailed Description of the KrestView Class
Class Visibility Extends Implements
KrestView public none none
Attribute Visibility Type Other
crawler private CrawlerObserve
r
entity private EntityObserver
search private SearchObserver
Function Visibility Parameters Returns Actions
KrestView public JTable,
JTextField,
JTable
void Constructor to
set up the view
addPageToSearchTable public String void Adds a new
page to the
search table
updateCurrentCount public Integer void Updates the
number of
crawled web
pages
updateCurrentPage public String void Updates the
current web
105
page being
crawled
updateCurrentProgress public Integer, Integer void Updates the
current progress
updateEntitySearchResults public KrestObjectLibr
ary, String
void Updates the
screen with the
entity search
matches found
updateSitesToCrawl public Integer void Updates the
number of sites
left to crawl
updateWebSearchResults public ArrayList void Updates the
screen with the
web search
matches found
2.4.1.2 CrawlerObserver
The CrawlerObserver class is responsible for updating the screen when the model
changes due to web crawling.
Figure 6.15 CrawlerObserver Class
Table 6.12 Detailed Description of the CrawlerObserver Class
Class Visibility Extends Implements
CrawlerObserver public none none
106
Attribute Visibility Type Other
crawlButton public JButton static
currentCount public JTextField static
currentPage public JTextField static
currentProgress public JProgressBar static
krestFrame public JFrame static
matchingPagesTable public JTable static
maxToCrawl public JComboBox static
sitesToCrawl public JTextField static
Function Visibility Parameters Returns Actions
CrawlerObserver public void Default
Constructor to
set up the view
CrawlerObserver public JTextField,
JTextField,
JProgressBar,
JComboBox,
JTable,
JFrame,
JButton
void Class
constructor
addPageToSearchTable public String void Adds a new
page to the
crawled pages
table
updateCurrentCount public Integer void Updates the
current number
of crawled
pages
updateCurrentPage public String void Updates the
current page
107
being crawled
updateCurrentProgress public Integer, Integer void Updates the
progress bar
updateSitesToCrawl public Integer void Updates the
number of sites
left to crawl
2.4.1.3 SearchObserver
The SearchObserver class is responsible for updating the screen when the model
changes due to web searching.
Figure 6.16 SearchObserver Class
Table 6.13 Detailed Description of the SearchObserver Class
Class Visibility Extends Implements
SearchObserver public none none
Attribute Visibility Type Other
matchingPagesTable public JTable
minBacklinksField public JTextField
Function Visibility Parameters Returns Actions
SearchObserver public JTable,
JTextField
void Constructor to
set up the
class
removeMatchesBelowBa
cklinkCount
private ArrayList ArrayList Removes the
matching web
108
pages without
the requisite
number of
back links
sortListByBacklinkCount private ArrayList ArrayList Sorts the
matching web
pages by
decreasing
number of
back links
updateWebSearchResults public ArrayList void Updates the
matching web
pages found
2.4.1.4 EntityObserver
The EntityObserver class is responsible for updating the screen when the model
changes due to entity searching.
Figure 6.17 EntityObserver Class
Table 6.14 Detailed Description of the EntityObserver Class
Class Visibility Extends Implements
EntityObserver public none none
Attribute Visibility Type Other
matchingEntitiesTable public JTable
Function Visibility Parameters Returns Actions
109
EntityObserver public JTable void Constructor to
set up the
class
sortListByMatchSize private ArrayList ArrayList Sorts the
entities found
by the number
of pages they
were found on
updateEntitySearchResult
s
public KrestObjectLib
ary,
String
void Updates the
matching
entities found
2.5 Model Package
Figure 6.18 Model Package
2.5.1 Class Description
2.5.1.1 KrestModel
The KrestModel class is responsible for holding the current KrestObjectLibrary
object, and making the appropriate pieces available to other classes.
110
Figure 6.19 KrestModel Class
Table 6.15 Detailed Description of the KrestModel Class
Class Visibility Extends Implements
KrestModel public none none
Attribute Visibility Type Other
library private KrestObject
Library
static
name private String
Function Visibility Parameters Returns Actions
KrestModel public String void Constructor to
set up the class
addObject public WebObject void Adds a new
object to the
model
findDataByName public String WebObject Find a specific
object by name
in the model
findObjectsByType public Integer ArrayList Find all
objects of a
specified type
in the model
111
getData public void Enumeration Gets all of the
objects in the
model
getName public void String Gets the name
of the model
removeObject public WebObject void Remove a
specific object
from the
database
setName public String void Sets the name
of the model
2.5.1.2 KrestObjectLibrary
The KrestObjectLibrary class is responsible for holding onto all created
WebObjects.
Figure 6.20 KrestObjectLibrary Class
Table 6.16 Detailed Description of the KrestObjectLibrary Class
Class Visibility Extends Implements
KrestObjectLibrary public none none
Attribute Visibility Type Other
objects public Hashtable static
112
Function Visibility Parameters Returns Actions
KrestObjectLibrary public void void Constructor to
set up the class
findObjectByName public String WebObject Find a specific
object in the
database
findObjectsByType public Integer ArrayList Find all objects
of a specified
type in the
database
getAllObjects public void Enumeration Gets all of the
objects in the
database
getKeys public void Enumeration Gets all of the
keys of the
database
removeObject public WebObject void Remove a
specific object
from the
database
2.5.1.3 WebObject
The WebObject class is an abstract class that can be implemented by both the
Webpage class and the KrestEntity class. It is used to hold data found based on
web crawls and web or entity searches.
Figure 6.21 WebObject Class
113
Table 6.17 Detailed Description of the WebObject Class
Class Visibility Extends Implements
WebObject public none none
Attribute Visibility Type Other
name protected String
Function Visibility Parameters Returns Actions
getName public void String Grabs the
name of the
object
setName public String void Gives a new
name to the
object
2.5.1.4 Webpage
The Webpage class is responsible for holding onto information about a single web
site. Each web site explored will have its own Webpage instance.
Figure 6.22 Webpage Class
Table 6.18 Detailed Description of the Webpage Class
Class Visibility Extends Implements
Webpage public WebObject none
Attribute Visibility Type Other
114
backlinkCount private Integer
backlinks private ArrayList
pageText private String
Function Visibility Parameters Returns Actions
Webpage public String void Constructor
that sets up
the class
addNewBacklink public String void Adds the
name of a
webpage that
links to this
page
getBacklinkCount public void Integer Grabs the
number of
pages that link
to this one
getText public void String Grabs the text
of this object
setText public String void Sets the text
of this object
2.5.1.5 KrestEntity
The KrestEntity class is an abstract class that can be implemented by the
AddressEntity, EmailEntity, FaxEntity, PhoneEntity, ZipEntity, and
OverarchingEntity. It is used to hold data found based on entity searches.
115
Figure 6.23 KrestEntity Class
Table 6.19 Detailed Description of the KrestEntity Class
Class Visibility Extends Implements
KrestEntity public WebObject none
Attribute Visibility Type Other
entityName protected String
entityPattern protected String
numberOfOccurrences public Integer
occurrenceList protected ArrayList
Function Visibility Parameters Returns Actions
addOccurrence public String void Adds a new
webpage that
contains the
entity
getAllOccurrences public void ArrayList Grabs all
instances of
the entity
from the
database
getName public void String Grabs the
name of this
object
setName public String void Sets the name
116
of this object
2.5.1.6 AddressEntity
The AddressEntity class is responsible for holding onto information about a single
address entity. Each address found during an entity search will have its own
instance of this class.
Figure 6.24 AddressEntity Class
Table 6.20 Detailed Description of the AddressEntity Class
Class Visibility Extends Implements
AddressEntity public KrestEntity none
Attribute Visibility Type Other
cityString private String
stateString private String
streetAddress private String
Function Visibility Parameters Returns Actions
AddressEntity public String,
String,
String
void Constructor to set
up the class
getCity public void String Gets the city
getState public void String Gets the state
117
getStreet public void String Gets the street
setCity public String void Sets the new city
of the entity
setState public String void Sets the new state
of the entity
setStreet public String void Sets the new
street of the
entity
2.5.1.7 EmailEntity
The EmailEntity class is responsible for holding onto information about a single
email entity. Each email address found during an entity search will have its own
instance of this class.
Figure 6.25 EmailEntity Class
Table 6.21 Detailed Description of the EmailEntity Class
Class Visibility Extends Implements
EmailEntity public KrestEntity none
Attribute Visibility Type Other
emailAddress private String
Function Visibility Parameters Returns Actions
EmailEntity public String void Constructor to set
up the class
getEmailAddress public void String Grabs the email
address
118
associated with
the object
setEmailAddress public String void Gives a new
email address to
the object
2.5.1.8 FaxEntity
The FaxEntity class is responsible for holding onto information about a single fax
entity. Each fax number found during an entity search will have its own instance of
this class.
Figure 6.26 FaxEntity Class
Table 6.22 Detailed Description of the FaxEntity Class
Class Visibility Extends Implements
FaxEntity public KrestEntity none
Attribute Visibility Type Other
areaCode private String
faxNumber private String
Function Visibility Parameters Returns Actions
FaxEntity public String,
String
void Constructor to set
up the class
getAreaCode public void String Gets the area
119
code of the fax
number
getFaxNumber public void String Gets the fax
number without
the area code
getFullFaxNumber public void String Gets the fax
number with the
area code
setAreaCode public String void Sets the new area
code of the entity
setPhoneNumber public String void Sets the new
number of the
entity
2.5.1.9 PhoneEntity
The PhoneEntity class is responsible for holding onto information about a single
phone entity. Each phone number found during an entity search will have its own
instance of this class.
Figure 6.27 PhoneEntity Class
Table 6.23 Detailed Description of the PhoneEntity Class
Class Visibility Extends Implements
120
PhoneEntity public KrestEntity none
Attribute Visibility Type Other
areaCode private String
phoneNumber private String
Function Visibility Parameters Returns Actions
PhoneEntity public String,
String
void Constructor to set
up the class
getAreaAndPhoneNumber public void String Gets the phone
number with the
area code
getAreaCode public void String Gets the area
code of the phone
number
getPhoneNumber public void String Gets the phone
number without
the area code
setAreaCode public String void Sets the new area
code of the entity
setPhoneNumber public String void Sets the new
number of the
entity
2.5.1.10 ZipEntity
The ZipEntity class is responsible for holding onto information about a single zip
entity. Each zip code found during an entity search will have its own instance of
this class.
121
Figure 6.28 ZipEntity Class
Table 6.24 Detailed Description of the ZipEntity Class
Class Visibility Extends Implements
ZipEntity public KrestEntity none
Attribute Visibility Type Other
zipCode private String
Function Visibility Parameters Returns Actions
ZipEntity public String void Constructor to
set up the class
getZipCode public void String Grabs the zip
code associated
with the object
setZipCode public String void Gives a new zip
code to the
object
2.5.1.11 OverarchingEntity
The OverarchingEntity class is responsible for holding onto information about all
entity types. Each street address, email address, fax number, phone number, and
zip code found during an overarching entity search will have its own instance of
this class.
122
Figure 6.29 OverarchingEntity Class
Table 6.25 Detailed Description of the OverarchingEntity Class
Class Visibility Extends Implements
OverarchingEntity public KrestEntity none
Attribute Visibility Type Other
cityString private String
emailAddress private String
faxAreaCode private String
faxNumber private String
phoneAreaCode private String
phoneNumber private String
stateString private String
streetString private String
zipCode private String
Function Visibility Parameters Returns Actions
OverarchingEntity public String,
String,
String,
String,
String,
String
void Constructor to set
up the class
123
String,
String,
String
getAreaAndPhoneNumber public void String Gets the area
code and phone
number
getEmailAddress public void String Gets the email
address
getCity public void String Gets the city
getFaxAreaAndNumber public void String Gets the fax area
code and fax
number
getFaxAreaCode public void String Gets the fax area
code
getFaxNumber public void String Gets the fax
number without
the area code
getPhoneAreaCode public void String Gets the phone
area code
getPhoneNumber public void String Gets the phone
number without
the area code
getState public void String Gets the state
getStreetAddress public void String Gets the street
getZipCode public void String Gets the zip code
setEmailAddress public String void Sets the new
email address of
the entity
setCity public String void Sets the new city
of the entity
124
setFaxAreaCode public String void Sets the new fax
area code of the
entity
setFaxNumber public String void Sets the new fax
number of the
entity
setPhoneAreaCode public String void Sets the new
phone area code
of the entity
setPhoneNumber public String void Sets the new
phone number of
the entity
setState public String void Sets the new state
of the entity
setStreetAddress public String void Sets the new
street of the
entity
setZipCode public String void Sets the new zip
code of the entity
125
CHAPTER 7 - Test Plan
1 Test Plan Identifier
KREST-Validation-V-1.0
2 Introduction
This document provides the methods that will be used to test the KDD-Research Entity
Search Tool (KREST). The project allows the user to perform a web crawl, to perform a
basic web search over the crawled pages, and to perform a entity search over the crawled
pages. Each task will be treated as a separate module of the system and will be tested with
respect to the associated requirements described in the vision document.
3 Test Items
The following items will be tested:
• General Application Related Items
• Web Crawler Items
• Web Search Items
• Entity Search Items
• Reproducing similar results based on the same datasets to [2].
4 Tested Features
All features listed below will be tested. These features can also be found in the Vision
Document.
4.1 General Application Related Items
• ARI 100 – The program shall provide a GUI for user interaction.
• ARI 101 – The application shall be executable in a single step (e.g. without having
to perform any setup steps).
126
• ARI 102 – The application shall have a menu bar that contains at a minimum: a File
menu and a Help menu.
• ARI 103 – The application shall allow the user to load a data set of web pages.
• ARI 104 – The application shall allow the user to save entity search results.
• ARI 105 – The application's Help menu shall contain at a minimum an About menu
item.
• ARI 106 – The application's menu bar shall contain shortcut keys.
• ARI 107 – The application shall be platform independent.
• ARI 108 – The application shall be able to be minimized.
• ARI 109 – The application shall be able to be closed without having to perform a
Control-C from the command line.
4.2 Web Crawler Items
• WCRI 100 – The user shall have the ability to perform a web crawl based on a
starting website.
• WCRI 101 – The user shall be allowed to specify the starting website (if none is
specified, http://www.cis.ksu.edu will be used).
• WCRI 102 – The user shall have the ability to specify the maximum depth of the
web crawl.
• WCRI 103 – The user shall have the ability to specify a log file in which to save
the results of the crawl.
• WCRI 104 – The user shall be allowed to specify the maximum number of
websites to crawl before stopping.
• WCRI 105 – The user shall be allowed to stop the crawl at any time before it
finishes.
• WCRI 106 – The user shall be notified when the crawl is complete.
• WCRI 107 – The user shall be kept apprised of the total number of pages left to
crawl.
• WCRI 108 – The user shall be apprised of the total number of pages crawled.
127
4.3 Web Search Items
• WSRI 100 – The user shall be allowed to search over previously crawled web
pages.
• WSRI 101 – The user shall have a box to enter search terms.
• WSRI 102 – The user shall be allowed to specify the minimum number of back-
links required for a page containing the search term to be considered a match.
• WSRI 103 – The URLs that match the search terms shall be sorted in order of
number of back-links.
• WSRI 104 – The URLs that match the search terms shall be displayed in a
scrollable text box.
4.4 Entity Search Items
• ESRI 100 – The user shall have the ability to search for entities from previously
crawled websites.
• ESRI 101 – The user shall have a box to enter search terms.
• ESRI 102 – There shall entities for at a minimum: email address, phone number,
fax number, street address, and zip code.
• ESRI 103 – There shall be an overarching entity that gathers all contact info.
• ESRI 104 – The entity search results shall be ranked based on highest score.
• ESRI 105 – The user shall be allowed to specify search terms in addition to entity
terms.
• ESRI 106 – The entities that match the search terms shall be displayed in a
scrollable text box.
5 Features not to be Tested
Testing on the following two requirements will not be tested, rather they will be checked in
the code by inspection.
• WCRI 109 – The crawler shall follow the robot exclusionary protocol.
• WCRI 110 – The crawler shall use multiple threads to avoid putting too much
stress on an individual web host.
128
6 Approach
Testing will be performed by running separate series of actions using KREST. The
sequences of actions will be defined in separate test cases, which can be found in Section
10. Each test case will list the action to be performed, the expected result, and the features
or requirements that map to that step.
7 Item Pass / Fail Criteria
Each test case will be considered successful if it meets the requirements mentioned in the
Vision document. A test case will fail if any requirement is not met as described.
8 Suspension Criteria and Resumption Requirements
8.1 Suspension Criteria
In the event of a test case failure, the running of the test case shall be halted. The
failure shall be logged in the Test Log, as well as the likely cause, and suggested
solutions to the problem.
8.2 Resumption Requirements
After a test case failure, the test case shall be rerun from the beginning of the test once
the problem has been logged with the problem identified and a solution to the problem
implemented. Testing on independent test cases can continue to be executed in parallel
with the effort to fix problems encountered in independent areas.
9 Test Deliverables
A Test Log document will be maintained during testing, that will provide delivered when
testing is complete. The Test Log document will document the time and date of all test
cases run, as well as documenting whether the tests passed or failed. In the event of a
failed test, the Test Log will also contain the reason for the failure as well as suggested
solutions.
10 Testing Tasks
129
10.1 Test Case 1: Application Items
This test case tests the basic application items.
Prerequisites: None.
Table 7.1 Test Case 1
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the KREST
program starts up, with the
Web Crawler tab opened.
• Observe that the menu bar
contains a File menu and a
Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
2 Tester types Alt-H | A. • Observe that an About
dialog is opened.
ARI 105
ARI 106
3 Tester selects the “OK”
button from the About
dialog.
• Observe that the About
dialog closes.
4 Tester minimizes the
KREST application.
• Observe that the KREST
application is minimized.
ARI 108
5 Tester restores the
KREST application.
• Observe that the KREST
application is restored.
ARI 108
6 Tester types Alt-F | X. • Observe that the KREST
application closes.
ARI 109
7 Tester starts KREST by
running the .jar file on a
CIS Linux or Unix
machine.
• Observe that the KREST
program starts up, with the
Web Crawler tab opened.
• Observe that the menu bar
ARI 100
ARI 101
ARI 107
ARI 102
130
contains a File menu and a
Help Menu
• Observe that the menu
items contain shortcuts.
ARI 106
8 Tester types Alt-F | X. • Observe that the KREST
application closes.
ARI 109
10.2 Test Case 2: Web Crawler Items
This test case tests the web crawler requirements.
Prerequisites: Test Case 1 must have passed.
Table 7.2 Test Case 2
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the .jar
file on a Windows PC.
• Observe that the KREST
program starts up, with
the Web Crawler tab
opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
2 Tester enters a valid
website to crawl in the
“Start Crawl At:” field.
• Field becomes populated. WCRI 100
WCRI 101
3 Tester selects the radio
button next to “Max
Depth to Explore”
• Observe that the radio
button next to “Max
Depth to Explore”
becomes selected.
131
• Observe that the radio
button next to “Max Sites
to Explore” becomes
deselected.
4 Tester selects the radio
button next to “Max Sites
to Explore”.
• Observe that the radio
button next to “Max Sites
to Explore” becomes
selected.
• Observe that the radio
button next to “Max
Depth to Explore”
becomes deselected.
5 Tester enters a number
between 10 and 25 in the
“Max Sites to Explore”
field.
• Observe that the field is
updated.
WCRI 104
6 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is renamed
to a “Stop Crawl” button.
• Observe that the “Reset
Crawler” button becomes
sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
• Observe that the
“Crawled URLs” field is
updated with the current
number of URLs that has
WCRI 108
132
number of URLs that has
been crawled.
• Observe that the “Sites in
the Queue” field is
updated with the number
of websites that could
still be explored.
• Observe that the “Current
Progress” progress bar is
updated based on the
number of pages left to
crawl.
• Observe that when the
“Current Progress”
progress bar reaches
100%, a dialog appears
notifying the operator
that the crawl is
complete.
• When the crawl is
complete, observe that
the “Stop Crawl” button
returns to a “Begin
Crawl” button.
• Observe that the Web
Search and Entity Search
tabs become sensitized.
WCRI 107
WCRI 106
7 Tester selects the “OK”
button from the dialog.
• Observe that the dialog
closes.
8 Tester presses the “Reset
Crawler” button.
• Observe that a
133
Crawler” button. confirmation dialog
appears.
9 Tester presses the
“Cancel” button from the
confirmation dialog.
• Observe the confirmation
dialog disappears.
• Observe there are no
changes to the form.
10 Tester presses the “Reset
Crawler” button.
• Observe that a
confirmation dialog
appears.
11 Tester presses the “OK”
button from the
confirmation dialog.
• Observe the confirmation
dialog disappears.
• Observe the “Reset
Crawler” button is
desensitized.
• Observe the “Begin
Crawl” button is labeled
appropriately.
• Observe the “Currently
Crawling” field is empty.
• Observe the “Crawled
URLs” field is set to 0.
• Observe the “Sites in the
Queue” field is set to 0.
• Observe the “Crawl
Progress” progress bar is
reset.
12 Tester selects the
checkbox next to the
“Log File to Use Field”.
• Observe the checkbox
becomes selected.
• Observe the field
becomes enabled.
WCRI 103
134
13 Tester enters a valid
filename where they
would like the log results
saved (or uses the default
file name).
• Observe the field is
updated.
14 Tester selects the radio
button next to “Max
Depth to Explore”.
• Observe that the radio
button next to “Max Sites
to Explore” becomes
selected.
• Observe that the radio
button next to “Max
Depth to Explore”
becomes deselected.
15 Tester enters a value
between 2 and 5 in the
“Max Depth to Explore”
field.
• Observe the field is
updated.
WCRI 102
16 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is renamed
to a “Stop Crawl” button.
• Observe that the “Reset
Crawler” button becomes
sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
• Observe that the
“Crawled URLs” field is
updated with the current
WCRI 108
135
updated with the current
number of URLs that has
been crawled.
• Observe that the “Sites in
the Queue” field is
updated with the number
of websites that could
still be explored.
• Observe that the “Current
Progress” progress bar is
updated based on the
number of pages left to
crawl.
• Before the crawl
completes, move to the
next step.
WCRI 107
17 Tester presses the “Stop
Crawl” button.
• Observe that the text of
the Button returns to
“Begin Crawl”.
• Observe that the crawl is
halted.
WCRI 105
18 Tester presses the “Reset
Crawler” button.
• Observe that a
confirmation dialog
appears.
19 Tester presses the “OK”
button from the
confirmation dialog.
• Observe the confirmation
dialog disappears.
• Observe the “Reset
Crawler” button is
desensitized.
• Observe the “Begin
136
Crawl” button is labeled
appropriately.
• Observe the “Currently
Crawling” field is empty.
• Observe the “Crawled
URLs” field is set to 0.
• Observe the “Sites in the
Queue” field is set to 0.
• Observe the “Crawl
Progress” progress bar is
reset.
20 Tester types Alt-F | X. • Observe that the KREST
application closes.
ARI 109
21 Tester opens the log file
that they specified for
crawling.
• Observe that the results
of the web crawl were
logged.
WCRI 103
10.3 Test Case 3: Web Search Items
This test case tests the web search requirements.
Prerequisites: Test Case 1 and Test Case 2 must both have passed.
Table 7.3 Test Case 3
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the .jar
file on a Windows PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
ARI 100
ARI 101
ARI 102
137
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 106
2 Tester enters a valid
website to crawl in the
“Start Crawl At:” field.
• Field becomes
populated.
WCRI 100
WCRI 101
3 Tester enters a number
between 10 and 25 in the
“Max Sites to Explore”
field.
• Observe that the field is
updated.
WCRI 104
4 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is
renamed to a “Stop
Crawl” button.
• Observe that the “Reset
Crawler” button
becomes sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
• Observe that the
“Crawled URLs” field
is updated with the
current number of
URLs that has been
crawled.
• Observe that the “Sites
in the Queue” field is
updated with the
WCRI 108
138
updated with the
number of websites that
could still be explored.
• Observe that the
“Current Progress”
progress bar is updated
based on the number of
pages left to crawl.
• Observe that when the
“Current Progress”
progress bar reaches
100%, a dialog appears
notifying the operator
that the crawl is
complete.
• When the crawl is
complete, observe that
the “Stop Crawl” button
returns to a “Begin
Crawl” button.
• Observe that the Web
Search and Entity
Search tabs become
sensitized.
WCRI 107
WCRI 106
5 Tester selects the “OK”
button from the dialog.
• Observe that the dialog
closes.
6 Tester selects the Web
Search tab.
• Observe that the Web
Search tab is now
raised.
7 Tester enters a string to
search for in the “Search
• Observe that the field is WSRI 100
139
search for in the “Search
String” field.
updated. WSRI 101
8 Tester enters a value
between 1 and 3 in the
“Min # of Backlinks”
field.
• Observe that the field is
updated.
WSRI 102
9 Tester presses the “Begin
Search” button.
• Observe that URLs that
contain the search string
in their text are listed in
the Search Results
scrollable box.
• Observe that the URLs
are sorted by decreasing
number of backlinks.
WSRI 104
WSRI 103
10 Tester enters a new string
to search for in the
“Search String” field.
• Observe that the field is
updated.
WSRI 100
WSRI 101
11 Tester presses the “Begin
Search” button.
• Observe that URLs that
contain the search string
in their text are listed in
the Search Results
scrollable box.
• Observe that the URLs
are sorted by decreasing
number of backlinks.
WSRI 104
WSRI 103
12 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
140
10.4 Test Case 4: Entity Search Items
This test case tests the entity search requirements.
Prerequisites: Test Case 1 and Test Case 2 must both have passed.
Table 7.4 Test Case 4
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the .jar
file on a Windows PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
2 Tester enters a valid
website to crawl in the
“Start Crawl At:” field.
• Field becomes
populated.
WCRI 100
WCRI 101
3 Tester enters a number
between 10 and 25 in the
“Max Sites to Explore”
field.
• Observe that the field is
updated.
WCRI 104
4 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is
renamed to a “Stop
Crawl” button.
• Observe that the “Reset
Crawler” button
becomes sensitized.
141
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
• Observe that the
“Crawled URLs” field
is updated with the
current number of URLs
that has been crawled.
• Observe that the “Sites
in the Queue” field is
updated with the
number of websites that
could still be explored.
• Observe that the
“Current Progress”
progress bar is updated
based on the number of
pages left to crawl.
• Observe that when the
“Current Progress”
progress bar reaches
100%, a dialog appears
notifying the operator
that the crawl is
complete.
• When the crawl is
complete, observe that
the “Stop Crawl” button
returns to a “Begin
WCRI 108
WCRI 107
WCRI 106
142
Crawl” button.
• Observe that the Web
Search and Entity
Search tabs become
sensitized.
5 Tester selects the “OK”
button from the dialog.
• Observe that the dialog
closes.
6 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
7 Tester enters a string to
search for in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
8 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the email
address of the previous
term.
• Observe that the field is
updated.
ESRI 102
9 Tester presses the “Begin
Search” button.
• Observe that email
addresses that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
ESRI 106
ESRI 104
10 Tester deletes the old
value in the “Search
• Observe that the field is ESRI 100
143
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
updated. ESRI 101
ESRI 105
11 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
12 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
ESRI 106
ESRI 104
13 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
14 Tester adds “#fax”
without the quotes in the
“Search String” field to
search for the fax number
of the previous term.
• Observe that the field is
updated.
ESRI 102
15 Tester presses the “Begin
Search” button.
• Observe that fax ESRI 106
144
Search” button. numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
ESRI 104
16 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
17 Tester adds “#address”
without the quotes in the
“Search String” field to
search for the street
address of the previous
term.
• Observe that the field is
updated.
ESRI 102
18 Tester presses the “Begin
Search” button.
• Observe that street
addresses that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
ESRI 106
ESRI 104
145
19 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
20 Tester adds “#zip”
without the quotes in the
“Search String” field to
search for the zip code of
the previous term.
• Observe that the field is
updated.
ESRI 102
21 Tester presses the “Begin
Search” button.
• Observe that zip codes
that were contained on
the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
ESRI 106
ESRI 104
22 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
23 Tester adds “#all” without
the quotes in the “Search
String” field to search for
the all of the contact info
of the previous term.
• Observe that the field is
updated.
ESRI 103
24 Tester presses the “Begin
Search” button.
• Observe that all contact ESRI 106
146
Search” button. info that was contained
on the same pages that
matched the search
string is listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
ESRI 104
25 Tester types Alt-F | S,
saves the entity search
results, and verifies that
the data was saved.
• Observe that a file
dialog appears
ARI 104
26 Tester enters a valid file
name and selects the
‘Save’ button.
• Observe that the entity
search results were
saved to the specified
file.
27 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
10.5 Test Case 5: Reproducing the Results of [2]
This test case tests the ability of the entity searcher to reproduce the results of the entity
search project described in [2].
Prerequisites: Test Case 1 and Test Case 2 must both have passed. The twelve datasets
when represent a sampling of the original dataset found in Tao Cheng’s entity search
work [2] must be available for use.
147
Table 7.5 Test Case 5
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
2 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
3 Tester loads the
‘Test_Data_1.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103
4 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
5 Tester enters “Citibank
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
6 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
• Observe that the field is
updated.
ESRI 102
148
term.
7 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-967-
2400 should be
contained in the
matches.
ESRI 106
ESRI 104
8 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
9 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
10 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
11 Tester loads the
‘Test_Data_2.pages’
• Observe that the load ARI 103
149
‘Test_Data_2.pages’
dataset.
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
12 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
13 Tester enters “New York
DMV” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
14 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
15 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-342-
5368 should be
contained in the
matches.
ESRI 106
ESRI 104
150
16 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
17 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
18 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
19 Tester loads the
‘Test_Data_3.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103
20 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
21 Tester enters “Amazon
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
22 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
• Observe that the field is
updated.
ESRI 102
151
term.
23 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-201-
7575 should be
contained in the
matches.
ESRI 106
ESRI 104
24 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
25 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
26 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
27 Tester loads the
‘Test_Data_4.pages’
• Observe that the load ARI 103
152
‘Test_Data_4.pages’
dataset.
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
28 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
29 Tester enters “EBay
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
30 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
31 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 888-749-
3229 should be
contained in the
matches.
ESRI 106
ESRI 104
153
32 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
33 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
34 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
35 Tester loads the
‘Test_Data_5.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103
36 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
37 Tester enters “Thinkpad
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
38 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
• Observe that the field is
updated.
ESRI 102
154
term.
39 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 877-338-
4465 should contained
in the matches.
ESRI 106
ESRI 104
40 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
41 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
42 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
43 Tester loads the
‘Test_Data_6.pages’
dataset.
• Observe that the load
dialog disappears.
ARI 103
155
dataset. • Observe that the Web
Search and Entity
Search tabs become
enabled.
44 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
45 Tester enters “Illinois
IRS” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
46 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
47 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-829-
3676 should be
contained in the
matches.
ESRI 106
ESRI 104
48 Tester types Alt-F | X. • Observe that the ARI 109
156
KREST application
closes.
49 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
50 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
51 Tester loads the
‘Test_Data_7.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103
52 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
53 Tester enters “Barnes &
Noble Customer Service”
in the “Search String”
field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
54 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
• Observe that the field is
updated.
ESRI 102
157
term.
55 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed in
the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-422-
7717 should be
contained in the
matches.
ESRI 106
ESRI 104
56 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
57 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
58 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
59 Tester loads the
‘Test_Data_8.pages’
• Observe that the load ARI 103
158
‘Test_Data_8.pages’
dataset.
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
60 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
61 Tester enters “Bill
Gates” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
62 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
63 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
should be contained in
the matches.
ESRI 106
ESRI 104
159
64 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
65 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
66 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
67 Tester loads the
‘Test_Data_9.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103
68 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
69 Tester enters “Oprah
Winfrey” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
70 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
• Observe that the field is
updated.
ESRI 102
160
term.
71 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
[email protected] should
be contained in the
matches.
ESRI 106
ESRI 104
72 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
73 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
74 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
75 Tester loads the
‘Test_Data_10.pages’
• Observe that the load ARI 103
161
‘Test_Data_10.pages’
dataset.
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
76 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
77 Tester enters “Elvis
Presley” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
78 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
79 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
should be contained in
the matches.
ESRI 106
ESRI 104
162
80 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
81 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
82 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
83 Tester loads the
‘Test_Data_11.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103
84 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
85 Tester enters “Larry
Page” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
86 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
• Observe that the field is
updated.
ESRI 102
163
term.
87 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
should be contained in
the matches.
ESRI 106
ESRI 104
88 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
89 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File menu
and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
90 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103
91 Tester loads the
‘Test_Data_12.pages’
• Observe that the load ARI 103
164
‘Test_Data_12.pages’
dataset.
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
92 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now raised.
93 Tester enters “Arnold
Schwarzenegger” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
94 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102
95 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
ov should be contained
in the matches.
ESRI 106
ESRI 104
165
96 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109
166
CHAPTER 8 - Test Assessment Evaluation
1 Introduction
This document provides the results of performing functional qualification testing on the
KDD-Research Entity Search Tool (KREST) project. The project allows the user to
perform a web crawl, to perform a basic web search over the crawled pages, and to perform
an entity search over the crawled pages. Functional black-box testing was performed. The
functionality tested, and the methods used are described in the Test Plan document.
2 Test Results Summary
Table 8.1 Test Results Summary
Test Case Main Functionality Tested Pass/Fail
Test Case 1 Application Functionality PASS
Test Case 2 Web Crawling Functionality PASS
Test Case 3 Web Searching PASS
Test Case 4 Entity Searching PASS
Test Case 5 Reproducing the results in [2] PASS
The specific requirements tested by each test case are listed throughout the test procedures
next to the actual step where they are tested.
3 Complete Test Results
3.1 Test Case 1: Application Items
This test case tests the basic application items.
Prerequisites: None.
167
Date Performed: 3/9/08
Issues Found: None
Comments: Test ran perfectly.
Table 8.2 Test Log for Test Case 1
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
2 Tester types Alt-H | A. • Observe that an About
dialog is opened.
ARI 105
ARI 106
Pass
3 Tester selects the “OK”
button from the About
dialog.
• Observe that the About
dialog closes.
Pass
4 Tester minimizes the
KREST application.
• Observe that the
KREST application is
minimized.
ARI 108 Pass
5 Tester restores the
KREST application.
• Observe that the
KREST application is
restored.
ARI 108 Pass
6 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
7 Tester starts KREST by
running the .jar file on a
CIS Linux or Unix
machine.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
ARI 100
ARI 101
ARI 107
ARI 102
Pass
168
items contain shortcuts.
ARI 106
8 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
3.2 Test Case 2: Web Crawler Items
This test case tests the web crawler requirements.
Prerequisites: Test Case 1 must have passed.
Date Performed: 3/9/08
Issues Found: None
Comments: Crawler seemed to hang during the first attempt at breadth first crawling.
Application was restarted, and could not repeat the issue. This is considered to be an
issue with the internet connection (which has been flaky lately).
Table 8.3 Test Log for Test Case 2
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the .jar
file on a Windows PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
2 Tester enters a valid
website to crawl in the
“Start Crawl At:” field.
• Field becomes
populated.
WCRI 100
WCRI 101
Pass
169
3 Tester selects the radio
button next to “Max
Depth to Explore”
• Observe that the radio
button next to “Max
Depth to Explore”
becomes selected.
• Observe that the radio
button next to “Max
Sites to Explore”
becomes deselected.
Pass
4 Tester selects the radio
button next to “Max Sites
to Explore”.
• Observe that the radio
button next to “Max
Sites to Explore”
becomes selected.
• Observe that the radio
button next to “Max
Depth to Explore”
becomes deselected.
Pass
5 Tester enters a number
between 10 and 25 in the
“Max Sites to Explore”
field.
• Observe that the field is
updated.
WCRI 104 Pass
6 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is
renamed to a “Stop
Crawl” button.
• Observe that the “Reset
Crawler” button
becomes sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
• Observe that the
“Crawled URLs” field
is updated with the
current number of
URLs that has been
crawled.
• Observe that the “Sites
in the Queue” field is
updated with the
number of websites that
could still be explored.
• Observe that the
WCRI 108
Pass
170
“Current Progress”
progress bar is updated
based on the number of
pages left to crawl.
• Observe that when the
“Current Progress”
progress bar reaches
100%, a dialog appears
notifying the operator
that the crawl is
complete.
• When the crawl is
complete, observe that
the “Stop Crawl” button
returns to a “Begin
Crawl” button.
• Observe that the Web
Search and Entity
Search tabs become
sensitized.
WCRI 107
WCRI 106
7 Tester selects the “OK”
button from the dialog.
• Observe that the dialog
closes.
Pass
8 Tester presses the “Reset
Crawler” button.
• Observe that a
confirmation dialog
appears.
Pass
9 Tester presses the
“Cancel” button from the
confirmation dialog.
• Observe the
confirmation dialog
disappears.
• Observe there are no
changes to the form.
Pass
10 Tester presses the “Reset
Crawler” button.
• Observe that a
confirmation dialog
appears.
Pass
11 Tester presses the “OK”
button from the
confirmation dialog.
• Observe the
confirmation dialog
disappears.
• Observe the “Reset
Crawler” button is
desensitized.
• Observe the “Begin
Crawl” button is labeled
appropriately.
• Observe the “Currently
Crawling” field is
empty.
Pass
171
• Observe the “Crawled
URLs” field is set to 0.
• Observe the “Sites in
the Queue” field is set
to 0.
• Observe the “Crawl
Progress” progress bar
is reset.
12 Tester selects the
checkbox next to the
“Log File to Use Field”.
• Observe the checkbox
becomes selected.
• Observe the field
becomes enabled.
WCRI 103 Pass
13 Tester enters a valid
filename where they
would like the log results
saved (or uses the default
file name).
• Observe the field is
updated.
Pass
14 Tester selects the radio
button next to “Max
Depth to Explore”.
• Observe that the radio
button next to “Max
Sites to Explore”
becomes selected.
• Observe that the radio
button next to “Max
Depth to Explore”
becomes deselected.
Pass
15 Tester enters a value
between 2 and 5 in the
“Max Depth to Explore”
field.
• Observe the field is
updated.
WCRI 102 Pass
16 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is
renamed to a “Stop
Crawl” button.
• Observe that the “Reset
Crawler” button
becomes sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
Pass
172
• Observe that the
“Crawled URLs” field
is updated with the
current number of
URLs that has been
crawled.
• Observe that the “Sites
in the Queue” field is
updated with the
number of websites that
could still be explored.
• Observe that the
“Current Progress”
progress bar is updated
based on the number of
pages left to crawl.
• Before the crawl
completes, move to the
next step.
WCRI 108
WCRI 107
17 Tester presses the “Stop
Crawl” button.
• Observe that the text of
the Button returns to
“Begin Crawl”.
• Observe that the crawl
is halted.
WCRI 105 Pass
18 Tester presses the “Reset
Crawler” button.
• Observe that a
confirmation dialog
appears.
Pass
19 Tester presses the “OK”
button from the
confirmation dialog.
• Observe the
confirmation dialog
disappears.
• Observe the “Reset
Crawler” button is
desensitized.
• Observe the “Begin
Crawl” button is labeled
Pass
173
appropriately.
• Observe the “Currently
Crawling” field is
empty.
• Observe the “Crawled
URLs” field is set to 0.
• Observe the “Sites in
the Queue” field is set
to 0.
• Observe the “Crawl
Progress” progress bar
is reset.
20 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
21 Tester opens the log file
that they specified for
crawling.
• Observe that the results
of the web crawl were
logged.
WCRI 103 Pass
3.3 Test Case 3: Web Search Items
This test case tests the web search requirements.
Prerequisites: Test Case 1 and Test Case 2 must both have passed.
Date Performed: 3/9/08
Issues Found: None
Comments: First search yielded no results (probably due to the small number of web
pages actually crawled). Changed the search term, and the test was able to be
completed properly.
Table 8.4 Test Log for Test Case 3
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by • Observe that the
KREST program starts
ARI 100 Pass
174
double clicking on the .jar
file on a Windows PC.
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 101
ARI 102
ARI 106
2 Tester enters a valid
website to crawl in the
“Start Crawl At:” field.
• Field becomes
populated.
WCRI 100
WCRI 101
Pass
3 Tester enters a number
between 10 and 25 in the
“Max Sites to Explore”
field.
• Observe that the field is
updated.
WCRI 104 Pass
4 Tester presses the “Begin
Crawl” button.
• Observe that the “Begin
Crawl” button is
renamed to a “Stop
Crawl” button.
• Observe that the “Reset
Crawler” button
becomes sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the current
website being explored.
• Observe that the
“Crawled URLs” field
is updated with the
current number of
URLs that has been
crawled.
• Observe that the “Sites
in the Queue” field is
updated with the
number of websites that
could still be explored.
• Observe that the
“Current Progress”
progress bar is updated
WCRI 108
Pass
175
progress bar is updated
based on the number of
pages left to crawl.
• Observe that when the
“Current Progress”
progress bar reaches
100%, a dialog appears
notifying the operator
that the crawl is
complete.
• When the crawl is
complete, observe that
the “Stop Crawl” button
returns to a “Begin
Crawl” button.
• Observe that the Web
Search and Entity
Search tabs become
sensitized.
WCRI 107
WCRI 106
5 Tester selects the “OK”
button from the dialog.
• Observe that the dialog
closes.
Pass
6 Tester selects the Web
Search tab.
• Observe that the Web
Search tab is now
raised.
Pass
7 Tester enters a string to
search for in the “Search
String” field.
• Observe that the field is
updated.
WSRI 100
WSRI 101
Pass
8 Tester enters a value
between 1 and 3 in the
“Min # of Backlinks”
field.
• Observe that the field is
updated.
WSRI 102 Pass
9 Tester presses the “Begin
Search” button.
• Observe that URLs that
contain the search string
in their text are listed in
the Search Results
scrollable box.
• Observe that the URLs
are sorted by decreasing
number of backlinks.
WSRI 104
WSRI 103
Pass
10 Tester enters a new string • Observe that the field is
updated.
WSRI 100 Pass
176
to search for in the
“Search String” field.
updated. WSRI 101
11 Tester presses the “Begin
Search” button.
• Observe that URLs that
contain the search string
in their text are listed in
the Search Results
scrollable box.
• Observe that the URLs
are sorted by decreasing
number of backlinks.
WSRI 104
WSRI 103
Pass
12 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
3.4 Test Case 4: Entity Search Items
This test case tests the entity search requirements.
Prerequisites: Test Case 1 and Test Case 2 must both have passed.
Date Performed: 3/9/08
Issues Found: None
Comments: Completed the test flawlessly.
Table 8.5 Test Log for Test Case 4
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the .jar
file on a Windows PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help
Menu
• Observe that the menu
items contain
shortcuts.
ARI 100
ARI 101
ARI 102
Pass
177
shortcuts. ARI 106
2 Tester enters a valid
website to crawl in the
“Start Crawl At:” field.
• Field becomes
populated.
WCRI 100
WCRI 101
Pass
3 Tester enters a number
between 10 and 25 in the
“Max Sites to Explore”
field.
• Observe that the field
is updated.
WCRI 104 Pass
4 Tester presses the “Begin
Crawl” button.
• Observe that the
“Begin Crawl” button
is renamed to a “Stop
Crawl” button.
• Observe that the
“Reset Crawler”
button becomes
sensitized.
• Observe that the
“Currently Crawling”
field continuously
updates with the
current website being
explored.
• Observe that the
“Crawled URLs” field
is updated with the
current number of
URLs that has been
crawled.
• Observe that the “Sites
in the Queue” field is
updated with the
number of websites
that could still be
explored.
• Observe that the
“Current Progress”
progress bar is updated
based on the number
of pages left to crawl.
• Observe that when the
“Current Progress”
progress bar reaches
100%, a dialog appears
WCRI 108
WCRI 107
Pass
178
100%, a dialog appears
notifying the operator
that the crawl is
complete.
• When the crawl is
complete, observe that
the “Stop Crawl”
button returns to a
“Begin Crawl” button.
• Observe that the Web
Search and Entity
Search tabs become
sensitized.
WCRI 106
5 Tester selects the “OK”
button from the dialog.
• Observe that the dialog
closes.
Pass
6 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
7 Tester enters a string to
search for in the “Search
String” field.
• Observe that the field
is updated.
ESRI 100
ESRI 101
ESRI 105
Pass
8 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the email
address of the previous
term.
• Observe that the field
is updated.
ESRI 102 Pass
9 Tester presses the “Begin
Search” button.
• Observe that email
addresses that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the
results are sorted based
on max number of
times found.
ESRI 106
ESRI 104
Pass
10 Tester deletes the old • Observe that the field
is updated.
ESRI 100 Pass
179
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
is updated. ESRI 101
ESRI 105
11 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field
is updated.
ESRI 102 Pass
12 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the
results are sorted based
on max number of
times found.
ESRI 106
ESRI 104
Pass
13 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field
is updated.
ESRI 100
ESRI 101
ESRI 105
Pass
14 Tester adds “#fax”
without the quotes in the
“Search String” field to
search for the fax number
of the previous term.
• Observe that the field
is updated.
ESRI 102 Pass
15 Tester presses the “Begin
Search” button.
• Observe that fax
numbers that were
contained on the same
pages that matched the
search string are listed
ESRI 106
Pass
180
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the
results are sorted based
on max number of
times found.
ESRI 104
16 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field
is updated.
ESRI 100
ESRI 101
ESRI 105
Pass
17 Tester adds “#address”
without the quotes in the
“Search String” field to
search for the street
address of the previous
term.
• Observe that the field
is updated.
ESRI 102 Pass
18 Tester presses the “Begin
Search” button.
• Observe that street
addresses that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the
results are sorted based
on max number of
times found.
ESRI 106
ESRI 104
Pass
19 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field
is updated.
ESRI 100
ESRI 101
ESRI 105
Pass
20 Tester adds “#zip”
without the quotes in the
• Observe that the field
is updated.
ESRI 102 Pass
181
“Search String” field to
search for the zip code of
the previous term.
21 Tester presses the “Begin
Search” button.
• Observe that zip codes
that were contained on
the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the
results are sorted based
on max number of
times found.
ESRI 106
ESRI 104
Pass
22 Tester deletes the old
value in the “Search
String” field enters a new
string to search for in the
“Search String” field.
• Observe that the field
is updated.
ESRI 100
ESRI 101
ESRI 105
Pass
23 Tester adds “#all” without
the quotes in the “Search
String” field to search for
the all of the contact info
of the previous term.
• Observe that the field
is updated.
ESRI 103 Pass
24 Tester presses the “Begin
Search” button.
• Observe that all
contact info that was
contained on the same
pages that matched the
search string is listed
in the Entity Search
Results scrollable box.
• Observe that the
results are sorted based
on max number of
times found.
ESRI 106
ESRI 104
Pass
25 Tester types Alt-F | S,
saves the entity search
results, and verifies that
• Observe that the entity
search results were
saved to the specified
file.
ARI 104 Pass
182
results, and verifies that
the data was saved.
26 Tester enters a valid file
name and selects the
‘Save’ button.
• Observe that the entity
search results were
saved to the specified
file.
Pass
27 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
3.5 Test Case 5: Reproducing the Results of [2]
This test case tests the ability of the entity searcher to reproduce the results of the entity
search project described in [2].
Prerequisites: Test Case 1 and Test Case 2 must both have passed. Four datasets when
represent a sampling of the original dataset found in [2] must be available for use.
Date Performed: 3/11/08. Retested 3/12/08.
Issues Found: Entity Searcher was having trouble with case sensitivity of search
terms. Updated the check in the code, rebuilt and retested
Comments: Overall the test worked well after the fix.
Table 8.6 Test Log for Test Case 5
Step
#
Action Performed Expected Outcome Requirements
Met
Pass/Fail
1 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
Pass
183
ARI 106
2 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
3 Tester loads the
‘Test_Data_1.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
4 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
5 Tester enters “Citibank
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
6 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
7 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-967-
2400 should be
contained in the
matches.
ESRI 106
ESRI 104
Pass
8 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
9 Tester starts KREST by
double clicking on the
.jar file on a Windows
• Observe that the
KREST program starts
up, with the Web
ARI 100
ARI 101
Pass
184
.jar file on a Windows
PC.
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 102
ARI 106
10 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
11 Tester loads the
‘Test_Data_2.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
12 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
13 Tester enters “New York
DMV” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
14 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
15 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-342-
5368 should be
ESRI 106
ESRI 104
Pass
185
contained in the
matches.
16 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
17 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
18 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
19 Tester loads the
‘Test_Data_3.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
20 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
21 Tester enters “Amazon
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
22 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
23 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
ESRI 106
Pass
186
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-201-
7575 should be
contained in the
matches.
ESRI 104
24 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
25 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
26 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
27 Tester loads the
‘Test_Data_4.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
28 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
29 Tester enters “EBay
Customer Service” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
30 Tester adds “#phone”
without the quotes in the
“Search String” field to
• Observe that the field is
updated.
ESRI 102 Pass
187
search for the phone
number of the previous
term.
31 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 888-749-
3229 should be
contained in the
matches.
ESRI 106
ESRI 104
Pass
32 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
33 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
34 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
35 Tester loads the
‘Test_Data_5.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
36 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
37 Tester enters “Thinkpad • Observe that the field is
updated.
ESRI 100 Pass
188
Customer Service” in the
“Search String” field.
updated. ESRI 101
ESRI 105
38 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
39 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 877-338-
4465 should contained
in the matches.
ESRI 106
ESRI 104
Pass
40 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
41 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
42 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
43 Tester loads the
‘Test_Data_6.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
ARI 103 Pass
189
dataset. Search tabs become
enabled.
44 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
45 Tester enters “Illinois
IRS” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
46 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
47 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-829-
3676 should be
contained in the
matches.
ESRI 106
ESRI 104
Pass
48 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
49 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
Pass
190
ARI 106
50 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
51 Tester loads the
‘Test_Data_7.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
52 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
53 Tester enters “Barnes &
Noble Customer Service”
in the “Search String”
field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
54 Tester adds “#phone”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
55 Tester presses the “Begin
Search” button.
• Observe that phone
numbers that were
contained on the same
pages that matched the
search string are listed
in the Entity Search
Results scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that 800-422-
7717 should be
contained in the
matches.
ESRI 106
ESRI 104
Pass
56 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
57 Tester starts KREST by • Observe that the
KREST program starts
ARI 100 Pass
191
double clicking on the
.jar file on a Windows
PC.
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 101
ARI 102
ARI 106
58 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
59 Tester loads the
‘Test_Data_8.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
60 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
61 Tester enters “Bill Gates”
in the “Search String”
field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
62 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
63 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
ESRI 106
Pass
192
should be contained in
the matches.
ESRI 104
64 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
65 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
66 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
67 Tester loads the
‘Test_Data_9.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
68 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
69 Tester enters “Oprah
Winfrey” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
70 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
71 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
ESRI 106
Pass
193
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
[email protected] should
be contained in the
matches.
ESRI 104
72 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
73 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
74 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
75 Tester loads the
‘Test_Data_10.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
76 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
77 Tester enters “Elvis
Presley” in the “Search
String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
78 Tester adds “#email”
without the quotes in the
“Search String” field to
• Observe that the field is
updated.
ESRI 102 Pass
194
search for the phone
number of the previous
term.
79 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
should be contained in
the matches.
ESRI 106
ESRI 104
Pass
80 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
81 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
82 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
83 Tester loads the
‘Test_Data_11.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
Search and Entity
Search tabs become
enabled.
ARI 103 Pass
84 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
85 Tester enters “Larry • Observe that the field is
updated.
ESRI 100 Pass
195
Page” in the “Search
String” field.
updated. ESRI 101
ESRI 105
86 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
87 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
should be contained in
the matches.
ESRI 106
ESRI 104
Pass
88 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
89 Tester starts KREST by
double clicking on the
.jar file on a Windows
PC.
• Observe that the
KREST program starts
up, with the Web
Crawler tab opened.
• Observe that the menu
bar contains a File
menu and a Help Menu
• Observe that the menu
items contain shortcuts.
ARI 100
ARI 101
ARI 102
ARI 106
Pass
90 Tester presses Alt-F | L. • Observe dialog to load
file opens.
ARI 103 Pass
91 Tester loads the
‘Test_Data_12.pages’
dataset.
• Observe that the load
dialog disappears.
• Observe that the Web
ARI 103 Pass
196
dataset. Search and Entity
Search tabs become
enabled.
92 Tester selects the Entity
Search tab.
• Observe that the Entity
Search tab is now
raised.
Pass
93 Tester enters “Arnold
Schwarzenegger” in the
“Search String” field.
• Observe that the field is
updated.
ESRI 100
ESRI 101
ESRI 105
Pass
94 Tester adds “#email”
without the quotes in the
“Search String” field to
search for the phone
number of the previous
term.
• Observe that the field is
updated.
ESRI 102 Pass
95 Tester presses the “Begin
Search” button.
• Observe that email
addresses that occurred
on the same pages that
matched the search
string are listed in the
Entity Search Results
scrollable box.
• Observe that the results
are sorted based on max
number of times found.
• Observe that
gov should be contained
in the matches.
ESRI 106
ESRI 104
Pass
96 Tester types Alt-F | X. • Observe that the
KREST application
closes.
ARI 109 Pass
4 Overall Results
KREST passed the formal qualification testing with flying colors, and is now ready for the
final MSE presentation.
197
CHAPTER 9 - User’s Manual
1 Introduction
This document describes how to setup and run the KDD-Research Entity Search Tool
(KREST). It will explain how to run web crawls, web searches, and entity searches, as
well as detailing how to load in available data.
2 Application Setup
This section details what things are necessary in order to run KREST.
2.1 Required Software
• Java Runtime Environment 1.3.1 or later
2.2 Recommended Hardware
• Minimum recommended processor speed: 1.6 GHz
• Minimum recommended RAM: 512 MB
• Minimum recommended internet connection: DSL or better
2.3 Required Files
• KREST.jar – This jar file contains everything necessary to run KREST. If you
desire to see or make modifications to the source code, it is available in KREST-
Source-final.zip. Simply download the source, make any modifications deemed
necessary, and rebuild the project. The FatJar plugin was used with eclipse to
package everything necessary into the executable jar file.
2.4 Recommended Files
• WebBase Datasets – These can be created from WebBase at:
http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/. They represent
previously crawled pages. If you want to load in a large section of crawled pages
198
for web or entity searching, you should consider downloading datasets from there.
Instructions for how to download datasets are available on the WebBase website.
3 KREST
3.6 Running KREST
• Double click on the KREST.jar executable Jar file to start up the application. You
should see a screen like the one below.
Figure 9.1 Opening KREST Screen
3.7 Performing a Web Crawl
So you want to perform a web crawl. Before you can do that though, there are several
decisions that you need to make:
• Where do you want to start the web crawl at
• Do you want to perform a breadth-first crawl? If so, how many pages do you
want to explore?
199
• Or would you rather perform a depth-limited crawl? If so, how many levels deep
would you like to explore?
3.2.1 Breadth-First Crawling
This is the type of crawling where you limit the scope of the web crawl by the number
of websites that you want to explore. First, enter the website that you would like to
begin exploring at. After that, make sure that the ‘Max Sites to Explore’ circle is
selected, and enter the maximum number of websites that you want to have explored.
There is a drop down box containing different amounts, or you can enter a specific
number.
It is important to note that if the crawler runs out of web pages to explore before it
reaches your maximum number of sites to explore, it will stop crawling. (However, it is
extremely rare for this to happen.).
Next, once you are satisfied with the start page and the maximum number of sites to
explore, press the ‘Begin Crawl’ button. You should see the fields at the bottom of the
KREST form start updating with the progress bar moving to tell you how much
progress has been made in your web search. When the web crawl is complete a box
will pop up telling you that the crawl has completed.
200
Figure 9.2 Completed Breadth-First Web Crawl
3.2.2 Depth-First Crawling
This is the type of crawling where you limit the scope of the web crawl by the depth of
the websites beyond the start page that you want to explore. First, enter the website
that you would like to begin exploring at. After that, make sure that the ‘Max Depth to
Explore’ circle is selected, and enter the maximum depth of websites that you want to
have explored. The default depth of 3 can be modified, but keep in mind that
increasing it too much can leave the crawler going for a long time!
It is important to note that if the crawler runs out of web pages to explore before it
reaches your maximum depth to explore, it will stop crawling. (However, it is
extremely rare for this to happen.).
Next, once you are satisfied with the start page and the maximum depth to explore,
press the ‘Begin Crawl’ button. You should see the fields at the bottom of the KREST
form start updating with the progress bar moving to tell you how much progress has
201
been made in your web search. When the web crawl is complete the progress will stop
moving forward.
Figure 9.3 Depth-First Crawl in Progress
3.2.3 Saving Web Crawl Information
If you want to save the information about the web crawl, click the box next to the “Log
File to Use:” field. You should see the field become editable. Either enter a new file
name, or use the one provided. When this box is selected, and the ‘Begin Crawl’ button
is pressed, all information about the web crawl will be written out the file.
202
Figure 9.4 Saving a Web Crawl
3.2.4 Stopping a Web Crawl
Did you make a mistake in the page that you wanted to start crawling from? Is the
crawl taking too long, and you just want it to end? Don’t worry; you have the ability to
stop the web crawl at any point. Once you’ve started a web crawl, notice that the
‘Begin Crawl’ button has changed to a ‘Stop Crawl’ button. Simply press the ‘Stop
Crawl’ button at any point during a web crawl, and the crawl will immediately stop
with the status fields being reset to defaults. You may also be interested in the ability
to clear crawled pages out of the database, which is detailed in the next section.
203
Figure 9.5 Stopping a Web Crawl
3.2.5 Resetting the Crawled Pages
If you want to start over from scratch after having performed a web crawl, select the
‘Reset Crawler’ button. It will clear all of the previously crawled web pages out of the
database, and reset the fields on the form. If you are in the middle of a web crawl when
the ‘Reset Crawler’ button is pressed, it will stop the web crawl and reset the database.
The fields containing information about the crawl will also be reset.
204
Figure 9.6 Resetting a Web Crawl
3.8 Performing a Web Search
Performing a web search is simple with KREST. First, you must have either performed
a web crawl, or loaded pages through the application. (Loading Data is discussed in
Section 3.5). To perform a web search, click on the ‘Web Search’ tab, enter the term
that you would like to search for, and press the ‘Begin Search’ button. The pages that
contained the search terms will be listed in the ‘Search Results’ table. The matching
pages will be ranked according to number of back-links, that is, the number of pages
that link to that particular web page.
205
Figure 9.7 Performing a Web Search
3.3.1 Filtering the Web Search Results
Did you get too many results? Or only want to see the most significant ones? By using
the ‘Min # of Backlinks’ field, you can filter out the results that do not have any other
page refer to them. This helps ensure that you get the highest quality results. Simply
enter the minimum number of back-links required, and press ‘Begin Search’ – lesser
results will be filtered out automatically.
206
Figure 9.8 Filtering the Web Search by Back-link Count
3.9 Performing an Entity Search
Performing an entity search is simple with KREST. First, you must have either
performed a web crawl, or loaded pages through the application. (Loading Data is
discussed in Section 3.5). To perform an entity search, click on the ‘Entity Search’ tab,
enter the term that you would like to search for, following by the entity type that you
would like to find and press the ‘Begin Search’ button. The entity search matches will
be returned as well as pages that contain the entities in the ‘Search Results’ table. The
entities found will be ranked according to number of web pages that contained each
entity.
To search for an entity, enter the type preceded by the pound (#) sign. Acceptable
entity types are Street Addresses (#address), Email Addresses (#email), Phone
Numbers (#phone), Fax Numbers (#Fax), and Zip Codes (#Zip). There is also an
Overarching entity (#all) that will pick up all entity information. If you do not enter a
207
valid entity type into the search box, a box will pop up notifying you of the valid entity
terms.
Figure 9.9 Performing an Entity Search
3.10 Loading Data
Sometimes you’d rather skip the web crawl and look at data that you already have on
your computer. In order to load previously crawled data, simply go to the ‘File’ menu
and select ‘Load Data’. A file dialog will appear asking you to select the location of
the previously crawled data. Once you select the right file, KREST will begin loading
– PLEASE NOTE: Loading in data can take a while. Once the file has been loaded, a
box will pop up notifying you that loading data is complete.
208
Figure 9.10 How to Load Data into KREST
3.11 Saving Entity Search Results
Need to save your entity search results out to a file? In order to save the results,
complete a web search, and then select the ‘File’ menu and press ‘Save Results’. A file
dialog will pop up allowing you to select where the results to be saved.
209
Figure 9.11 How to Save Entity Search Results
3.12 Exiting KREST
Leaving so soon? You have two ways that you can shut down the KREST application:
• Click the ‘X’ button in the upper-right hand corner of the application.
• Go to the ‘File’ menu and select ‘Exit’.
210
Figure 9.12 KREST Application with Exit Methods Circled
3.13 Information About KREST
Want to find out who created KREST, and when it was created? Click on the ‘Help’
menu and select ‘About’. You’ll see a box pop-up with information on the developer.
211
Figure 9.13 How to Access the Help Menu
3.14 Troubleshooting
Have a problem that wasn’t answered elsewhere in the manual? You problem might be
answered here.
3.9.1 Crawler not Getting All Links on a Web Page
The Web Crawler is set to look for all instances of “http://….” in the html of the web
page. It is currently unable to extract partial links (such as “/cgi-bin/index.html”). This
is a feature that may be implemented in a future build.
3.9.2 Progress Bar not Updating During Depth-First Crawls
Depth-First crawling works differently that normal Breadth-First crawling. Since the
crawling keeps processing until it hits the max depth, there isn’t an easy way to track
when all of the pages at the max level have been processed. Because of this, the
progress bar will sometimes hang at 66%. If it appears that crawling has completed (by
the crawled page not changing), it is safe to move on to perform web or entity searches.
212
3.9.3 Cannot Click on URLs in the Web Search Results
The URLs in the Web Search Results area are not clickable URLs. However, if you
want to visit one of the URLs that were found, simply click in the cell and highlight the
URL. Copy the text of the URL and paste it into your web browser.
3.9.4 Cannot Click on URLs in the Entity Search Results
Ideally, you would not need to click on the URLs in the Entity Search Results area, as
the information has already been extracted from the web pages. However, if you really
want to see the web page, simply click in the cell and highlight the URL. Copy the text
of the URL and paste it into your web browser.
3.9.5 Tried to Load Data, but Received an Error Message
Currently KREST is only able to load datasets downloaded from WebBase
(http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/). Trying to load any other
type of data will result in an error message being displayed.
3.9.6 Tried to Load Data, but Only Loaded X Number of Pages
The KREST application is currently limited to loading in about 32 MB worth of data
from a file. This is due to Java’s class size restrictions. All pages that were loaded
have been loaded properly, and you may perform web searches and entity searches on
the loaded pages.
3.9.7 Entity Search Results Don’t Match What I Expected for Overarching Results
Overarching results are based on the address. Once the address has been found on a
webpage, the other entities will be searched for from that point in the webpage.
Nothing before that point in the page will be recorded.
3.9.8 Searching for Multiple Entity Types
KREST is limited to searching for only one entity type at a time. If you want to search
for more than one at a time, you will need to combine them all using the
“#overarching” entity type. If you try to search for more than one entity type at once,
the last one will be used.
3.9.9 Miscellaneous Problem Not Mentioned Above
213
If you are reading this section after encountering a problem, then you may have found a
bug in the application. Please note the bug and email it to the developer at
[email protected] (Maintained through May 2008). If the issue is bad enough that it is
preventing you from running, shutdown KREST and restart it.
214
CHAPTER 10 - Project Evaluation
1 Introduction
This document describes in detail my experiences while working on the KDD-Research
Entity Search Tool (KREST) project throughout two semesters of CIS 895. It includes a
time log analysis, a source code analysis, as well as problems encountered and lessons
learned. Also included is a section which describes possible future work on the project.
2 Problems Encountered
During the course of the project, there were several areas that were frequent causes of
concern, where a majority of the debugging time ended up being spent.
2.1 Web Crawler Thread Control
In order to speed up web crawling, I implemented a system which allowed multiple
threads to crawl web pages at the same time. I have had very little previous experience
working with thread control, but for basic crawling, the system seemed to work pretty
well. However, I encountered a lot of problems when trying to do more complex things
like stopping a web crawl, restarting a web crawl, or starting a brand new crawl after
one has been stopped. I was eventually able to get past the problems, but had to spend
a lot more time debugging the issues than I had planned for.
2.2 Java Class Size Limitations
My initial plan for the project was to store all of the crawled or loaded web pages in a
Hashtable within the KrestObjectLibrary class. I wanted to avoid having to hook up to
a database, because I didn’t have much experience using JDBC calls, and I wanted to
keep the storage mechanism as simple as possible. I assumed that I would be able to
allocate as much space as was available to storing web pages. I later found out when
trying to test the crawl functionality that Java limits each class to 32 MB of heap space.
This limited the crawls to around 1500 to 2500 web pages, depending upon the size of
215
the pages. While not bad, I was hoping to be able to load over 50,000 web pages at a
time. Since I was still able to achieve significant results with the smaller number of
web pages, I did not need to look at adding in a real database; however, it would be a
good add on project to add database functionality for webpage storage.
2.3 Jigloo GUI Builder
For the project, I needed to build a graphical user interface for the application. I
wanted one that was integrated with Eclipse, my integrated development environment
(IDE) of choice. In order to do this, I chose to use Jigloo, which had been used by
previous CIS 895 students in building their projects. Upon inspection and running
small tests with the Eclipse plugin, the tool seemed to work well at building interfaces.
The larger that the screens got though, the longer Jigloo took to load each time. It also
took longer and longer to recompile after each change. I also struggled with the layouts
within the plugin, they did not seem to pack well when the GUI was built as an
executable. If I were doing the project from scratch again, I would go with a different
GUI builder.
3 Source Lines of Code (SLOC)
The estimate for the SLOC to be produced for the project was made at the end of Phase 1
of the project. The estimate was 2000 SLOC based on other available web crawling
projects. At the end of phase two, a new estimate was made, which anticipated that there
would be around 2350 SLOC.
The actual SLOC developed was 2960. A detailed breakdown of the SLOC produced for
the project can be seen in Appendix A.
I believe the original estimate was low due to the amount of extra code produced by using
the Jigloo GUI builder. The builder added in many extra “getter” methods for all of the
graphical widgets, most of which were not used. This accounted for about 350 to 400
SLOC. The other area that was larger than expected was the entity search portion of the
project. In order to search for specific entity types, extra code was needed, which resulted
216
in several hundred extra SLOC. Since this was where a majority of the remaining code
needed to be developed during phase three, it likely caused the gap.
Overall, I think that I did a decent job with the original estimate on the SLOC, although I
would’ve liked to have done better. The original estimate ended up being off by less than
50%, and the second estimate was only off by about 25%.
4 Project Duration
The following table shows the preliminary estimated dates for the completion of the three
project phases, and the actual dates when they were finished. The actual completion dates
stayed very close to the estimated schedule.
Table 10.1 Project Phase Completion Dates
Phase Expected Completion Date Actual Completion Date
1 November 13, 2007 November 13, 2007
2 February 15, 2008 February 13, 2008
3 April 25, 2008 April 23, 2008
The figure below shows the total time spent working on the project during each phase of
the project.
217
Figure 10.1 Phase Breakdown
Time Spent Per Phase (in Hours)
Phase 1, 55.92,
37%
Phase 2, 57.83,
38%
Phase 3, 37.67,
25%
Phase 1
Phase 2
Phase 3
It ended up that the amount spent on all three phases was roughly equal, despite the
differences in the length of time between phases. This was due to trying to keep on
schedule, so there was an attempt to cram more work into a compressed amount of calendar
time.
The following graph displays an overall breakdown of time spent on activities relating to
the project. Over 75% of the total project time went into documentation and code
development, which is to be expected. Additional charts will follow that will show the
activity breakdown per phase.
218
Figure 10.2 Project Activity Breakdown
Time Spent Per Project Activity (in Hours)
Coding, 57.92, 38.25%
Integration, 0.75,
0.50%
Documentation, 57.17,
37.75%
Webpage, 2.00,
1.32%
Presentation, 11.58,
7.65%
Reading, 9.50, 6.27%
Timelog, 1.50, 0.99%
Environment, 4.58,
3.03%
Research, 1.92,
1.27%
Discussion, 4.50,
2.97%
Discussion
Research
Reading
Timelog
Environment
Coding
Integration
Documentation
Presentation
Webpage
The chart below details the activity breakdown for Phase 1 of the project. Although
slightly over 50% of the time was spent coding and producing documentation, a large
chunk of time was also spent in discussion, reading, researching, and setting up the
project environment.
219
Figure 10.3: Phase 1 Activity Breakdown
Time Spent Per Project Activity During Phase 1 (in Hours)
Coding, 12.58, 22.50%Integration, 0.00,
0.00%
Documentation, 17.50,
31.30%
Discussion, 4.33,
7.75%
Research, 1.92,
3.43%
Environment, 4.58,
8.20%
Timelog, 0.83, 1.49%
Reading, 9.50, 16.99%
Presentation, 3.67,
6.56%
Webpage, 1.00,
1.79%Discussion
Research
Reading
Timelog
Environment
Coding
Integration
Documentation
Presentation
Webpage
The chart below details the activity breakdown for Phase 2 of the project. The amount
of time spent producing documentation was similar to Phase 1, but the amount of time
spent coding almost tripled. Also, the amount of time spent preparing for the
presentation almost doubled. It is interesting to note that by Phase 2, little time was
spend reading, researching, and setting up the environment as these activities were
completed during Phase 1.
220
Figure 10.4: Phase 2 Activity Breakdown
Time Spent Per Project Activity During Phase 2 (in Hours)
Coding, 30.33, 52.45%Documentation, 18.67,
32.28%
Webpage, 0.25,
0.43%
Presentation, 7.92,
13.69%
Timelog, 0.67, 1.15% Discussion
Research
Reading
Timelog
Environment
Coding
Integration
Documentation
Presentation
Webpage
The chart that follows details the activity breakdown for Phase 3 of the project. The
amount of time spent producing code dropped significantly when compared to Phase 2.
This is due to the coding of the project almost being complete by the time that Phase 3
began. Also, the amount of time spent in producing documentation rose by quite a bit
compared to other phases. This is due to the increased amount of documentation
required for Phase 3, as well as cleaning up previously release documents, and putting
together the portfolio from previous work.
221
Figure 10.5: Phase 3 Activity Breakdown
Time Spent Per Project Activity During Phase 3 (in Hours)
Coding, 15.00, 40%
Integration, 0.75,
2%
Documentation,
21.00, 56%
Presentation, 0.00,
0%
Timelog, 0.00, 0%
Discussion, 0.17,
0%Webpage, 0.75, 2%
Discussion
Research
Reading
Timelog
Environment
Coding
Integration
Documentation
Presentation
Webpage
5 Lessons Learned
Throughout the duration of the project, there were several topics that I learned that I could
apply in the future.
5.1 Eclipse IDE
I use C++ everyday at work, and I really have not used Java or Eclipse for more than
brief assignments since graduating from Kansas State with my undergraduate degree in
2003. This was the first time I had developed anything significant in both Java and the
Eclipse IDE. It took me a while to figure out how to set everything up with the
development environment, but once I got past the learning curve, it was an extremely
powerful tool. Knowledge of how to use Eclipse will definitely be useful if I ever
switch to a project at work that uses Java.
5.2 Creation of Design Documents
On the projects that I have worked on since graduating with my undergraduate degree, I have
never been through the full software lifecycle – I have always come in during the coding
phase and stayed through integration before moving to a new project. Due to this, I have
222
spent a lot of time developing software based on design documents that were produced by
others, but I have never spent any time developing design documents from scratch. Using
Microsoft Visio to develop design document was a good learning experience and will be
useful in the future.
6 Future Work
There are three areas that I would consider for project enhancements if I had more time to
work on the project.
6.1 Integration of Open Source Web Crawler
Currently KREST supports web crawling, web searching, and entity searching. Early
on during the project, I made the decision to implement the crawling capability rather
than using one of the available open source crawlers. This allowed me to learn how
web crawling works, and served as a base for future entity search development.
However, due to the time and scope limitations of the MSE project, the crawler is
limited in comparison to other open source crawlers. For instance, while the crawler
supports crawling over links found in web pages it only supports full http:// URLs, it
cannot handle partial paths. Also, while the crawler has its own thread control to
prevent the crawler from slamming the internet connection, it is not nearly as robust as
any of the open source crawlers.
6.2 Adding a Database to Hold Web Pagse
In order to limit the scope of the project, a Java Hashtable object is used to hold
crawled Webpage objects rather than a full backend database. While the current
mechanism works well with the current project, it is also one of the main limitations of
the project. Due to the Hashtable object being stored by the KrestObjectLibrary class,
trying to load too many web pages will cause Java to run out of heap space as the
KrestObjectLibrary class will attempt to grow beyond the 32 MB class limit imposed
by Java.
In order to update the data storage to be more robust, a full database should probably be
implemented if a developer was to extend the project in the future. An added benefit of
223
adding a database would be the ability to store previously crawled pages over multiple
sessions, rather than having to start over from scratch each time the program is run.
6.3 Adding Ability to Load Different File Types
KREST is currently set up to load files create from the WebBase repository, available
at: http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/. The operator can go to
the WebBase website and download a specified number of webpages. Currently, the
crawler can handle about 32 MB of data, so in most cases this works out to roughly
1500 web pages.
In the future, it may be useful to use KREST as an alternate test bed, to compare against
other projects. In order to do this, KREST would have to be extended to load
additional file types.
224
References
[1] Cheng, T., Yang, X., & Chang, K. (2007). EntityRank: Searching Entities Directly and
Holistically. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB
2007).
[2] Cheng, T., Yang, X., & Chang, K. (2007). Supporting Entity Search: a Large-Scale
Prototype Search Engine. In Proceedings of the 2007 ACM SIGMOD Conference
(SIGMOD 2007), pages 1144-1146.
[3] Gallagher, P. (2005). Component Design 1.0. Retrieved 03/17/2008, from
http://mse.cis.ksu.edu/gallagher/PhaseThree/PDF/Component_Design_1_0.pdf.
[4] Gallagher, P. (2005). Technical Inspection List 1.0. Retrieved 12/13/2007, from
http://mse.cis.ksu.edu/gallagher/PhaseTwo/PDF/Technical_Inspection_1_0.pdf.
[5] Gallagher, P. (2005). Test Plan 1.0. Retrieved 01/09/2008, from
http://mse.cis.ksu.edu/gallagher/PhaseTwo/PDF/Test_Plan_1.0.pdf.
[6] Gallagher, P. (2005). Vision Document 2.0. Retrieved 10/29/2007, from
http://mse.cis.ksu.edu/gallagher/PhaseTwo/PDF/Vision_Document_2.0.pdf.
[7] Guillen, E. (2004). Architecture Design 1.0. Retrieved 01/14/2008, from
http://mse.cis.ksu.edu/esteban/phase_2/docs/Architecture Design1.0.pdf.
[8] IEEE Standard for Software Quality Assurance Planning. IEEE Std 730-1998
(Revision of IEEE Std 730-1989).
[9] IEEE Guide for Software Quality Assurance Planning. IEEE Std 730,1-1995 (Revision
of IEEE Std 983-1986).
[10] Marston, T. (2007). The Model-View-Controller (MVC) Design Pattern for PHP.
Retrieved 01/15/2008, from http://www.tonymarston.net/php-mysql/model-view-
controller.html.
[11] Relevant standards – IEEE Std.839-1998 for Software Test Plans.
[12] Sepaha, B. (2005). Inspection Checklist. Retrieved 12/13/2007, from
http://mse.cis.ksu.edu/binti/Phase2Documents/Checklist.pdf.
[13] Wikipedia. Retrieved 10/29/2007 from http://www.wikipedia.org.
225
[14] Zhong, H. & Cheng, T. (2007). Virtual Web: What If You Own the Entire Web?.
Retrieved 10/29/2007, from http://mias.uiuc.edu/dssi/2007_virtual_web.
226
Appendix A - Source Metrics
The project source metrics were determined using SLOC Metrics 3.0 available from
http://SLOCMetrics.com.
1. Project Summary Metrics
Table A.1 Overall Project SLOC Metrics
Project SLOC % SLOC Comments Blank Lines Total
All Source 100.0% 2960 1374 541 4875
2. Source Metrics By Package
Table A.2 Source Metrics by package
Project SLOC % SLOC Comments Blank Lines Total
Application 72.43% 18 10 7 35
Controller 14.97% 2144 637 344 3125
Model 11.99% 443 474 119 1036
View 0.61% 355 253 71 679
Total 100.0% 2960 1374 541 4875
3. Source Metrics of the Application Package
Table A.3 Source Metrics of the Application Package
Project SLOC % SLOC Comments Blank Lines Total
KrestApplication.java 100.00% 18 10 7 35
Total 100.00% 18 10 7 35
227
4. Source Metrics of the Controller Package
Table A.4 Source Metrics of the Controller Package
Project SLOC % SLOC Comments Blank Lines Total
EntitySearcher.java 36.71% 787 82 84 953
KrestController.java 32.09% 688 210 72 970
SiteVisitor.java 15.72% 337 138 88 563
FileLoader.java 4.15% 89 5 16 110
HTTPReader.java 3.08% 66 32 22 120
WebSearcher.java 2.38% 51 30 15 96
ThreadController.java 2.19% 47 50 18 115
KrestAboutDialog.java 1.45% 31 15 9 55
WebCrawler.java 1.26% 27 50 13 90
Webpage.java 0.98% 21 25 7 53
Total 100.00% 2144 637 344 3125
5. Source Metrics of the Model Package
Table A.5 Source Metrics of the Model Package
Project SLOC % SLOC Comments Blank Lines Total
OverarchingEntity.java 26.64% 118 121 30 269
KrestObjectLibrary.java 18.74% 83 45 13 141
FaxEntity.java 9.03% 40 41 11 92
PhoneEntity.java 9.03% 40 41 11 92
AddressEntity.java 7.90% 35 46 10 91
KrestModel.java 7.22% 32 50 11 93
Webpage.java 5.42% 24 34 8 66
KrestEntity.java 5.19% 23 30 7 60
EmailEntity.java 4.29% 19 24 7 50
ZipEntity.java 4.29% 19 24 7 50
WebObject.java 2.26% 10 18 4 32
228
Total 100.00% 443 474 119 1036
6. Source Metrics of the View Package
Table A.6: Source Metrics of the View Package
Project SLOC % SLOC Comments Blank Lines Total
EntityObserver.java 25.07% 89 41 19 149
TextAreaRenderer.java 21.41% 76 53 9 138
CrawlerObserver.java 18.87% 67 56 13 136
SearchObserver.java 18.03% 64 31 15 110
KrestView.java 10.14% 36 60 11 107
TextAreaEditor.java 6.48% 23 12 4 39
Total 100.00% 355 253 71 679