+ All Categories
Home > Documents > Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...

Decision system for stock market investors - Abstract Method2 Abstract The aim of this research is...

Date post: 24-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
55
1 This Report is submitted to the University of Strathclyde in partial fulfilment of the Regulations for the Degree of MSc in Information Technology Systems Decision system for stock market investors Michael Witty 200590826 Supervised by: Dr Ian Ruthven. Department of: Computer & Information Sciences. September 2007 Except where otherwise expressly indicated the work reported in this document is my own. It has been performed during, and has not been submitted for assessment in connection with any other award whatsoever. Signed Date
Transcript
  • 1

    This Report is submitted to the University of Strathclyde in partial fulfilment of the

    Regulations for the Degree of MSc in Information Technology Systems

    Decision system for stock market investors

    Michael Witty

    200590826

    Supervised by: Dr Ian Ruthven.

    Department of: Computer & Information Sciences.

    September 2007

    Except where otherwise expressly indicated the work reported in this document is my

    own. It has been performed during, and has not been submitted for assessment in

    connection with any other award whatsoever.

    Signed Date

  • 2

    Abstract

    The aim of this research is to analyse how investors collect and use typical market

    indicators and investigate the ways in which current technology can be used to enable

    more informed time critical decisions.

    It has been the Holy Grail of mathematicians, economists, investment banks and

    programmers alike to try and create systems and techniques which accurately predict

    stock market movements in order to ensure financial gain and eliminate risk. The

    findings of such research however are inconclusive as to weather the stock market can

    be predicted accurately enough to make significant profits.

    The efficient market theory concludes that the market already reflects the value of an

    investment since all relevant information is currently in the public domain, this

    conclusion is increasingly being challenged as more complex computer systems are

    directed towards the field, and the vast repositories of data available on the internet

    grow.

    This research focuses not on predicting the market but providing better tools to help

    investors make decisions based on the current market conditions. Recent standards and

    technologies such as XML and Web 2.0 have provided solutions to some of the common

    problems of data retrieval, representation, organization and manipulation. This research

    looks at the use of such technologies.

    Acknowledgements

    I would like to thank Dr Ian Ruthven for his help and guidance during this project.

  • 3

    Table of Contents

    1. Introduction ................................................................................................... 5

    1.1. Problem Statement ................................................................................ 5

    1.2. Overview .............................................................................................. 5

    1.3. Scope .................................................................................................. 5

    2. Literary Review .............................................................................................. 6

    2.1. The Stock Market .................................................................................. 6

    2.1.1 Fundamentals ................................................................................... 6

    2.1.2 Predictability ..................................................................................... 7

    2.1.3 The Efficient Market Hypothesis ........................................................... 7

    2.1.4 The Rise in On-line Trading ................................................................. 8

    2.2. Information Visualisation ........................................................................ 8

    2.2.1 Existing Graphical Representations ...................................................... 9

    2.2.2 Issues ............................................................................................ 12

    2.2.3 Cognitive Maps ................................................................................ 13

    2.3. Functional View of Market Trading ......................................................... 13

    2.3.1 Investor Goals................................................................................. 14

    2.3.2 MSN Research Wizard ...................................................................... 14

    2.3.3 Functional Requirements .................................................................. 15

    2.4. Data Retrieval ..................................................................................... 17

    2.4.1 Web Content Mining ......................................................................... 17

    2.4.2 Extraction Techniques ...................................................................... 17

    2.4.3 Examples ....................................................................................... 18

    2.4.4 Dapper.Net ..................................................................................... 21

    2.5. Data Storage ...................................................................................... 22

    2.5.1 XML ............................................................................................... 22

    2.5.2 XPath ............................................................................................. 23

    2.5.3 Storing XML .................................................................................... 24

    2.5.4 XML Databases ................................................................................ 24

    2.6. Transforming XML ............................................................................... 25

    2.6.1 XSL................................................................................................ 25

    2.6.2 SVG ............................................................................................... 27

    2.6.3 XForms .......................................................................................... 28

    3. Specification ................................................................................................ 30

    3.1. Problem Statement .............................................................................. 30

    3.2. Stakeholder Analysis ............................................................................ 30

  • 4

    3.3. User Goals .......................................................................................... 31

    3.4. Use Cases........................................................................................... 32

    4. System Design ............................................................................................. 32

    4.1. Methodology ....................................................................................... 32

    4.2. Required Technologies ......................................................................... 32

    4.3. Decisions ............................................................................................ 33

    4.4. Modules ............................................................................................. 33

    4.4.1 Dapper Module ................................................................................ 34

    4.4.2 Database Connection Module ............................................................ 35

    4.5. Proposed Architecture .......................................................................... 36

    5. Implementation ............................................................................................ 36

    5.1. Issues and Design Changes .................................................................. 36

    5.2. Module implementation ........................................................................ 37

    5.2.1 Utilities Package .............................................................................. 37

    5.2.2 Dapp Manager Package .................................................................... 38

    5.2.3 DBQuery Package ............................................................................ 40

    5.2.4 Style sheet & Icon Design ................................................................. 41

    5.3. Final System Architecture ..................................................................... 44

    5.4. Interfaces ........................................................................................... 44

    6. Testing ........................................................................................................ 46

    6.1. Component Testing .............................................................................. 46

    6.2. Usability testing .................................................................................. 47

    6.3. Speed & Accuracy Test ......................................................................... 48

    6.4. Results ............................................................................................... 51

    7. Conclusion ................................................................................................... 52

    8. Bibliography................................................................................................. 53

  • 5

    1. Introduction

    1.1. Problem Statement

    For Investors, making financial gains requires quick and informed decision-making based

    on many different sources of time sensitive data. News articles, company profiles,

    financial indicators and general economic conditions are constantly changing and all can

    affect the performance and financial well being of a company and therefore its stock

    price.

    The process of gathering and analysing this information is time consuming. For investors

    the cost of their time and charges incurred through buying and selling shares

    immediately reduces potentials gains.

    Another consideration is the time dependant nature of trading, this means that not only

    do investors have to collect and analyse the relevant data they must also make a

    decision based on this data while it is still relevant. Although many sites or portals exist

    specifically aiming to provide a single source for investors, these sites themselves

    become vast and are still predominantly text based.

    The task for investors is then not only to discover the relevant information but also

    determine some meaning from it.

    1.2. Overview

    This paper aims to research and develop a graphical display method to present a holistic

    view of a market index, specifically the FTSE 100. The intention being that investors can

    narrow their research efforts by filtering which companies are worth investigating further

    and ultimately help make decisions whether to buy, sell or hold a particular stock. A brief

    overview will be given of the stock market and a functional analysis of the trading

    process. An analysis of existing systems and technologies will be used to develop a

    graphical tool for traders to gain a quick insight into the market.

    1.3. Scope

    Some assumptions are made about the users of the final system, most financial sites

    offer tutorials and helpers to aid first time investors on the intricacies of the stock

    market, the assumption has been made that the users and testers of this system where

    necessary will have relevant knowledge and experience to do so, and that the writing of

  • 6

    such educational material into the final site is beyond the scope of this project. The final

    system is also intended to provide proof of concept and as such will demonstrate some

    possibilities in terms of retrieving data, however all possible scenarios will not be

    implemented, with a view that the system is versatile enough to retrieve data from

    many diverse locations assuming the user provides appropriate configuration.

    2. Literary Review

    First an understanding of the stock market and available technologies is required to

    assess high-level functional requirements and identify technologies, which can answer

    these requirements.

    2.1. The Stock Market

    Stock is the term for the outstanding capital of a company or corporation, this stock is

    divided up into shares which are traded on an exchange in a similar way to an auction,

    the difference being that in a stock market sellers and buyers do not make trades on a

    highest or lowest offer wins basis, instead they are matched based on the price they are

    willing to trade at.

    2.1.1 Fundamentals

    Many exchanges exist globally, among these the major ones include the New York Stock

    Exchange (NYSE) in America, the Nikkei in Japan and in the UK the London Stock

    Exchange (LSE).

    Each exchange consists of lists or indexes of companies grouped by market capitalization

    (the estimated total value of a company). If a company is listed on a particular index

    investors can gauge how large the company is in terms of its financial value. This project

    is concerned with the FTSE 100, which lists the top UK companies traded on the London

    Stock Exchange.

    The price at which shares are bought and sold is governed by many factors, the price of

    a stock can be thought of as a reflection of what the market is willing to pay. Expressed

    another way it can be said that the market price is a reflection of the perceived value of

    a company, this value changes over time to reflect the company’s financial performance

    and well-being. As Elinger observes the market is searching for the right price1

    1 The Art of Investment Elinger, A.

  • 7

    2.1.2 Predictability

    The predictability of financial markets has engaged the attention of market professionals

    & academic economists & statisticians for many years 2

    Being able to predict how a market or individual share is going to behave in the future

    would be of great advantage to any investor giving them a guaranteed profit on any

    investments they make. As such this is exactly what many investors try to do.

    Several methods and techniques exist from fundamental analysis to technical charting.

    The effectiveness of such techniques is always being debated and indeed whether or not

    it is in fact possible to predict market movements with any degree of accuracy. Some

    studies and theories challenge the reasoning behind such pursuits one notable study is

    the efficient market hypothesis.

    2.1.3 The Efficient Market Hypothesis

    Malkiel3 first proposed the Efficient Market Hypothesis in 1973. The findings of his

    research suggest that the market cannot be predicted using any of the formal techniques

    such as fundamentals and technical analysis. These methods rely on quantitative data

    about companies and trade information such as prices and volume, which are freely

    accessible in the public domain.

    It is proposed that since this information is already freely available to all investors the

    market already reflects any implications of this information. The debate continues

    between promoters of the EMH and the more traditional technical analysts, as yet no

    solid conclusions have been made either way, and with so much attention from various

    research sources the debate is likely to continue. Recent advances in techniques,

    computing power and larger data sets available via the Internet have fuelled this debate

    further4.

    It has therefore become clear that an alternative approach is required to instead provide

    investors with all the information and data they require in such a way that allows a quick

    overview and analysis of market activity to help make investing decisions, Mills5

    proposes that investors need to gather and analyse this information as soon as it

    becomes available so that timely decisions can be made.

    2 Predicting the unpredictable Mills, T

    3 Random Walk Down Wall Street Malkiel, B

    4 Predicting the unpredictable Mills, T

  • 8

    2.1.4 The Rise in On-line Trading

    A number of factors have resulted in on-line trading becoming more popular in the past

    few years, increased availability of data, increase in net usage, new technology faster

    connections and favourable market conditions have made investing in stock more

    attractive6.

    This vast increase in on-line trading has given rise to many web sites offering trading

    tools for investors and market data portals.

    In addition recent standards and technologies such as XML and Web 2.0 have enabled

    richer web based applications including the use of graphics.

    2.2. Information Visualisation

    Information visualisation is concerned with the representation of data in a graphical

    format, which successfully imparts information to the viewer. This idea was famously

    captured by the proverb ‘a picture is worth a thousand words’.

    Tufte7 takes this concept further by introducing the idea of data density. Textural based

    representations are limited by the viewer’s ability to read and understand the text itself.

    Basic Text can be thought of as one dimensional in its ability to communicate

    information, being the value the characters represent. Graphics on the other hand can

    be used to represent more than one dimension through the use of colour, size, shape

    and context. This ability means that more data can be represented over a set area.

    Harris8 describes how using colour alone can help authors in the following ways:

    Differentiate Elements

    Encode areas of equal value

    Alert viewer when a predetermined condition occurs

    Identify particular values

    Indicate similar items

    Signify changes in direction, trends conditions

    5 Predicting the unpredictable Mills, T

    6 Stock Market Psychology, Warneryd, K

    7 Envisioning Information Tufte, E

    8 Information Graphics Harris,R

  • 9

    Improve retention of information

    Use gradations to indicate transitions from one set of conditions to another

    It can be seen that many of these attributes lend themselves nicely to the stock market

    scenario, particularity in the identification of trends and changes in direction for

    numerical indicators.

    2.2.1 Existing Graphical Representations

    The idea of representing data using graphics is not new even in the stock market

    scenario; various charts and display methods already exist:

    Simple time series: Probably the most synonymous chart with stock markets is the time

    series graph, which simply plots one variable against a set time period from this an

    investor can see how the price has performed historically.

    Figure 1 Example of traditional charting on Self Trade.

    Candlesticks: Bar Chart/Candlestick – First devised by a Japanese rice trader the idea of

    the candlestick diagram is to show price change over a certain period in relation to the

  • 10

    highest and lowest price. Candle sticks are still used today on many sites such as Digital

    Look and Self Trade. They are a good example of how graphics can be used to store data

    in a smaller area. The example below shows that by using a box and two lines the

    diagram can successfully communicate 4 pieces of information to a user at once. When

    combined with a time series chart even more information can be imparted.

    Figure 2 Candlestick Example9

    Heat-maps: The concept of the heat map is to display a particular indicators rate of

    change (most commonly the price change over a period) and communicate this change

    graphically by changing the colour of the graphic.

    Digital Look10 provides one example of a heat-map currently available:

    9 http://www.babypips.com/school/what_is_a_candlestick.html

    10 http://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?

    http://www.babypips.com/school/what_is_a_candlestick.htmlhttp://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?

  • 11

    Figure 3: Digital Look Heat-Map

    MSN11 also provides a similar heat-map display again this displays the price change for a

    certain period.

    11

    http://msn.moneyam.com/heatmaps/

    http://msn.moneyam.com/heatmaps/

  • 12

    Figure 4: MSN Heat-Map

    It can be seen that most of these graphical tools only attempt to map one variable, and

    in all cases it is almost certainly the change in price over a certain time period.

    2.2.2 Issues

    Spencer12 observes ‘The mere re-arrangement of how the data is displayed can lead to a

    surprising degree of additional insight’

    It is clear that graphics can help however on the converse Tufte13 also observes that the

    incorrect use of graphics can have a negative effect.

    Some common errors include the use of irrelevant decoration, information overload and

    negative use of colour. As such the factors must considered when designing such

    interfaces. As a guide the following requirements need to be addressed:

    Selection of Data – Relevant to a task

    Representation – How to represent abstract things

    Presentation – Spatial Limitations

    12

    Information Visualisation Spence, R 13

    Visual Explanations Tufte, E

  • 13

    Scale Dimensionality - How many dimensions, variables can be displayed

    Re-arrangement, interaction & exploration

    Internalisation – Minds representation of an internal image

    Externalisation – Display of what user actually sees, i.e. computer display

    Mental Models – human memory models

    Invention, experience & skill

    2.2.3 Cognitive Maps

    The next consideration in terms of information visualisation is how the user interacts

    with the graphic. The idea of a cognitive map is how the user constructs a navigational

    guide to an interface in memory; a simple real world analogy would be the London

    underground map.

    Most passengers on the underground have one goal in mind, which is how to get from

    point A to point B and the required connections between the two. As such the

    underground map uses colour to represent the different connecting routes and does not

    attempt to display any other real world data such as accurate scales because the user is

    not interested in this information.

    Another analogy would be to think of cognitive maps as the bridge between the real

    world, the computer display and the users memory14.

    The process of creating these maps can be illustrated by the following sequence:

    Browse > CONTENT > model > INTERNAL MODEL > interpret > INTERPRETAION >

    Formulate browsing strategy > BROWSING STRATEGY.

    To aid this process the concept of Context maps can be used to help users create such

    models. Such maps aim to give the viewer an basis on which to build their own cognitive

    map.

    2.3. Functional View of Market Trading

    To gain an understanding of how investors make decisions and the ways in which this

    data is analysed a functional analysis of trading activities is undertaken.

    14

    Mental models, Navigation

  • 14

    2.3.1 Investor Goals

    Investors all share a common goal to achieve a return on their initial investment. On a

    very basic level the goal is to always buy when a stock is undervalued before the market

    moves to reflect this, and conversely sell when an investment is overvalued. Put simply

    buy low and sell high.

    The methods used to achieve this will vary from person to person. Individual goals and

    strategies will differ between individual personalities and age groups. Investors can

    however be grouped into two general categories as either active or passive traders, also

    known as short and long traders.

    Active traders aim to make profit from the short-term natural fluctuations in price or

    volatility. The frequency of these trades varies, the most extreme example being the day

    trader who makes very large trades over short periods to take advantage of daily

    fluctuations in price.

    Passive traders in comparison aim to take advantage of the markets long-term tendency

    to increase, they therefore make very infrequent trades and buy shares periodically to

    add to their portfolio as opposed to selling. Most traders generally fall into the second

    category15

    2.3.2 MSN Research Wizard

    MSN Research wizard16 gives a good indication of what is involved when deciding to sell

    or buy shares. The page is a kind of expert system using MSN data to guide an investor

    through the process of assessing an individual company. The wizard looks mainly at

    fundamental data to gauge how good an investment is.

    The wizard is split in to 5 main sections. The first step looks at the company’s

    fundamentals; a set of indicators used to assess a companies financial well-being.

    Fundamentals can be used to determine how profitable a company has been to date and,

    as well as giving an idea of the general state of their finances. The kinds of question it

    aims to answer include:

    How much does the company sell and earn (sales & income)

    15

    Stock Market Psychology, Warneryd, K 16

    http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E

    http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E

  • 15

    How fast is the company growing (sales growth & income growth compared to

    industry)

    How profitable is the company (profit compared to industry 1yr & 5yr)

    How is the companies financial health (debt/equity ratio compared to industry)

    Some investors use a company’s past price performance as an indication of future

    performance, many will argue that past prices have no bearing on future prices, likewise

    some will argue that a company that has performed well to date should perform well in

    the future. As such this page basically gives an overview of the stocks performance

    measured as price change over the past 1, 3 and 12 months.

    Following on from the fundamentals the next section looks at the likely future price of

    the investment. Using a company’s profits to earnings ratio along with analyst

    expectations an estimate of how the company is likely to perform over the coming 2

    years is given.

    A company’s share price can be affected by a number of social factors, such as news

    stories relating not only to the company itself but general economic conditions. An

    extreme example of this is demonstrated by the Northern Rock bank crisis17, which saw

    the share price lose 30% of its value overnight. This dramatic drop in price was initiated

    after it emerged the company had sought a loan from the Bank of England as a result of

    difficult financial conditions. Despite the fact the fundamental business was sound the

    panic that ensued as customers withdrew savings caused the market price to freefall.

    Recognising the importance of financial news MSN have added in a catalysts section to

    the wizard, which details any company specific news stories that could impair or improve

    confidence in the company.

    Finally another predominant task in the decision process is considered, which is

    comparison. Looking at a single company profile can only impart information in a single

    context. To get meaning from this data a comparison is required, in this case MSN allows

    comparative analysis with up to two other company profiles.

    2.3.3 Functional Requirements

    17

    http://news.bbc.co.uk/1/hi/business/7007076.stm

    http://news.bbc.co.uk/1/hi/business/7007076.stm

  • 16

    From our initial investigation it is clear that in terms of making wise investments

    knowledge is key. As J.K Lassers18 observes of Warren Buffet; one of Americas most

    successful investors:

    He will seek out every last bit of information he can get, whether it’s a company’s return

    on equity or the fact that the CEO is a miser who takes after Ebenezer Scrooge himself.

    Using the MSN wizard as a guide the functional tasks can be broken down as follows:

    Determine profitability of a company

    Determine return on investment

    Determine the risk of the investment

    Determine the value of the company

    An insight in exactly how the data is analysed can also be gained. It can be seen that

    most numerical indicators are analysed in the following ways

    Value in relation to highs and lows

    Value in comparison with a base value such as market or sector

    Difference between two values, spreads, rate of change.

    Trends and direction.

    Identification of changes in trend, turning points.

    The main functional requirements can be grouped into two main categories:

    Our first main functional requirement is therefore the retrieval and storage of data from

    the World Wide Web for analysis.

    Secondly to make decisions the data must be analysed, this will involve some or all of

    the tasks described in the previous section, which investors already perform on the

    various sources available. A graphical interface is proposed which will allow users to

    explore and display the retrieved data in different ways to gain a better understanding of

    its meaning.

    Each top-level requirement is investigated in turn to generate lower level requirements:

    18

    Pick Stocks Like Warren Buffet Lassers, J K

  • 17

    2.4. Data Retrieval

    There is a wealth of information available to the investor via the modern Internet. As

    such many companies have emerged which aim to provide content to investors for

    analysis, sites such as MSN money19, Digital Look20 and Self Trade21. As we have seen

    however relevant information can come from a wide range of sources. To access all

    these resources manually involves searching and browsing for content. Even with a

    comprehensive bookmark list of sites, this activity is time consuming and laborious.

    There is a requirement therefore to programmatically extract and consolidate this

    information.

    2.4.1 Web Content Mining

    Web content mining is concerned with discovering information from the many sources

    available on the web22. Using data mining techniques content can be analysed and

    extracted for use in other applications.

    One problem with using such a vast data repository such as the Internet is the dynamic

    nature of the content. In order to retrieve data in any circumstance an application needs

    to know where to look and a reference of what it is looking for. In the context of the

    World Wide Web we are dealing with pages of content which can be written in a range of

    formats; ASP, JSP and HTML may change in structure at any time and may not following

    strict rules associated with mark up languages.

    A further complication is the fact that HTML generally doesn’t contain any type

    information and content will almost always be represented as a generic string type. This

    poses issues when trying to extract useful information, which will be used by another

    program that is strongly typed such as Java.

    Luckily despite these issues there are techniques and programs, which solve these

    problems:

    2.4.2 Extraction Techniques

    A basic technique for retrieving web-based content is the concept of Screen Scraping.

    Screen scraping involves extracting data from its final output format, usually the visual

    19

    http://money.uk.msn.com/ 20

    http://www.digitallook.com 21

    http://www.selftrade.co.uk/

  • 18

    display of the program being scraped. In the context of the web this would involve

    taking content from the browser directly. This can be achieved by a number of methods

    such as regular expressions or dedicated API's. This technique has limitations however,

    because the data being extracted is taken from a format which has human readability in

    mind, additional processing is required to remove styling elements. The data itself will

    not necessarily be structured in a suitable way for use by other programs and as such

    requires contextual information added later.

    Tree Builders are aimed specifically at web page extraction and take advantage of the

    mark up languages structure. A tree builder will attempt to create a tree representation

    of a web page in memory by matching start and end tags in the target document. The

    program will then build a representation of the structure in order to provide a navigable

    context. The designer of the particular extraction program will dictate the way in which

    the tree is built and how extensively it caters for specific tag libraries. Once a tree

    representation has been created data can be extracted based on its location in a

    document. This method is useful for retrieving data from many pages, which have

    identical layouts for different content, such as stock prices, but can only work with

    supported formats.

    W3 introduced the Document Object Model23 or DOM to address these issues, in their

    own words:

    ‘The Document Object Model is a platform- and language-neutral interface that will allow

    programs and scripts to dynamically access and update the content’

    The introduction of this standard meant that API and program writers had a common

    interface to work from. As such parsers can take the tree builder concept to the next

    level by building a DOM representation of the page in order to extract its content.

    2.4.3 Examples

    Implementation of a fully-fledged extraction program is time consuming and not the

    main focus of this project; there are many freely available programs for this task, two

    notable online examples being Yahoo pipes24 and Dapper25:

    Yahoo pipes is a web 2.0 application available exclusively on-line, it relies on structured

    data in the form of XML, RSS feeds and JSON as a target content type. The site consists

    22

    Web Content Mining with Java, Loton, T 23

    http://www.w3.org/DOM 24

    http://pipes.yahoo.com/pipes/ 25

    http://www.dapper.net/

    http://www.w3.org/DOMhttp://pipes.yahoo.com/pipes/http://www.dapper.net/

  • 19

    of a graphical interface in which users add modules and connect them to create a

    customized output from existing web pages.

    The modules themselves perform various tasks affording the users control over the data

    retrieved from selected URL’s. The output is then displayed as a standard html page,

    which can be viewed by anyone who logs into the site.

    Example creating a simple RSS feed aggregator.

    The “Fetch Feed” module is used to retrieve news stories from the BBC's business feed,

    this is simply connected to the output module.

    Figure 5 Simple feed to retrieve RSS from the BBC

    Multiple feeds can be combined in the fetch feed module, a filter module is added to

    allow users to search the feeds for specific terms. The search module is added to provide

    a user input on the main page.

  • 20

    Figure 6 Simple Aggregator to combine two feeds

    A search term box is added in the above example to filter only news items of interest

    from the 3 selected news feeds.

    Figure 7 Output page for the aggregator with search term box.

  • 21

    Other modules can be used to create more complex pipes, XML data can be extracted

    directly and manipulated, filtered or combined with other web sources to create useful

    pages. However the application is limited to use with live data and the output is

    restricted to the standard output, in addition few sources of useful data are freely

    available in XML format.

    2.4.4 Dapper.Net

    Dapper (concatenation of Data Mapper) is another online application that allows users to

    extract content from anywhere on the net and output it into various formats including

    XML, JSON, RSS feeds etc. Dapper also provides a Java API allowing developers to

    connect their programs with dapper to retrieve the extracted content.

    Dapp’s are small retrieval applications created using the main site. Each Dapp is created

    to parse a specific web page. Initially this is achieved via a virtual browser within the

    site. The user interface allows web content to be selected for retrieval. In the example

    below the last trade price element is selected. Each selected element can have some

    basic manipulation to remove preceding or tailing strings in this case the p is removed.

    Figure 8 Dapper UI showing selected content

    Any number of elements can be added. Once the content has been selected the user can

    add field names and the output grouped. These are reflected in the resulting XML output.

  • 22

    Figure 9 Preview showing output

    Dapper is flexible enough to allow modifications to the content which is retrieved at a

    later date. The addition of the Java API allowing external programs to interface with

    Dapper makes it an ideal solution to the retrieval problem.

    2.5. Data Storage

    The second top-level requirement of the proposed design is the storage of the data

    retrieved by Dapper. The output format from dapper is selected when the Dapp is

    created and the user has several options including RSS feeds JSON and standard HTML.

    Since we are using the data in another application it makes sense to retrieve the data as

    XML:

    2.5.1 XML

    XML is a standard for data exchange and has become popular for use in desktop

    applications for configuration files, as well as on the web to store and exchange data.

    XML can be thought of as data about data, in that not only does it contain the actual

    data but also contextual and structural information.

    XML has many advantages; firstly it’s high portability between applications and cross

    platform the fact that it has been a W3 standard since 1998 means a lot of applications

    and application interfaces are available. For the example Dapp that we created in the

  • 23

    previous section the XML output would look as follows (the actual output has been

    simplified to show only the elements of interest).

    MSNPriceData

    http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L

    1.233

    2007-07-29 15:59:25

    605.00

    Although there is only actually one piece of data, being the last price the Dapp gives lots

    of other information within the XML document such as the source of the data, when it

    was accessed and the name of the Dapp that accessed it.

    The structure of XML is strict in that every start tag must have a corresponding end tag

    and each document must have a single root element. In this example it can be seen that

    the tag is the root and all the other tags are nested within it. This

    characteristic allows logical grouping of elements in hierarchies.

    2.5.2 XPath

    XPath is a query language that enables the inspection of XML files. The language is a W3

    standard and works on a hierarchical basis similar to a file system. An XPath navigates

    through the document structure to a particular node or set of nodes depending on how

    far down the tree the path goes. This adds an interesting capability to XML documents in

    that they can be treated as a very simple database provided an XPath interface is

    available.

    In the above example we consider the following XPath expression:

    //PriceData/last

    The double slash at the start tells the path to start from the root node: elements the

    following expressions tell it to first navigate to the element which is a child

    of and then to the element which is in turn a child of

    the result would then be 605.00; the content of our element.

  • 24

    2.5.3 Storing XML

    Using XML on its own cannot provide a solution which will fully replace a relational

    database; although in theory the data could be extracted and continually added to one

    large XML file the problems of organization, persistence, availability, security, efficient

    search and update still exist.

    There is a need therefore to use a RDBMS to store XML data, a number of possible

    solutions are available. One solution involves storing XML files directly as a file within the

    database, however this solution disregards the logical structure of the XML files when

    performing queries on the resulting table.

    Another solution would be to create further reference tables to store some of the more

    important structural information about a document, which can then be queried. This case

    will not cope well with changes to document structure since the underlying tables will

    need to be updated to reflect such changes.

    Therefore to gain the full advantage of XML the document would need to be decomposed

    before insertion into the database and then recompiled when it is extracted. XML

    schemas could also be used to ensure the structure is maintained. Although the

    database can now provide the same level of logical information as the original document

    there are performance ramifications.

    2.5.4 XML Databases

    XML databases aim to give the best of both worlds. A native XML database allows the

    storage of individual documents in collections, which can be queried and updated using

    XPath and Xupdate; another standard for performing updates on xml. Collections are

    more versatile than a traditional RDBMS in that they can store a set of generic XML

    documents regardless of weather they contain the same structure. Collections can also

    be stored within collection to provide further levels of grouping and allow queries on

    multiple sources.

    Apache Xindice is a Java implementation of a native XML database according to

    XMLdb.org specifications. Xindice runs as a web application in a suitable container such

    as Tomcat, the way in which the database is access and added to is up to the designer of

    the application, since Xindice is Java based there is a substantial API to support most of

    its functions although it is possible to control via a command line interface.

  • 25

    Because it is packages as a web app collections can be viewed via a web browser:

    Figure 10 Xindice debug tool showing a collection of XML files

    Xindice nicely answers our second requirement to store our retrieved data, since this is

    already in XML format courtesy of Dapper. It also means we don’t need to worry about

    tailoring for changes in incoming data’s structure and a handy interface is provided to

    check up on the collections.

    2.6. Transforming XML

    The final requirement is to represent the retrieved data in a graphical format, again W3

    and XML standards provide the answer. Two standards exist which can address the

    problem: XSL and SVG.

    2.6.1 XSL

    XSL stands for Extensible Style sheet Language. XSL is to XML what CSS is to HTML. W3

    continues its mission to separate data from presentation by introducing XML style sheets

    or XSTL for short. XSL allows designers to dynamically change the representation of XML

    data into other formats such as HTML and SVG.

    Using our example output file from before we add an extra line to reference the style

    sheet:

  • 26

    MSNPriceData

    http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L

    1.233

    2007-07-29 15:59:25

    605.00

    In this case we want to simply display this data in a HTML file along with some other

    information, the resulting style sheet would look as follows:

    Latest Price

    Latest Stock Price :

    The style sheet uses our XPath expression to reference the content to display, the

    resulting HTML file looks as follows:

    Figure 11 Result of XSL transform in Firefox

  • 27

    The advantage of this is the separation of data from presentation, we could use the

    same style sheet over and over again to display the price of different stocks.

    XSL requires a parser to transform XML data, most browsers support this as standard so

    that XML can be styled on the client side to provide the desired result in the case of

    Firefox Expat is used. It is also possible to style the data on the server side using a third

    party parser such as Apache Xalan before passing the resulting transformed document to

    the client, in which case they would simply receive the HTML representation.

    2.6.2 SVG

    SVG stands for Scalable Vector Graphics26 another W3 standard, which extends XML to a

    graphical format. SVG aims to address some of the current issues with web-based

    images such as file size and varying screen resolutions. Vector Graphics are images

    generated from a series of vectors drawn between defined co-ordinates. The relevant

    data required to draw the image is stored as XML mark-up using SVG tags. One

    advantage of this format is the ability to scale the image without loss of quality or

    pixelation. One draw back to this technology is the need for a plug-in to be installed

    within the client browser, although Firefox and Opera support SVG as standard IE still

    requires the Adobe plug-in.

    A further advantage being that SVG is part of the W3 recommendation so it can be

    coupled with XSL to generate graphics from XML making it ideal for representing

    numerical data graphically.

    Again looking at our previous example we use the same XML/XSL combination to draw a

    simple box that represents the price of a security. The XSL to achieve this would be as

    follows:

    26 http://www.w3.org/TR/SVG11/

    http://www.w3.org/TR/SVG11/

  • 28

    Resulting SVG output, albeit not very interesting the dimensions of the box in this case

    are determined by the stock price divided by 10:

    Figure 12 Simple box representation of a stock price

    The XSL is slightly different in this case because we need to use our data as a value

    within the SVG mark-up to do this an XSL variable can be used to temporarily store the

    data so it can be used in the transform.

    The use of XSL XML and XSL is a nice set of standards, which answer our data

    presentation problem, once a suitable XSL template is created it can be reused wherever

    required.

    2.6.3 XForms

    Xforms (short for XML forms) is one of the latest standards from W3 pitched in there

    own words as the latest generation of Web Forms to replace the outdated HTML form27.

    Xforms aim to make the task of creating web forms easier with many of the standard

    tasks involved in such an exercise incorporated into the specification; retrieving and

    saving data from local files, validation of user inputs and dynamic content are just a few

    examples. One of the main advantages of Xforms however is the ability to access and

    update XML content and provide logical bindings between data, even in separate XML

    files. Xforms also aim to provide a better user experience with some AJAX like

    functionality built in.

    Xforms are written in XML using Xform tags they access content in other XML files using

    the concept of bindings along with XPath to navigate the documents. A further benefit is

    the ability to make asynchronous submissions from the form without any laborious

    Javascript.

    27

    http://www.w3.org/TR/xforms/

    http://www.w3.org/TR/xforms/

  • 29

    The following example illustrates a simple Xform:

    The XML file from the previous example is used to write an Xform to give a user access

    the data:

    My First Xform

    My XML Data


    Data:


    Save

    The above form will appear as follows in an Xform enabled browser:

    Figure 13 Our price data from before appears but can now be edited.

    With Xforms the designer defines a model representation of the data, this can be

    programmed directly into the form or referenced from an external file as in the above

  • 30

    example. Here the submission element tells the forms processor to save the file to the

    local file system as data.xml.

    A further advantage of Xforms is the fact that it can access any other XML standard

    mark-up such as XSL. Coupling these two standards together enables the form to not

    only access XML data but also manipulate XSL files and therefore change the resulting

    SVG output.

    3. Specification

    Given our initial requirements have been identified and relevant technologies researched

    the problem can be assessed in more detail, to generate further requirements the

    problem statement can be updated and stakeholder analysis revisited.

    3.1. Problem Statement

    The initial problem statement was the retrieval and representation of data from the

    Internet. We now have to consider how this will be solved using the technologies

    identified. Specifically the inclusion of Dapper adds additional functions and requirements

    from the point of view of administering and running the retrieval.

    3.2. Stakeholder Analysis

    The initial stakeholder analysis identifies the primary, secondary and tertiary

    stakeholders:

    Primary Stakeholders: Administrators

    User Profile: The administrators could also be private investors. At present the

    assumption is made that some external management is required for the site

    whether this is by the investor using the site or a third party.

    Role: Ensure errors caused by external factors such as server downtime, changes

    to site structure are dealt with, input will be required to respond to such problems

    and update Dapp’s as necessary.

    Goals: Browse web sources for relevant information. Identify information, which

    is of interest. Manage and update Dapp's. Build up and maintain collections of

    data sources. Schedule tasks for Dapp's to perform. Manage collected data.

    Secondary Stakeholders: Investors and End Users

  • 31

    User Profile: Investors and front end users who will access the date retrieved via

    the graphical interface.

    Role: The data that is retrieved and the format that it is eventually stored in will

    affect the people who use that data. Investors want information as soon as it is

    available and spending time searching for this information is costly both in terms

    of investors time but also in their ability to make informed decisions.

    Goals: Gain an overview of all relevant information relating to current or future

    potential investments. Select the output format for the data. Filter and Search

    data. Extend administrator goals.

    Tertiary Stakeholders: Content Owners

    User Profile: Web masters and web content owners

    Role: Maintaining web pages and content

    Goals: Attract users to their sites and in some cases generate revenue through

    advertising or subscription

    3.3. User Goals

    High-level goals identified from problem statement and stakeholder analysis are used to

    define top-level use cases.

    Manage and update Dapp's.

    Build up and maintain collections of data sources.

    Manage collected data.

    Select different views of the data.

    Filter and Search data.

  • 32

    3.4. Use Cases

    Figure 14 Use Case Diagram

    4. System Design

    4.1. Methodology

    The Design Methodology used is a top down modular approach to development. Starting

    with high-level use cases the interfaces and main functionality are determined; from

    here functional requirements are elicited as separate modules based on their intended

    tasks. The previous sections have outlined the various specifications available to answer

    our three top-level requirements, Data retrieval, storage and transformation. These

    standards follow a strict Model View Controller paradigm as such it makes sense to

    extend this to the whole application.

    4.2. Required Technologies

  • 33

    Before development some base technologies are required to support the system. Xindice

    runs as a web application on a suitable container, in this case Apache Tomcat is chosen.

    Once Xindice has been downloaded and unpacked it is deployed to Tomcat and tested

    using the appropriate URL, in this case: http://localhost:8282/xindice/?/db.

    The top level collection in Xindice is called db, the question mark indicates the debug

    page which is automatically loaded when Xindice is accessed using the base URL. This is

    the only user interface provided as standard for Xindice. XML files can be viewed via this

    tool but not added or manipulated.

    Xforms and SVG cannot be viewed on all browsers by default. Extensions are required

    for most to support these standards and the level of functionality supported differs

    between implementation. As such Firefox is chosen since it provides good support for

    SVG and the Mozilla Xforms extension implements most of the Xforms 1.0 functionality

    despite still being in a development stage.

    To aid development the eclipse IDE is used since it supports most of the standards used

    with the exception of Xfoms and SVG. Firefox provides an error console, which is useful

    for debugging XML content as such it can provide useful feedback on SVG, XML, XSL and

    Xform errors.

    4.3. Decisions

    Although much of the necessary functionality can be implemented on the client side

    using browser extensions, a back end is still required to interface with Dapper and

    Xindice.

    Java Servlets were chosen to address this requirement partly due to the Java API

    support for both Xindice and Dapper but also because java has plenty of XML and DOM

    API’s to allow XML data handling. Xforms can post data as XML files direct to the server

    to handle these files the server side application must therefore be able to access and

    manipulate XML.

    4.4. Modules

    To simplify the design process the application is split into smaller modules each

    addressing a specific function. Keeping with the grouping used so far the main functions

    are data retrieval, storage and presentation.

  • 34

    4.4.1 Dapper Module

    Much of the data retrieval requirement has been addressed by Dapper, however the

    Dapper.net provides a means to create the Dapps but not control their execution. The

    Dapps themselves only have the ability to execute for one URL at a time, our

    requirement is to extract data from different sources but also for multiple pages in the

    same resource, this involves specifying parameters directly within the URL.

    Our first requirement is therefore a means to execute a Dapp and specify a URL for it to

    work on. From our Use cases we also have the requirement to maintain collections of

    resources for the Dapp to retrieve from. Finally there are two requirements to select the

    Dapp to be used and specify the storage location for the data.

    The Dapper API unfortunately is very basic and looks incomplete. As such the interface

    options for Java are limited so much of the above functionality needed to be

    implemented.

    Using our Model View Controller ideal the functionality is divided, first implementing the

    data aspect of our module an XML file is created to store the Model view of our Dapp.

    Unfortunately the Dapper API does not provide an obvious means to elicit certain

    parameters from the site. The model will therefore be a means to represent each Dapp.

    From the initial requirements the following information needs to be stored:

    Dapp Name

    Storage Location

    Collection of resource locations (URLs)

    Using XML as a storage medium in this way not only makes sense because XML support

    is required for the other aspects of the design so extra effort is saved on implementing

    another means to store the configuration data.

    The View aspect will be taken care of by Xforms again to take advantage of the XML

    standards and functionality on the client side. With Xforms users can manipulate the XML

    files to address our requirements to add and update lists of resources,

    The above will provide a nice interface for some XML but won’t actually do anything so

    the controller aspect is required. Since the storage medium for the retrieved data is

    Xindice, which needs to run on Tomcat, a servlet container, it is logical to use Java

    servlets for our back end functionality.

  • 35

    4.4.2 Database Connection Module

    To access Xindice methods are required to first of all connect to the collection and

    perform queries on the data, again a servlet module will be used for the controller

    aspect. The data is stored in collections within the database, the top-level collection db

    contains database specific files such as Meta information and should not be used to store

    content, as such collections need to be created. Our first set of requirements is therefore

    to provide the ability for users to create collections within Xindice.

    Once data is retrieved by Dapper it will need to be inserted into a specific collection,

    although the data retrieval task is handled by the Dapper module the retrieved data will

    be passed to the database connection for insertion. Xindice allows the programmer to

    specify a unique id for the document being inserted into the database, however since we

    will be querying the XML content directly using Xpath having a suitable system for

    identifying documents by their id is not required. In addition Xindice has a mechanism in

    place to automatically assign unique ids to files as they are added which saves some

    development work.

    Finally a requirement exists to query the collections, the Xindice API provides a query

    engine, which accepts an XPath string as input, the issue will therefore be to provide a

    suitable interface to the user that can be translated to an Xpath query whilst being user

    friendly.

    To provide the XSL functions a server side function is proposed to work with the other

    servlets. A third party parser is required such as Apache Xalan to achieve this. Although

    the browser can take care of processing XSL some extension functions may be required

    to provide more robust support for numeric processing, many extension libraries exist

    which can be used for this purpose.

    From these initial design considerations a conceptual architecture was drawn up showing

    the relationship between the various components.

  • 36

    4.5. Proposed Architecture

    Figure 15: Proposed System Architecture

    5. Implementation

    The implementation approach was again top down, first of all creating the user interfaces

    to address user requirements then developing the Java code to accommodate the

    intended functions. The final implementation differed slightly from the original design

    concept as problems and improvements were discovered in through the implementation

    phase of the project.

    5.1. Issues and Design Changes

    In the initial concept a couple of changes were made: firstly it was initially envisioned

    that the final system would behave much like a real world web application with login

    details and user specific preferences. It was decide however that this kind of

  • 37

    functionality did not add any major benefit to the project nor did it help achieve the

    initial goals.

    The second change was to the retrieval aspect, it can be seen from the conceptual

    architecture that the intention was to allow XSL transformations to be made on the data

    before being inserted into the database; unnecessary data could be removed and

    additional information added to improve document retrieval. It was later decided to drop

    this function since the benefits would be minimal.

    5.2. Module implementation

    As with the overall architecture some changes were made during the development

    process to accommodate new information as it became available. The implementation of

    each module is discussed in detail.

    5.2.1 Utilities Package

    The utilities package was added to provide some basic functions to each of the other

    modules rather than repeating code. Two core functions that both the database query

    and dapp manager classes would require was the ability to access Xindice and

    manipulate XML documents using DOM4J.

    The Database Connector class provides basic database functions such as connection,

    collection discovery, insertion and retrieval of documents. Queries are also execute via

    the Database connector by passing an XPath string expression to the executeQuery()

    method. Although no DOM standard implementation is favoured by any of the APIs

    DOM4J was chosen because of the range of available functions. Within all the modules

  • 38

    XML files are manipulated or passed as DOM4J implementations of the Document

    interface.

    There were few issues with the database connector because much of the functionality is

    available via the Xindice API and little additional functionality had to be coded.

    The XML helper class was implemented to carry out the XML document processing which

    became a common requirement between classes. The class handles saving, reading and

    converting XML between formats.

    During the development process it became clear that the generic typing of the retrieved

    data by Dapper was going to cause problems with XSL. Some of the SVG transforms

    required numerical data without any formatting information included. For example 1000

    is represented as 1,000. To ensure the data retrieved is suitable for use with XSL,

    regular expressions and additional data validation had to be added. As such the retrieval

    process became more complicated. The solution was to add a user defined content type

    field to the admin page so that users could specify what kind of data they were

    expecting to retrieve. The selection is used to perform regular expressions on the input

    strings to ensure the data will work with XSL.

    Another feature of Xindice is that it can be used to update the XML files contained within

    a collection using XUpdate, it would have been more elegant to store the relevant

    configuration files in a separate user collection within Xindice. The issue with doing this

    is that frequent database queries would need to be made because of the dependency

    between Xforms and the XML files. As such it was decided to keep the files static on the

    server and manipulate the documents using the XMLHelper class as such methods were

    added to add and remove nodes sets from the document.

    5.2.2 Dapp Manager Package

  • 39

    Despite the functionality provided by Dapper the execution and management of the

    Dapps became more involved that expected. A Dapp implementation class was required

    to store and manage the data sent via Xforms, an additional URLlist class was

    implemented to manage the list of variables being used for retrieval. Finally a servlet is

    used to access the objects.

    The XML data submitted by the Xform is used to instantiate the DappImplementation

    and Urllist objects. A base URL and a list of variables is specified by the user and stored

    in the dapp configuration XML file. Once submitted, the URLlist class is responsible for

    generating URLs and keeping track of the current progress. Replacing a predefined

    marker in the base URL with a variable creates the URL as follows:

    http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:{var}

    becomes

    http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:TSCO

    The URL can now be passed to the Dapp for retrieval, once the Dapp has executed the

    resulting XML file is validated to see if the retrieval was successful and if the data is valid

    against our regular expression list. A copy of the output data elements is queried during

    validation and added to the configuration file, this was not part of the original

    functionality but later added to support the use of regular expressions for typing data,

    the added benefit of this is that it can be used as a list of query parameters for the

    Xforms interface.

  • 40

    It was the intention to use the URLlist class as a progress reporter for the Administration

    interface. At present a particularly long list of variables will take a while to execute. It

    would be desirable to provide feedback to the user on its progress, which could be

    achieved by a servlet and javascript to periodically query the list. However this feature

    was omitted due to time constraints.

    5.2.3 DBQuery Package

    The Database Query package contains the classes required for not only the database

    queries but also the XML transforms. The two were packaged together because they are

    both used from the same Xform interface.

    The function of sorting and querying the data could be achieved in a number of ways. It

    was decided that the styling function should be kept separate from the database query

    function. One solution would have been to send the result of the database query direct

    to the client with a reference to the relevant style sheet and allow the users browser to

    perform the transform. This solution however means that every time a user makes a

    change to the way in which a document is styled they need to submit another query to

    the database in addition to parsing and styling the output again. The fact that the client’s

    browser only sees the result of the transform prevents inspection of the output by the

    interface, which can aid some of the context information. The decision was made to keep

    the data retrieval and styling tasks separate.

    When a query is submitted, the DBQuery servlet builds an Xpath expression from the

    user input and queries a specified collection. The resulting XML output is not sent to the

    client but stored on the server. Once complete a submit is triggered automatically to the

    Data Styler servlet which then transforms the output file into and SVG document and

    again stored the result on the server. The interface reads the SVG direct from this file

  • 41

    the advantage of this set-up is that changes to the style sheet can be performed on the

    database output without submitting another query to the database. The added

    advantage is that the database output is now directly accessible to the Xform provides

    additional functionality such as listing available query parameters.

    5.2.4 Style sheet & Icon Design

    The style sheet design proved to be the most difficult part of the implementation. The

    task was to provide a set of predefined graphical representations that users could select

    and manipulate via the main interface. As we have seen XSL provides a mechanism to

    use XML data to transform graphics, this can be any SVG parameter, dimension, colour,

    opacity, shape etc.

    The concept of an icon is used to represent an individual data entity, in this example we

    are looking at individual shares as extracted from Dapper. The icon provides a graphical

    representation of 1 or more pieces of data. The number of parameters differs between

    each icon, the simpler ones only displaying one piece of information as a change in one

    of the graphical aspects of the design.

    The problem with this concept is providing a context for comparison. Because the data

    can be over an infinite range we need some base to compare each item to. To get round

    this problem each variable is presented as a percentage of the groups maximum. For

    example if we want to display the last price the style sheet first needs to know what the

    maximum price is in the data set.

    This is achieved through he use of exstl:math a set of extension functions which can be

    used in addition to XSL. In this case the math:max function is used to determine the

    maximum value of an element in a node set. Once this value has been calculated the

    individual elements can be compared to it to work out where they are placed on the

    scale. Since this value can now be calculated as a percentage the style sheet can

    calculate a corresponding percentage of a graphical value. In figure 16 the opacity of

    each box represents each elements last price in relation to the maximum price of the

    data set being viewed.

  • 42

    Figure 16 simple box icon designs similar to the heat-map concept.

    This example is fairly simple and essentially the same as a heat-map to provide more

    interesting graphics more complex icons needed to be designed using the same

    principle.

    To allow representation using different icons without lots of server side processing the

    functionality of Xforms is taken advantage of again. As we have seen Xforms can access

    any XML based content and as such can access and modify XSL. The basic parameters of

    each icon are stored in the style sheet as global parameters. By implementing a simple

    input control with a reference to these parameters the shapes can be manipulated.

    In figure 16 three global parameters are available; zoom, resolution and text size. The

    zoom function simply scales the graphics up by increasing the relevant dimensions. SVG

    proves its usefulness, as the graphics remain crisp no matter how much a user zooms in.

    The text size parameter is self-explanatory although the same function can be achieved

    via most browsers.

    Finally the resolution parameter is added as an exaggeration function. In figure 17 a

    slightly more complex graphic is illustrated. This time the icon displays the difference

    between two parameters as a sloping line. On initial testing of this model it became clear

    that for some data sets the differences in slope was negligible for some stocks, making

  • 43

    distinction between icons difficult. To address this issue an exaggeration parameter was

    added to allow the user to multiply the slope by a certain factor making small differences

    more visible.

    Figure 17 Rate of change icon showing the difference between two variables

    The icons themselves are based on separate templates within the style sheet. The

    selection of the icon to be used is achieved via the Xform interface and a binding to the

    template reference. This allows the user to select any template from the list.

    To improve usability the style sheet interface needed to be dynamic from the point of

    view that the number of available user specified variables for each template would differ

    and as a consequence would have different meanings in the context of the current

    template. A separate style sheet configuration file is provided to the Xform and bound to

    the style sheet the result is that the Xform now knows what controls to display when.

    Looking at figures 16 and 17 it can be seen that the number of inputs available and the

    labelling of these inputs differ between icons.

    The positioning of the icons on the screen was another problem, which took considerable

    time to resolve. The dynamic nature of the icons and scalability meant hard coding the

    positions on the page was not an option, as such each position needed to be calculated

    based on a stating point the size of the icons and the screen width.

  • 44

    5.3. Final System Architecture

    Figure 18 Final system architecture showing changes

    5.4. Interfaces

    The final interfaces were partly governed by the functional requirements but also

    constrained by the capabilities of Xforms. As mentioned earlier the Administration Xform

    had a few additions to support the specification of basic type information for the

    retrieved content. Xforms can be styled using CSS in much the same was as HTML the

    process is not quite as straightforward and again relies on the browsers support. As such

    only basic styling was used mainly for positioning elements on the user interface.

    Another of xforms advantages is the ability to dynamically display content and controls

    without making requests to a server. This function is used on the admin interface to

    provide a page style navigation through the various Dapps the user has created; the left

  • 45

    and right arrow icons navigate between the Dapps updating the relevant fields. This

    ability is also demonstrated by add and remove controls, which allow the user to add

    new Dapps or variables and likewise remove them. The changes made by the user still

    need to be saved, if the dapp.xml file was stored on the local file system this would be

    easy through the use of Xforms built in put submission, since we are running from a

    server we need to submit the XML and use a servlet to save the changes, although this

    is not a perfect solution Xforms makes the submission asynchronously so the user is not

    affected too much.

    Figure 19 Admin interface showing Dapp configuration data

    There are some outstanding issues with the Xpath navigation, the original intention

    being that the Xform should be able to identify or generate the required Xpath to a

    specific element by inspecting the database output and the dapp configuration file. The

    Xform cannot handle groupings of data in this way and some additional path information

    needs to be added by the user. For example we ideally want the user to be able to enter

    any parameter that is available as either a search parameter or a styling parameter. This

    information is taken from the Dapp and output XML documents stored on the server.

    These elements only store the lowest level element names and as such cannot be passed

    as a useful xpath parameter since we require the full path, in this case we need to first

    access the parent element of Fundamentals first. This is perhaps an oversight in the

    design but some modifications can be made to rectify this by adding more contextual

    information to the dapp.xml file.

  • 46

    Figure 20 A slightly alternative presentation approach where the width of the ring signifies a value

    It can be seen in the above illustration that two sets of submission controls are provided

    to the user one for the database query and one for the data styler. The svg result is

    loaded automatically from the server into a separate iframe, when changes are made to

    the style sheet the xform waits until the submission is complete then refreshes the frame

    to update the graphic.

    6. Testing

    In order to test how effective the design is at fulfilling the requirements the testing is

    divided into 3 categories: Component testing and usability tests to determine how well

    functional requirements are met. To test the initial hypotheses that a graphical interface

    will be of advantage to an investor speed accuracy tests are carried out.

    6.1. Component Testing

    On the software level unit tests were carried out on each component to ensure they

    achieve the desired functionality. Each functional requirement is tested in turn to ensure

    the final design satisfies the original specification.

  • 47

    6.2. Usability testing

    After sufficient testing of the base components was completed the User interface had to

    be tested to determine how effective the design is in terms of usability and also to

    determine if the solution provides proof of concept.

    To test the usability of the system an observational approach was taken based on

    Nielson’s 5 quality attributes28:

    Learn-ability: How easy is it for users to accomplish basic tasks the first time they

    encounter the design.

    Test subjects were not given any background on the program and asked to try and

    interact with it. They were also asked to describe what they were thinking and any

    assumptions they had about the interface. The observer did not respond to any direct

    questions at this point in order to gauge how effective the interface was at

    communicating functionality.

    Efficiency: Once users have learned the design, how quickly can they perform

    tasks.

    After the initial tests users were given the opportunity to ask questions to get a better

    understanding of the interface, they were then asked to repeat specific tasks in order to

    assess how easy it was to perform specific functions.

    Memorability: When users return to the design after a period of not using it, how

    easily can they re-establish proficiency?

    Test subjects were at this point asked to return to the program after a period of time in

    order to assess how easy it was to remember the affordances of the interface.

    Errors: How many errors do users make, how severe are these errors, and how

    easily can they recover from the errors.

    An observational approach was again taken to note any mistakes the user made and

    their impact on the system.

    Satisfaction: How pleasant is it to use the design.

    Finally test subjects were asked on a scale of 1 to 10 how pleasant they felt the interface

    was to use.

  • 48

    6.3. Speed & Accuracy Test

    To test how well the system answers the initial problem an assessment is made of how

    well users can gain insight into the data being represented by the system. To test this

    two factors were investigated; speed and accuracy.

    Experimental Set-up

    A set of 100 shares representing the FTSE 100 was selected, the data set being a

    representation of the market on a specific date. For each date test subjects were asked

    to identify a value in the set firstly on the graphical interface then on a plain text

    representation of the same data.

    The ordering of the symbols was changed in between tests to ensure subjects didn’t

    memorize the positioning of a particular stock. Subjects were timed to see how long it

    takes to identify a particular value and then assessed on how accurate they were.

    For the first two tests the relative size graphical icon was used. This representation

    changes the size of the icon relative to a specified value.

    Figure 21 Relative Size box graphic

    28

    http://www.useit.com/

  • 49

    In the first test users were asked to identify which icon they thought represented the

    highest and lowest value in a collection.

    For the second set of data test subjects were asked to identify trends based on the daily

    price movement. For this test the Two Variable Box representation was used

    Figure 22 Two variable box graphic showing the relative difference

    Correctly identify steepest trend up and down in a collection

    Figure 22 shows the basic two variable icon, the rate slope of the line indicates the

    difference between the specified variables.

    To test the effectiveness of this design users were asked to look at the graphic and

    identify which stock they thought was falling the fastest and also which one they thought

    was rising the fastest.

    The decision times were timed in all cases as a comparison to timings gained using a

    text only representation.

  • 50

    Figure 23 Two variables relative to a third.

    Correctly identify value the indicator near its highest and lowest extreme.

    Figure 23 shows one of the more complex icon designs, similar in concept to the

    candlestick the diagram it aims to show the direction and rate of change of the daily

    price in relation to its year to date high.

    As with the previous icon the slope of the line indicates the rate of change and the colour

    re-enforces its direction. The position of the line in relation to its container box signifies

    how close the current price is to the highest value it has been over the past year.

    To test the effectiveness at communicating this information users were again timed and

    asked to identify the stock they think is closest to its year to date high and furthest

    away.

    Figure 24 shows the text only interface, which was implemented as a style sheet

    template in order to keep the surrounding interface the same and change as few test

    variables as possible. The above tasks were all repeated on this interface again changing

    the sorting order of the data to avoid test subjects memorizing data locations.

  • 51

    Figure 24 Text only representation of a variable

    6.4. Results

    Usability Test.

    Learn-ability: after observing a set of 5 subjects it became clear that more

    contextual information was required for the controls. One test subject commented

    that it is not immediately obvious what function some of the controls performed.

    Another issue was the openness of some of the controls, for example the zoom

    control can be set to any value the user wants and it is not immediately obvious

    how large that will make the icons.

    Efficiency: On an initial attempt with no instruction some users had difficulty

    working out what the controls did, however after a quick demonstration most

    could manipulate the data confidently.

    Memorability: After a day the users were asked to return to the interface and try

    out some basic tasks to see how easy it was to repeat. Most users achieved this

  • 52

    task successfully and the main issue seemed to be the initial usage of the

    interface.

    Errors: The most common errors the users made was to either compare

    parameters which were not suitable for any logical comparison and selecting

    scales that were caused excessive distortion of the graphics. The first issue is

    hard to rectify since the user can define any data source an assumption is made

    that they will pick resources suitable for comparison. The second issue can be

    rectified by the addition of stricter limits to the interface.

    Satisfaction: The overall satisfaction rating was 6 out of 10 from our 5 test

    subjects. There is evidently room for improvement on the interface however some

    of the test subjects had no prior knowledge of stock market trading and as such

    the overall purpose and context of the application was new to them.

    Speed-Accuracy Test.

    The results of the speed and accuracy test were more promising with 80% of the test

    cases the users speed of decision-making was faster than using a test only interface.

    The accuracy figures were however less conclusive, with both textural and graphical

    accuracy rates of 60%. We would expect the accuracy rates to be around or lower for

    the graphical representations because they are not as definite as numerical figures.

    The test group used could have been larger and more testing in this area is needed to

    make definite conclusions on the effectiveness of the interface, however the initial

    results tend to agree the hypothesis that a graphical system is better in terms of gaining

    quick insights into large sets of data.

    7. Conclusion The application displays a basic answer to the initial requirements albeit a simplified one

    but could easily be extended to give a wider range of functions, in its current state it

    demonstrates that by using the available web standards a flexible system can be

    developed which allows data to be retrieved transformed and represented on the web.

    Our test results prove that the system can effectively impart large amounts of

    information quickly to the viewer; however further work is required to improve the user

    interface, mainly in the areas of contextual information.

    To expand the system a user can easily add any content they like provided Dapper.net

    could extract it successfully. There are limitations to the data that can be viewed and the

    graphical icons that are displayed. Going forward it would be beneficial to provide

  • 53

    another interface, which allows users to create icons based on the retrieved data to

    create personalized graphical representations.

    8. Bibliography Cleveland, Williams S: Visualizing data, Murray Hill, N.J. : At&T Bell Laboratories ; [Summit, N.J. :

    Published by Hobart Press, c1993]

    Ellinger, A. G: The art of investment, 3rd rev. ed. Bowes and bowes, 1971

    Harris, Robert L: Information graphics: a comprehensive illustrated reference

    New York : Oxford University Pr


Recommended