November 2001

WORKSHOP ON SPATIAL DATA USABILITY

ABSTRACTS

WAGENINGEN URCENTRE FOR GEOINFORMATION

November 2001

UNDERSTANDING SPATIAL DATA USABILITY

Gary J. Hunter

Department of Geomatics, University of Melbourne, Victoria 3010, Australia

Email: [email protected]

ABSTRACT

In recent scientific literature, a number of researchers have made mention of a seemingly new characteristic of spatial data known as ‘usability’. This apparent property is also receiving mention in the data mining and knowledge discovery literature, so it would seem to be something that is not wholly restricted to the spatial domain and has much broader impact as well. While concepts such as the use and value, and diffusion of spatial information have been the subject of research since the late-1980s, the current references to usability clearly represent something that is both novel and different. Accordingly, the purposes of this paper are initially to understand what is meant by usability and to assess whether it is a significant concept worthy of more detailed scientific pursuit. If this is so, then the secondary aims of the paper are to identify the elements that comprise usability, and to consider what the related research questions might be and how an appropriate research agenda should be shaped.

Keywords: Spatial data, usability, research agenda.

1. INTRODUCTION

In the public announcement for this workshop on spatial data usability, mention was made of the well-known example that occurred almost 150 years ago when a London doctor, John Snow, combined spatial data relating to the locations of cholera deaths with the positions of water pumps in that city, to test his theory about the source and transmission of an outbreak of that deadly disease that killed 600 people in its first ten days (UCLA, 2001). That famous example is now taught to students worldwide in fields such as geography and epidemiology, and serves as a perfect example of how spatial data can be very effectively applied in critical situations.

Similarly, there have been more modern applications of spatial data which, although they do not have the same impact as Dr. Snow's work, have nevertheless proven to be extremely valuable. For instance, exploratory data analysis techniques have been used to locate previously unidentified cancer clusters (Openshaw et al., 1987), while the value-added linkage of electronic telephone directories with street mapping

2

products is causing people to replace their hardcopy “YellowPages” telephone directories with enhanced digital “YellowMap” alternatives.

On the other hand, a bold initiative of the 1970s to provide on-line interactive color maps of statistical data as part of the U.S. While House Information System (the Domestic Information Display System—DIDS), was completely abandoned within the space of a few short years (Cowen, 1982). Then, more recently there have been cases reported in the past two or three years where a lack of faith in the reliability of outputs from environmental models that employ spatial data has caused governments to abandon major projects—in essence because they are unwilling to proceed with their decision-making because of major concerns about the trustworthiness of the scientific information presented to them (Beven, 2000). Of course, there are many thousands of applications of spatial data that fall between these two extremes, but it is these particular cases at the outer limits that are both exciting and distressing, and are therefore deserving of closer scrutiny.

So there seems to be a common link between these examples, involving some fundamental characteristic that has resulted in these spatial data applications being either very successful or unsuccessful. It would appear the cases all demonstrate either a very high or very low degree of data ‘usefulness’ or ‘usability’, which in turn produces very positive or very negative economic, social, environmental or scientific impacts. Our interest here lies in knowing exactly what it is that distinguishes these cases from other, perhaps more mundane, examples. For instance, is low usability caused by a poor choice of data, models and algorithms for the given application, or is it simply a matter of bad data quality? Alternatively, is high usability proportional to the degree of ‘interestingness’ or ‘unexpectedness’ in the data (as data miners would say), or the result of data integration and value-adding? Or are these differences caused simply by some unpredictable, indescribable phenomenon that produces such extreme examples? At this stage we do not know, but given the very large expenditure of resources nowadays on the development of spatial information products, it would seem to be a goal worth pursuing to ensure that all spatial data is as ‘usable’ as possible. Clearly, with a better understanding of data usability we might be able to increase the number of ‘successes’ and reduce the ‘failures’ in the application of spatial data. Accordingly, this paper seeks to provide a better understanding of spatial data usability. Following this introduction, it examines the meaning of the concept within a technological setting in both non-spatial and spatial contexts, and then investigates what fundamental elements comprise usability. Finally, the paper focuses upon what the research questions associated with usability might be, and what priorities a future research agenda in this subject should adopt.

3

Usability –a scientific or a procedural issue for Geographical Information Science?

Peter A. Burrough, Utrecht University.

The organisers of this meeting have set up five questions on the theme of “usability”. The aim of this talk will be to discuss each of these questions in order to determine the usefulness to GI-practitioners of pursuing this topic and attempting to arrive at formal, reproducible and useful conclusions.

1. What do we mean by 'usability'? Collins Eng Dict (mid 90s) says: usable: able to be used. It gives as nouns: usability,

useability and usableness. This is different from useful: able to be used advantageously, beneficially, or for several purposes; helpful, serviceable: Noun: usefulness

Therefore usability is the degree with which an object or unit of information serves a defined purpose. This could be indicated on a binary scale or on a gradual scale – degrees of usefulness or usability.

Usability depends on context – something could be usable for one purpose and useless for another.

Usability may also depend on availability, formatting, coding, language, source, provenance, history, etc; many of these aspects may prevent a potentially useful object/piece of information from being used/useful.

Usability is different from quality/accuracy. These may determine usability, but may also be irrelevant if the context is wrong. For example, you may buy a new tyre for the car. The tyre is of high quality, but will not be usable if it does not fit the wheels.

2. Why is usability important? The degree of usability provides an indication to the user of the degree with which an

object/unit of information will serve its intended purpose.

3. What are the characteristics of spatial data usability? What are the characteristics of spatial data use? What do users need? How long is a piece of string?

4. What are the research problems to be solved in spatial data usability? Is this a scientific or a practical question? Can you derive usability from metadata? Can you define usability in metadata? How do you set up a generic means for determining usability for a wide (unlimited) range

of applications?

5. What should the research priorities be? Is there sufficient science in this to make research worthwhile? How much is common sense?

4

All these points will be discussed, mainly in the context of environmental modelling.

Services for Data Integration

Abstractfor the Workshop on Spatial Data Usability

November 19-20, 2001, Center for Geo-Information, Wageningen UR, The Netherlands

Catharina Riedemann & Werner KuhnInstitute for Geoinformatics

University of MünsterD-48149 Münster

GermanyTel.: +49 (0)251 83-31963/34707

Email: {riedemann,kuhn}@ifgi.uni-muenster.de

Data users are interested in the information necessary to solve problems and make decisions. Data handling and processing are not their intention. Therefore, we call data usable that easily reveal inherent information without demanding technical expertise. This is usually done by wrapping data into services, which give direct access to the relevant information.

A major problem with the exploitation of geospatial data is that the needed information can often only be obtained by combining various sources (e. g. finding locations for industrial plants requires topographic, infrastructure, environmental, and demographic data). Creating new datasets today mostly does this integration of existing data, but among other things these additional datasets are difficult to update when the original data change. We seek a way to integrate data just-in-time by wrapping them in suitable services which immediately give the answer to a user’s question instead of producing a new dataset. This will eliminate the update problem, because at any time the most current data are accessed.

We envision an environment with data and service providers, where services can be coupled with data on demand to form a “wrapped object” exposing the desired information. The platform is the Internet. Promising technologies are available and under development that help to build the necessary infrastructure, among them the Web Services Description Language (WSDL), Universal Description, Discovery and Integration (UDDI), and the Simple Object Access Protocol (SOAP). The challenge is to ensure that only permissible operations are performed on the data. This will need research concerning the exposure and evaluation of data and operation semantics.

We examine case studies to gain crisp problem statements and test research results. In the end, we are interested in how the notion, the handling, and the results of data integration change and influence the future use of existing datasets.

5

Shared Earth System Models for the Dutch Subsurface

J.D. VAN WEES, R. VERSSEPUT, H.J. SIMMELINK, R. ALLARD, H. PAGNIERNetherlands Institute of Applied Geoscience TNO- National Geological Survey,

P.O. Box 6012, 2600JA Delft, The [email protected]

One of TNO-NITG’s primary missions is to map the Deep Subsurface of the Netherlands, up to a depth of approximately 5 000 m, at a scale of 1:250 000. Over the years, the institute has compiled a wealth of such mapping data and stored it in digital form. The horizons mapped with these data were used to construct a volumetrically consistent stratigraphic model: the three-dimensional atlas of the deep subsurface of the Netherlands. These data and parameters – which include borehole and locations of seismic interpretations, horizons, fault lines and sub-crop lines – are now available for digital distribution.

As part of the work done on the atlas, TNO-NITG developed a dissemination and visualisation system for a broader public. This system will be available to all interested parties, free of charge, starting in the spring of 2001. All they need is a computer with an Internet connection to start the HTML navigator page designed by TNO-NITG. Users can select data on any given part of the Netherlands; the system will then copy all of the relevant data from the database into a zip file and send it to them by e-mail.

TNO-NITG also designed a three-dimensional viewer in Java 3D, a program that runs on any computer, so that these data could be visualised. Users can adjust it to their computer’s speed. The viewer is available free of charge for interactively viewing the digital atlas of the Dutch subsurface. The atlas can and will be further expanded to include fault structures and property models, among other things, in three dimensions. Moreover, the dissemination system and viewer provide a generic means of integrating data from other disciplines and sources.

The digital atlas provides a great deal of added value compared to standard, static information media such as maps and profiles. For instance, the system architecture makes it possible to distribute customised data and keep these constantly updated in response to the needs of society and the market. Another benefit is that users can interactively determine what part of the subsurface they want to see depicted and on what scale. Interested parties can also choose to look at a two-dimensional cross-section of any given area of the subsurface. They can peel off, as it were, layers from the volumetric model to get a clearer picture of the interconnection between the layers.

For all these reasons, spatial underground models that integrate knowledge and their visualisation will become increasingly important for obtaining greater insight into the complex geological structure of the subsurface. And that insight is crucial for the spatial planning of the deep subsurface, especially now that more intensive uses for it, such as gas and CO2 storage and geothermal energy generation, are being discussed. Seen in that light, this recently developed dissemination and visualisation system is a valuable policy instrument.

6

Analysing uncertainty propagation in GIS: why is it not that simple?

Gerard B.M. HeuvelinkInstitute for Biodiversity and Ecosystem Dynamics

Universiteit van AmsterdamNieuwe Achtergracht 166

1018 WV AmsterdamThe Netherlands

E-mail: [email protected]

Attention for spatial accuracy assessment and uncertainty propagation in GIS has been with us ever since the introduction of Geographical Information Systems in the 1980s. Over the past 20 years, an impressive number of scientific articles have addressed the issue of uncertainty and error in relation to GIS. But have we made much progress? Speaking for uncertainty propagation in GIS, most of us would agree that the answer to this question should be a. We now have a number of techniques that enable us to track how error propagates in GIS operations. Arguably the most appealing technique – because it is intuitively clear, easily implemented and generally applicable – is Monte Carlo simulation. Given uncertain input to a GIS operation, the idea behind this method is to draw a realisation from the input probability distribution, submit it to the operation, compute and store the result, and to repeat this procedure many times, so that the collection of outputs approximates the true output probability distribution. Indeed, the Monte Carlo method is a straightforward and effective technique. The method is computationally demanding, but nowadays this can hardly be considered a serious drawback. In other words, have we not solved the problem of uncertainty propagation in GIS? Alas, we have not. There still are a number of fundamental problems to be resolved before we may expect to see standard GIS equipped with a universal ‘error propagation button’. In this presentation I discuss some of these problems. Not necessarily the most important ones, but rather those that I find the most interesting. These problems have to do with difficulties in the assessment of input error, difficulties in representing uncertain spatial attributes in conventional GIS databases, the ever-present mistiness about what uncertainty really is and problems concerning the scale or support of the model entities. Although the presentation is likely to generate more questions than it will provide answers, some suggestions for going about the problems raised will be proposed. Shall we see a GIS that is completely tuned to uncertain spatial attributes in the year 2010? Given the many fundamental problems to be resolved, it seems to me that this very much depends on our ability to launch a large-scale concerted initiative in this direction.

7

Why do we use geographic data so ineffectively?

Itzhak BenensonDepartment of Geography and Human Environment, University Tel-Aviv

[email protected]

Huge geographic databases at different resolutions have been constructed during the last decade and continue to grow intensively in extends, resolution, and layers of information. Surprisingly, the usage of the geographic data remains far beyond this progress. Planning, decision-making, marketing, and locating procedures are still based on the aggregate information and geographic data at fine resolution remain unemployed.

The poor use of data is characteristic not for politicians and managers only, but also for researchers. For example, the exceptional GIS of the Israeli population census of 1995, which contains multiple data (including salaried income!) on all households in the country, and is geo-referenced at the level of separate buildings, is explored by a couple of researches only. The public use of geo-data is not much better. As the analysis of the Internet GIS services demonstrates, existing location and way-finding engines are used much below their capacity.

If so, the problem might be not in the data and methods, but in human cognition of the geo-information. The initial analysis makes it possible to assume that the problem is in artificial character of the basic GIS navigation and presentation operations, such as zoom, pan, and thematic mapping, which do not allow viewing of geo-data at varying resolution, which depends on distance to objects and, thus, do not fit to human perception of space.

The current presentation and navigation tools can be modified. Several approaches that fit to human cognition and are based on 3D visualization can be proposed. Three-dimensional visualization enables human-friendly presentation of the information, including varying resolution within one map window and continuous navigation. I review recent advances in these 3D approaches and propose the ways to make them a part of standard GIS GUI.

8

mailto:[email protected]

Is interestingness an indicator of data usability?

Monica WachowiczCentre for Geo-Information

Wageningen URE-mail: [email protected]

Data mining consists of a variety of techniques and tools used for the discovery of previously unknown, valid, potentially useful, and understandable patterns in very large databases. These techniques differ in the types of data they can mine and the kinds of knowledge representation they use to convey the discovered patterns. Text, multimedia and geographic data are some examples of the forms of data being used for mining patterns. These patterns can in turn be represented as classification rules, association rules, clusters, sequential patterns, time series, contingency tables and concept hierarchies. However, the number of patterns being generated is often very large, and only a few of these patterns are likely to be of interest to the user or an organisation. To address this problem, researchers have been working on defining various measures of pattern 'interestingness'.

One approach to determining interestingness is to define it in objective terms, where interestingness is measured in terms of its structure and the type of data used in the mining task. The measures used to achieve this capture the statistical strength of a pattern, such as the amounts of 'confidence' and 'support' for a rule. An alternative approach is to define subjective measures of interestingness that do not depend solely on the statistical strength of a pattern, but also on the opinion of the user who examines the pattern. Subjective measures are based upon user beliefs or biases regarding relationships in the data. Some examples of subjective interestingness measures are 'unexpectedness' (a pattern is interesting if it is surprising to a user) and 'actionability' (a pattern is interesting if the user can act on it to his/her advantage).

Interestingness measures have already proven to be successful in reducing the number of rules that need to be considered in data mining tasks. As such, this paper proposes another potential application of interestingness to determine 'usability' (the state or quality of being useful) and utility ('the degree of usefulness') of very large data sets. Ultimately, different interestingness measures and their relationships could provide users with knowledge about patterns in the data, as opposed to knowledge about the data itself (for example, metadata).

The paper will address the following questions:

- Can interestingness measures be used for determining the usability and utility of very large data sets?

- Once we have determined an interestingness measure of a pattern of a particular data set, how can we search for patterns in other similar data sets using the same measure? Will this approach be a way to determine the usefulness and utility of similar data sets?

- When data sets change over time, patterns also keep changing with the data. How an update to a data set will affect the definition of interestingness measures? How changes can be incorporated into interestingness measures? Would it be possible to make use of the knowledge about emerging, sequential, and constraint-based frequent patterns for defining usefulness and utility over time?

9


Data in light of the GNOSTICAL THEORY OF SPATIAL UNCERTAIN DATA

Dr. Karel SEVCIK, Australia, [email protected]

This paper is focused on quantitative data. Any individual datum is a product of defined processes called quantification. Quantification is studied by the theory of measurement and in practice usually applied as measurement or counting. The quantification process does not define just the value of an individual datum, but also its mathematical structure. The structure precisely determines datum's numeric, geometric and algebraic properties. Properties of sets of such data are then obvious and impossible composition of data of different structures is a matter-of-course. This approach to an individual datum (accepted in relativistic paradigm: e.g. special relativity, theory of measurement, gnostical theory of spatial uncertain data, etc.) is in sharp contrast with today still much predominant statistical model of random selection of an individual datum from some basket of theoretical distribution (paradigm of Newtonian mechanics, Euclidean geometry and statistics).

Each measured quantity has its ideal value (e.g. concentration of gold in a sample). However, no measurement is absolutely precise. Therefore, repeated measurements produce different results. This difference is caused by influence of uncertainty. Because these two components have the same group structure, they project into the same numerical result – an individual uncertain datum. Quantification process producing data can be described in 2D plane, where one axis represents the ideal value and the second axis the uncertainty.

Analysis of data structure and mainly analysis of physical interpretation of its geometric properties results in new generation of information theory. It describes quantification process in detail. Its fundamental discovery is in definition, properties and interpretations of an estimation process as a counterpart of quantification. This complete information theory by P. Kovanic is called the Gnostical theory of uncertain data (see references). The GTUD offers very powerful, easy programmable, robust and universal tools for processing of any sets of quantitative data (even small and highly disturbed). Data are treated regardless of shape of their distribution, because the only condition for validity of estimates is defined structure of processed data.

Space has its structure well known from geometry and physics. Spatial / temporal structures are identical with data structures and structures of quantification and estimation processes. Existence of the Gnostical Theory of Spatial Uncertain Data is just logical consequence.

The above mentioned paradigm puts the first proposed question of data usability in a new light. Each individual datum carries information on property, for which its measurement method was designed. Quality of measurement determines damage of data information caused by various amounts of uncertainty. However, influence of uncertainty is partially compensated by estimation process (compensation of uncertainty is always bellow 100% as it follows from the Law of thermodynamics). If the quantification process is in order, consequently, data are in order. There is no question of unsuitable data. (Expect of obvious errors not related to information theory - GIGA systems)

The above turns question of data usability to question of suitability of processing methods. The reason is obvious. If processing method does not respect data structure or fundamental natural laws (or both), its results are more or less false. Development of relativistic paradigm and related individual mathematical, physical and information theories pinpointed many mistakes of previous paradigms. The most common mistakes are e.g.:

(1) random distribution of quantitative data (data are individual products of quantification);

(2) composition of squares of error as a measure of uncertainty of a set (uncertainty is part of a datum and its measure is cos or cosh function according to geometry of given process, i.e. like power factor in physics);

10

(3) composition of squares of data errors into semivariogram (spatial weights depend on location of individual estimate points; information and spatial properties of data are strictly separated);

(4) Estimation of "in-average-all-sample" minimum variability (real errors might be much smaller, local disturbances "make" wide results doubtful).

(5) Negative weights of kriging contradict the Law of thermodynamics (weight relates to entropy).

Spatial data are usable, if the data information structure and structure of sampling space are known and quantitative. However, such data still need not satisfy purpose of investigation for which they are used. Although GTSUD extracts maximum of information contained in data, it cannot violate the Law of thermodynamics and "create" information. Other questions include sampling density and pattern, and quality of data values. However, density and pattern of sampling depends on purpose of data and scale of required result (definition from semivariogram is in this view irrelevant). Quality of data relates to applied analytical methods and belongs to other professions rather than to information theory. Data quality can be always improved by better sampling and/or analytical design, what not always means higher cost.

Present practical processing of spatial data is troubled much more by unrealistic expectations and trivial mistakes in data collection, analysis and processing, than by serious theoretical problems. Insufficiency of statistical approaches for many tasks need not be discussed. More advanced paradigm is also known (GTUD is published for almost 20 years, special relativity and Riemannian algebra about a century).

Research priorities are thus set by common scientific paradigm.

References:

Kovanic P. (1984): Gnostical Theory of Individual Data, Problems of control and information theory, Vol. 13(4), pp. 259-274.

Kovanic P. (1984): Gnostical Theory of Small Samples of Real Data, Problems of control and information theory, Vol. 13(5), pp. 303-319.

Kovanic P. (1984): On Relations between Information and Physics, Problems of control and information theory, Vol. 13(6), pp. 383-399.

Kovanic P. (1986): A New Theoretical and Algorithmical Basis for Estimation, Identification and Control, Automatica, Vol. 22, No. 6, pp. 657-674.

SUMMARY OF GNOSTICAL THEORY OF SPATIAL UNCERTAIN DATA

Dr. Karel SEVCIK, Australia, [email protected]

Knowledge of spatial properties of studied variables is fundamental for many scientific fields, from geology to for example engineering, information technology, automation, economy, and particularly for artificial intelligence and artificial sensing (vision).

Although Theory of Regionalized Variable (geostatistics) represents significant experiment, until now, no modern scientific approach to these estimation problems has existed. Gnostical Theory of Spatial Uncertain Data (GTSUD or perhaps "geognostics"), the principles of which are summarized in this abstract constitutes a new generation of approaches to quantitative spatial data. Some of the main results given by GTSUD are shown and critically compared with classical methods of geostatistics.

GTSUD grows from mathematical properties of space and numbers. Each individual datum carries complete information. It is considered a unique individual object. Spatial datum is composed of two separate parts: its uncertain value and spatial location. Each part must have structure of a

11

quantitative numerical group. Kind of structure of the group completely determines data model and space model. Consequently, GTSUD is applicable to any quantitative data (measured or counted). There is no assumption on data distribution or their spatial properties like stationarity, homogeneity, trend, etc.

Natural consequence of existence of information uncertainty (i.e. a difference between uncertain information value of a datum (e.g. measured ore concentration) and its ideal value) is a pair of information characteristics: information weight and information irrelevance. Because space (or time) is also quantitative variable, existence of spatial uncertainty (difference between location of a datum (sample) and location of an estimate) naturally results in existence of a pair of spatial characteristics: spatial weight and spatial irrelevance. Each individual spatial uncertain datum possesses these four characteristics regardless of other properties (e.g. geostatistical). There is no relationship between information and spatial characteristics (e.g. no need for stationarity, homogeneity or model distribution).

Squares of weight and irrelevance have direct physical interpretations in growth of entropy and loss of information. Entropy and information form two mutually compensating fields. The mentioned functions result in definition of two distribution functions of an individual spatial uncertain datum, one in information structure, second for space. Interpretation of all the above mentioned functions is completely isomorphic with interpretations of corresponding characteristics in the Special Theory of Relativity and significant correspondence with quantum mechanics was also shown in literature.

Proven additive composition of information weight and information irrelevance results in two kinds of distribution functions: global distribution (GDF) and local distribution (LDF). Although spatial weight and spatial irrelevance are also additive, they are not used in estimation of "spatial distribution", but in contrast, they serve for optimization of distribution estimates of the observed variable at a point of an estimate.

Global distribution function is very robust and describes data as one cluster (homogeneous). It has no general statistical counterpart. Field of GDF-estimate over studied space is always unimodal, but need not be continuous and partially need not exist at all. If data are not homogeneous in their value in some area, this estimate simply does not exist. Practical consequence is: (1) in protection of the estimate against influence of inhomogeneity like e.g. nugget effect and consequent extreme robustness; and (2) in detection of spatial discontinuity in values like e.g. faults or different geochemical units. There is no need for any kind of test of existence of GDF, because it always does not exist, if at least one point of its derivative (the data density) is negative (general probabilistic definition of distribution function).

Local distribution function is infinitely flexible and thus it could describe multimodal data. Its statistical counterpart could be found in Parzen's kernels. Practical consequence is: (1) in separation of different objects like e.g. one map for main concentration field and separate maps of nuggets, pollution, leached zones in a single estimate; and (2) in detection of spatial discontinuity in values like e.g. faults or different geochemical units. There is also no need for testing, because this estimate always exists.

Quality of estimates is measured by growth of entropy and loss of information, what guarantees best possible results. GTSUD extracts maximum information from data, but cannot "make" more information, than information contained in data.

GTUSD produces simple, universal and strictly logic algorithms easily programmable for computers. Such programs are applicable to any data without need for special knowledge or human interference, like "art-of-geostatistics". Properties of GTSUD protect applications from production of mistaken results (e.g. if data are inhomogeneous in their values at one point, there is preferred no result for that point against wrong global estimate, while local estimate always exists, but might have more than one value).

12

Opportunities and limitations of sharing spatial data

Elzbieta BieleckaInstitute of Geodesy and Cartography

Warsaw, [email protected]

Nowadays, there is a rapid growth of the availability of digital spatial data and a growing need to use it for all kinds of GIS applications and to support decision-making process. The development of communication technology makes possible to collect datasets from variety of sources and different types of application. Data providers make data available to users via Internet. It seems to be a lot of databases, datasets, and other geographical information like satellite imagery, aerial photographs, and maps in the digital and analogue forms. It also becomes possible for every user to share some spatial data, and not to collect them from very beginning. Sharing data requires, first of all wide information about scope of data, and the place where they are stored, furthermore translation from original source of data into the user’s system and adaptation to specific GIS applications. The understanding of the quality of the data in order to conscious applications is of utmost importance. It becomes the limitation of data sharing and data usability.

GIS allows mixing data of different origin and accuracy, this leads to assume that all objects in GIS databases are of different quality. The digital systems are capable of processing data more precisely than the analog ones, but their final accuracy still depends on the accuracy of their source data. The final accuracy depends on the qualities of the original input data and on the precision with witch input data are processed.

Data usability means the effectiveness, efficiency, and satisfaction with which users can achieve goals using the data in a particular environment. As users requirements are very diversified usability is relative and can be only determined in the context of use. The users always define usability for certain applications to fulfill goal of GI system. For most of users usability means that data have to be in form that can be handled with the tools that user posses. Users need to know what data are in the database and what are their characteristics.

The same data should be shared by different applications that is why one of the most important question is ”how the data should be stored, maintained and make available for users?” in order to be usable.

Three conditions are important and necessary to enable high data usability:

1) Methodological – methodology for building GIS application (conceptual, logical and physical project, object definitions, data dictionary), metadata, standards for data recording, storing and transferring, available and clear documentation for further development, reference system and reference data, data integration and harmonisation.

2) Organizational – rules for sharing data, agreement between the actors of spatial data sharing.

3) Technical – computer communication technology, openness in software and hardware, interoperability.

Request for spatial data are very high both in public and private sectors in Poland. The most useful data are that collected in the National Land Information System, literally: land and building register, boundaries of administrative units, cadastral and utilities maps. These data have the best usability as they are always stored in the similar way and the context of information is always the same, the responsibility, quality and all necessary information about data are well known. The LIS databases can be easily incorporated and integrated with other system.

13


Selected topographical data (transport network, hydrography, settlements), DTM and land use data are also concerned. For the sake of lack of commonly access to digital forms of these data they have rather poor usability. The data are collected and maintained mostly by private companies, seldom up-dated, poor documented. Metadata system for geographic data to all intents and purposes does not exist. Sometimes data provider have data description in written or digital forms sometimes only mentally in their minds. Data are difficult to access and rather expensive.

The other limitations for sharing data by GIS applications in Poland are as follows:

inadequate technical infrastructure;

too many co-ordinate systems defined in zones that disable creating seamless databases;

lack of accepted standards for data recording, storing and transferring;

not cleared rules of sharing data;

Only recently increased efforts have been put into making data more usable. At the central level of administration some essential decisions were made concerning: project NSDI, topographical database, creation of database for general data, standardization. At regional level GIS systems are build with the topographic data as the reference data. However the GIS environmental are different, regional databases are harmonized at the conceptual level.

14

Usability issues in information visualization applications

Carla M. D. S. Freitas 1 , Marco A. Winckler 2 , Paulo R.G. Luzzardi 1 , Luciana P. Nedel 1 and Marcelo S. Pimenta 1

1 Universidade Federal do Rio Grande do Sul, Instituto de InformáticaCaixa Postal 15064 91501-970 Porto Alegre, RS, Brazil

2 LIHS Université Toulouse 1 Place Anatole France31042 Toulouse France

In the last few years the increasing volume of information provided by several applications, different instruments and mainly the Web has lead to the development of techniques for selecting among the bulk of data the subset of information that is relevant for a particular goal or need. Research on visual query systems, data mining and interactive visualization techniques has resulted in a wide variety of visual presentation and interaction techniques that can be applied in different situations. However, although there is a great variety of models and techniques for information visualization [1], each application requires a particular study in order to determine if the selected technique is useful and usable. This study is usually guided by the type of data that should be represented and the user tasks or analysis process that the visualization should help or support. Previous work at our research group [2, 3, 4] has resulted in a classification of either data categories and visual representations that provides a conceptual framework for developing new techniques [5,6]. During these projects it has become evident that we can not separate the visual aspects of both data representation and interface issues from the interaction mechanisms to help a user to browse and query the data set through its visual representation. Moreover, our experience confirms that evaluating these two aspects is an important issue that must be addressed with different approaches including, of course, empirical tests with users, which have proved that users often have their own analysis tools and are not aware of the benefits of visualization techniques.

We separate usability issues in three main categories: visual representation usability, tool's usability and data usability. Developing an application where visual data representation is the basis for interaction imposes answering the following questions. Does visualization techniques' usability affect data usability? How do we separate visual representation and interaction aspects that affect tool's usability from modeling aspects that clearly affect data usability?

Our approach is to link interface usability knowledge with evaluation of the expressiveness, semantic content, and interaction facilities of visualization techniques. Classical techniques employed for evaluating user interfaces for example, usability inspection methods and user testing, are being investigated to select an adequate framework for a methodology of usability testing at all the three levels mentioned above. At present we have empirical evidences collected from case studies suggesting that we can distinguish these three categories.

References[1] Card, S.K.; Mackinlay, J.D. and Shneiderman, B. (eds.) Readings in Information Visualization - usingVisualization to Think. San Francisco, Morgan Kaufmann, 1999.[2] Manssour, I., Freitas, C.M.D.S., Claudio, D.M. and Wagner, F.R. Visualizing and Exploring Meteorological Data Using a Tool-Oriented Approach. In: Earnshaw, R., Vonce, J. and Jone, H. (eds.) Visualization and Modeling, Cambridge, Academic Press, 1995. pp. 47-62.[3] Freitas, C.M.D.S, Basso, K., Drehmer, M., Oliveira, J.B., Hofmann, L.S. and Freitas, T.R.O. Visualizing dolphins'behavior in a limited area. Unpublished case study. 1999.[4] Basso, K. and Freitas, C.M.D.S. Visualization of geological prospecting data. In: International Symposium on Computer Graphics, Image Processing, and Vision Proceedings. Rio de Janeiro, Brazil, 1998. pp. 142-149.[5] Manssour, I.H.; Furuie, S, Nedel, L. P., Freitas, C. M.D. S. A Framework to Visualize and Interact with Multimodal Medical Images. In: International Workshop on Volume Graphics 2001. Stony Brook, New York. IEEE Computer Society.[6] Cava, R.A. and Freitas, C.M.D.S. Visualizing Hierarchies using a Modified Focus+Context Technique. In: IEEE Information Visualization 2001. Late Break Hot Topics Proceedings. (Contribution accepted as interactiveposter).

15

Users perception of spatial data usability.

Arnold BregtWageningen UR,

Centre for Geo-information

Spatial data usability is a complex issue. What are the key factors that determine spatial data usability? How do different users judge the usability of spatial data? What is the best way to present spatial data in order to make a user assessment possible of its usability? Answers to the above raised questions are not easy. Clearly the specific demand of a user at a certain moment in space and time pays a crucial role, but also the characteristics and accessibility of spatial data are important.

In order to get some notion of the users perception on spatial data usability a limited survey was held among 40 persons. Each person was asked to classify its own level of knowledge as a spatial data expert or spatial data amateur. 20 Persons are classified as spatial data experts and 20 as amateurs. Next, the question “What makes spatial data usable for you?” was asked to all persons. The time for responding was limited, instant reactions on this question were recorded. In the presentation the results of this survey will be presented.

Although, limited in scope and scale, the survey clearly indicates some key aspects of spatial data usability.

16

Data usability for operational modelling in British forestry

Juan C. SuárezSilv (N) Forest Research

Roslin, [email protected]

Forestry in GB is evolving a multi-purpose role in which concerns over the environment and the provision of recreation match the more traditional requirements of timber production. There is, therefore, a need to realign research outputs to meet these new demands through the provision of specialised tools.

Models are a method for encoding knowledge in order to address the volume, complexity and uncertainty in our understanding of natural processes. Therefore, modelling can provide one of the most effective methods for technology transfer of research. It also provides forest managers with decision-making tools. The British Forest Research Agency is developing the CoreModel programme as a method for integrating models in a multimodel structure that gravitates around the use of a Process Based Model of tree growth and the use of an Object-Oriented design architecture, reinforced by the addition of GIS capabilities.

Models frequently offer limited adaptability in their predictions when confronted by new situations and applications (e.g. different species composition, irregular stand structures). Constant changes in the problem domain require existing models to be adaptable to new situations and still offer sensible predictions. In model integration, up-scaling and down-scaling operations are limited by the availability of data at different spatial and temporal scales. In forestry, each level operates at different spatial and temporal scales that vary from a few minutes or hours affecting individual leaves (physiological models) to thousands of years affecting an entire forest over hundreds or thousands of square kilometres (forest succession models).

There are different approaches for harmonising data scales when integrating models. Aggregation is generally perceived as computationally intensive and the effort of running an entire successional sequence certainly intimidating. Alternatively, the use of proportional multipliers can be an optimal solution when contrasted with empirical datasets. However, the use of multipliers may be constrained by the absence of mechanistic cause-and-effect information that could be useful in predicting the same response across different scales. Data modelling techniques like data trends is used in those situations where data are highly correlated to baseline data (e.g. temperature and elevation). In other situations, the use of intermediate models can be used to create the data inputs required in other models. This is a process not exempt from uncertainties difficult to quantify in the absence of validation data. Natural processes random in appearance may be pose a limit in our capabilities for data modelling or data usability. Gap models in forestry describe forest structure in terms of the competition for nutrients between trees. Nevertheless, they are limited in terms of describing rate of survival when the stand is affected by abiotic hazards like wind or snow.

A case of model integration within the CoreModel framework is ForestGALES, a wind risk model for forestry plantations. One of the main problems in the prediction of wind damage is the level of aggregation of the crop characteristics as depicted in the current datasets. Stocking density, tree height and tree diameter, are referred to different fields as a unique value representing each forest stand in the component table of the Forestry Commission Sub-Compartment Database. Montecarlo methods and a height-diameter distribution have been applied to create a spatio-temporal dimension of stand variability. A second approach is the use of a commercial Lidar sensor to map spatial variability within the polygons representing each forest stand. The second approach may be used to the analysis of locational effects not considered originally by the model.

17


Fitness for purpose as a component of data usability

Sytze de BruinCentre for Geo-Information

Wageningen URThe Netherlands

[email protected]

At the onset of the workshop, there might be as many ideas about what ‘data usability’ means as there are participants. Possibly, we will end up with a list of components that all contribute to the concept. My intended contribution to that list would be its interpretation in terms of ‘fitness for use’. I will deal with assessment of fitness for use in cases where we know how data are to be used, but we are uncertain as to whether or not a particular data set suits the intended objective. I will briefly discuss three case studies that demonstrate a decision analytical method for assessing the expected utility of data and address the data requirements of such approach.

In the first case study, fitness for use is determined by the uncertainty in the data set and by the risk of undesirable consequences when making decisions based on that data. Here, the utility of the data set lies in its ability to control the probability of adverse consequences. Another aspect highlighted in this case study is that of spatial variability of uncertainty.

In the second case study, two candidate data sets provide information about a process that can be considered stochastic. The data sets will not be able to control this process, but rather provide information on its unknown realisation. Again, the expected utility of the information can be assessed before actually using the data sets.

Finally, the operational practicability of the decision analytical approach adopted in the first two case studies is discussed in a setting where the expected benefits and risks of decision consequences are unknown and where a proper model of data uncertainty cannot be specified.

18


Estimating the usability of old information

Anders ÖstmanLuleå University of Technology

Department of Environmental EngineeringSE-971 87 Luleå

SWEDENEmail: [email protected]

Abstract

Several user surveys indicates that having current data is a very important for the usefulness of a certain dataset . In the standard proposals of today, currentness is specified by a date stamp in the lineage section. Although the clear importance of the aging factor, very few attempts have been made to find a theoretical framework for managing incurrent data.

Examples of such framework are the statistical reliability theories and maintenance engineering. Using these theories, it is possible to estimate the reduced usability of a certain dataset as well as an optimal data maintenance program.

In this paper, reliability theories are reviewed in the light of geospatial data maintenance and usage. It is concluded that we often lack important information, to be able to estimate the aging effect with high precision. This information has so far not been intended to be a part of the quality specifications as defined by ISO. However, the statistical theories reviewed in this paper, provides a solid fundament for further research activities.

19


Robustness in Spatial Analysis

D. Josselin, THEMA, CNRS, France

Key-words : Robustness, Spatial Analysis, Data Quality, Statistical Tools Efficiency,Expert Approach, ESDA

Abstract

We propose to discuss about the relationship between robustness and spatial analysis. Is robustness important in spatial analysis? How it can induce skewed knowledge and decisions? Is there different forms of robustness and at which level of the spatial analysis process? Indeed, we'll try to find out at which stage of the spatial analysis process these questions may be pertinent, and if it is possible to improve robustness in contexts of spatial decision support.

More precisely, we'll divide our text in four parts.

First, we'll give a global definition of the notion of robustness. We'll try to show why it is so consequential to take it into account in spatial analysis in order to help actors to make decisions keeping in mind the information reliability along its processing. This will be highlighted using several examples and applications.

In a second part, we'll present three different aspects of « robustness » (in a large meaning) related to different levels in spatial analysis:- the data : their « quality » (notions of accuracy, pertinence, completeness, notably),- the statistical tools to qualify and quantify a spatial phenomenon by exploring these data : the

statistical tools « efficiency » (resistance and robustness),- the way expert investigates its data to extract relevant information : the expert « approaches »

(global vs local ; exploratory vs confirmatory).

In a third part, we'll present three propositions for robustness improvement :- at the data level : a series of maps to make relative the data quality (application to French agricultural flows and related spatial partitioning),- at the statistical level : the example of robust estimations of the central value of a statistical

distribution (for instance : the « meadian », a robust estimator built on mean and median compared to different robust M-estimators),

- at the expert level : different ways (proposistions !) to model and explore a spatial phenomenon by coupling local and global analysis ;

Finally, we conclude on discussing about a global framework able to enhance robustness and to provide to the expert different complementary keys to improve its spatial analysis.

20

Date post:	19-Nov-2014
Category:	Documents
Upload:	tommy96
View:	784 times
Download:	2 times

November 2001

Documents