IAEA-TECDOC-472
GEOLOGICAL DATAINTEGRATION TECHNIQUES
PROCEEDINGS OF A TECHNICAL COMMITTEE MEETINGORGANIZED BY THE
INTERNATIONAL ATOMIC ENERGY AGENCYAND HELD IN VIENNA, 13-17 OCTOBER 1986
A TECHNICAL DOCUMENT ISSUED BY THEINTERNATIONAL ATOMIC ENERGY AGENCY, VIENNA, 1988
PLEASE BE AWARE THATALL OF THE MISSING PAGES IN THIS DOCUMENT
WERE ORIGINALLY BLANK
GEOLOGICAL DATA INTEGRATION TECHNIQUESIAEA, VIENNA, 1988IAEA-TECDOC-472
Printed by the IAEA in AustriaSeptember 1988
The IAEA does not normally maintain stocks of reports in this series.However, microfiche copies of these reports can be obtained from
IN IS ClearinghouseInternational Atomic Energy AgencyWagramerstrasse 5P.O. Box 100A-1400 Vienna, Austria
Orders should be accompanied by prepayment of Austrian Schillings 100,in the form of a cheque or in the form of IAEA microfiche service couponswhich may be ordered separately from the I MIS Clearinghouse.
FOREWORD
The objectives of this Technical Committee are to bring together currentknowledge on geological data handling and analysis technologies as developedin the mineral and petroleum industries for geological, geophysical,geochemical and remote sensing data that can be applied to uranium explora-tion and resource appraisal.
Tue recommendation for work on this topic was first made at the meetingof the NEA-IAEA Joint Group of Experts on R & D in Uranium Exploration Tech-niques (Paris, May 1984). In their report, processing of integrated data setswas considered to be extremely important in view of the very extensive datasets b u i l t up over the recent years by large uranium reconnaissanceprogrammes. A Technical Committee Meeting was convened in Vienna in October1986 in order to provide a forum for the expression of new techniques andconcepts in geological data integration techniques.
With the development of large, mult id i c i p l i nary data sets which includesgeochemical, geophysical, geological and remote sensing data, the a b i l i t y ofthe geologist to e a s i l y interpret large volumes of information has beenlargely the result of developments in the field of computer science in thepast decade. Advances in data management systems, image processing software,the size and speed of computer systems and s i g n i f i c a n t l y reduced processingcosts have made large data set integration and analysis practical and afford-able. The combined signatures which can be obtained from the different typesof data s i g n i f i c a n t l y enhance the geologists a b i l i t y to interpret fundamentalgeological properties thereby improving the chances of finding a s i g n i f i c a n tore body.
This volume is the product of one of a number of activities related touranium geology and exploration during the past few years with the intent ofb r i n g i n g new technologies and exploration techniques to the IAEA MemberStates.
The Scientific Secretary of the Technical Committee is Mr. D.W. McCarnof the Nuclear Materials and Fuel Cycle Technology Section of the IAEA.
EDITORIAL NOTE
In preparing this material for the press, staff of the International Atomic Energy Agencyhave mounted and paginated the original manuscripts as submitted by the authors and givensome attention to the presentation.
The views expressed in the papers, the statements made and the general style adopted arethe responsibility of the named authors. The views do not necessarily reflect those of the govern-ments of the Member States or organizations under whose auspices the manuscripts were produced.
The use in this book of particular designations of countries or territories does not imply anyjudgement by the publisher, the IAEA, as to the legal status of such countries or territories, oftheir authorities and institutions or of the delimitation of their boundaries.
The mention of specific companies or of their products or brand names does not imply anyendorsement or recommendation on the part of the IAEA.
Authors are themselves responsible for obtaining the necessary permission to reproducecopyright material from other sources.
CONTENTS
GEOLOGICAL DATABASE MANAGEMENT SYSTEMS
Experiences with database management systems for regional reconnaissance and mineralresource data ................................................................................................ 9F. Wurzer, H. Kürzt
Adaptation of the relational model to large volume geoscience databases ......................... 19M.T. Holroyd
SAMINDABA — A South African mineral deposits database ....................................... 33S.S. Hine, E.C.I. Hammerbeck
GEOSIS — A pilot study of a geoscience spatial information system .............................. 51A.L. Currie
INDUGEO — Indian uranium geological database ..................................................... 61K.N. Nagaraj, P. Srinivasa Murthy, S.G. Tewari
AIRBORNE RADIOMETRIC INTERPRETATION AND INTEGRATION
Analysis of airborne gamma ray spectrometric data from the volcanic region in the easternpart of China ................................................................................................ 73Shwcin Zhao
Data enhancement techniques for airborne gamma ray spectrometric data ........................ 87S.G. Tewari, K.N. Nagaraja, N.V. Surya Kumar
Integration of Landsat MSS imagery with aeromagnetic and airborne spectrometric dataover a part of Mozambique .............................................................................. 117G.R. Garrard
Integrated remote sensing approach to uranium exploration in India ............................... 131N.V.A.S. Perumal, S.N. Kak, V.J. Katti
Uranium exploration in the Precambrian of West Greenland using integrated gammaspectrometry and drainage geochemistry (Summary) ................................................ 165A. Steenfelt
Analysis of integrated geologic data for uranium exploration in Egypt ............................ 171M.E. Mostafa
Anorthosite related polymetallic thorium-uranium deposits in the Namaqualandmetamorphic complex, South Africa ................................................................... 197M.A.G. Andreoli, N.J.B. Andersen, J.N. Faune
Application of the geostatistical analyses to uranium geology ........................................ 219(Ore reserve estimation and interpretation of airborne magnetic survey data)Shenghuang Tang, Yuxuan Xue, Jinqing Meng
Interpretation of airborne radiometric data to predict the occurrence of uranium ................ 239J. Talvitie, H. Arkimaa, O. Aikäs
GEOCHEMICAL DATA INTERPRETATION AND INTEGRATION
Graphical displays of multivariate geochemical data on scatterplots and maps as an aid todetailed interpretation ..................................................................................... 245H. Kiirzl
Geohydrology and its application in defining uranium and other metallogenic provinces ...... 273M. Levin, F.A.G.M. Camisani-Calzolari, B.B Hambleton-Jones
Interpretation and evaluation of regional geochemical anomalies of uranium ..................... 303C. Frick, S.W. Strauss
An ADP-system to predict massive Cu-Zn ore deposits using maps of different scales ........ 339V.V. Kuosmanen, H.M. Arkimaa, LA. Suppala
WORKING GROUP REPORTS
Working Group 1: Remote sensing in geology .......................................................... 359Working Group 2: Geophysical technique ................................................................ 363Working Group 3 : The interpretation of geochemical exploration data ............................ 369Working Group 4: Database management system ...................................................... 377
List of Participants ............................................................................................ 387
GEOLOGICAL DATABASE MANAGEMENT SYSTEMS
EXPERIENCES WITH DATABASE MANAGEMENT SYSTEMSFOR REGIONAL RECONNAISSANCE AND MINERALRESOURCE DATA
F. WURZER, H. KÜRZLMinéral Resources Research Division,Joanneum Research Society,Leoben, Austria
Abstract
Presently the main task of the M i n e r a l Resources ResearchD i v i s i o n (MRRD), Leoben, S t y r i a , is to c o l l e c t , store andprocess regional g e o l o g i c a l i n f o r m a t i o n r e l a t e d to the A u s t r i a nt e r r i t o r y . Working o n government contract a l l a v a i l a b l e d a t aper 1:50000 mapsheet are s y s t e m a t i c a l l y gathered and d i g i t i z e d .The aim is to e s t a b l i s h a consistent d i g i t a l data processingand documentation system abl e to provide i n d i v i d u a l i n f o r m a t i o nto the p u b l i c . For t h i s purpose an elegant way of d a t a integra-t i o n supported by a database management system (DBMS) isnecessary. Main appl i c a t i o n s are dedicated to d e t a i l e d minerale x p l o r a t i o n and regional resource studies, e s p e c i a l l y suppor-t i n g decisions in the realm of regional p l a n n i n g .
D i f f e r e n t g e o l o g i c a l , geochemicaI, and geophysical data asw e l l as facts concerning the mineral inventory, m i n i n g h i s t o r y ,c l a i m s and a b i b l i o g r a p h y s h a l l be incorporated in the system.Datatypes and structures are m a n i f o l d and require specialh a n d l i n g by the DBMS. U n t i l now, however, no adequate s o l u t i o nto that problem seems to e x i s t . D i f f e r e n t approaches weretested, l i k e h i e r a r c h i c a l structured f l a t f i l e management a sfor example supported by the VAX/VMS operating system, indexs e q u e n t i a l f i l e structures, and r e l a t i o n a l DBMS l i k e GRASP.A d d i t i o n a l l y VAX-DSM an i n t e r p r e t e r language, which supports ah i g h l y structured h i e r a r c h i c a l storage system is in extensiveuse. W h i l e w i t h t h i s language it is possible to develop a goodworking system for handling alphameric i n f o r m a t i o n e s p e c i a l l yfor b i b l i o g r a p h i c and m i n e r a l inventory data, GRASP hasreasonable a b i l i t i e s to deal w i t h p o i n t r e l a t e d numeric data.Both systems, however, offer no s u f f i c i e n t s o l u t i o n to caseswhere thematic i n f o r m a t i o n is related or assigned to c e r t a i n
geographic areas. Therefore at present the a p p l i c a b i l i t y ofcommercially offered geographic i n f o r m a t i o n systems is examined
Recent r e s u l t s of the i n v e s t i g a t i o n suggest a combinationof d i f f e r e n t DBMS's, each e s p e c i a l l y s u i t e d to the respectivedata structure. W e l l d e f i n e d interfaces have to be designed tosupport convenient transfer and l i n k of data.
1. INTRODUCTION
Regional surveys (airborne geophysics and stream sedimentgeochemistry) funded by the A u s t r i a n governement were carriedout during the l a s t decade. The major aim of these projects hasbeen to increase domestic m i n e r a l e x p l o r a t i o n a c t i v i t i e s .
When the f i r s t data became a v a i l a b l e , it was obvious, thatthe amount of data to be handled and processed would require aw e l l dedicated computer system. In fact no governmental andg e o s c i e n t i f i c i n s t i t u t i o n in A u s t r i a had the capacity and ex-perience to face that problem. Therefore the MRRD o u t l i n e d aconcept for the implementation of hard- and software, represen-t i n g the basic requirements for a modern and comprehensive dataprocessing system, w e l l suited to handle all d i f f e r e n t types of"geo-data".
Meanwhile a d d i t i o n a l governmental contracts have beenapproved to gather a d d i t i o n a l regional data. For example, it isintended to e s t a b l i s h a mineral inventory f i l e for the wholeA u s t r i a n t e r r i t o r y . There is also the a d d i t i o n a l aim to supplygovernmental a u t h o r i t i e s as w e l l as p l a n i n g i n s t i t u t i o n s w i t hregional resource assessment and basic g e o l o g i c a l data. Thiswas not obvious when o r i g i n a l l y e t a b l i s h i n g the data processingsystem, so it had to be adjusted time and again.
The data processing system i t s e l f is near the f i f t h yearof r e a l i s a t i o n . S t a r t i n g w i t h a PDP-11/34 ( D i g i t a l EquipmentCorporation) we now work on a VAX 8300 w i t h a VMS operatingsystem (12 MB Memory, 32-bit double processor) and three harddisks w i t h 456 MB storage capacity each. In a d d i t i o n to online
10
t e r m i n a l s , g r a p h i c c a p a b i l i t i e s include two HP vector p l o t t e r s ,one d i g i t i z e r , one i n k j e t hardcopy u n i t and two Tektronixgraphic t e r m i n a l s (4115B, 4207).
S t a r t i n g w i t h a rough overview of the basic theory and de-f i n i t i o n s concerning database systems the p r a c t i c a l experiencesw i l l b e o u t l i n e d i n t h i s paper.
2. BASIC CONCEPTS AND DATA TYPES
B a s i c a l l y the o v e r a l l purpose of a database system is tostore, m a i n t a i n , and r e t r i e v e i n f o r m a t i o n (resp. dala). In thef o l l o w i n g there w i l l be no clear d i s t i n c t i o n between the terms"data" and " i n f o r m a t i o n " , both words w i l l be used synonymously.Of course there can be seen an important d i f f e r e n c e : one can usedata to refer to the values p h y s i c a l l y recorded in a databaseand i n f o r m a t i o n to refer to the meaning of those valuesunderstood by some user.
In the f o l l o w i n g the four major components of a databasesystem (data, hardware, software, and users) w i l l be describedand s h o r t l y commented.
2.1 Data
Large amounts of d a t a are gathered in the f i e l d ofreconnaissance surveys, m i n e r a l e x p l o r a t i o n and m i n i n g . Topreserve t h i s i n f o r m a t i o n for f u t u r e use and to s a t i s f y presentdemands of current projects, it is very a d v i s a b l e to store thesedata in one or more common pools c a l l e d database systems. A l i s tof d i f f e r e n t sources of data is given in t a b l e 1.
A lot of these data resp. i n f o r m a t i o n is represented onmaps. This fact includes the need for h a n d l i n g the correspondinggeographic i n f o r m a t i o n (coordinates of p o i n t s , l i n e s and espe-c i a l l y information related to areas).
This i n f o r m a t i o n , however can be very useful for d i f f e r e n tkinds of data i n t e g r a t i o n (e.g. map overlay).
11
TABLE 1: D i f f e r e n t sources of d a t a lo be stored in "geo-data-base systems".
Topography (e.g. d i g i t a l t e r r a i n models)GeoIogyPetrography and M i n e r a l o g yGeochemi s t r yGeophys i esE x p l o r a t i o n a nd M i n i n gL i teratureRemole Sensing
2.2 Hardware
The hardware consists of the secondary storage devices(e.g. disks, drums, tapes p l u s the corresponding c o n t r o l l e r s )on which the database resides. But the r e l a t e d aspects shouldnot be mentioned here because these aspects form a major t o p i cin t h e i r own and the r e l a t e d problems are not so p e c u l i a r todatabase systems themselves. Of course minimum demands have tobe s a t i s f i e d , e s p e c i a l l y to s u i t the idea of combining t h e m a t i ca n d geographical i n f o r m a t i o n l i k e t h e g r a p h i c a l o u t p u t devices( p l o t t e r s , CRT's). I t i s also necessary t o specify i n advancewhich data have to be kept on d i s k to guarantee o n l i n e r e t r i e v aDepending on the amount of a v a i l a b l e data and m a i n l y on ther e l a t i v e frequency of i n q u i r i e s d i f f e r e n t p a r t s of thedatabase can be kept on tape.
2.3 Software
The l i n k between the users of a database system and thep h y s i c a l database i t s e l f is b u i l d by a l a y e r of software c a l l e dthe database management system (DBMS) p r o v i d i n g access to d a t a ,m a i n t a i n i n g f a c i l i t i e s , s e c u r i t y mechanisms, and so on.
12
2.4 Users
The users of a database system can be d i v i d e d i n t o threebroad groups. The f i r s t are the a p p l i c a t i o n programmers w r i t i n gthe programs which need access to stored data. The secondgroup is the class of end users accessing the database via aquery language. The t h i r d group are database a d m i n i s t r a t o r sdoing the important job of c r e a t i n g and e s t a b l i s h i n g as w e l l asm a i n t a i n i n g a n d u p d a t i n g t h e database i n connection w i t h a l lthe necessary o r g a n i s a t i o n a l and a d m i n s l r a t ive duties.
2.5 Database A r c h i t e c t u r e
The a r c h i t e c t u r e of a database is exposed in three l e v e l s ,the conceptual, the i n t e r n a l , and the extern a l l e v e l ([1], C23).For each l e v e l corresponding models e x i s t .
The conceptual model describes the logical overview of alldata. This has to be s p e c i f i e d before a new database is imple-mented.
The i n t e r n a l model describes the p h y s i c a l o r g a n i s a t i o n ofthe data in the computer system i t s e l f .
The e x t e r n a l model corresponds to d i f f e r e n t views of users,i.e. d i f f e r e n t users have t h e i r own requirements l i k e as p e c i a l l y structured subset of the stored data. The DBMS has top r o v i d e t h e f a c i l i t i e s f o r t h e respective r e t r i e v a l s .
2.6 Basic Approaches
The three most commonly used approaches to databasem o d e l l i n g are the f o l l o w i n g :
a) the network database system,b) the h i e r a r c h i c a l database systemc) the r e l a t i o n a l database systemAt MRRO network type database systems are not used, m a i n l y
because r e t r i e v a l o p e r a t i o n s become very slow, when complexdata structures are present.
13
3. PRACTICAL EXPERIENCE
The r i g h t s e l e c t i o n of software for a database system isvery i m p o r t a n t and s t r o n g l y dependend on the group of users.The main p a r t of t h i s group is the inhouse s t a f f r e q u i r i n gquick access to d a t a in connection w i t h project work and docu-mentation. In a d d i t i o n to that a p u b l i c service for o f f - l i n e re-t r i e v a l (via tape or p r i n t e d reports) has to be e s t a b l i s h e d .Based on t h i s p r i n c i p a l demands the f o l l o w i n g ODMS's have beenused to implement various databases by the MRRD.
The h i e r a r c h i c a l database system VAX-OSM ( D i g i t a l StandardMumps, [3]) is a non structured i n t e r p r e t e r language w i t h ah i g h l y structured h i e r a r c h i c a l storage system. The most im-p o r t a n t advantage is, t h a t d e f i n e d data f i e l d s do not use anystorage as long as they have no assigned a c t u a l v a l u e . Thisfeature is very useful for h i g h l y heterogeneous i n f o r m a t i o n l i k ed e s c r i p t i o n s . A rough scheme for a m i n e r a l inventory f i l e basedon a h i e r a r c h i c a l concept is o u t l i n e d in f i g u r e 1. It isrealized in VAX-DSM on our DP-system. Many of the items aredescriptions of v a r i a b l e f i e l d l e ngth, which can be e a s i l yhandled w i t h o u t wasting storage capacity.
location of a mineral occurencewithin a defined mapsheet
geogr.info commodities geol.descr. econ.descr. outcrop type
genetic type petrography
Fig. 1: Example for a hierarchical concept (mineral inventory file) -extensions are possible at any level
14
Another advantage of VAX-DSM is the f l e x i b i l i t y in formula-t i n g i n q u i r i e s and also in changing the h i e r a r c h i c a l order ofitems.The f l e x i b i l i t y of VAX-DSM ist based on the fact t h a t itis not a DBMS in a s t r i c t sense but m a i n l y a programming lan-guage. Of course t h i s includes also the disadvantage t h a tfeatures u s u a l l y provided by a DBMS (e.g. f a c i l i t i e s for accessrestrictions) have to be w r i t t e n as routines. This gives thea b i l i t y to design special a p p l i c a t i o n s but requires a lot ofprogrammi ng.
The state of art in computer science is represented by ther e l a t i o n a l approach. Objects and r e l a t i o n s between them can beseen as tables. The scheme of a r e l a t i o n a l concept for a geo-chemical data f i l e is given in f i g u r e 2. An existing r e l a t i o n a lsystem is GRASP (Geological R e t r i e v a l and Synopsis Program C4]>developed in FORTRAN at the U.S. Geological Survey. The MRRDuses a s e l f developed system c a l l e d DBS (Database System), whichis based on the concept of GRASP but e s p e c i a l l y dedicated toa p p l i c a t i o n software and own projects.
ID
1563
e as ting
808846.
northing
5234000.
As
15.
Al
5.4
* • •
El
AsAl* * »
unit
ppmX
anal. method
AASICP
year
19841984
type
str.sed.str.sed.
• • •
Fig. 2: Example for a relational concept (stream sediments -analytical results)
The main advantages of r e l a t i o n a l DBMS's are the f l e x i b i -l i t y in r e t r i e v i n g data and in reorganizing a whole database.This is very important, because during the developement orduring the l i f e t i m e of a database it is often necessary to
15
adjust the implementation or even the whole conceptual model tof u l f i l l new requirements. The best way to create a new databaseand to make it v a l u a b l e and f r e q u e n t l y used is by an i t e r a t i v eprocedure. At f i r s t a " p i l o t version" of the database is im-plemented on the computer, based on a conceptual model, which isthen transformed to an i n t e r n a l model. I n t e n s i v e use of t h a tp i l o t version should v e r i f y the usefulness of the concept andh i g h l i g h t a d d i t i o n a l requirements, to be incorporated in af i n a l vers i on.
4. ADVANTAGES OF DATABASE SYSTEMS
a) Redundancy can be reduced: if every user stores hisown f i l e s belonging to his r e s p e c t i v e a p p l i c a t i o n s , the samedata are stored several tim e s , using up a d d i t i o n a l storagecapac i t y.
b) Inconsistencies can be avoided: our experience showsthat i t i s almost i m p o s s i b l e t o control t h e u p d a t i n g o f m u l t i -ple copies of data (sometimes one even does not know aboutsuch copies). T h i s can create inconsistencies, i.e. some userswork on corrected and others on "old" data.
c) Data can be shared: the same data can be accessed byseveral users at the same t i m e . If the content of a database isw e l l documented, which anyway is a necessity even unexpriencedusers, unaware of the a v a i l a b i l i t y of all the data, can bes t i m u l a t e t d and get new ideas for f u r t h e r use. A good conceptfor a database can even foresee f u t u r e requirements ofa p p l i c a t i o n s , which should be s a t i s f i e d .
d) Security r e s t r i c t i o n s can be a p p l i e d w i t h c e n t r a lcontrol. Of course if these r e s t r i c t i o n s are not implementedmisuse cannot be excluded and database systems can even supporti t .
e) I n t e g r i t y can be m a i n t a i n e d : i n t e g r i t y checks he l p toavoid errors (e.g. the sum of f r a c t i o n s g i v e n in percent shouldnot exceed 100 percent).
f) Data independence: a very important advantage of database systems is the independence between a p p l i c a t i o n programs
16
and the p h y s i c a l f i l e structure of the data used. Changing t h i ss t r u c t u r e does not necessitate r e w r i t i n g whole programs becausethe DBMS pro v i d e s access corresponding to the external modelrequ i red.
5. PROBLEMS CONCEARNING GEOLOGICAL DATA
There are some basic problems which arise by s e t t i n g up"geo database systems":
a) How can geographic information be handled in closeconnection to t h e m a t i c i n f o r m a t i o n . For t h i s purpose so c a l l e dgeographic i n f o r m a t i o n systems (GiS) have been developed. Thosesystems (e.g. ARC/INFO C51) pr o v i d e the f a c i l i t i e s to d i g i t i z e ,e d i t , and p l o t geographic i n f o r m a t i o n . Map la y o u t s can be de-signed and r e g i o n a l i s e d processes analysed in a proper andef f i c i e n t way.
b) The problem of v a r i a b l e record length should not beneglected. Let us consider the example of the m i n e r a l i n v e n t o r yf i l e , e s p e c i a l l y t h e g e o l o g i c a l d e s c r i p t i o n s . T h e l e n g t h o fsuch d e s c r i p t i o n s v a r i e s enormously from zero to many pages.For e x i s t i n g r e l a t i o n a l systems ( l i k e GRASP), there is noa p p r o p r i a t e s o l u t i o n to t h a t problem, because a fixed l e n g t h isassigned to each data f i e l d . By c u t t i n g down v o l u m i n o u sd e s c r i p t i o n s , i n f o r m a t i o n w i l l b e l o s t , b y d e f i n i n g large dataf i e l d s t o f i t even t h e largest d e s c r i p t i o n , storage c a p a c i t yw i l l be wasted. The storage management for example of VAX-DSMp r o v i d e s dynamic a l l o c a t i o n t o solve t h i s problem.
c) The problem of u p d a t i n g is always e v i d e n t , e s p e c i a l l y ifone uses a database to manage all the data for one geoscien t i f i cproject. For example, the r e s u l t s of a s t a t i s t i c a l a n a l y s i sshow some grouping of geochemical data. Further a n a l y s i s shouldbe c a r r i e d out w i t h i n groups and the database should containt h i s a d d i t i o n a l i n f o r m a t i o n a n d p r o v i d e t h e correspondingaccess. This was one reason, why MRRD developed i t ' s own projectdatabase DBS, where t h i s problem is solved.
d) One major problem for every database is the q u e s t i o nof d a t a q u a l i t y . I n c o n s i s t e n c i e s in the source data, errors
17
introduced by the process of d a t a entry, measurement errors,raster i n f o r m a t i o n w i t h low or inconsistent d e n s i t y - all theseproblems have to be considered. In many s i t u a t i o n s there is noclear s o l u t i o n t o a l ! that b u t i t i s a challenge f o r f u t u r ework .
6. SUMMARY
Geological data have very heterogeneous forms, thereforenot all the a v a i l a b l e i n f o r m a t i o n can or should be p i l e d in ones i n g l e database system. To guarantee transfer and l i n k of databetween d i f f e r e n t DBMS, w e l l d e f i n e d interfaces have to bedesigned. This is e s p e c i a l l y necessary when combining all k i n d sof thematic information w i t h the corresponding geographical in-formation (e.g. basi c topographical d a t a l i k e drainage systemsetc.). R e l a t i o n a l DB systems, the s t a t e of the art in computersciences, can be h i g h l y recommended but h i e r a r c h i c a l approachesa r e s t i l l u s e f u l l t o solve t h e problem o f v a r i a b l e recordlength (e.g. for the incorporation of q u a l i t a t i v e geologicalfacts and descriptions).
REFERENCES
C1] DATE, C.J. (1981): An i n t r o d u c t i o n to database systems,t h i r d e d i t i o n . Addison Wesley P u b l . Comp. Reading,Massachuselt s.
[2] SCHLAGETER, G., STUCKY, W. (1983): Datenbanksysteme:Konzepte und Modelle. B.G. Teubner, S t u t t g a r t .
[3] VAX-11 DSM Language Reference Manual, AA-J415B-TE, D i g i t aEquipment Corporation, December 1982.
[41 GRASP Users Ma n u a l , Roger W. Bowen, U.S.G.S., Reslon,V i r g i n i a , June 1982.
[5] ARC/INFO Users Manual Version 3.2, Envir o n m e n t a l SystemsResearch I n s t i t u t e , Redlands C a l i f o r n i a , USA, A p r i l 1986.
18
ADAPTATION OF THE RELATIONAL MODELTO LARGE VOLUME GEOSCIENCE DATABASES
M.T. HOLROYDDataplotting Services Inc.,Don Mills, Ontario, Canada
Abstract
Over the last half decade, the Relational model has becomethe preferred model for Data Base Management System (DBMS) designand the Hierarchical model has come to be regarded as obsolete.The very large volume data sets from geophysical surveys are,however, intrinsically hierarchical - in both their datastructures and in the data manipulation requirements of thedigital compilation and cartography processes. Converting aninterpolated map grid from its conventional form to a normalisedrelation, would (at least) double the data storage volume and(at least) quadruple the data access time. With only moderatelysized surveys this is an unacceptable penalty.
The relational model does, however, have distinctadvantages over its predecessors in its clarity and simplicty ofdata definition and in the rigor with which data manipulationoperations can be specified. These advantages could be ofsubstantial benefit to geoscience data processing if theattendant disadvantages could be overcome.
One means to accomplish this is to employ an Algebraic datastructure model in conjunction with the Relational model. TheAlgebraic model treats relational domains as vectors. It tooemploys a rigorous formal definition of data structure andmanipulation processes. The relationships between domains in thesame relation and between different relations are defined byarithmetical operators used in a logical sense. The data basecan thus be described by an algebraic expression and datamanipulation processes can exactly be modelled by algebraicmanipulation of this expression.
The addition of the Algebraic model overcomes the abovenoted disadvantages of the Relational model as it permitsGeoscience data sets to be viewed externally as fully normalisedrelations yet treated internally in their conventional,condensed form.
1. DATA BASE MANAGEMENT SYSTEMS (DBMS)
1.1 General definitions
At one time the bulk unit of data was regarded as the"file" in which all records were of the same form and content,hence simple "read and write" programs were usually adequate for
19
data access and manipulation. Today, the bulk unit of data isthe "data base" - a collection of many different interrelated"files" with different contents and structures.
Martin [1] defines a data base as follows:
"A data base may be defined as a collection ofinterrelated data stored together without harmful or unnecessaryredundancy to serve one or more applications in an optimalfashion: the data are stored so that they are independent ofprograms which use the data; a common and controlled approach isused in adding new data and in modifying and retrieving existingdata within the data base. One system is said to contain acollection of data bases if they are entirely separate instructure".
Date [2] provides a much briefer definition:
"A data base is a collection of stored operational dataused by the application system of some particular enterprise".
Olle [3] defines the data base in contrast to the abovementioned file as follows:
"The difference between a data base and a file, in termsused prior to the advent of data processing, is perhapsanalogous to the difference between a thoroughlycross-referenced set of files in cabinets in a library or in anoffice and a single file in one cabinet which is notcross-referenced in any way.
The relationship between the data base and the data basemanagement system (DBMS) is described by Olle (op. cit.) as:
"A data base is a set of data stored in some special wayin direct access computer storage. A DBMS is the software thathandles the storage and retrieval of the records in this database".
1.2 Data structure models and DBMS implementations
Current implementations of the DBMS fall into threegeneral categories on the basis of the underlying structuralmodel. The categories are "Hierarchical", "Network" and"Relational". The mathematical bases of such structures aredescribed in detail by Berztiss [4].
Hierarchical systems are based upon the "tree" structure,and network systems on the network or plex structure. Relationalsystems are based upon Relational algebra. A comprensive reviewand comparison of the properties of the three types is to befound in Date (op. cit.)
20
2. THE RELATIONAL DBMS AND GEOSCIENCE DATA BASES
2.1 The characteristics of Relational data and DBMS
Relational data bases contain one or more physicallyindependent tables - rows and columns - of data. Such a table isreferred to as a "Relation". Each row or record of the relationis referred to as a "tuple" (As in "N-tuple"). The tuples of anyone Relation must all have the same logical structure andcontent. The data items in any one column of a Relation must allbelong to the saine "Domain" of information.
The standard data manipulation operations available in aRelational DBMS consist simply of "SELECT", "PROJECT" and"JOIN".
SELECT operates on a single table to select certainrecords according to content. PROJECT operates on a single tableto extract (to "project out") entire columns. Either or both ofthese two operations applied to a table create a new table as arow and/or column subset of the original.
The JOIN operation takes the records of two separatetables and "joins" them together wherever the same value of aspecific field is found in the records of both tables. Thisagain creates a new table. A JOIN operation is, however, onlymeaningful if the JOIN is made on two columns, one in each,table which both belong to the same domain.
It was shown by Codd [5], the originator of this model,that an admixture of these three basic operations, appliedrecursively and repeatedly as required, could isolate anydefinable subset of the entire data base as a single Relation.Furthermore, it is not necessary to create any predefinedaccess paths, pointers, linkages or indices to attain this goal.
In order for these operations to proceed correctly,however, it is necessary that the following rules apply to theRelations:
1) every record must have one more fields or whose value(s)is(are) unique within the relation (a unique key).
2) all other fields within the record must be "functionallydependent" on this key.
3) all other fields must be functionally independent ofeach other.
These rules together constitute what is called "thirdnormal form" and the process of putting the data base in suchorder is known as "Normalisation". (The phrase "The Key, thewhole Key, and nothing but the key, so help me Codd" is ahelpfull mnemonic for this.)
21
2.1.1 The general advantages of the Relational ModelThe relational model offers certain potential advantages
over the Hierarchical and Network models:-1) It greatly simplies the user's view of the data base.
Instead of complex graphical structures, we now have sets ofsimple tables - data in the form we normally conceive of it.
2) No pre-defined retrieval bias is built-in to therelational DBMS in the form of hierarchical linkages or networkpointers, etc. Hence retrievals from any and all viewpoints areequally feasible.
3) Operations apply to whole tables and create new wholetables.
Consequently, the relational approach is currently verymuch favoured by the user community.
It must be noted though, that this is employed in practiceas a conceptual model which governs the users' view of the database and defines formally the structures of the data base andthe manipulation processes applicable to it. All existingRelational DBMS's (that are acceptably efficient) relyinternally, beyond the user's view, on older well proven,techniques taken straight from their Hierarchical and Networkprogenitors in order to function efficiently.
2.2 The characteristics of large geoscience data bases
Among the largest geoscience data bases that currentlyexist are those derived from large scale aero-geophysicalsurveys, and the volume of data gathered by such surveys issteadily increasing. This increase is exhibited by all threepossible degrees of freedom - size of survey area, dataacquisition rates and number of survey parameters recorded.
Bristow [6] describes an airborne gamma ray spectrometrysystem capable of sampling up to 1024 energy channels at ratesof up to 4 samples per second. Each channel/sample is a 16 bitword, hence the maximum data acquisition rate is 8 Kbytes/second or approximately 30 Megabytes/hr. 100 hours of surveyflight would therefore acquire approximately 3 Terabytes ofdigital data.
The Thailand aerogeophysical survey [7] currently inprogress contains about 1,000,000 kilometers of survey flightline. Magnetics, gamma-spectrometry VLF and electronicnavigation (ENS) data are all recorded digitally. The raw data(as recorded in-flight) volume will be several Gigabytes. Theprocessed final data (interpolated grids, etc) will be in theTerrabyte range.
22
2.2.1 The logical content of survey data bases
Regardless of type, such survey data bases generallycontain the following logical data entities:-
i) FIELD - A single number, e.g. A spatialcoordinate, a geophysical measurement.
ii) STATION - A group of fields of different types allrelated to the same space/time point.
iii) LINE - An ordered sequence of stations.
iv) BLOCK - A group of lines which can be processed asan independent unit.
v) SURVEY - One or more blocks constituting an entireproject.
vi) MAP - A subset of a block or survey within anarbitrarily defined polygonal boundary.
vii) GRID - A set of geodata values all of the same typearranged in grid order over all or part ofthe survey area.
2.2.2 The inherent structure of survey data bases
The essential function of a DBMS applied to geosciencedata is to support efficient storage and retrieval of the abovedescribed logical entities.
These entities, however, clearly exibit inherentstructures more suited to the oldest data model rather than theRelational model. For example "Survey - Block - Line - Station- Field" is a strictly one-to-many structure. As is "Survey -Map - Line" and "Survey - Grid - Value" i.e. all are stricthierarchies.
Furthermore, the retrieval requirements when processingthis data have strictly "top-down" hierarchical search paths.i.e. The basic requirement is to retrieve the lowest levelentities - a line of stations - from a specified survey and map.This writer has never encountered an application requiring aninverse search, i.e. to find the line, map and survey whichcontain a specific aeromagnetic field value or Uranium count.
The above considerations indicate that this type of data isinherently suited to a hierarchical DBMS. Other considerations existwhich indicate its inherent unsuitability to a Relational DBMS.
23
2.3 Geoscience data bases and the Relational DBMS
There is nothing inherent in the types of data entitiesdescribed above that would actually prevent them from being putinto and served by a relational DBMS. All of the different datasets encountered in geoscience could be structured as sets ofrelations (it can be shown, in fact, that any data base of anykind and complexity can be restructured as a set of relationswith no loss of information.)
The important question to answer is what are theadvantages and disadvantages of doing so ? The generaladvantages of the Relational model have been described, as havecertain considerations which indicate the inherent suitabilityof survey data to a Hierarchical model. There also exist cleardisadvantages in applying a Relational DBMS to survey data.
An obvious disadvantage is one of the cited generaladvantages of the Relational model, i.e. the absence ofpre-defined access paths. As described above, survey data baseshave distinctly preferred access paths. Hence no disadvantagesensue from having these built-in, and substantial improvementsin efficiency result from having them. Hence a Relational DBMSprevents exploitation of the inherent structures of the database.
The greatest disadvantage, however, arises from the needto normalise the relations. To illustrate this, consider one ofthe larger data groups employed in survey data processing - theinterpolated data grid.
Such grids, interpolated across and between the datatraverses, are an essential prerequisite to mapping and threedimensional interpretation processes. The grids are normallystored in row or column order as sequences of binary numbers.Each number represents an interpolated value at one node of thegrid. A separate "header" record defines associated gridparameters such as cell dimensions, orientation, coordinates ofgrid origin, data type, etc.
A single grid value can not be assumed to be unique withinthe set of grid values. Hence the data as usually storedviolates the first rule of normalisation (section 2.1 above). Inorder to normalise the data, it would be necessary to storeexplicitly the row and column index numbers with each grid node.As no two grid nodes can have the same row/column indices, thesetwo fields together would constitute the mandatory unique key ineach record of the grid relation.
Hence, normalisation would triple the number of data itemsto be stored for any grid. Although it is techically simple toexpand the grid data in this way, the resulting 300% increase in
24
storage volume and retrieval time is totally unacceptable inpractice.
Not all survey data groups are as intractable as griddata. The in-flight data set, for example, exists from thebegining as a fully normalised relation, i.e. each recordcontains a fiducial which is unique, and in-flight measurementswhich are functionally dependent on this key and independent ofeach other. Even so, such data is still not acceptable as-is formanipulation by a Relational DBMS because of the pre-requisitesof the JOIN operation.
Creation of a specific map as a subset of the entirein-flight data set involves "joining" this data set with anotherrelation defining the map involved. The in-flight data asstored, however, even though it is normalised, does not containa domain in common with the map definition data. Hence, therequisite join can not be made.
In order to facilitate the join, data from the "map no."domain would have to be added as another column to the in-flightdata. Likewise, for block or survey extraction. This againwould reduce storage and retrieval efficiency.
3. ADAPTATION OF THE RELATIONAL MODEL TO GEOSCIENCE NEEDS
It can be concluded from the above observations that theRelational model and current DBMS implementations of it, aregenerally unsuited to the needs of large geoscience survey databases, and by extension, to any other geoscience data base withsimilar properties to the survey data base.
The model could, though, be extended and adapted togeoscience needs. A DBMS based on such an extension could offerthe simplicity and power of the relational concept and still behighly efficient.
Any adaptation would have to resolve the problems of datastorage and retrieval efficiency. The adaptation should also, inkeeping with the Relational model, have a formal theoreticalbasis. It should not be simply an an-hoc superficialmodification.
A potential candidate for the adaptation is the Algebraicdata model. The properties of this model are described below.
3.1 The Algebraic data model
The Algebraic data model was derived from observation ofa variety of geoscience data sets and the manipulation processescommonly applied to such data in geoscience data compilationsystems.
25
A key feature of the model is the use of arithmeticaloperators to represent the logical relationships between variouslevels of components of the data set. This permits a data set tobe represented symbolically as an algebraic expression.
It was shown that formal algebraic manipulation of such anexpression exactly models all of the basic data manipulationoperations carried out within the geoscience processing systemsstudied and permits formal derivation of subsequent data structuresfrom the initial form of tha data set.
Detailed description of the derivation of the model isgiven by Holroyd [8]. A summary description follows.
3.1.1 Definition of terms
A data item is an "attribute" of some real-world entity.An attribute possesses two components - a "type" and a "value".
The "attribute type" is a real-world property of theentity. Mass and colour are two examples of real-worldproperties of physical objects. The "attribute value" is anobserved or measured value of the property, e.g. "50 kilograms"is a possible value for the attribute type mass.
A group of attributes which all describe an individualentity is an "aggregate". A data set is composed of many suchaggregates, one for each of the entities that the data setdescribes.
A data set is in "cardinal from" if all of the entitiesdescribed by the data set belong to the same class (i.e. all aredescribed by the same set of attribute types), and there is aunique one-to-one correspondence between entities andaggregates. Such a data set is a simple table. Each row is anaggregate describing one and only one entity. Each columncontains a single type of attribute. This is similar to anormalised relation. The difference is that there is no need fora unique key.
3.1.2 Arithmetical equivalents to logical relationships
Logical relationships between data set components can beexpressed as arithmetical operators. This confers algebraicproperties on the data set model.
The logical relationship between attributes of the sameentity is equivalent to the simple multiplication operator. Thelogical relationship between aggregates describing members ofthe same entity is equivalent to the addition operator. Thelogical relationship between aggregates describing members ofdifferent entity sets is the null operator.
26
3.1.3 Notational conventions for the data model
An attribute is represented by a letter for the type andan integer for the specific value, e.g. -
A7 (1)
A letter followed by a lower case alphabetic subscriptrepresents the general case, e.g., if "A" represented theproperty of colour, then "some or any particular colour" isrepresented by:-
Ai (2)
An aggregate of the attributes of a single entity isrepresented by a string of attribute symbols separated by commasto indicate the multiplicative relationship, e.g description ofa particular entity by specific values of three types ofattributes is represented as: -
A7,B5,C2 (3)
A data set in cardinal form is represented by a sequenceof aggregates separated by plus signs to indicate the additiverelationship, e.g a data set describing three entities of thesame class is represented as:-.
A7,B5,C6 + A2,B1,C1 + Al.Bl,C9 (4)
A data set can be characterised by type simply by theattribute types it contains, regardless of any specificattribute values. Parentheses are used to indicate replication,e.g. the cardinal form in (4) above can be summarised as:-
(A,B,C) (5)
3.1.4 Algebraic representation of cardinal form data
An expression representing a common aerogeophysical dataset is :-
M1.L1.F1 +M1,L1,F2 + ... Ml,L2,Fi + ... M2,Ll,Fj-f . . . Mk,Lk,Lk ... etc. (6)
A data set of this form is created by digitisation of theflight path maps of a survey. The entities are navigationalfixes along flight lines on maps. The attributes of each entityare as follows:-
M map numberL line numberF fiducial (serial) number.
27
The first aggregate (M1,L1,F1) describes the first entity- the first fix on the flight path - by the map and line uponwhich it lies and by its fiducial number. The second aggregate(M1,L1,F2) describes the second flight path point, etc, etc.
Note that, according to the notational convention, thesecond fix is on the same map and line as the first, but has adifferent fiducial number. Successive aggregates describe fixeson different lines, then on different maps.
The elipsis (...) in the expression indicates that asequence of aggregates, unspecified in number, has been omittedfor brevity. The lower case subscripts ("i", "j" and "k")indicate the general-case of the aggregate - "some/any specificvalue" .
All of the information available for description of eachentity resides in one aggregate and this aggregate contains noinformation descriptive of any other entity. Hence by definitiontViis is the cardinal form of the data set. The summaryrepresentation of (6), as in (5), 1s:-
CM.L.F) (7)
3.1.5 Algebraic development of alternative forms
The above representation of the cardinal form of the dataset was derived simply by observing an actual data set thendescribing it symbolically according to the notationalconvention.
Noting that the same map number is common to manysuccessive aggregates, and that the same line number is commonto many successive aggregates within one map, and that therelationship between attributes is multiplicative, then thesecommon values can be factored out of (6) to produce theexpression:-
Ml( Ll( Fl + F2...FI) + L2(Fj...) ... + Li(..))+ M2( Ll(. .)...) ... Mi (8)
- which, in summary notation is:-
) (9)Expression (9) is, in fact, a hierarchical structure. This
same structure is developed within data compilation programs bythe first process applied to the initial digitised track data inorder to economise on storage and improve retrieval efficiency.
Here, the structure was developed solely by formalalgebraic manipulation of the data set expression. This brieflydemonstrates the simulation capabilities of the model.
28
More complex manipulation procedures can be carried outwhich bring more powerful operators into use. These are the"dot" (inner product) and "star" (outer product) operators.The use of these operators is shown in the next section.
3.2 Combination of the Relational and Algebraic models
At the conclusion of section 2.1.1 above, it was noted thatthe Relational model is employed conceptually to provide aformal framework for the user's view of data base structure andmanipulation processes. What actualy goes on within the workingsof the DBMS is of no consequence provided that it appears toadhere to the formal rules of the Relational model.
Even so, the rules still do not permit, for example,storage of a data grid in a Relational DBMS in any other than theexpanded, inefficient structure as described.
If, however, we permit a relation to be defined as analgebraic structure, it becomes possible to store a data grid inthe usual efficient way and still maintain it as a fullynormalised relation in the manner shown below.
The expression for the data grid with the preferred,efficient, structure is:-
Gl + G2 + G3 ... (10)
In summary form this is:-
(G) (11)
i.e. simply rows and columns of grid values alone.
The required relational structure is:-I1,J1,G1 + I1,J2,G2 + ....+ I2,Jl,Gj + I2,J2,Gk... (12)
In summary form this is:-(I,J,G) (13)
i.e. rows and columns of grid values each with an explicitrow number (I) and column number (J).
Note that all of the aggregates in the first grid row allhave the same row number II and that all the aggregates in thenext row all have the same row number 12. The column numbers inthe first row are all different. They begin at Jl and increaseserially by 1 to Jm, where m is the number of columns in thegrid. This sequence from 1 to m is repeated in all subsequentrows. Hence, there are many common factors in the expression.
29
The expression can therefore be factorised. Twohierarchical forms are possible:
(KJ,G)) (14)
or:-
(J(I,G)) (15)In (14), the row number occurs once only at the start of
each row. This is followed by a sequence of grid values eachwith an explicit (and different) column number. In (15) the gridmatrix has been transposed to column order.
Both of these two forms reduce storage but neither is asefficient as the desired form (11). The outer and innerproduct operators, however, can be used to reduce the expressionto essentially the desired form as:-
(I)*(J).(G) (16)
This structure consists of all unique row numbers storedonce only, followed by all unique columm numbers stored onceonly, followed by the entire grid stored as grid values only, inthe same form as (11). When (16) is multiplied out, however, itbecomes the normalised cardinal form in (13).
Hence (16) is formally the equivalent of (13) but requiresconsiderably less data storage. With a grid of M columns and Nrows, the number of items LI in (13) is given by:-
Ll = 3 x ( M x N ) (17)
For (16), the number of items L2 is given by:-
L 2 = M + N + ( M x N ) (18)
For a 1000 x 1000 point grid, L2 is only 0.2% larger thanthe optimum achieved in (11), whereas Ll is 300% larger.
Similarly, the flight path data set in (9) is formally theequivalent of its cardinal form (7). Hence (9) can be stored inthe preferred, access and storage efficient, hierarchical formand still retain its formal definition as a normalised relation.
3.2.1 Implementation considerations
In an actual implementation of an "Algebraic/Relational"DBMS, very little need be added to the user's burden beyond thatalready carried for a standard Relational DBMS. The user couldregard all data as being in a fully normalised form at all timesand hence perform all the standard Relational operations withouthindrance.
30
The Algebraic aspects would be an additional capabilitytotally independent of Relational operations. Provision wouldsimply be made for the user to define the preferred algebraicstructures for the relations so as to optimise storage andretrieval efficiency.
A simple but powerful data manipulation language couldbe created on the formal algebraic basis of the model. Forexample, a command as simple as:-
(M,L,F) -> (M(L(F)))
- is sufficient to define fully and invoke thefactorisation process to create a hierarchy from a table.
4. CONCLUSIONS
It has been shown that commonly employed geoscience databases have inherent structures which make them more suited todata models other than the Relational model and that RelationalDBMS's have features inherently unsuited to many common datastorage and manpulation requirements of geoscience data.
Hence, it can be concluded that Relational DBMS's as theycurrently exist, are essentially inappropriate for manygeoscience data management needs.
Specific features of geoscience data and the Relationalmodel were examined to determine exactly where the most severeproblems lay.
The properties of a new data model, the Algebraic model,were demonstrated. These properties were employed to findsolutions to the problems.
It is concluded that a DBMS could be implemented with theAlgebraic model as an adjunct to the Relational model. Such asystem would make the beneficial features of the Relationalmethod applicable to geoscience data without the attendantproblems that currently make Relational systems inapplicable tosuch data.
REFERENCES
[1] Martin, J., Computer Data Base Organisation, Prentice HallInc., Englewood Cliffs, New Jersey (1975), 44.
[2] Date, C.J., An Introduction to Data Base Systems, Addison-Wesley Publishing Co., Reading Mass. (1976), 1.
31
[3] Olle, T.N., "Data base and data base management",Encyclopedia of Computer Science, 3rd éd.,Petrocelli/Chanter, New York (1976), 389.
[4] Berztiss, A.T., Data Structures: Theory and Practice,Academic Press, New York (1975).
[5] Codd, E.F., "A data base sub-language founded on therelational calculus", Proc. ACM SIGFIDET Workshop on DataDescription, Access and Control (1971)
[6] Bristow, Q., "A gamma-ray spectrometry system for airbornegeological research", Current Research Part C, GeologicalSurvey of Canada Paper 79-1C (1979), 55.
[7] (Anon), Technical Specifications, Bidding Documents forAirborne Geophysical Survey No. MRDP/01/1983, (Prepared byGeological Survey of Canada) (1983)
[8] Holroyd, M.T. A System for Automated Compilation andCartography of Earth Science Data With Special Provisionfor Aerogeophysical Data, Phd Thesis, University Of Ottawa,Geology Department (1984)
32
SAMINDABA - A SOUTH AFRICAN MINERALDEPOSITS DATABASE
S.S. HINE, E.C.I. HAMMERBECKGeological Survey of South Africa,Pretoria, South Africa
Abstract
The South African Mineral Deposits database, SAMINDABA, isdesigned specifically to accommodate interactive computerizedaccess to data on known mineral deposits within the Republic ofSouth Africa. Data on a wide range of mineral deposits is beingcaptured at present, interalia on uranium deposits of the KarooSequence.
The mineral deposit data is subdivided into seven majorgroupings: DEPOSIT IDENTIFICATION DATA identifying and locatingthe mineral deposit; HOST ROCK DATA a multiple data groupingdescribing the host rock(s) surrounding the ore body(ies);OREBODY DATA a multiple data grouping describing the geologicalcharacteristics of the orebody(ies); EXPLORATION DATA describingthe history and level of exploration carried out on the mineraldeposit; EXPLOITATION DATA RESOURCE DATA describing depositresources sub-divided into the following subgroups: demonstratedeconomic reserves, demonstrated marginal reserves, demonstratedsubeconomic resources, and inferred resources and, finally, DATAREFERENCES describing references to the sources of the mineraldeposit data.
SAMINDABA'S primary functions include the capture, validationand updating of the mineral deposit data while at the same timemaintaining a high level of security. Enquiries on and extrac-tion from the database can be done on two levels, firstly usingan on-line, menu driven enquiry system, SAMENQ, which allows auser, no matter what his level of computer expertise, to makeenquiries using predefined search paths, key fields and outputformats. The second level, using the database enquiry language,requires a specialized knowledge of the database structure andthe enquiry language together with a much higher level ofcomputer literacy, but allows a user freedom to enquire andoutput the enquiry results in any format that he wishes.
Due to the availability and high level of development ofcomputer software no specific applications software was written
33
for SAMINDABA but rather a facility was created in SAMENQallowing a user to output the results of an enquiry to aninterface file for input to various application packages asrequired. These, at present, include statistical, graphical,mapping and modelling packages.
1. INTRODUCTION
One of the primary functions of the Geological Survey is toprovide basic geological information to promote the explorationfor, and mining of, minerals in South Africa.
To date the collection, compilation and dissemination of dataand information on mineral occurrences and deposits has beendone manually. The ever increasing volume of such data andinformation is making it extremely difficult to manage thisfunction efficiently, and hence the establishment of a computer-ized mineral deposits database was proposed.
A study was made of various existing geological databases, inparticular CANMINDEX, CRIB, DASH and G-EXEC *• 2'3'4. This wasfollowed by a detailed data analysis and with the help of twocomputer consulting firms the South African Mineral DepositsDatabase (SAMINDABA) was designed, written and implemented5' 6.
2. SAMINDABA DESIGN PHILOSOPHY
SAMINDABA, a menu driven, on-line system was designed tosupport a wide range of users with varying degrees of technicaland computer expertise. Special attention was paid to theprotection of confidential data down to element level. SAMINDABAfunctions can be broadly subdivided into three groups as depict-ed in figure 1.
1. Capture, validation and storage of mineral depositdata
2. Enquiries against the database and output of the data3. Processing of the data
The first two of these functions support the whole spectrumof users and have extensive help, maintenance and system secur-ity facilities, while the processing function of SAMINDABArequires a greater technical knowledge and assumes that the usertakes responsibility for the data. This processing portion ofSAMINDABA is totally divorced from the database and it's con-
34
SAMINDABA SYSTEM\
/
SYSTEM SECURITY
\
/
MAINTENANCE
\
/
HELP
SAMINDABADATA CAPTUREDATA VALIDATIONDATA STORAGE
1SAMENQ
DATA CONFIDENTIALITYENQUIRIESDATA OUTPUT
/
\
/
\
/
\SAMINDABA DATA PROCESSING
______I______I
SAMSAS
BROWSING/EDITINGREPORTINGDATA MANIPULATIONSTATISTICSGRAPHICS
1IGGS/DIGIMAP
MAPPINGMINERAL MAPMETALLOGENIC MAP
GRIDDINGCONTOURINGKRIGING
FIG. 1. Schematic representation of relationships betweenSAMINADABA system, output and processing facilities
tents can in no way be affected by any of the available func-tions.3. GEOLOGICAL CONCEPTS
A SAMINDABA mineral deposit is defined as an identifiednatural concentration of minerals which is geologically,geographically or otherwise distinguishable from neighboringconcentrations. The data describing the deposit have beendivided into seven logical groupings as follows :
1. Deposit Identification or Header Data2. Host Rock Data
35
3. Orebody Data4. Exploration Data5. Exploitation Data6. Resource Data7. Data References
3.1 Deposit Identification or Header Data
The data in this group, as shown in figure 2, primarilyidentifies and locates a mineral deposit and is subdivided intothree groups of elements:
1. Deposit IdentificationThe deposit identification data grouping consists ofthe deposit name, synonum names, commodities, deposittype and the deposit status.
2. Deposit LocalityThe locality data elements precisely locate a depositboth geographically and within certain boundaries.
—| DEPOSIT SYNONYM NAME
DEPOSIT NAME
COMMODITY
—| DEPOSIT TYPE
DEPOSIT STATUS
^ DEPOSITDEPOSIT ID >
NTIF1CATION j
———————— 1 1M NAME \P
———————————— Ill
—— DEPOSIT LOCAL ITY 1
— | REFERENCE POINT DESCRIPTION
0 LATITUDELONGITUDE, LO X-CO-ÔRDINÂTË10 Y-CO-OROINATE
! CENTRAL MERIDIAN
— ELEVATION
i ————— H A T ALJ rA 1 r\
— | COMPILER NAME
—-^COMPILER 1NSTITU1
— | DATE COMPILED
— | COMMENTS
1 — DATA CONFIDENTS
— f LOCALITY UMCERTAINTY
— FARM NAMEFARM NOREGISTRATION DISTRICT
— MAGISTERIAL DISTRICT
— | PROVINCE
FIG. 2. SAMINDABA deposit identification data elements
36
(e.g. farm, province etc.). Geographically a depositlocality represents a point located approximately atthe centre of the deposit within a farm boundary atsurface elevation and defined by a longitude, latitudeand elevation.
3. Data OriginThis group of data elements, which is also includedwith all the other six data groupings describes whothe contributor of the data was and what the confiden-tiality status of the data is.
3.2 Host Rock Data
The host rock data grouping describes the rock or rockssurrounding the mineral deposit and contains the data elementsas shown in figure 3. These data elements describe the rocktype, geochronology, lithostratigraphy, structure, alteration,laboratory investigations and the economic status of the hostrock or rocks.
ROCK DESCRIPTIONDATA
ROCK CLASS
—\ PETROGRAPHIC CATEGORY
—| PETROGRAPHIC NAME
.—'—| MINERAL___________
—| MORPHOLOGY
—j RELATION TO ORE-800Y
HOST ROCKDEPOSIT IDHOST ROCK ID
HOST ROCKGEOCHRONOLOGY
EVENT DATEDDATING METHODAGE CONFIDENCE
CHRONOSTRATIGRAPHIC UNIT
ALTERATION TYPE
MINERALS
HOST ROCKLITHOSTRATIGRAPHY
FORMATIONMEMBERBED
SUPERGROUP/SEO.UENCE/COMPLEX NAME
GROUP/SUITENAME
SUBGROUP/SUBSUITENAME
INFORMAL UNIT RANK
INFORMAL UNIT NAME
RELATION TO OREBODY
ATTITUDEAND STRUCTURE
HOST ROCK STRIKEHOST ROCK DIP/PLUNGEDIP/PLUNGE DIRECTION
-| STRUCTURAL MODIFIER
MODIFIER STRIKEMODIFIER DIPMODIFIER DIP DIRECTION
PRESENT ECONOMIC STATUS
ECONOMIC POTENTIAL
J
FIG. 3. SAMINDABA host rock data elements
37
3.3 Orebody Data
The orebody data group which can be subdivided as shown infigure 4 contains the following data: descriptive, form andstructure, mineralogy, geochemistry and physical properties.
- OREBODY DESCRIPTION
SURFACE EXPRESSIONEXPOSURE TYPE
GENETIC CLASSGENETIC CATEGORYGENETIC TYPE
OREßODY GEOCHRONOLOGYEVENT DATEDDATING METHODAGEAGE CONFIDENCECHRONOSTRATIGRAPHIC UNIT
OREBODY HOST ROCK RELATION
jl=
1
noconnv— • UKtDUUY —— nOREBODY NAME .H
f^^^Jin^^ws ———— - —————————— ̂
OREBODY FORM !AND
STRUCTURE j
MORPHOLOGY CLASSMORPHOLOGY CATEGORYMORPHOLOGY TYPE
LENGTH RADIUSWIDTHMAXIMUM THICKNESSMINIMUM THICKNESSAVERAGE THICKNESS
OREBODY STRIKEOREBODY DIP/PLUNGEDIP/PLUNGE DIRECTION
STRUCTURAL MODIFIERMODIFIER STRIKEMODIFIER DIP/PLUNGEDIP /PLUNGE DIRECTIONINFLUENCE ON ORE
OVERBURDEN LITHOLOGYOVERBURDEN MAX THICKNESSOVERBURDEN MIN THICKNESSOVERBURDEN AVERAGE THICKNESS
MINERALOGY, GEOCHEMISTRY \AND ''••
PHYSICAL PROPERTIES !
MINERAL NAMEPARAGENIC ORDER
r1 ——————————————————————————————COMMODITYCONCENTRATIONGRADE
Jl1I —————————————————————————————————————————————————————————————————————— |
1 - - - - - - - •--- -ELEMENT/COMPOUNDCONCENTRATIONGRADE
PHYSICAL PROPERTYPHYSICAL PROPERTY VALUEPHYSICAL PROPERTY UNIT
i-j ORE TEXTURE
-\ PETROGRAPHIC DESCRIPTION |
-| LABORATORY METHOD
^ SAMPLE REPRESENTATIVENESS
JlJlJl
FIG. 4. SAMINDABA orebody data elements
3.4 Exploration and Exploitation
Exploration contains data elements describing when thedeposit was discovered, who discovered it and by what method,who has explored the deposit, what exploration techniques wereused and how well the deposit has been explored (fig. 5).
Exploitation data describes whether or not a deposit has beenmined, what type of mining activity took place, what commoditieswere exploited, their grades, beneficiation methods, cumulativeproduction and who owns the deposit at present (fig. 5).
38
EXPLORATIONAND
EXPLOITATIONDEPOSIT ID
EXPLORATIONHISTORY
r
DISCOVERERDISCOVERY METHODDISCOVERY DATE
LEVEL OF EXPLORATION
EXPLORED BYEXPLORATION START DATEEXPLORATION END DATEEXPLORATION METHOD
DRILLING METHODNUMBER OF BOREHOLES
EXPLOITATIONHISTORY
MINING STATUS
MINE TYPE
COMMODITY EXPLOITEDCOMMODITY STATUSCOMMODITY SUBSTANCE CONCENTRCONCENTRATION GRADEYEAR OF FIRST PRODUCTIONYEAR OF LAST PRODUCTION
CUMULATIVE PRODUCTIONPRODUCTION UNIT
BENEFICIATION METHODEXTRACTION METHODEND PRODUCT
OWNEROWNER HOLDING COMPANY
FIG. 5. SAMINDABA exploration and exploitation data elements
3.5 Resources and Reserves
The deposit resource and reserve data has been divided asfollows (fig. 6} :
1. Demonstrated Economic Reserves2. Demonstrated Marginal Reserves3. Demonstrated Subeconomic Resources4. Inferred Resources
The data describing these reserves and resources includequantities, commodities, grades, cut-off values, physicalproperty cut-off values and deposit dimensions and are repeatedfor each of the above sub-groups. In addition, a size classifi-cation and the area of influence are stored.
39
RESOURCES ANDRESERVES
DEPOSIT IDRESOURCE AREA NAMEMINE NAMESIZE CLASS
/ ;> ) J
—
DEMONSTRATED 1ECONOMIC RESERVES fy
QUANTITYQUANTITY UNIT
COMMODITYCOMMODITY SUBSTANCE CON .CONCENTRATION GRADE
CHEMICAL/MINERALÛGICALCUT-OFF PARAMETERCUT-OFF VALUECUT-OFF GRADE
PHYSICAL PROPERTY CUT-OFFPARAMETERCUT-OFF VALUEMEASUREMENT UNIT
DEPTH RANGE SMALLESTVALUEDEPTH RANGE HIGHESTVALUESTRIKE LENGTHBULK DENSITY
DEMONSTRATEDSUBECONOMIC RESOURCES
-
•— -3»- DITTO
1 DEMONSTRATED 1 INFERf MARGINAL RESERVES 9 RESOU
RED IRCES 1
DITTO DITTO
j ' , , . ,
DATAf REFERENCES
RFPOSIT inD A T A SECTION
— I DATA SOURCE TYPE
1 — 1 DATA SOURCE DESCRIPTION
.
— it— |l"
FIG. 6. SAMINDABA resource and reserve data elements
3.6 Data References
This data group contains all the references used to compilethe information on the deposit. Included in this group, as shownin figure 6, is the type of reference and the data group towhich the reference applies.
4. SAMINDABA MODELLING
A data model for SAMINDABA was developed employing relationaldata modelling, during which the geologically defineddeposit was analyzed, the data reduced to simple basic elementsand then grouped into logical relations with keys to each of thedefined data elements. The resulting data model is schematicallypresented in figure 7.
40
RESOURCE AREA ENTITY LINK
RESOURCEAREA DEPOSIT DEPOSIT SYNONYM
DEPOSIT DOCUMENT
DEPOSITCOMMODITY
DATA REFERENCES
DEPOSIT MINING
DEPOSIT HEADER
HOST ROCK
HOST ROCKLITHOSTRAT.
HOST ROCKATTITUDE
HOST ROCKMINERALOGY
HOST ROCKECON. STATUS
HOST ROCK LAB.
HOST ROCKALTERATION
HOST ROCKMODIFIER
OREBODY
OREBODYEXPOSURE
OREBODYDIMENSION
OREBODYATTITUDE
OREBODYMODIFIER
OREBODYHOST-REL.
OREBODYTEXTURE
DEPOSITRESOURCES
DEPOSITRESOURCE DIM.
DEPOSIT DRILLING
OREBODYBOREHOLEQUANTITY
OREBODYEXPLORATION
OREBODYCOMMODITY GRADE
OREBODYCOMMODITY
GEOCHEMISTRY
OREBODYCOMMODITYPROPERTY
OREBODY LAB.
OREBODYMINERALOGY
OREBODYPETROGRAPHY
FIG. 7. Schematic outline of the SAMINDABA data model
41
Conceptually SAMINDABA contains only deposit area records,however, in practice this posed a large problem as much of thereserve and resource data is not calculated for a deposit areabut rather for a mine lease area, as in the case of workingmines, or for resource areas, in the case of resource studies.The solution to the problem was to introduce two additionalrecords, a mine area record and a resource area record.
The resource area record, represented by a resource areareference point located at the approximate centre of a geologi-cally defined resource area and by an x, y and z co-ordinate,can incorporate one or more mine and/or farm areas and containsthe following data groupings :
1. Resource Area Header2. Exploitation3. Resources4. Data References
Similarly, a mine area record represented by a mine areareference point located at a convenient locality within a minearea, e.g. the main shaft, is defined by an x, y and z co-ordin-ate. It may incorporate one or more farms and carries a link toa resource area containing the following information :
1. Mine Area Header2. Exploitation3. Resources4. Data References
This concept of three interlinked record types is demon-strated in figure 8.
5. SAMINDABA FUNCTIONS
SAMINDABA was developed in a mainframe computer environmentusing a commercially available database management system and afourth generation language. The primary SAMINDABA functionsinclude the following :
1. Database maintenance2. Data capture and validation3. Database enquiries and output
42
RESOURCE AREAHEADER
MINE AREAHEADER
FIG. 8. Conceptual model showing the relationship betweenSAMINDABA resource areas, mine areas and deposits
5.1 Maintenance
All maintenance on the database which includes the registra-tion, insertion, updating and deletion of data records, synonymtables, validation tables, user profiles, programs, menus andhelp text is the responsibility of the database administratorwho executes these functions in an on-line, menu assistedenvironment. Also included is a. facility to provide managementinformation on the database usage and security violations. Onlythe database administrator has access to this section of SAMIN-DABA.
5.2 Data Capture and Validation
Due to the low volumes of data preparation, all capturing ofSAMINDABA data is menu assisted and executed in an on-lineenvironment. The data capturing process may be sub-divided into
43
the following groups :
1. Data records2. Validation tables3. Synonym tables4. Help text
The data records are the primary source of data for SAMIN-DABA. Data is recorded by geologists, partially in alpha-numericcode, on a comprehensive prescribed form. In order to assureconsistency and standardization this is done with the help of adetailed coding manual7.
Validation of SAMINDABA data is automatic, taking place atthe time of capture and includes the following :
1. Data integrityThe data must not violate database logic. Checks aredone to ensure that key fields are entered and relation-ships maintained, e.g. the element concentration unitcannot exist without the element name and concentra-tion value.
2. Data duplicationDuplication checks are done especially on multipledata elements.
3. Data formatData formats, i.e. numeric, alphabetic and alphanumer-ic are checked so that, for example, numeric valuesare not entered in an alphabetic field.
4. Data sizingThe data elements are checked that they do not exceeda certain length and that the number of characters iscorrect.
5. Data checks against validation tablesDue to the nature of the data much of the validationis done by checking the entries against tables ofallowed terms. SAMINDABA has 45 validation tables.
5.3 Database Enquiries and Output
All enquiries and output of SAMINDABA data are done throughan enquiry system called SAMENQ (Samindaba Enquiries). Thisenquiry system ensures that users of the database cannot corrupt
44
data or access data that is not available to them according totheir security classification.
SAMENQ is a menu driven on-line system which allows anyregistered user, no matter what his level of computer experienceor knowledge of the database structure is, to make enquiriesagainst SAMTNDABA.
The SAMENQ enquiry path is depicted in figure 9 showing twobasic types of enquiries. The first is a simple enquiry wherethe user specifies area and/or commodity criteria followed bywhat output is required. The second type of enquiry is morecomplex. The user again specifies area and/or commodity criteriaand then refines the search by selecting more specific criteriaregarding other data elements on the database, followed by anoutput specification. It is envisaged that these two types ofsearches will satisfy 90 per cent of all enquiries made againstthe database.
AREA/COMMODITYSECTION
SIMPLE ENOLUIRY COMPLEX ENttUIRY
OUTPUT SECIFICAT10N;
FIG. 9. Diagrammatic representation of SAMENQ enquiry paths
45
Output of SAMINDABA data resulting from a SAMENQ enquiry maybe routed to a screen, printer or interface file in the follow-ing formats :
1. Deposit profiles (partial or complete)2. Mineral map interface files3. Metallogenic map interface files4. SAS interface files
The output in the metallogenic format may be edited beforebeing sent to an interface file while specific data elements maybe selected for output to a SAS interface files.
5.4 Database Security
Due to the nature and sensitivity of the data stored onSAMINDABA one of the most important aspects of the databasedesign was the security aspect. SAMINDABA security is applied atthree levels :
1. The system level2. The application level3. The data level
5.4.1 System security
System level security refers to the standard security optionsavailable with the database management system (DBMS) and are ineffect regardless of what applications are being used. A user isassigned a password-protected identification code (ID) which islinked to a profile specifying the files and applications to beused and the operating system and programming commands that canbe executed. Thus, at this level it is possible to restrict auser from accessing applications and programming.5.4.2 Application Security
Security at this level refers to controls programmed into theSAMINDABA system and are only effective during a SAMINDABAsession. The controls operate in a similar manner to the systemsecurity in that a user is assigned a password protected IDlinked to a profile. This profile is stored on a SAMINDABAdatabase file and contains the following information :
1. The SAMINDABA user ID2. The user password
46
3. Entry menu4. SAMINDABA file access5. Data ownership code6. Printer identification code
The user password must be supplied before sign-on to SAMIN-DABA is effected. Once the user is signed on his user profile,linked to the ID, determines which files may be accessed and atwhat level the menu hierarchy will be entered. For example, auser who just has authority to search the database will only beallowed to enter the system via the enquiry menu where noupdates, deletions or any programming can be done. The user willnot be aware that these functions even exist.5.4.4 Data Security
At the data level it is possible to indicate theconfidentiality status and ownership of the data. For practicalconsiderations data elements are grouped together and eachgrouping is given a security status and data ownership code. Atpresent there are three basic confidentiality options.
1. The entire document is not confidential2. Certain of the specified data element groupings within
the document are confidential3. The entire document is confidential
If the document is not confidential any user may access thedata. If only certain of the data element groupings within adocument are confidential the confidentiality indicator for thatelement grouping is set and the data origin code is entered. Inthis case only users with the same data element group may accessthe data. All other users will not be aware that the dataexists.
In the case where an entire document is confidential, theconfidentiality indicator on the deposit header record is setand the data ownership code entered. All data on this depositcan then only be accessed by users with the same ownership codeon their profiles as that on the record.
6. SAMINDABA DATA PROCESSING
As stated above, the processing of data extracted fromSAMINDABA is completely separated from the data input, valida-tion, maintenance and enquiry functions.
47
All data extracted from SAMINDABA will have been routed viaSAMENQ where all the necessary security controls have beenapplied. Once a user has extracted the data he may do what hewishes with that data, which cannot be put back onto the data-base.
At present provision has been made to process the data usingSAS, IGGS and DIGIMAP, three commercially available softwarepackages which are briefly described below.
SAS is a data analysis system containing statistical routinesas well as providing data management, querying, reporting,graphics and modelling facilities all using an english-likelanguage.
IGGS is an interactive, geo-f acuities graphics supportprogram written in fortran IV and used to develop applicationsfor data entry, editing, updating and displaying of geograhi-cally oriented data.
DIGIMAP, an application of IGGS, is an interactive mappingapplication. It provides rapid response to user requests for mapdisplays. Several mathematical techniques are available asinterpretation aids, e.g. gridding, contouring, kriging.
6.1 SAMINDABA/SAS
Once the data has been extracted from SAMINDABA a user mayprocess it using SAMSAS, a specially developed system providingthe capability to process SAMINDABA data using SAS. An interfaceprogram has been written which reads the SAMINDABA output fileand converts it into a SAS dataset. Three basic methods of SASprocessing are available:
1. Batch SAS processing2. Interactive SAS processing3. SAMSAS
Batch processing is used to processing very large data setsor run very large programs which do not require interactiveintervention and take a long time to run on the computer, e.g.the interface conversion program.
48
Interactive SAS processing is used for program development,processing small data sets and running programs that requireuser intervention, e.g. creating reports, graphs etc. Thissystem assumes that the user is familiar with the SAMINDABA dataprinciples and the SAS language.
SAMSAS is a menu driven system which was written specificallyfor the inexperienced SAS users. This system allows a user toedit, browse, manipulate, statistically analyze, report andgraphically process the data extracted from SAMINDABA.
6.2 SAMINDABA/IGGS/DIGIMAP
The second main type of processing currently available is amapping function. This facility uses IGGS and DIGIMAP and givesthe user the capability to produce maps. These may be simplemineral maps showing the locality of mineral deposits. Such mapsmay be produced on a commodity (or group of commodities) basisof selected areas, on any scale of 1:250 000 or larger. Therelevant data on the mineral deposits portrayed is available inaccompanying sets of complete or partial deposit profiles.
Provision will be made for the creation of more detailedmetallogenic maps or complicated geological maps showing con-tacts, deposit boundaries etc. Using the same package the usercan also create contour maps and do certain geostatistics, likegridding, trend analysis and kriging, which will permit detailedmetallogenic analysis. Due to the nature of this type of proces-sing it is a prerequisite that the user of this package has somecomputer and technical knowledge.
7. CONCLUSIONS
SAMINDABA is an advanced computerized storage system forSouth African mineral deposits data, covering a wide range ofgeographical, geological, historical and resource aspects on alltypes of deposits, e.g. precious metals and stones, ferrous andbase metals, industrial minerals, as well as nuclear fuels. Theprimary objectives of SAMINDABA are to provide South Africanmineral deposit information in an efficient and timely manner inan attempt to assist in the exploration programmes, and by usingthis information and the processing tools available, to contri-
49
bute to the further understanding of the processes taking placeduring the formation of mineral deposits.
The database is being implemented by various current projectsof the Geological Survey/ in particular metallogenic mapping,the compilation of mineral maps, and various commodity studies.Of special interest in the context of this publication is thaturanium and attendant molybdenum deposits in the Karoo Sequenceare receiving attention at present, while a start is also beingmade with the Witwatersrand gold and uranium deposits.
ACKNOWLEDGEMENTS
The authors are indebted to Mr M.D. du Plessis who played aleading part in the development of SAMINDABA. This paper is inpart based on his earlier work. The Chief Director of theGeological Survey of South Africa kindly permitted publicationof this article.
REFERENCES
[1] Picklyk, D.D., Rose, D.G., Laramee, R.M., Canadian MineralOccurrence Index (CANMINDEX), Geol. Surv. Canada Paper No.78-8, (1978), 27.
[2] Calkins, J.A., Kays, O. , Keefer, E.K., CRIB-The mineralresources data bank of the U.S. Geological Survey, U.S.Geol. Surv. Circ 681, (1972), 39.
[3] Mundry, E. , Ein Dokurnentations-und Abfragebprogramm fürSchichtenverzeichnisse (DASCH), Geol. Jb., Bd. A7,(1973), 9.
[4] Jeffery, K.G., Gill, E.M., G-EXEC: a generalized FORTRANsystem for data handling in Computer-based systems forGeological field data, Geol. Surv. Canada Paper No. 74-63,(1975), 3.
[5] Mine, S.S., Vermaak, G. , SAMINDABA Functional Specifica-tions, Geol. Surv. S. Afr. Rep. (unpublished), (1986).
[ 6] Venter, Ï.M., Vermaak, G., Hine, S.S., SAMINDABA TechnicalSpecification, Geol. Surv. S. Afr. Rep. 1986-0114 (1986).
[7] Barnardo, D.J., Bredell, J.H., Vorster, C.J., SAMINDABACoding Manual, Geol. Surv. S. Afr. (unpublished) (1986),105.
50
GEOSIS - A PILOT STUDY OF A GEOSCIENCESPATIAL INFORMATION SYSTEM
A.L. C1JRRIEOntario Geological Survey,Ministry of Northern Development and Mines,Toronto, Ontario, Canada
Abstract
GEOSIS is a database system for geoscience andexploration data designed to cover Ontario. Apilot study is currently in progress to test allaspects of GEOSIS (eg data input methods for costand throughput, data structures, response ratesand user interface). The pilot study area (80kmby 20km) is in north-western Ontario just north ofLake St Joseph, and is part of the Uchi belt.
The major data sets within the database arePrecambrian geology and economic geology (ieassessment files, mineral occurrences data etc)with subsidiary data sets such as remote-sensingdata. GEOSIS is designed so that users can accessthe database using a telecommunicatingmicrocomputer without any decrease infunctionality compared to a directly connectedgraphics workstation.
Spatial information systems are based on theconcept of spatial relationships (ie map data) andattributes of objects on maps. Data structuresused in GEOSIS must be able to successfullyintegrate several fundamentally different kinds ofdata: 1. structured map graphic data (geosciencemaps), 2. passive raster graphics (sketch mapsfrom field notes and property descriptions), 3.raster data from remote sensing or imageprocessing systems, 4. structured alphanumericdata (geological structural data) and 5.unstructured text (geologists' reports, propertydescriptions etc).
The goal of the GEOSIS project is to provideusers, in their place of work (office or field),with access to the geoscience data collected ormanaged by the Ministry of Northern Developmentand Mines.
1. INTRODUCTIONGEOSIS is a geoscience spatial information
system that is designed to process and retreivegeoscience and mineral exploration data that wouldeventually have Ontario-wide coverage. Spatialinformation systems incorporate in an integrated
51
way the features of both map and alphanumeric database systems. The non-digital equivalent of thesesystems is a map with the associated data (tabularor text) that are linked to the map by location.Spatial information systems try to mimic thefunctionality of a paper map and associated dataas used by an experienced user.
By the end of of 1986 a pilot study, designedto test the various aspects of GEOSIS, will havebeen completed. The major topics to be exploredare as follows:
1. data input methods for cost andthroughput,
2. data structures so that user queries areable to be successfully answered,
3. response rates of system to users queriesand the workstation costs in relation to responsetimes and
4. design of user interface so that userscan successfully use the system with minimumtraining.
The pilot study area (80km by 20km) is innorthwestern Ontario just north of Lake St.Joseph, and is part of the Uchi volcanic belt.This area was selected because it had beenrecently mapped and the field notes were indigital form (Fig. 1). The area is also an areaof active mineral exploration. The explorationhistory of the study area is not extensive butsufficent to test data input methods.
Spatial information systems by their verynature must be able to integrate data of differenttypes from a wide