Databases October 2007
Geography 376: B. Klinkenberg 1
Attribute data handling in a GIS environment.
Chapter 10 Creating and maintaining geographic databases
B. Klinkenberg Geog 376 07
OutlineLinking data to placeDefinitionsCharacteristics of DBMSTypes of databaseRelational modelSQLDatabase design
Linking data to placeHaving defined how the geography can be modeled within a GIS, we now need to consider how the characteristics (or attributes) of the geographic features are associated with that geography.
GIS Data
SpatialData
AttributeLinkages
AttributeData
Databases October 2007
Geography 376: B. Klinkenberg 2
Storing attribute data
attribute data are stored separately from the coordinate datafeature identifier points to an attribute table:
point attribute tableline or arc attribute tablepolygon attribute table
1 2
3
polygon attribute table
similarly we can define point or line attribute tables if the spatial features are, for example, villages androads
Storing attribute data
A45233
A23482
A23431
Feature idpolygon id
Storing attribute datagood organization of the attribute data is very importantin socioeconomic GIS applications, the attribute data component is often much larger than the database component (e.g., few provinces, but hundreds of variables)
Id District Province101 Palma Merida102 S. Maria Merida103 Veralo Merida104 Bolo La Paz105 Jose La Paz106 Malabo La Paz107 Chilabo La Paz… … …
101102
103
P_Pop P_TFR Province397881 3.7 La Paz214084 3.2 Merida
… … …104
105107
106
Databases October 2007
Geography 376: B. Klinkenberg 3
OutlineLinking data to placeDefinitionsCharacteristics of DBMSTypes of databaseRelational modelSQLDatabase design
DefinitionsDatabase – an integrated set of data on a particular subjectGeographic (=spatial) database -database containing geographic data of a particular subject for a particular areaDatabase Management System (DBMS)– software to create, maintain and access databases
Storing dataThere are two fundamental ways to store data:
As a simple file (e.g., a text file)In a ‘database’
Simple file structures
Databases October 2007
Geography 376: B. Klinkenberg 4
Moving from files to databases
http://cs1.mcm.edu/~rob/class/dbms/notes/Chapt01/index.html
Advantages of Databases over Files
Avoids redundancy and duplicationReduces data maintenance costsApplications are separated from the data
Applications persist over timeSupport multiple concurrent applications
Better data sharingSecurity and standards can be defined and enforced
Disadvantages of Databases over Files
ExpenseComplexityPerformance – especially complex data typesIntegration with other systems can be difficult
Characteristics of DBMS (1)Data model support for multiple data types
MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, BLOBs, OLE Object, Hyperlink, Lookup Wizard (other DBMS’s are similar)
Load data from files, databases and other applicationsIndex for rapid retrieval
Databases October 2007
Geography 376: B. Klinkenberg 5
Characteristics of DBMS (2)Query language – SQL (also QBE, …)Security – controlled access to data
Multi-level groupsControlled update using a transaction managerBackup and recoveryDBA tools
Configuration, tuning
Characteristics of DBMS (3)Applications
CASE tools (Computer-aided software engineering)
Forms builderReportwriterInternet Application Server
Programmable API (Application Programming Interface)
OutlineLinking data to placeDefinitionsCharacteristics of DBMSTypes of databasesRelational modelSQL
Geographic Information
System
Database Management
System
• Data load• Editing• Visualization• Mapping• Analysis
• Storage• Indexing• Security• Query
Data
System TaskRole of DBMS
Databases October 2007
Geography 376: B. Klinkenberg 6
Types of DBMS ModelsHierarchicalNetworkRelational - RDBMSObject-oriented - OODBMSObject-relational - ORDBMS
Hierarchical DBMS
Hierarchical and Network
Hierarchical
Network
Node X YI 1 4II 4 4III 6 4IV 4 1
Line From To Left Right1 I III O A2 I IV B O3 III IV O C4 I II A B5 II III A C6 II IV C B
Poly LinesA 1,4,5B 2,4,6C 3,5,6
1
1
5
4
3
2
6
2 3 4 5 6
A
B C
1
2 3
4 5
6
III
III
IV
O = “outside” polygon
Relational Tables: Topological data model
Keys
Foreign keys
Databases October 2007
Geography 376: B. Klinkenberg 7
Object-oriented DBMS
Inheritance, encapsulation
OverviewNetwork
essentially a programmer's database model efficient but inflexible and hard to understand
Relational its only complex data type is the relationit is the only complete data model aimed at users instead of programmers relational query languages are easier to use than full-blown programming languages rich underlying theory separation of implementation and design
Object-Oriented an extension of object-oriented programming no generally agreed upon formal data model great freedom regarding complex data structures inheritance user-defined types encapsulation
OutlineLinking data to placeDefinitionsCharacteristics of DBMSTypes of databasesRelational modelSQLDatabase design
Relational DBMS (1)Data stored as tuples (tup-el), conceptualized as tablesTable – data about a class of objects
Two-dimensional list (array)Rows = objectsColumns = object states (properties, attributes)
Databases October 2007
Geography 376: B. Klinkenberg 8
Table
Row = object
Column = property Table = Object Class
Row = record =tuple
[# rows = cardinality]
Column = field = attribute[# of columns = degree]
Table = file = relation
Text Table 4.1
FID = Primary Key= Index
Relational DBMS (2)Most popular type of DBMS
Over 95% of data in a DBMS is in a RDBMS
Commercial systemsIBM DB2InformixMicrosoft AccessMicrosoft SQL ServerOracleSybase
Relation Rules (Codd, 1970)Only one value in each cell (intersection of row and column)All values in a column are about the same subjectEach row is uniqueNo significance in column sequenceNo significance in row sequence
Databases October 2007
Geography 376: B. Klinkenberg 9
NormalizationProcess of converting tables to conform to Codd’s relational rulesSplit tables into new tables that can be joined at query time
The relational joinSeveral levels of normalization
Forms: 1NF, 2NF, 3NF, etc.Normalization creates many expensive joinsDe-normalization is OK for performance optimization
Relational JoinFundamental query operationOccurs because
NormalizationData created/maintained by different users, but integration needed for queries
Table joins use common keys (column values -- foreign keys)Table (attribute) join concept has been extended to geographic case
OutlineLinking data to placeDefinitionsCharacteristics of DBMSTypes of databaseRelational modelSQLDatabase design
SQLStructured (or Standard) Query Language –(pronounced SEQUEL)Developed by IBM in 1970sNow de facto and de jure standard for accessing relational databasesThree types of usage
Stand alone queriesHigh level programmingEmbedded in other applications
Databases October 2007
Geography 376: B. Klinkenberg 10
Types of SQL StatementsData Definition Language (DDL)
Create, alter and delete dataCREATE TABLE, CREATE INDEX
Data Manipulation Language (DML)Retrieve and manipulate dataSELECT, UPDATE, DELETE, INSERT
Data Control Languages (DCL)Control security of dataGRANT, CREATE USER, DROP USER
OutlineLinking data to placeDefinitionsCharacteristics of DBMSTypes of databaseRelational modelSQLDatabase design
Steps involved in database creationData investigation: consider the type, quantity and qualities of data to be included in the database; the nature of the entities and attributes is decided (inventory of data, needs analysis).
Data modeling: form a conceptual model of data by examining the relationships between entities and the characteristics of entities and attributes (logical design--infological model).
Steps involved in database creationDatabase design: creation of a practical design for the database. This step depends upon and is constrained by the software being used. Field names, specific attribute types and structures (e.g., tables) are decided (physical design--datalogicalmodel).
Database implementation: populating the database with attribute data. This is followed by monitoring and upkeep, fine tuning, modification and updating.
Databases October 2007
Geography 376: B. Klinkenberg 11
Database design perspectivesInfological problems deal with how to define the information to be provided by the system to satisfy the needs of its users. Datalogical problems are about how to design the structure and operation of the system and to take full advantage of current information technology available. Essentially, infological work refers to system analysis and conceptual modeling, and datalogical work to technical design and physical implementation of the system.
ERMThe identification of entitiesThe identification of relations between entitiesThe identification of attributes of entities (Infological steps)The derivation of tables from this (datalogicalsteps)
Entity Relationship Modelling
ERM ERM RelationsMapping an ER Model intoa table.
Example of 1:1 relations
Example of a 1:M relation
Example of a M:N relation
Databases October 2007
Geography 376: B. Klinkenberg 12
Prof. Börje Langefors recognized the importance of three contexts in the infological approach. They are the “organizational context, wherein organized collections of people/individuals are perceived; the language context, wherein organized collections of symbols and linguistic behaviors are perceived; and technical context, wherein organized collections of technical artifacts (computers, telecommunication technologies, software) are perceived” (Iivari & Lyytinen, 1998 ; p. 170).
http://isworld.student.cwru.edu/tiki/tiki-index.php?page=Langefors_Review
Database design perspectives
This quote perfectly describes the situation wrt GIS within an organization, as well.
SummaryDatabase – an integrated set of data on a particular subjectDatabases offer many advantages over filesRelational databases dominateDatabase design issues