Geographic information system and database
management
Dept. of Disaster Science and Management
University of Dhaka
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
1
L-1
Definition:
Therefore, a geographic information system (GIS) integrates hardware, software, and
data for capturing, managing, analyzing, and displaying all forms of geographically
referenced information.(Sir)
A geographic information system (GIS) is a system designed to capture, store,
manipulate, analyze, manage, and present all types of spatial or geographical data.
Geographic information system, a system for storing and manipulating geographical
information on computer.
A geographic information system (GIS) lets us visualize, question, analyze, and interpret
data to understand relationships, patterns,
and trends.
Components of GIS:
Hardwar
Software
People
Data
Vector and Raster Data:
Vector data model: [data models] A representation of the world using points, lines, and
polygons. Vector models are useful for storing data that has discrete boundaries, such as
country borders, land parcels, and streets.
Raster data model: [data models] A representation of the world as a surface divided into
a regular grid of cells. Raster models are useful for storing data that varies continuously,
as in an aerial photograph, a satellite image, a surface of chemical concentrations, or
an elevation surface.
Concept of Layer:
Layers are the mechanism used to display geographic datasets in ArcMap, ArcGlobe,
and ArcScene. Each layer references a dataset and specifies how that dataset is
portrayed using symbols and text labels. When you add a layer to a map, you specify its
dataset and set its map symbols and labeling properties.
Remote Sensing:
Remote Sensing is the science and art of acquiring information (spectral, spatial,
and temporal) about objects, area, or phenomenon, without coming into physical
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
2
contact with the objects, or area, or phenomenon under investigation. Without direct
contact, some means of transferring information through space must be utilized.
RS Applications:
Change Detection
Land use
Sea Surface Temperature
Rainfall estimation
Yield Monitoring
Drought Monitoring
Urban sprawl Monitoring
GPS:
A GPS receiver calculates its position by precisely timing the signals sent by GPS satellites
high above the Earth. Each satellite continually transmits messages that include.
GPS Applications:
New data capturing
Navigation
Vehicle Tracking System
Geo-referencing etc.
L-2
ArcGIS Suite:
Desktop GIS
Server GIS
Online GIS
Mobile GIS
ESRI Data
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
3
ArcGIS Suite: Functionality:
L-3
RS and GIS Tools for Early Warning and Response
Use of GIS in Early Warning system and Response:
FFWC and BMD use for weather and flood forecast
Storm surge inundation
Flood inundation
Salinity intrusion
Cyclone map
Disaster incidence database (DIDb)
GIS-based building inventory database
Building age and building density
Damage and loss estimation
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
4
GIS in Bangladesh
In Bangladesh, many organizations are providing Geographic Information Services in
various fields like infrastructure planning, agriculture, weather forecasting, change
monitoring, damage assessment etc.
Key Geographic Information Service Providers are -
1. Space Research and Remote Sensing Organization (SPARRSO)
2. Survey of Bangladesh (SoB)
3. Local Government Engineering Department (LGED)
4. Forest Department
5. Bangladesh Meteorological Department (BMD)
6. Flood Forecasting Warning Center (FFWC)
7. Geological Survey of Bangladesh (GSB)
8. Roads and Highways (RHD)
9. Directorate of Land Records and Surveys (DLRS)
10. Soil Resources Development Institute (SRDI)
11. Bangladesh Agricultural Research Council (BARC)
12. Institute of Water Modeling (IWM)
13. Centre for Environmental and Geographic Information Services (CEGIS)
14. Comprehensive Disaster Management Programme, MoDMR
L-4 GIS as an Information System
Information System:
A combination of hardware, software, infrastructure and trained personnel organized to
facilitate planning, control, coordination, and decision making in an organization.
Geographic Information System:
A Geographic Information system (GIS) integrates hardware, software, and data for
capturing, managing, analyzing, and displaying all forms of geographically
referenced information.(Sir)
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
5
Problem in Geospatial Data Management (BD):
No uniform standard
Poorly maintained
Out of date
Inaccurate
No data sharing
No data retrieval service
Benefits of GIS Implementation:
Geospatial data are better maintained in a standard format
Revision and updating are easier
Geospatial data and information are easier to search, analysis and represent.
More value added product
Geospatial data can be shared and exchanged freely.
Productivity of the staff improved and more efficient.
Time and money are saved.
Better decision can be made.
Difference between GIS Manual Works:
MAPS GIS MANUAL WORKS
STORAGE Standardized
&integrated
Different scales on different
standard
RETRIEVAL Digital Database Paper Maps, Census, Tables
UPDATING Search by Computer Manual Check
OVERLAY Very Fast Expensive & Time consuming
SPATIAL ANALYSIS Easy Complicated
DISPLAY Cheap & Fast Expensive
Basic Functions of GIS:
Functions Sub Function
Data Acquisition and
prepossessing
Digitizing, Editing, Topology Building, Projection
Transformation, Format Conversion.
Database Management
and Retrieval
Data Archival, Hierarchical Modeling, Network
Modeling, Relational Modeling, Attribute Query etc.
Spatial Measurement
and Analysis
Buffering, Overlay operations, connectivity
Operations.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
6
Graphic output and
Visualization
Scale Transformation, Generalization, Topography
Map, Statistical Map
Area of GIS Applications:
Area GIS Application
Facilities Management Locating underground pipes & cables, planning
facility maintenance
Environmental and
Natural Resources
Management
Environmental impact analysis, disaster
management and mitigation
Street Network Locating houses and streets, car navigation,
transportation planning, workforce distribution
Planning and
Engineering
Urban planning, regional planning, development of
public facilities
Land Information Taxation, land use zonation, land acquisition
Technologies that Contributes to GIS:
Geography
Cartography
Computer aided design
Surveying(GSP)
Photogrammetry
Statistics
Remote Sensing
Data-base Design
Modelling
GIS Information Infrastructures:
Social Infrastructure: Land Use, Religious Inst. Cadaster etc.
Urban Infrastructure: Fire station, Cable and Pipe Network Transportations
Environmental Infrastructure: Natural Resources Pollution, Climate Change, Disaster.
Economic Infrastructure: Marketing, Banking, Car Navigations
Educational Infrastructure: School Location, Literacy, Enroll/Drop out
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
7
L-4 GIS Data Model
Data Model:
A data model in geographic information systems is a mathematical construct for
representing geographic objects or surfaces as data. For example, the vector data
model represents geography as collections of points, lines, and polygons; the raster
data model represent geography as cell matrices that store numeric values
Types of geometric data model
– Vector Model uses discrete points, lines and areas corresponding to discrete objects
with name or code number of attribute
– Raster Model uses regularly spaced grid cells in specific sequence. An element of grid
cell is called a pixel (picture cell)
Cell value: Each cell has a value
Cell size: Each cell has a width and height and is a portion of the entire area represented
by the raster
Cell location: The location of each cell is defined by its row and column location within
the raster matrix.
Vector Data Model:
Geometry
– The real world features\ objects can be classified as –
Point object such as electric pole,
Line object such as Upazila road
Area object such as lake
Topology
Means the relationships or connectivity between the spatial objects.
Point
Are zero-dimensional objects that contain only a single coordinate pair? Points are
typically used to model singular, discrete features such as buildings, wells etc. Points have
only the property of location. Other types of point features include the node and the
vertex. Specifically, a point is a stand-alone feature, Vertices are defined as each bend
along a line or polygon feature that is not the intersection of lines or polygons
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
8
Node
An intersect of more than two lines or strings or polygons, or start and end point of string
with node number
Chain /Edge/Arc
A line or a string with chain number, start and end node number, left and right
neighbored polygons
Polygon
An area with polygon number, series of chains that form the area in clockwise order
(minus sign is assigned in case of anti-clockwise order).
Chain/Arc/Edge:
Chain ID, Start Node ID, End Node ID, Attributes.
Node:
Node ID, (x, y), adjacent chain IDs (positive for to node, negative for from node).
Chain geometry:
Chain ID, Start Coordinates, Point Coordinates, End Coordinates.
Chain topology:
Chain ID, Start Node ID, End Node ID, Left Polygon ID, Right Polygon ID, (Attributes).
Polygon topology:
Polygon ID, Series of Chain ID, in clockwise order (Attributes).
L-5 Raster Data Model
Raster Model
- Model uses regularly spaced grid cells in specific sequence. An element of grid cell is
called a pixel (picture cell)
Raster Data Model
The JPEG, B M P, and TIFF file formats (among others) are based on the raster data
model
If you zoom deeply into the image, you will notice that it is composed of an array
of tiny square pixels (or picture elements). Each of these uniquely colored pixels, when
viewed as a whole, combines to form a coherent image
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
9
These pixels are used as building blocks for creating points, lines, areas, networks, and
surfaces
Accordingly, the vast majority of available raster GIS data are built on the square pixel.
The raster data model is referred to as a grid-based system. Each cell in a raster carries a
single value, which represents the characteristic of the spatial phenomenon at a location
denoted by its row and column. The data type for that cell value can be either integer
or floating-point.
The raster model will average all values within a given pixel to yield a single value.
Therefore, the more area covered per pixel, the less accurate the associated data
values.
The area covered by each pixel determines the spatial resolution of the raster model from
which it is derived. Specifically, resolution is determined by measuring one side of the
square pixel. A raster model with pixels measuring 1 km by 1 km (1 square kilometer) in
the real world would be said to have a spatial resolution of 1 km
Raster Data Encoding:
a. Cell by cell Encoding
b. Run-Length Encoding
c. Quad tree Encoding
Cell by cell Encoding
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
10
Quad Tree Encoding:
Comparison of Vector and Raster Model Data
Advantage:
Raster model Vector model
It is a simple data structure It provides a more compact data
structure
Overlay operations can be done
easily
Topology building is easy and
hence good for network analysis
Representation of high spatial
variability possible
Smooth graphical representation
Disadvantage
It less compact and requires more
storage capacity
It is relatively complex data structure.
Topological relationships are more
difficult to represent.
Overlay operations are more difficult to
implement.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
11
Blocky appearance does not give
smooth image especially along the
edges.
This is discrete, so difficult to represent
high spatial variability
L-7 Map Projection
Coordinate System:
A coordinate system is a reference system used to represent the locations of
geographic features, imagery, GPS locations etc. within a common geographic
framework.
Each coordinate system is defined by:
Its measurement framework which is either geographic (in which spherical
coordinates are measured from the earth's center) or planmetric.
Unit of measurement (feet or meters for projected coordinate systems or decimal
degrees for latitude–longitude).
The definition of the map projection for projected coordinate systems.
Other parameters such as spheroid of reference, datum, and projection parameters like
one or more standard parallels, central meridian, shifts in the x- and y-directions.
Types of Coordinate Systems:
GCS(Geographic Coordinate System)
A global or spherical coordinate system such as latitude/longitude. A geographic
coordinate system is a coordinate system that enables every location on the Earth to be
specified by a set of numbers or letters
Projected coordinate system:
A projected coordinate system, based on a map
projection, which provides various mechanisms to project
maps of the earth's spherical surface onto a two-
dimensional Cartesian coordinate plane. Projected
coordinate systems are sometimes referred to as map
projections
Geographic Coordinate Systems
A geographic coordinate system (GCS) uses a three-
dimensional spherical surface to define locations on the earth.
A GCS includes an angular unit measure, a prime meridian, and
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
12
a datum (based on a spheroid). A point is referenced by its longitude and latitude values.
Longitude and latitude are angles measured from the earth's center to a point on the
earth's surface. The angles often are measured in degrees (or in grads).
Some Definition:
1. Sphere
2. Spheroid
3. Meridian
4. Grater circle
5. Equator
DATUM
Types of Datum:
Geocentric Datum
Local Datum
PROJECTED COORDINATE SYSTEM
Map Projection
Map projections refer to the methods and procedures that are used to transform the
spherical three-dimensional earth into two-dimensional planar surfaces. Specifically, map
projections are mathematical formulas that are used to translate latitude and longitude
on the surface of the earth to x and y coordinates on a plane.
Since there are an infinite number of ways this translation can be performed, there
are an infinite number of map projections. Generally, the paper is either flat and
placed tangent to the globe (a planar or azimuthal projection) or formed into a cone
or cylinder and placed over the globe (cylindrical and conical projections).
Every map projection distorts distance, area, shape, direction, or some combination
thereof.
Concept of map projection
To illustrate the concept of a map projection, imagine that we place a light bulb in
the center of a globe. When we turn the light bulb on, the outline of the continents
and the graticule will be “projected” as shadows on nearby surface.
This is what is meant by map “projection.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
13
Distortion due to map projection
Distortions are inevitable during projection from 3D to 2D plane
Map projections introduce distortions in shape, area, distance, and direction. A series of
trade-offs will need to be made with respect to such distortions considering purpose of
the map
Characteristic of Coordinate projection system
A projected coordinate system is defined on a flat, two-dimensional surface.
A projected coordinate system has constant lengths, angles, and areas across the
two dimensions.
A projected coordinate system is always based on a geographic coordinate system
that is based on a sphere or spheroid. In a projected coordinate system, locations are
identified by x,y coordinates on a grid, with the origin at the center of the grid. The
two values are called the x-coordinate and y-coordinate the coordinates at the origin
are x = 0 and y = 0. Horizontal lines above the origin and vertical lines to the right of
the origin have positive values; those below or to the left have negative values.
When working with data in a geographic coordinate system, it is sometimes useful
to equate the longitude values with the X axis and the latitude values with the Y
axis
Types of map projection
A map projection uses mathematical formulas to relate spherical coordinates on
the globe to flat, planar coordinates.
Different projections cause different types of distortions. Some projections are designed
to minimize the distortion of one or two of the data's characteristics. A projection
could maintain the area of a feature but alter its shape
Map projections are designed for specific purposes. One map projection might be used
for large-scale data in a limited area, while another is used for a small-scale map
of the world. Map projections designed for small-scale data are usually based on
spherical rather than spheroidal geographic coordinate systems.
Conformal projections (Shape)
Conformal projections preserve local shape. A map projection accomplishes this by
maintaining all angles. In this projections, the meridians and parallels intersect at right
angles The area enclosed by a series of arcs may be greatly distorted in the process.No
map projection can preserve shapes of larger regions.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
14
Equal area projections (Area)
Equal area projections preserve the area of displayed features.
To do this, the other properties—shape, angle, and scale—are distorted.
In this projections, the meridians and parallels may not intersect at right angles.
Equidistant projections (Distance)
Equidistant maps preserve the distances between certain points.
Most Equidistant projections have one or more lines in which the length of the line on a
map is the same length (at map scale) as the same line on the globe, regardless of
whether it is a great or small circle, or straight or curved.
For example, in the Sinusoidal projection, the equator and all parallels are their true
lengths.
In other Equidistant projections, the equator and all meridians are true.
No projection is equidistant to and from all points on a map.
True-direction projections (Direction)
The shortest route between two points on a curved surface such
Some True-direction projections are also conformal, equal area, or equidistant.
Map projection method
Because maps are flat, some of the simplest projections are made onto geometric
shapes that can be flattened without stretching their surfaces. These are called
developable surfaces. Some common examples are cones, cylinders, and planes. A
map projection systematically projects locations from the surface of a spheroid to
representative positions on a flat surface using mathematical algorithms.
In projecting from one surface to another is creating one or more points of contact. Each
contact is called a point (or line) of tangency.
A Planar projection is tangential to the globe at one point. Tangential cones and
cylinders touch the globe along a line. If the projection surface intersects the globe
instead of only touching its surface, the resulting projection is a secant. Whether the
contact is tangent or secant, the contact points or lines are significant because
they define locations of zero distortion.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
15
In general, distortion increases with the distance from the point of contact. Many
common map projections are classified according to the projection surface used: conic,
cylindrical, or planar
Types of Method:
Conic
The simplest Conic projection is tangent to the globe along a line of latitude. This line is
called the standard parallel. The meridians are projected onto the conical surface,
meeting at the apex, or point, of the cone. Parallel lines of latitude are projected onto
the cone as rings.
The cone is then "cut" along any meridian to produce the final conic projection,
which has straight converging lines for meridians and concentric circular arcs for
parallels. The meridian opposite the cut line becomes the central meridian.
In general, the further you get from the standard parallel, the more distortion
increases. Thus, cutting off the top of the cone produces a more accurate
projection. You can accomplish this by not using the polar region of the projected data.
Conic projections are used for multitude zones that have an east–west orientation.
Somewhat more complex Conic projections contact the global surface at two
locations. These projections are called Secant projections and are defined by two
standard parallels. The distortion pattern for Secant projections is different between the
standard parallels than beyond them. Generally, a Secant projection has less overall
distortion than a Tangent projection. More complex Conic projections, the axis of the
cone does not line up with the polar axis of the globe. These types of projections are
called oblique
Cylindrical
Like Conic projections, cylindrical projections can also have tangent or secant cases. The
Mercator projection is one of the most common cylindrical projections, and the equator
is usually its line of tangency. Meridians are geometrically projected onto the cylindrical
surface, and parallels are mathematically projected. This produces graticular angles of
90 degrees. The cylinder is "cut" along any meridian to produce the final cylindrical
projection.
The meridians are equally spaced, while the spacing between parallel lines of
latitude increases toward the poles. This projection is conformal and displays true
direction along straight lines. For more complex Cylindrical projections the cylinder is
rotated, thus changing the tangent or secant lines.
Transverse Cylindrical projections, such as the Transverse Mercator, use a meridian as the
tangential contact or lines parallel to meridians as lines of secancy. The standard
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
16
lines then run north–south, along which the scale is true. Oblique cylinders are rotated
around a great circle line located anywhere between the equator and the meridians. In
these more complex projections, most meridians and lines of latitude are no longer
straight. In all Cylindrical projections, the line of tangency or lines of secancy have
no distortion
Planar
It projects map data onto a flat surface touching the globe. This type of projection is
usually tangent to the globe at one point but may be secant. The point of contact
may be the North Pole, the South Pole, a point on the equator, or any point in between.
This point specifies the aspect and is the focus of the projection. The focus is
identified by a central longitude and a central latitude. Possible aspects are
Polar
Equatorial
Oblique
Polar aspects are the simplest form. Parallels of latitude are concentric circles
centered on the pole, and meridians are straight lines Patterns of area and shape
distortion are circular about the focus. For this reason, Planer projections accommodate
circular regions better than rectangular regions. Planar projections are used most often
to map Polar Regions.
Some Planar projections view surface data from a specific point in space. The point of
view determines how the spherical data is projected onto the flat surface. The
perspective from which all locations are viewed varies between the different
Planar projections. The perspective point may be –
Gnomonic -the center of the earth
Stereographic- a surface point directly opposite from the focus
Orthographic- a point external to the globe, as if seen from a satellite or another planet
Map Projection – UTM
The Universal Transverse Mercator (UTM) conformal projection uses a 2-dimensional
Cartesian coordinate system to give locations on the surface of the Earth. The UTM system
is not a single map projection. The system instead divides the Earth into sixty zones, each
a six-degree band of longitude, and uses a secant transverse Mercator projection
in each zone.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
17
It make a straight north-south cut like in the peel of the orange and repeating this
north-south cut, at equal intervals, until 60 strips (6° each) or zones have been
detached. Each of these zones will then form the basis of a separate map projection.
This flattening action results in a slight distortion of the geographical features within
the zone, but because the zone is relatively narrow, the distortion is small and may
be ignored by most map-users.
The UTM system divides the Earth between 80°S and 84°N latitude into 60 zones, each
6° of longitude in width (as earth circumference is 360°). These zones have been
numbered 1 to 60.
Each zone is segmented into 20 latitude bands. Each latitude band is 8 degrees
high, and is lettered starting from "C" at 80°S, increasing up to "X“ except I & O.
The last latitude band, "X", is extended an extra 4 degrees, so it ends at 84°N
latitude, thus covering the northernmost land on Earth. Latitude bands "A" & "B" and
"Y" & "Z" cover the polar region using UPS (Universal Polar Stereographic)
Coordinate System. North hemisphere starts from “N”.
Exceptions
UTM grid zones are uniform over the
globe, except in two areas.
On the southwest coast of Norway, grid
zone 32V is 9° wide, and grid zone 31V is 3° (correspondingly shrunk) to cover only open
water.
Around the region Svalbard, the four grid zones 31X (9° wide), 33X (12°), 35X (12°), and
37X (9° wide) are extended. The three grid zones 32X, 34X and 36X are not used
Summary
Universal Transverse Mercator. Conformal projection (shapes are preserved), Cylindrical
surface, Two standard meridians, Zones are 6 degrees of longitude wide, UTM is
commonly used and is a good choice when the east-west width of area does not exceed
6 degrees, Scale distortion is 0.9996 along the central meridian of a zone, There is no scale
distortion along the standard meridians, Scale distortion gets to unacceptable levels
beyond the edges of the zones
Scale and Scale Factor
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
18
1:100,000 is a ratio, on a map it is referred to as map scale. The map scale is found by
dividing the distance on the map by the distance in the real world if it is measured 100E
by 40N to 100E by 35N on a map, it is found 2.77 cm. Then measure the real world distance
which is 55,493,612.86 cm.
Then scale is 2.77cm ÷ 55,493,612.86cm = 1:20,000,000. For this particular map 1:20,000,000
is the principle scale, i.e. the displayed scale. But map scale is not static, it is not the
same everywhere on the map
Scale depends on the projection used and how the projection distorts the world. It the
new location from105E,30N to 100E,30N on a map, it is measured with 2.6 cm. The
real distance on the earth was 48,239,311.01 cm. Then scale is 2.6 cm ÷ 48,239,311.01 cm
= 1:18,555,581
This scale was taken from a location that was NOT along the standard parallels or along
the meridians, it is referred to as local scale. To find the scale factor, divide local
scale by principle scale. Using the examples above, the scale factor comes up as
1:18,555,581 ÷ 1:20,000,000 = 1.077959.
Scale and Scale Factor So the local scale has been exaggerated by 107%
Map Transformation Method
Database Management
Database
A database is a collection of related, logically rational data used by the application
programs in an organization.
Database Management System (DBMS)
A database management system (DBMS) defines, creates and maintains a database.
The DBMS also allows controlled access to data in the database.
A DBMS is a combination of five components: hardware, software, data, users and
procedures.
Attribute Data management in GIS
Attribute data are stored in table. A table is organized by row and column. Each row
represents a spatial feature. Each column describes a characteristic/property. The
intersection of a column and row shows the value of a particular characteristic for a
particular feature. A row is also called record or tuple and column is also called
field or item or attribute.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
19
Types of Attribute Table:
1. Feature Attribute Table:
Having access to the spatial data. Every vector data set must have feature
attribute table.
2. Non-Spatial Attribute Table:
It means the table does not have direct access to the geometry of features but
has a field that can link the table to the feature attribute table whenever
necessary.
Database Management
The presence of feature attribute table and non-spatial data tables means that a GIS
requires a database management system (DBMS) to manage these tables.
Database Model
1. Hierarchical
Stores data as hierarchically related to each other. Record shape are tree
structure.
2. Network
In the network model, the entities are organized in a way that some entities
can be accessed through several paths.
3. Relational
A relational database is a collection of tables, also called relations, which
can be connected to each other by keys.
In Relational database, a table is a collection of data elements organized in terms of rows
and columns. A table is also considered as convenient representation of relations. But a
table can have duplicate rows while a true relation cannot have duplicate rows.
Table is the simplest form of data storage. Below is an example of Employee table.
A Primary Key represents one or more attributes whose value can uniquely identify a
record in a table. Its counterpart in another table for the purpose of linkage is called
a Foreign Key. A key common to two tables can establish connections between
corresponding records in the tables.
Record:
A single entry in a table is called a Record or Row. A Record in a table represents set of
related data. For example, the above Student table has 4 records. Following is an
example of single record.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
20
Field:
A table consists of several records(row),
each record can be broken into
several smaller entities known as Fields.
The above Student table consist of four
fields, ID, Name, Roll and Add Heading of
Column
Column:
In Relational table, a column is a set of value of a particular type. The term Attribute is
also used to represent a column. For example, in Student table, Name is a column that
represent names of employee.
Database Key
Keys are very important part of Relational database. They are used to establish and
identify relation between tables.
They also ensure that each record within a table can be uniquely identified by
combination of one or more fields within a table.
Types of Key:
Keys are very important part of Relational database. They are used to establish and
identify relation between tables. They also ensure that each record within a table
can be uniquely identified by combination of one or more fields within a table.
1. Primary Key
A primary key is a candidate key that is most appropriate to be the main reference
key for the table. As its name suggests, it is the primary key of reference for the
table and is used throughout the database to help establish relationships with other
tables. As with any candidate key the primary key must contain unique values,
must never be null and uniquely identify each record in the table.
2. Composite Key
When a primary key is created from a combination of 2 or more columns, the
primary key is called a composite key. Each column may not be unique by itself
within the database table but when combined with the other column(s) in the
composite key, the combination is unique.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
21
3. Foreign Key
A foreign key is generally a primary key from one table that appears as a field in
another where the first table has a relationship to the second. In other words, if we
had a table A with a primary key X that linked to a table B where X was a field in B,
then X would be a foreign key in B.
Relation Types in Relational Database
One to one
One to Many
Many to One
Many to Many
Object-oriented models define a database as a collection of objects with features and
methods. A detailed discussion of object-oriented databases follows in an advanced
module.
Relational Database Design
The design of any database is a lengthy process and involved task that can only
be done following a step-by-step procedure.
The first step normally involves interviewing potential users of the database [Need
Assessment]
The second step is to build an entity-relationship model (ERM)
that defines the entities, the attributes of those entities and
the relationship between those entities.
The E-R (entity-relationship) data model views the real world
as a set of basic objects (entities) and relationships among these objects.
Entity-Relationship Model (ERM)
The database designer creates an entity-relationship (E-R) diagram to show the entities
for which information needs to be stored and the relationship between those
entities. E-R diagrams uses several geometric shapes.
Rectangles - represent entity sets
Ellipses - represent attributes
Diamonds - represent relationship sets
Lines - link attributes to entity sets and link entity sets to
relationships sets.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
22
Database Normalization
Normalization means of breaking data into its related groups and defining the
relationships between those groups. Database Normalization is a technique of organizing
the data in the database. Normalization is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Ana molies. It is a multi-step process that puts data into tabular form
by removing duplicated data from the relation tables.
Normalization is the process of efficiently organizing data in a database. There are several
goals of the normalization process:
To avoid redundant data in tables that waste space in the database and may
cause data integrity problems
To ensure that attribute data in separate tables can be maintained and
updated separately and can be linked whenever necessary
To facilitate a distributed data base
They are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.
There are different steps or forms of normalization, normally denoted as 1NF (First
Normal Form, 2NF (Second Normal Form, 3NF (Third Normal Form)
Normalization can also be thought of as a trade-off between data redundancy
and performance.
Normalizing a relation reduces data redundancy but introduces the need for
joins when all of the data is required by an application such as a report query.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
24
Advantages
The relational database is simple and flexible
Each table can be prepared, maintained and edited separately
The tables can remain separate until query and analysis requires that is attribute
data from different tables establish a temporary link generally
As link are temporary that makes data management and data processing easy
Object Base
Object-oriented models define a database as a collection of objects with features and
methods. A detailed discussion of object-oriented databases follows in an advanced
module.
Operation on Database
In a relational database we can define several operations to create new relations
based on existing ones.
Some operations are
Insert
Delete
Update
Select
Structured Query Language
Structured Query Language (SQL) is the language standardized by the American
National Standards Institute (ANSI) and the International Organization for Standardization
(ISO) for use on relational databases. It is a declarative rather than procedural language,
which means that users declare what they want without having to write a step-by-step
procedure. The SQL language was first implemented by the Oracle Corporation in 1979,
with various versions of SQL being released since then.
Insert
The operation inserts a new row into the relation. The insert operation uses the following
format.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
25
“INSERT TO[RELAION_NAME] VALUES[(ss, ss, dd)]”
Delete operation
The operation delete row(s) based on criteria from the relation. The Delete operation uses
the following format:
“DELETE FROM [RELATION_NAME] WHERE [CRITERIA]”
Update Operation
The operation changes the values of some field(s) (attribute) of row(s) in a relation. The
update operation uses the following format
“UPDATE [RELATION_NAME] SET [FIELD1= VALUE1], SET [FIELD2=[VALUE2] ,… WHERE [CRITERIA]”
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
26
Select Operation
The tuples (rows) in the resulting relation are a subset of the tuples in the original
relation
“SELECT [F1], [F2] FROM [RELATION_NAME] WHERE [CRITERIA]”
Select Distinct Operation
SELECT DISTINCT statement is used to
return only distinct (different) values.
SELECT DISTINCT [F1] , [F2] FROM
[RELATION_NAME] ;
SELECT DISTINCT [FIELD NAME] FROM
[RELATION_NAME]
Join Operation
The join operation is a binary
operation that combines two relations
on common attributes.
Union Operation
The SQL UNION operator combines the result of two or more SELECT statements. Each
SELECT statement within the UNION must have the same number of columns. The columns
must also have similar data types. The columns in each SELECT statement must be in the
same order. The UNION operator selects only distinct values by default. To allow
duplicate values, use the ALL keyword with UNION.
“SELECT [FIELD_NAMES] FROM [RELATION1]
UNION
SELECT [FIELD_NAMES] FROM [RELATION2]”
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
27
Suitable Land for Cyclone Shelter Construction
Mapping units of land type: Khas land, Private land, Classified Land
Mapping units of Elevation: 0 – 100m
Set A: Khas land
Set B: Elevation >= 50m
X = A AND B finds all occurrence of Khas land with elevation >= 50 m
X = A OR B finds all occurrence of Khas land, and all elevation >=50 m
X = A NOT B finds all occurrences that are Khas land where the elevation is less than 50m
X = A XOR B finds all occurrence that are neither Khas land or have elevation >=50 m.
Raster Algebra
Is an algebraic framework for performing operations on data stored in a geographical
information system (GIS)? Allows the user to model different problems and to obtain
new information from the existing data set. Mathematical combinations of layers.
What you have just seen is the basis for the map algebra language in ArcGIS Grid and
Spatial Analyst
Local functions
Focal functions
Zonal functions
Global functions Local Focal Zonal Global
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
28
a. Local
Sometimes called layer functions. Work on every single cell in a raster layer Cells
are processed without reference to surrounding cells Operations can be
arithmetic, trigonometric, exponential, logical or logarithmic functions. New layer
is a function of two or more input layers Output value for each cell is a function of
the values of the corresponding cells in the input layers.
• Arithmetic operations +, -, *, /, Abs, …
• Relational operators >, <, …
• Statistic operations Min, Max, Mean, Majority, …
• Trigonometric operations Sine, Cosine, Tan, Arcsine, Arccosine,
…
• Exponential and logarithmic operations Sqr, sqrt, exp, exp2, …
b. Focal
Compute an output value for each cell as a
function of the cells that are within its
neighborhood.
Widely used in image processing with different
names
– Convolution, filtering, kernel or moving
window
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
29
The simplest and most common neighborhood is a 3 by 3 rectangle window.
Others are a rectangle, a circle, an annulus (a donut) or a wedge
c. Zonal
Compute a new value for each cell as a
function of the cell values within a zone
containing the cell
Zone layer
o defines zones
Value layer
o contains input cell values
Zonal Statistical Operation
• Calculate statistics for each cell by using all the cell values within a zone
o Zonal Mean, Zonal Median, Zonal Sum, Zonal Minimum, Zonal Maximum,
Zonal Range, Zonal Majority, Zonal Variety
Outputs of Zonal Operations
• Raster layer
– All the cells within a zone have the same value on the output raster layer
• Table
– Each row in the table contains the statistics for a zone.
– The first column is the value (or ID) of each zone.
– The table can be joined back to the zone layer.
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
30
d. Global
Operations that compute an output raster where the value of each output cell is
a function of all the cells in the input raster
Global statistical operations
Distance operations.
o Euclidean distance
o Cost distance
1. Distance operations
Characterize the relationships between each cell and source cells (usually
representing features)
o Distance to nearest source cell
Euclidean Distance
Calculates the shortest straight distance from each
cell to its nearest source cell (EucDistance)
Assigns each cell the value of its nearest source cell
(EucAllocation)
Calculates the direction from each cell to its nearest source cell
(EucDirection)
Cost Distance
Compute the least accumulative cost from each
cell to its least-cost source cell
Source raster
o Representing features (points, lines, and
polygons)
o No-source cells are set to NODATA value
Friction raster
o Cost encountered while moving in a cell (distance, time, dollars and efforts)
o Unit is: cost per unit distance
o Can have barriers (NODATA cells)
Resampling
Cell Size
Different raster datasets may not have the same cell resolution. But during processing
between multiple datasets, the cell resolution ideally should be the same. When multiple
raster datasets are input into any analysis and their resolutions are different, one or more
Friction
s
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
31
of the input datasets will be automatically resampled to the coarsest resolution of the
input datasets.
Resampling
To find the value each cell in the resampled output raster, the center of each
cell in the output must be mapped to the original input coordinate system. Each cell
center coordinate is transformed backward to identify the location of the point on the
original input raster.
Once the input location is identified, a value can be assigned to the output location
based on the nearby cells in the input. It is rare that an output cell center will align exactly
with any cell center of the input raster. Therefore, techniques have been developed
to determine the output value depending on where the point falls relative to the
center of cells of the input raster and the values associated with these cells.
The three techniques for determining output values are-
• Nearest neighbor assignment,
• Bilinear interpolation,
• Cubic convolution.
Each of these techniques assigns values to the output differently. Thus the values assigned
to the cells of an output raster may differ according to the technique used.
Nearest Neighbor
The nearest neighbor technique assigns the value of the cell
whose center is closest to the center of the output cell. It is
the resampling technique of choice for discrete, or
categorical, raster data, such as land-use raster’s, because
it does not change the value of the input cells.
In the image below, the output raster is resampled from a
rotated input raster. The cell centers of the input raster are in
gray. The value for one of the cell on the output raster (in
red) is derived by identifying the nearest cell center on the
input raster (the blue spot) and assigning its value to the
output cell center.
Bilinear Interpolation
Bilinear interpolation uses the value of the four nearest input cell
centers to determine the value on the output raster. The new value
for the output cell is a weighted average of these four values,
adjusted to account for their distance from the center of the output
cell. Since the values for the output cells are weighted based on
distance, and then averaged, the bilinear interpolation is best used
for data where the location from a known point or phenomenon
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
32
determines the value assigned to the cell. For example, elevation, slope, magnitude of
earthquake from the epicenter.
In the image below, the output raster is resampled from a rotated input raster. The cell
centers of the input raster are in gray. The value for one of the cell on the output raster
(in red) is derived by identifying the four nearest cell centers on the input raster (the four blue spots) and assigning the weighted average of the four values to the output cell.
Cubic Convolution
Cubic convolution is a resampling technique similiar to bilinear
interpolation except that the weighted average is calculated
from the values of the 16 nearest input cell centers. Compared
with bilinear interpolation, cubic convolution has a tendency to
sharpen the edges of the input data since more cells are involved
in the calculation of the output value.
In the image below, the output raster is resampled from a rotated
input raster. The cell centers of the input raster are in gray. The
value for one of the cell on the output raster (in red) is derived by identifying the sixteen
nearest cell centers on the input raster (the four blue spots) and assigning the weighted
average of the sixteen values to the output cell.
Interpolation
Interpolation is the procedure of estimating the value of properties at unsampled points
or areas using a limited number of sampled observations.
Point wise interpolation
o Thiessen polygon
o Weighted Average
Interpolation by curve fitting
Exact interpolation
o Nearest neighbor
o Linear interpolation
o Cubic interpolation
Approximate interpolation
o Moving Average
o B-spline
o Curve Fitting by Least Square Method
Interpolation by surface fitting
3.1 Regular grid
o Bilinear Interpolation
o Cubic Interpolation
3.2 Random points
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
33
o TIN
Point wise interpolation
Point wise interpolation is used in case the sampled points are not densely
located with a limited influence or continuity in surrounding observations, for
example climate observations such as rainfall and temperature, or ground
water level measurements at wells.
a. Thiessen Polygon
Thiessen polygons can be generated using distance operator which
creates the polygon boundaries as the
intersections of radial expansions from the
observation points.
This method is also known as Voronoi
tessellation.
Each Thiessen polygon contains only a
single point input feature. Any location
within a Thiessen polygon is closer to its associated point than to any
other point input feature
b. Weighted Average
A window of circular shape with the radius of dmax is drawn at a point
to be interpolated, so as to involve six to eight surrounding observed points.
Then the value of a point is calculated from the summation of the
product of the observed value zi and weight wi, divided by the summation
of the weights. The weight functions commonly
used are the function of distance as follows
Weight Function Properties
0 order -Average without consideration of distance
1st order -Nearest points have a little influence
2nd order -Nearest points have moderate influence
3rd order -Nearest points have very strong influence
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
34
Curve fitting is to interpolate
The principle of curve fitting is to interpolate the value at an unsampled point using
surrounding sampled points.
Curve fitting is an important type of interpolation in many applications of GIS.
Curve fitting is divided into
two categories;
Exact interpolation :
a fitted curve passes
through all given
points
Approximate
interpolation : a
fitted curve does
not always pass
through all given
points
Exact Interpolation
1. Nearest Neighbor
The same value as that of the observation is given within the proximal
distance
2. Linear
A piecewise linear function is applied between two adjacent points
3. Cubic interpolation
A third order polynomial is applied between two adjacent points under
the condition that the first and second order differentials should be
continuous. Such a curve is called "spline"
Nearest Cubic Linear
GEOGRAPHIC INFORMATION SYSTEM AND DATABASE MANAGEMENT [email protected]
35
Approximate Interpolation
a. Moving Average
A window with a range of -d to +d is set to average
the observation
within the region
b. B-Spline
A cubic curve is determined by using four adjacent observations
c. Least square method
(Sometimes called regression model) is a statistical approach to estimate an
expected value or function with the highest probability from the observations with
random errors. The highest probability is
replaced by minimizing the sum of square
of residuals in the least square method.
Residual is defined as the difference
between the observation and an
estimated value of a function.