+ All Categories
Home > Documents > Geodatabases_Lecture.pdf

Geodatabases_Lecture.pdf

Date post: 20-Nov-2015
Category:
Upload: anna-xilaki
View: 4 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
Hi welcome to my lecture on Geodatabases. My name is Stewart Bruce, I am GIS Program Coordinator at Washington College and I am going to try and enlighten you on the wonderful world of Geodatabases. 1
Transcript
  • Hi welcome to my lecture on Geodatabases. My name is Stewart Bruce, I am GIS Program Coordinator at Washington College and I am going to try and enlighten you on the wonderful world of Geodatabases.

    1

  • The purpose of this lesson is really to introduce you to some of the basic tools that you are going to use to establish one of the three types of geodatabases, the personal geodatabase and we are going to teach you some settings for creating spatial layers that can take advantage of the functions of the geodatabase. In our series of courses that are part of geoworkshops.org, this is the first of 2 separate lessons that we have on geodatabases. This is really the introductory level; we have a more advanced level geodatabase course in our 200 series courses.

    2

  • Now in order to appreciate the geodatabase, it helps to have a little bit of understanding of some of the previous spatial data formats that work with the ESRI product. The data looks the same if you have parcels or roads, doesnt really matter what data format you use, you will see parcels or roads but how that data is handled internally to the ArcGIS product has changed dramatically over the years.

    3

  • Now the first spatial data file format that was designed by ESRI was called the ArcInfo Coverage. It was originally designed to work within the UNIX operating system. You may not recall how computers worked in the 80s and the early 90s, but a lot of GIS users used the Sun spot work stations, very expensive machines. Some of the key points of the ArcInfo coverage is that it is a proprietary vector data format, the coverage is stored in a directory or folder structure and whenever you see an ArcInfo coverage, there are always two folders, there is one folder called the Info, this basically has a bunch of unique information about the coverage and there is another folder that would usually have the name of the layer. You need both of those folders to make a coverage.

    4

  • Now the Info folder contains information about potentially many data layers, so within a folder structure as seen below where you see it says coverages and then we have a roads layer and a streams layer and we also have an info layer. So the roads layer is the roads folder, it contains information at roads, the streams folder contains information about streams and the info folder contains information relating to both of those layers. So it is almost impossible to copy a coverage by using regular windows explorer, you really have to use the ESRI tools, more specifically ArcCatalog to copy coverages from one location to another, this makes it kind of difficult to move this data around, it is difficult to email an ArcInfo Coverage to someone unless you basically convert it into what is known as an ArcInfo interchange file, with a file extension of a .E00.

    5

  • Now one of the benefits of a coverage and there is actually quite a few benefits to a coverage. One is that a coverage could contain multiple vector types in the same coverage. So you could have points, lines, polygons, you could have annotation all within one coverage. Now the ability to have multiple vector types within a single coverage is a very good feature, very useful to have and there are some other issues related to topology that a coverage had that is also extremely useful.

    6

  • Topology is really something that allows you to maintain spatial relationships between different features. When ESRI switched to the shapefile format, the topology was lost in the transition.

    7

  • Now this diagram here really gives you an idea of what topology is, so you can see we have 3 polygons, A, B, C and then D would be the exterior of the polygons. So topology is the spatial relationship between these different spatial features, so they share a common borders, they share common vertices and this information is important to be captured within a GIS. If you have topology then you can validate topology rules, for example you dont want to have things like overlapping polygons or gaps between the polygons and the different data layers.

    8

  • Now another format that relates to the coverage is whats known as the ArcInfo Grid. It uses the same folder format as the coverage, but instead of managing vector data, it manages raster data. The grid raster file format is really not totally gone away, because there are a lot of ArcGIS tools you will use that deal with rasters and the standard output for these tools is a grid format.

    9

  • Now one of the downsides of the coverage, because so far it sounds pretty cool, probably one of the bigger downsides is that in the days of UNIX it worked very well, but with the introduction of Windows it started becoming obsolete. Its not necessarily that the coverage itself was obsolete, but that the original operating system for the ESRI product was call ArcInfo and ArcInfo was designed to work on a UNIX workstation. With the introduction of Windows, ESRI came up with a new product called ArcView that could operate on the Windows platform. They also then developed this new data format called the shape file. The other issue is that the proprietary nature of the coverage limited sharing of GIS data between different applications that were non-ESRI. Now, prior to Windows NT 4.0 operating system, the coverage and ArcInfo itself would not work well in a Windows environment. And the other issue was that the ArcInfo software itself was very complex to use, took a lot of training, had a lot of keystroke commands which are using in the UNIX operating system. This complexity hindered new users, such as yourselves, from doing GIS work. Not to mention the fact that a UNIX sunspark workstation cost around 20 or 25 thousand dollars, if you could imagine that in todays market.

    10

  • ESRI created the shapefile. The shapefile was really designed to be used on a personal computer. It was still a proprietary data format, but ESRI sort of published what this was, and that allowed other vendors of GIS software to create connectivity between the ESRI product line, and their product lines. It started with the ArcView product, and kind of advanced and developed all the way though ArcView version 3 dot x. Now, shapefiles have multiple files, so unlike the coverages, which are organized into a folder format, shapefiles consist of a number of individual files. Now, the diagram that Im showing here basically indicates that we have a dot S-H-P, dot S-H-X, a dot DBF, these are the three fundamental files for shapefiles. For example, you can have a projection file, called dot P-R-J, you could have an ArcView legend file with a dot A-V-L, and there are a few other examples. So potentially with a shapefile you could have six, seven, maybe eight different files, that together make up the shapefile.

    11

  • Now, Geodatabases was a solution to some of the problems that developed here. So we had all this functionality in a coverage, topology being one of the main things, and the ability to have points, lines, and polygons in a single folder. Shapefiles could only exist, you could have a point shapefile, you could have a line, or polygon shapefile. There was no such thing at all as an annotation shapefile, and in shapefiles we lost the topology. So the ability to prevent things like overlapping polygons, slivers, and other topological problems disappeared with the shapefile. ESRI invented the geodatabase which solved all of these problems.

    12

  • Now, in the big picture here, I kind of want to point out that this slide says ArcGIS 9, of course were up at ArcGIS 10 right now, but this will give you a picture of the entire ESRI product family. You are working right now with the desktop GIS, and most of you will probably have the ArcView desktop. Were actually using the ArcInfo product, and as I explained in earlier lessons, the difference really between ArcView and ArcInfo is simply that ArcInfo has more tools. Back in the days when you had ArcInfo 7 and ArcView 3, these software's were not the same, so when a user transferred to a higher software with more functionality they really had to relearn the entire interface. But notice in the diagram that really the ArcGIS desktop product is only one of many products that are available in the ArcGIS data family. Things like ArcGIS server, we have ArcIMS, there is ArcGIS for mobile applications, there is also some real enterprise level functionality by the use of ArcSDE which stands for Spatial Data Engine, which can connect to very robust database management systems, such as Oracle, or, for example, Microsoft SQL Server enterprise. So its really a big family of products and in our courses were just learning the really entry level desktop product.

    13

  • So lets explore geodatabases a little more, what are they all about? Now, when I describe a geodatabase I really like to talk about the geodatabase as a container, and if you look inside ArcCatalogue for example, when you see a geodatabase, it looks like a little drum. So if you think of a geodatabase as a giant 55 gallon oil drum, its really a huge container. And in this container you can do a number of different things. For example you can import data, we have the capability of importing any previous ESRI spatial data format such as a coverage or a shapefile, into the new format, we can actually bring in CAD drawings from autoCAD or from MicroStation can be imported into the geodatabase, we can also add things like tables and rasters. In addition to bringing in data that already exists, you can create new data, so we can create new points, lines, or polygons, we can create what are known as annotation feature classes, and we can also create new tables. So into this container you can put an awful lot of stuff.

    There are three types of geodatabases that you will use, the first one is known as a personal geodatabase, and the personal geodatabase uses a Microsoft Access format. There are some benefits to the personal geodatabase, our lesson has you use the personal geodatabase, mainly because its really easy to upload your geodatabase so we can see what you develop. Since it follows a Microsoft Access database format you can have a lot of different things in your geodatabase, but youre only creating a single file on your computer that has multiple things inside of it, this makes it easily

    14

  • portable. Some of the limitations of the personal geodatabase, it does have a data storage limit, so if you exceed 2 gigabytes the personal geodatabase will, um Ive never actually tried to do that, but if you can imagine that oil drum blowing up and spilling oil everywhere, well its just not going to work. Another serious limitation is that you can only have a single user at a time if youre doing editing. Now, ESRI realized these limitations and they wanted to come up with a new product.

    So, introducing ArcGIS 9 they came up with the file geodatabase. So some of the benefits are, number one, its not dependent on the Microsoft operating system. So in theory ESRI can now develop the ArcGIS product to work on, for example, a LINUX or who knows, maybe even a Macintosh file format. It has a single folder structure so in some respects it is actually quite similar to the ArcInfo coverage, except theres no info folder. Now its not quite as portable because theres so many files inside the folder, but you could zip the folder and then transport the data. So Ive posted here that there is a 1 terabyte data limit, but some additional research has shown, by listening to some ESRI propaganda, indicates that this data limit could be much higher. But honestly the average user is never going to come close to a terabyte, thats actually quite a lot of data. Another advantage of the file geodatabase? It handles raster better, and according to ESRI it handles data processing much faster. So the personal geodatabase, for example, it looked like you could add raster data to it, but in reality you were managing the raster data and the data was not actually contained inside the geodatabase. With the file geodatabase that raster data is inside the file geodatabase folder, for organizational purposes thats a very big advantage.

    Now, enterprise geodatabases are basically used for large operations, lets say for example, the city of baltimore government probably uses an enterprise geodatabase, its really for very large operations, it is very complex and its not actually discussed in great depth in this particular course.

    14

  • Now some of the requirements for enterprise geodatabases, it does require that you have ArcEditor or ArcInfo, you would also have to have ArcSDE so you could manage your connections to relational databases. These databases could include things like Oracle, SQL server, Informix, or DB2, none of these are inexpensive.

    15

  • Youre also going to need a large amount of money because all of that software is very very expensive. Microsoft SQL server enterprise for example list price, or street price, is about $39,000, ArcInfo costs $10,000, ArcSDE costs several thousand dollars, so next thing you know youve spend $100,000 just on the software. Then youre going to need a vast amount of knowledge, if youve ever looked at things like Oracle or SQL server enterprise, this is very complex software, and it requires very skilled professionals in your organization that know how these operate, and that adds up to a lot of money.

    16

  • Now I do want to discuss some of the benefits of personal and file geodatabases, so that you have some understanding of some of the reasons that I think they are a big improvement and you definitely should be using them solely for all of your GIS data.

    17

  • So portability and data management, the geodatabase allows you to neatly organize a lot of different files in a single geodatabase, so you could have all the data you needed for a single project located in one geodatabase. This then simplifies the backup of your data, so you really only have to back up a single personal geodatabase or you have to back up a single file location. This makes data easier to share, its easier to document the data, and it definitely does have an improved database format. All of this goes towards making your life a little bit simpler.

    18

  • Another thing I want to bring up is this concept of shape area and shape length. We talked about this to some extent in our data frames lesson, but try to picture this, so if we look at polygon A and polygon B. If we modify polygon A to look like polygon B what has changed about the spatial attributes for polygon B? Basically whats change is that the perimeter and area of polygon B has changed. Now in the old shapefile data format, if you edited polygon A to look like polygon B it would not update some of the area and perimeter fields in the shape file. You would have had to have manually recalculated area and perimeter. Now the bad part about that is lots of people forgot to do this, or they didnt even know they had to do it in the first place. So, in the geodatabase format, whenever you edit, and you save it, it automatically recalculates area and perimeter for you in the SHAPE_area and SHAPE_length field. The only thing that it doesnt do if you have a derivative field, such as acres. Lets say you have your SHAPE_area and your SHAPE_length, but you also have your acres field. So if you recalculate, and you change the polygon, and you change the area and perimeter, it automatically updates the SHAPE_area and SHAPE_length, but you would still have to manually update the derived fields such as acres, so please dont forget that or else you are going to end up with some completely bogus analysis on your data. And remember too, that when you re-project data to a new projection, it changes area and perimeter.

    19

  • Now this diagram here is sort of showing a little bit more detail of the SHAPE_length and SHAPE_area field. What we get is when you see these numbers in your attributes table, or your features classes, a lot of people ask, well, what are the units of measurement for these. Well the SHAPE_length and the SHAPE_area are always in the units of measurements for the projection. So if you want to go and figure that out, all you have to do is open up the layer properties and then look at the, I believe its the source field, and that will tell what projection the datas in, and then you can determine what the units of measurement are.

    20

  • The next thing I want to talk about, and I actually think this is a huge benefit of the geodatabase, is this concept of attribute domains. Now I remember many years ago, probably longer than Id like to remember, I was working at Mifflin County, and I was given access to our property database, and I had an assignment to figure out what types of heating systems were being used in homes, in Mifflin County PA. But when I got to the data what I found was, there was very little data entry control by the clerks who worked down in the assessment office. Now there were many different clerks and they all had different ways of determining data. Now I was trying to figure out how many homes had oil/hot water heat, and inside that database I actually found 17 different ways that you can say oil/ hot water heat. Well this is a problem if you have clerks entering data, and they have the option to type in, quite frankly, anything that they want. Attribute domains can prevent that, so if you know for example, in a particular data field, that there are only a certain number of variables you could use, you could pre-program that by creating this attribute domain. Then when the data entry operator goes ahead and tries to fill it in, instead of having to type it in they would get a dropdown menu and they would pick the correct code and you would eliminate data entry errors. Plus its a little bit quicker because you dont have to actually type it in. The other thing that you can do in a database that is really cool, is that if you know for example, that 70% of your feature class has a certain code, you can make that code default, so if youre doing a lot of digitizing and you know that you have mostly agricultural land, you can make agriculture the default code. This will save you a ton

    21

  • of labor and allow you to be more efficient in your operations. There are different types of attribute domains that you can pick, one of them is known as the range domain, so lets say youre mapping hydrants and you know that the range is between 0 and 100 PSI. You can set up a range domain that can prevent a person from entering a value outside of that range. Again this will help prevent data entry errors. Coded domains are also very valuable, in a coded domain what you can do is, for example lets say youre mapping trees and you know there are certain kinds of trees. You would go ahead and create the domain and have these different types of trees there, from that drop down/ pull down menu. It also works very well for things like numeric values on pipe size. You can set a default value which I kind of already mentioned, so Im going to kind of blow through this here. And you can also set null values, so if you want you can have the field so that null value is not acceptable and it forces the operator to enter some kind of value in that field.

    21

  • Now, the next super huge benefit of geodatabases is that it does allow you to create an annotation feature class. Now, if you recall the ArcInfo coverage had an annotation feature class, then when we started using the shapefile there was no annotation feature class. Well, the geodatabase reintroduces the functionality that previously existed with the coverage. Now I mention here that previous issues related to annotation with coverage and shapefiles are resolved. There was some issue, and this actually caused me a lot of grief when I was working at Mifflin County, I would do my annotation in a coverage, I would bring it in the ArcView product, and due to some really complex issues with font sizes, font types, and how the two different soft wares drew the text, the text never ended up in the same spot. So the geodatabase has resolved that issue.

    22

  • Since annotation is the biggest headache for any GIS user, this is a really great benefit and if you put it into scale here, lets say you were working in York County, Pennsylvania. Theres around 160,000 parcels, each parcel could have six pieces of text, and that comes out to almost a whopping one million individual pieces of text that somebody in your organization has to be responsible for placing all that text. Its a gigantic pain, and the geodatabase allows you the tools that you need to properly handle annotation.

    23

  • The other big benefit is raster management, so you can create things like raster catalogs, you can store pictures that are hyperlinked to individual spatial features, we have a whole lesson on hyperlinking in our 200 level courses, one of my favorite lessons, and you can also access raster data though the identify button if you have your rasters in a geodatabase. So huge benefits for raster management for organizations, big or small.

    24

  • There are some concerns and cautions I should warn you about. One of those is that geodatabases are not backwards compatible. So if you are using ArcGIS 10 and other people are using ArcGIS 9 they will not be able to read you ArcGIS database. Now, you have the capability, using some of the ArcToolbox tools, you can convert a Arc 10 geodatabase into a previous version, but quite honestly the simplest thing to do is make sure everyone in your organization is at the same level of ArcGIS software that you are using.

    25

  • So, to sum it up geodatabases are GRRREAT! So stop using those shapefiles and stop using coverages, and use the format that is the future of the ArcGIS product line.

    26

  • Thank you very much for your time today and I hope that you enjoy the practical exercise that we have for you, and have a great day!

    27