How and why to document data for long-term storage;
and
What's special about Geographical data?
Allan ReeseCefas Weymouth
Cefas buildings
Weymouth
Burnham-on-Crouch
Fish disease
• Causes and progress of infection• Diagnostics (viruses, bacteria,
fungi, parasites)• Vaccines and therapeutics
(safety and efficacy)• Epidemiology & risk assessment• Surveillance and control –
Fish Health Inspectors• Emerging and exotic diseases• Policy advice
Spring Viremia of Carp Virus
Fish farms in E&W
Who wants a database?• I’ve got some data so I need a database• Our demo will show you how easy it is to
simultaneously search, share and retrieve information from thousands of library databases
• Project… plans to build, through networking, a database on best practices in the field
• Rapid growth in the quantity of omic data means bio-informaticians need to manage data in an efficient and reliable manner. The main focus of this course is on designing, creating and querying relational databases
Why a (relational) database?
1. large volume of data (typically gigabytes)2. complex data structure (not matching standard
application)3. long-term use / continued accumulation or
incremental update4. total accuracy & consistency needed on micro-scale5. frequent accesses to small subsets, ad hoc queries6. data shared by more than one person
(University Computing 1991; Significance Dec 2007)
Extract for analysis• Fields ( variables ) = columns• Units ( level of analysis ) = rows• Columns x Rows = Data table
Query -> view ->
table of data -> summary or analysis
Mystery meat• What tables form the raw data?• What fields are in each table?• Data dictionary?• Documenting meanings or DB structure?
Table preferred when• Scientific data probably SHOULD NOT be changed
– or data added in batches ( incremental )
• Structure NOT complex– replication across units allowed, but not excessive
• Levels of analysis are few ( or few dominant )• Analyses summarize whole data or samples
– often one-offs ( bespoke or user-written ) • Sorting or indexing allows very rapid access
Data table needs metadata• Metadata standards (Dublin core)
– emphasis on discovery – list many fields– codebook not mentioned
• A modest suggestion– data table of rows and columns, with column headers– codebook: another table to explain headers– metadata: describe background, ownership etc
Geographical Databases
ESRI (ArcInfo) assumes• The purpose of a GIS is to provide a spatial
framework to support decisions …• Most often, a GIS presents information in the
form of maps and symbols …• A map user is the end consumer of a GIS.
This person looks at maps …• When the Cassini spacecraft was launched,
GIS was used to evaluate the risk of an accident with the plutonium generators on board
Nearer to me
GISs contain
• Data as points, lines, areas• Location data
– lat/long, grid refs, postcodes, toids• Representation instructions
– scaling, icons, label position, shading
Can you get data out?• Point and click works for pop-up labels
– not to output a table• Limited to the precision of the input device, including
the user’s eyesight• I want, probably, a whole layer of data, including the
positions as named fields
How do my needs map into the database?
Lacking / hidden / difficult in GIS
• List fields associated with physical object• Choose many objects and output data
– eg to make proximity matrix• Distinguish raw from constructed data
– point-heights versus interpolated contour
• Output data values for an area – eg sea surface temperatures
Request
GIS suppliers may prefer to address users’ needs by adding yet more features to the interface, or pointing to the SQL interface
I would rather they re-consider the role of the GIS as a data warehouse, from which it should be easier to select and extract data that can be analysed in other software