AVH - Australia’s Virtual Herbarium
Logo
Jim Croft
Centre for Plant Biodiversity Research
Australian National Herbarium
Australia’s Virtual Herbarium:
storing and interchanging botanical data on-line
Jim CroftCentre for Plant Biodiversity
ResearchAustralian National Herbarium
[email protected]://www.anbg.gov.au/jrc/
Storage Industry speak
• Role of Industry / Cost of storage infrastructure relative to project / Business competetiveness / Strategic business advantage / Justify investment / Return on Information (ROI) / Ever-changing environment / Interoperability / Linking geographically dispersed sites/ Availability all the time, everywhere, forever / Storage strategy / Effective storage management / Right storage architecture / Faster storage environment / Advanced database technologies / Optimising availability, performance, placement, recovery / Flexible future-proof IT infrastructure / Data storage to assist R&D / Replication for disaster recovery / Benefits of central disaster recovery / Necessity to plan ahead / Staged implementation approach / Lead organizational change
Herbarium botany speak
• Herbarium / Plants / Regional floristics / Preserved botanical specimens / Taxonomic hierarchy and taxon ranks / Nomenclature, synonymy / Alternative phylogenies and classifications / International Code of Botanical Nomencalture / Identification keys / Original, derived and modifided data / Localities, geocodes / Data accuracy and precision / Habitat, altitude, depth, substrate / Biological images / Geospatial modeling and visualization / Environmental modeling and prediction / Species distributions and occurrence / Biological descriptive frameworks / Interactive identification / Flora Information systems / Landcover species / Curatorial standards / Cryptogams / Gymnosperms / Platyzomataceae / Eucalyptus camaldulensis
AVH - The Big Questions
The 6 Ws:
Who?What
Where?When?Why?hoW?
AVH - The Big Questions
What is the AVH?Why should the AVH happen?Where does the AVH happen?Who does the AVH happen for?When does the AVH happen?hoW does the AVH happen?
Whence the AVH?
What is a Herbarium?
• A physically and administratively secure building
• A managed archival scientific collection of preserved plant specimens
• A research environment and resource for botanical systematic and taxonomic resource
• A taxonomic, spatial and temporal information base for botanical research, environmental decision-making and public information
Platyzoma micropyllum
Platyzoma micropyllum
Platyzoma micropyllum
Herbarium Specimens
Compactus storage units
Compactus storage units
Botanical Library
Botanical literature
Specimen Data Capture
Public Reference Herbarium
What is a Virtual Herbarium?
• The physical resources and biological information of a herbarium represented digitally
• On-line access to herbaria and to botanical information managed by herbaria
• Integrated access to botanical information from various sources in a herbarium and other on-line botanical information
What is the AVH?
• A collaborative project of the Australian Herbarium community, providing:– Partnership and shared access to each
others data– Real-time access to current working data– Shared access to common authority files– A shared development environment– Opportunity to shared data-hosting,
archiving and off-site backup.– Co-ownership of the final product
The pilot: distribution of Acacia aneura, mulga
Acacia aneura: Distribution of specimens from each herbarium
Overlays
Geocode accuracySurvey data
A Herbarium Database Structure
Why is there an AVH?
• Pressure on Herbaria to work more efficiently
• Demand for access to larger amounts of data
• Demand to access data more quickly• Demand to view data in different ways• Pressure on herbaria to be and appear
more responsive to community needs
What is the Problem?
• > 18,000 species of higher plants• > 64,000 available names• Extensive synonymy (4 names per
plant)• 8 major government-funded herbaria• Similar number of university herbaria• > 6,500,000 specimens Aust. herbaria• 50-100 data elements per specimen• Several Kb per specimen
Where is the data?
• In each herbarium (largest 1.3 million specimens)
• Pooling data centrally not acceptable for operational, political and emotional reasons.
• Therefore we need a distributed data management and access solution, maintaining and ensuring custodial responsibility
Where is the data?
• Images compound the problem• Several Kb and up for plant images
(possibly 100,000 available)• Specimen images need high resolution,
up to 20 Mb or more• Need to be sub-sampled for web
display• At least 100,000 type specimens• Ideally all 6.5 million should be done
Where is the AVH?
• Spread across Australian herbaria• Data distributed; resides with
custodians• Each herbarium has a portal to
receive requests to and deliver data from its database
• Each herbarium hosts a common AVH query interface that polls all herbaria and integrates and returns data as a single query
Major Australian Herbaria
Who are the participants?
State Herbarium of South Australia
Queensland Herbarium
Australian National Herbarium
Northern Territory Herbarium
Tasmanian Herbarium
Industry Partner:KE Software
National Herbarium of Victoria
National Herbarium of New South Wales
Western Australian Herbarium
Australian Biological Resources Study
Holdings of Aust. Herbaria
Who runs the AVH?
• The Council of Heads of Australian Herbaria (CHAH).
• The Herbarium Information Systems Committee (HISCOM)
• IT staff at each herbarium (technology)• Botanical staff at each herbarium
(content)• Scientific staff at each herbarium
(validation)
Aust. & NZ Environment & Conservation Council
• Government committee of Commonwealth and State/Territory Environment Ministers
• Accepted that the community wanted the product
• Funding options and regional support• Working group• Project design input - new name
“The Agreement”
• $10 million project over five years• Capture new data and validate old• State/Territory to contribute amount
relative to specimens to be databased/validated
• $4 million Commonwealth + $4 million State/Territory + $2 million private
• Sharing data critical to cost (cf. $16 million)
Who uses the AVH?
• The participating herbaria get access to all the data at the highest precision.
• Public access filter restricts access to work in progress, sensitive locality data, etc.
• Access to conservation agencies, environmental decision makers
• Research and education• Public general interest
GREENING THE
GRAINBELT Uses
Uses
When did the AVH happen?
• Basically this year
• But we have been working towards it for over 12 years
• And there have been the occasional dead ends and setbacks, waiting for technology, capacity, support, etc.
Brief History of the AVH
• 1995 - HISCOM recommends the AVH concept (a distributed database) to CHAH
• 1997 - Canvassed at Systematics meeting• 1999 - Proof of concept with Acacia• 2000 - Government Minister shows
interest• 2000 - Interest from industry/foundations• 2000 - Negotiating cost & lobbying
Recent Activity
• Major item at October CHAH meeting- Agreement on what information we provide to community - Priority groups and ‘Who does what?’
• Trust to oversee financial arrangements
• Liaison and Advisory Committee
Evolution of the AVH
Race to database
Need for semantic standard recognized
HISPID
Exchange Distributed query
Standard syntax
Need for common semantic schema recognized Botanica
l ontology?
hoW does the AVH work?
• On a number of different levels– Politically– Administratively– Technically– Scientifically
AVH General Architecture
URL
UMLXML
URI XHTML
HTTPUDDI
XSLTXPATHRDF
PNG
SVG
DOMCSS SAXHISPID
ITFBNF
Z39.50
WAIS
ASN.1
XML schema
Standards
Dublin CoreRDFSZ39.19 SOAP
cgiRMI
Whence the AVH?
• A new era of integrated access to botanical information
• New ways of visualizing data form different sources
• New ways on managing and validating data across remote databases
• More automation, more speed, higher throughput
Added extras - the real AVH
• Stage 1: databasing (dots on maps)• Plus map overlays, precision flags,
spatial queries, pretty interfaces, etc.• Conflicting taxonomies - towards a
National Census• Stage 2+: images, descriptions,
identification tools• Multiple resources and options (cf.
library)
Botanical illustrations
Plus
But...
Integrated strategies for tackling fungal
biodiversity Problem: 250,000 spp., 5% known, few
herbarium collections
Solution: Fungimap
Community mapping of 100 common species by 600 volunteers
Distribution and habitat data leads to better conservation and systematics
But...
Australian eFloras and other digital products
Australian eFloras and other digital products
Australian eFloras and other digital products
Why it will work• Communication - CHAH, few herbaria• Collaboration - long-standing, data
sharing, overcoming Australia’s Federal/State system
• Champions - management, public• Lobbying and profile of herbaria• Relevance of product• And now…we need to maintain
commitment to project (e.g. impact on research outputs and other organisational initiatives)
Future technology• Currently very simple architecture
and technology• Increase in Complexity and ‘bulk’ is
inevitable• Can not avoid engaging computer
scientists and the computer industry– Optimize data storage– Optimize data access and delivery– Optimize analysis and visualization– Optimize knowledge discovery
ParticipantsState Herbarium of South Australia
Queensland Herbarium
Australian National Herbarium
Northern Territory Herbarium
Tasmanian Herbarium
Industry Partner:KE Software
National Herbarium of Victoria
National Herbarium of New South Wales
Western Australian Herbarium
Australian Biological Resources Study