Post on 18-May-2015
transcript
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity InformaticsEvolving in the Biological Sciences
National Geographic News, 05/21/08 Tuatara
Lib
rari
es
????
?
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
encyclopedists
Nomencaltor Animalia“Seahorse” Conrad Gesner 1570
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
encyclopedistsDenis Diderot – 1751Encyclopedie
Precursor to Sematic Web Thinking
• “So great is the power of linkage and order that even the mundane becomes important” DD
• Encyclopedia… the word signifies unity of knowledge
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Serine Molecule
BiodiversityHeritage Library
Synthesis CenterField Museum
InformaticsMarine BiologicalLaboratory & MOBOT
Education & OutreachSmithsonian/Harvard
SecretariatSmithsonian
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
“The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions
for research in Every branch of biology:
– E.O. Wilson
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
HOW to build this enterprise• Recognition of the importance of all types of material in all formats• Recognition that a single set of rules, a single mechanism, a single
type of discovery tool cannot accomplish everything• Recognition that entities other than libraries can, want to, and will
contribute to the information-finding construct• Recognition that all of us are part of the whole, and that it is an
interdependent relationship, not the relationship of an all-powerful mother ship (LC) to a fleet of shuttle craft
• Recognition that the way we have made decisions in the past may no longer serve us well
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Collaborative Tree of Life distributed semantic
Biodiversity Heritage Library ever evolving TED all information Synthesis Center Oh wow! SpeciesBase ClassificationBank Education and Outreach ANTS index MacArthur Foundation taxonomic intelligence modular software communal ownership user defined AvenueA | Razorfish OBIS MBL free
visualization images WorkBench sounds phylogeny web 2.0 names-based infrastructure Atlas of Living Australia February 2008 Google Marine Biological Laboratory all species Smithsonian FISHBASE Harvard Field Museum Tree of Life E. O. Wilson aggregation / mashup EDIT ScratchPad widgets
MOBOT NHM AMNH NYBotancial Sloan Foundation GBIF llison l NameBank videos National Geographic any classification TDWG/BIS
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage LibraryMission:Provide Open Access to Biodiversity Literature
Goals:
Digitize the core published literature on biodiversity and put on the Web
Agree on approaches with the global taxonomic community, rights holders and others
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
BHL
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Internet Archive Scribe: Boston
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
How big is the Biodiversity domain?
•Over 5.4 million books dating back to 1469
•800,000 monographs
•40,000 journal titles (12,5000 current)
•50% pre-1923
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Classes of Texts• Public Domain – pre 1923
• Non-profit society journals
• Post 1923 monographs/journals
–Monographs without © renewals
–Commercial journals with permission
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Where are we?•5,000,000 scanned pages•13,000 volumes•BHL Portal•Sloan, MacArthur, Moore Foundation- funding
•49 full run journals
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Carolus Linnaeus, “father of modern taxonomy”
“Who Knowth not the name
Knowth not the subject”
Linnaeus, 1737,
Critica Botanica n 210.
Royal Science Academy of Sweden, portrait
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
“All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”
~ Grimaldi& Engel, 2005, Evolution of the Insects
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Information about named groups (taxa) of organisms (taxon-related information)
• Extends back at least 1000 years
• In books, journals, surveys, museum specimens, herbaria….
• In many languages and is distributed
From T.E. Glover, The Fishes of Southwestern Japan, c.1870
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
The challenge for contemporary DIGITAL libraries
Goal:
Use one name to find the content for all names
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Names – the only universal metadata for Biology
Names offer a logical way to search for and index content
•Names annotate data objects•All names annotate all data objects
•A compilation of all names ever used is the foundation of a universal index for biology or for a semantic web for biology
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
LibrariesPublishers
MuseumsFederal Agencies
Who is affected by these problems?
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Serious challenges in federated environments
One organism
4 scientific names
4 maps
We want one map
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Reconciliation – linking alternative names for the same organism
A query initiated with any name, can be expanded to all names and will unify data associated with each
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• All names & all Classifications ClassificationBank • Alternative names reconciled
• Similar names disambiguated
• Exploit hierarchies to browse and search, build a comprehensive classification
• Improve performance with federated systems
• Read documents, web sites, databases and taxonomically indexing the content
• Create a unified portal to information about organisms on the internet
Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• data from various sources may be merged
• red dots on the maplink back to the website thatprovided the geographical co-ordinates
Specimen distribution data from remote sources
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Taxonomic Intelligence• Lexicon of Scientific Names
• Reconciliation and Disambiguation
• Hierarchical Inclusion
• Integration into Information Retrieval
• Linkage to Other Data Types (e.g., Molecular, Morphological, Phenotype)
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
uBio
• 10.7 Million+ Name Strings
• Reconciliation Groups
• http://www.ubio.org
• FIND IT – scientific name recognition algorithm
• Training and improving algorithm
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
uBioRSS Taxonomically Intelligent RSS Feed Aggregator
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
MBL WHOI Library – Woods Hole authors’ publications
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
MBL WHOI Library – Woods Hole species publications
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Taxonomically intelligent scientific text parsing
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Search• Browse
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
AcknowledgmentsCatherine NortonPatrick LearyDavid RemsenDiane RielingerDavid PattersonNeil SarkarGerald Weissmann
A.W. Mellon FoundationAlfred P. Sloan Foundation
John D. & Catherine T. MacArthur FoundationInternet Archive
Christopher FreelandTom GarnettMartin KalfatovicGraham HigleyBHL & EOL Teams
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org