Date post: | 29-Jun-2015 |
Category: |
Technology |
Upload: | katja-c-seltmann |
View: | 645 times |
Download: | 2 times |
Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera ResearchKatja Seltmann Matthew Bertone Matthew J. YoderIstván MikóElizabeth MacleodAndrew Ernst Andrew R. Deans
Volumes: 1-16Years: 1992-2007
The opportunity…
1. Database (infrastructure)
2. Terms used in hymenoptera morphology
3. JHR Volumes 1-16 are online and processed using optical character recognition (OCR) software through the Biodiversity Heritage Library
.
Volumes: 1-16Years: 1992-2007
We were wondering…
1. Can we find new terms for the HAO by text extraction?
2. Look for ways we as a community do things. Is it really true that terminology follows phylogeny?
How captured terms from Journal of Hymenoptera Research…
1. Download articles from Biodiversity Heritage Library (http://www.biodiversitylibrary.org)
2. Put text in database (MX)
3. Match the article text to the words we know are terms
(also cataloged in the same database)
3. Add new terms based on what is NOT matched
4. People made decisions
353 articles
from 353 articles:
2121 morphological terms
643 qualitative
2065 terms from JHR are not defined as concepts. Floating without definition!
As of June 1, 2010…
carina (3638, 160) wing (3297, 194) setae (3294, 171) vein (2891, 141) cell (2855, 202) seta (2545, 55) eye (2438, 186) segment (2415, 159) tergum (2381, 137) hind (2209, 172) larva (1751, 113) propodeum (1617, 184 )
tooth (1604, 110) punctures (1490, 96) clypeus (1482, 175) segments (1422, 159) flagellomere (1392, 91) tergite (1371, 87) mandible (1369, 143)antenna (1365, 164) body (1359, 244) region (1289, 214)tibia (1261, 129) leg (1244, 101) ovipositor (1230, 127) ocellus (1218, 116)
larvae (1214, 161) scutellum (1201, 159) line (1166, 147) lobe (1160, 133) mesosoma (1137, 159) longitudinal (1131, 161) scape (1127, 133) legs (1072, 202) carinae (1014, 118) pronotum (1011, 162) terga (1002, 122) forewing (988, 132) antennal (966, 168) metasoma (960, 168)
carina (3638, 160) wing (3297, 194) setae (3294, 171) vein (2891, 141) cell (2855, 202) seta (2545, 55) eye (2438, 186) segment (2415, 159) tergum (2381, 137) hind (2209, 172) larva (1751, 113) propodeum (1617, 184 )
tooth (1604, 110) punctures (1490, 96) clypeus (1482, 175) segments (1422, 159) flagellomere (1392, 91) tergite (1371, 87) mandible (1369, 143)antenna (1365, 164) body (1359, 244) region (1289, 214)tibia (1261, 129) leg (1244, 101) ovipositor (1230, 127) ocellus (1218, 116)
larvae (1214, 161) scutellum (1201, 159) line (1166, 147) lobe (1160, 133) mesosoma (1137, 159) longitudinal (1131, 161) scape (1127, 133) legs (1072, 202) carinae (1014, 118) pronotum (1011, 162) terga (1002, 122) forewing (988, 132) antennal (966, 168) metasoma (960, 168)
Qualifying terms: spatial, adjectives, comparative
Qualifying terms: spatial, adjectives, comparative
posterior (2694, 216) dorsal (2654, 216) anterior (2475, 221) slightly (2247, 227 small (2048, 284) short (1930, 249) apex (1894, 192) smooth (1817, 174) large (1629, 266) distinct (1487, 201)
transverse (1486, 173) similar (1476, 276) base (1471, 200) broad (1394, 178) half (1357, 207) separated (1217, 182) single (1097, 243) rounded (1037, 158) dorsally (1017, 146) nearly (990, 185) shiny (980, 83)
inner (950, 158) shorter (938, 177) few (874, 239) elongate (859, 147) lower (834, 188)
.
Look at the data a different way…
1. Terminals are taxa discussed in articles• Use only articles that have the word “description
of” in the title• Holes: Ichnumonoidea(49), Chalcidoidea(38),
Vespoidea(36), Apoidea(36),Symphyta(9), Cynipoidea(7), Chrysidoidea(4), Stephanidae(1), Mymarommatidae(1)
2. Characters presence or absence of a term• Use only terms that occurred in more than one
article
3. Created a matrix excluding spatial and qualifying words • (1162 terms, 181 terminals)
4. TNT analysis • xmult /level 7 replications 5 hits 5• nelsen
http://tiny.cc/p0aan
http://tiny.cc/p0aan
http://tiny.cc/p0aan
http://tiny.cc/p0aan
http://tiny.cc/p0aan
studentstudent
http://tiny.cc/p0aan
Petiole: http://tiny.cc/p0aan
What does this mean to ISH…
1. Next session addresses this…moving to open access journal
2. Things we can do in our publications (in the form of annotations) that can make data synthesis easier and less need to repeat work.
funding: Advances in Biological Informatics (NSF DBI-0850223) NESCent (NSF EF-0423641) Morphbank (NSF DBI-0446224) HymAToL (NSF EF-0337220) PEET: Monographic research on parasitic Hymenoptera (NSF DEB-0328922)
intellect and enthusiasm:Biodiveristy Heritage Library, Rick Prelinger
International Society of Hymenopterists NESCent Other ontology projects Deans Lab (Barb Sharanowski, Trish Mullins, Bob Blinn, Rinchhuanawma,
Lydia Abernethy)
Acknowledgments
http://tiny.cc/[email protected]