+ All Categories
Home > Science > Integrating data with phylogenies, at scale

Integrating data with phylogenies, at scale

Date post: 16-Apr-2017
Category:
Upload: hilmar-lapp
View: 109 times
Download: 4 times
Share this document with a friend
43
Integra(ng data with phylogenies, at scale Nico Cellinese University of Florida & Hilmar Lapp Duke University
Transcript
Page 1: Integrating data with phylogenies, at scale

Integra(ngdatawithphylogenies,atscale

NicoCellineseUniversityofFlorida

&HilmarLappDukeUniversity

Page 2: Integrating data with phylogenies, at scale

WHAT’SINANAME?

Page 3: Integrating data with phylogenies, at scale

What’sinaname?

Chaos!• NamesandConceptsdonotreconcilethateasily• Namesaretextstrings•  Contextislackingorsubjec(ve• Meaningisnotcomputable

Page 4: Integrating data with phylogenies, at scale

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

Page 5: Integrating data with phylogenies, at scale

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

Page 6: Integrating data with phylogenies, at scale

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

Idon’tunderstandanyofthoseconceptswhetherinLaDnorEnglish,butIcansDlllinkthemtotheirnames,asinoneobject

tooneobject

Page 7: Integrating data with phylogenies, at scale

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

…and200+

…and400+

Page 8: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 9: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 10: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 11: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 12: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 13: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 14: Integrating data with phylogenies, at scale

Idiosyncratic Russian dolls syndrome

Page 15: Integrating data with phylogenies, at scale

FromahumanperspecDve,welosetrackofconcepts.Hardtoreconcileallofthem.Weneedhelp!Canwecomputethem?

Idiosyncratic Russian dolls syndrome

Page 16: Integrating data with phylogenies, at scale

Linneannamespointtoconcepts

AntoineLaurentdeJussieuGeneraPlantarum,1789

…and200+

…and400+

Page 17: Integrating data with phylogenies, at scale
Page 18: Integrating data with phylogenies, at scale

•  WecanuncluNerconcepts,andtherebynomenclature

•  HowdowenavigatealongtheTreeofLiferepurposingLinneannames,whicharelinkedtotradi(onalconcepts?

Page 19: Integrating data with phylogenies, at scale

Darktaxa!

Page 20: Integrating data with phylogenies, at scale

Darktaxa!

Howdoweintegratedatawiththistree?

Page 21: Integrating data with phylogenies, at scale

Tree-thinkingCommondescentàevoluDonatthecenteroftaxonomy

B C D

Branches

Synapomorphies

A

Clades=taxa

Discovery

Page 22: Integrating data with phylogenies, at scale

Tree-thinkingCommondescentàevoluDonatthecenteroftaxonomy

Discovery

CommunicaDonHow??

014

7De

nsity

0.07

0.22

0.72Diversification rate

Page 23: Integrating data with phylogenies, at scale

Tree-thinking

Berberidopsidaceae

OpilionesZingiberaceae

HamamelidaceaeSarcolaenaceae

Lingulidae

Hymenoptera

Mammalia

Apocynaceae

Galliformes

Rubiaceae

Anarthriaceae

Lineidae

CrocodylidaeStylosiphonia

Andrenidae Cracidae

Gavialis

Globba

Micrella Rhodoleia

Phalangiidae Tachyglossa

Lyginia

Mediusella

Chamaeclitandra

Page 24: Integrating data with phylogenies, at scale

Tree-thinking

Berberidopsidaceae

OpilionesZingiberaceae

HamamelidaceaeSarcolaenaceae

Lingulidae

Hymenoptera

Mammalia

Apocynaceae

Galliformes

Rubiaceae

Anarthriaceae

Lineidae

CrocodylidaeStylosiphonia

Andrenidae Cracidae

Gavialis

Globba

Micrella Rhodoleia

Phalangiidae Tachyglossa

Lyginia

Mediusella

Chamaeclitandra

ThesenamesarenotgeneratedinanevoluDonary-basedframework(Groupsdefinedbycharactersimilarityvs.commondescent)

Page 25: Integrating data with phylogenies, at scale

BoththeEncyclopediaofLife(EOL)andtheOpenTreeofLifesuggestthatCampanuloideaeisamisspellingofCampaniloidea(marinegastropods!)GBIFdoesnotcurrentlyhaveCampanuloideaeinitsbackbonetaxonomy.

Page 26: Integrating data with phylogenies, at scale

Areyoukiddingme?

ThesearetheCampanuloideae!

Wangetal.2014

Page 27: Integrating data with phylogenies, at scale

LifeasastreetmapHowtonavigatelifeasamachine

Page 28: Integrating data with phylogenies, at scale

Mappingdatatophylogene(cknowledgespace

Page 29: Integrating data with phylogenies, at scale
Page 30: Integrating data with phylogenies, at scale

Streetsignsservepeople,notmachines

Page 31: Integrating data with phylogenies, at scale

•  HowdowebuildareliableGPSforphylogenies?•  Howdowereproduciblyfindtherightnodes?

Mappingdatatophylogene(cknowledgespace

Page 32: Integrating data with phylogenies, at scale

FEED

Textual Definition –

The hyoglossus is a muscle that attaches to the hyoid and tongue and is innervated by Cranial Nerve XII.

Computable Definition –

('attached to' some 'hyoid bone') and ('attached to' some tongue) and ('innervated by' some 'hypoglossal nerve') and spatially disjoint with 'intrinsic tongue muscle'

Druzinskyetal(2015):LogicdefiniDonsofmammalianfeedingmusclesbymeansofnecessaryandsufficientcondiDonstrueforallmammals

Nomenclature≠Seman(cs

Page 33: Integrating data with phylogenies, at scale

Phyloreference=

Logicdefini(onofaclade,usingthepropertycommonto

alloflife

Page 34: Integrating data with phylogenies, at scale

PhyloreferencesStatementsformallyexpressingthepaaernswediscover

(analogoustomapcoordinates)

Node-Based Branch-Based Apomorphy-Based

A B C A B C A B C

X

ThecladeoriginaDngwiththelastcommonancestorofBandC.

ThecladeoriginaDngwiththefirstancestorofBthatisnotanancestorofA.

ThecladeoriginaDngwiththefirstancestorofCtoevolveX.

Page 35: Integrating data with phylogenies, at scale

PhyloreferencesyieldacoordinatesystemfortheTreeofLife

•  Anynode,branch,subtreeisreferenceable•  Referencesareunambiguous•  Referencesarecomputable•  Referencesareportable•  Adaptstonewandchangingknowledge

Page 36: Integrating data with phylogenies, at scale

Manyneededtechnologiesalreadyexist

•  OWLontologiesdesignedfor–  PhylogeneDcknowledge:

CDAO

–  Phenotypicknowledge:Uberon,PATO,…

–  Efficientandexpressivereasoners:FaCT++,HermiT,Racer,ELK

Page 37: Integrating data with phylogenies, at scale

0.0

Campanula_rotundifolia

Pseudonemacladus_oppositifolius

Lobelia_cardinalis

Campanula_latifolia

Cyphocarpus_rigescens

Wahlenbergia_linifolia

Nemacladus_ramosissmus

Lobelia_coronopifolia

Cyphia_elata

Pentaphragma

Crysanthemum

Sphenoclea

Platycodon_grandiflorus

Cyphia_bulbosa

53

Campanula

1

7

8

9

4

Lobelia

Cyphia

6

1 0

2

Class:Campanulaceae_1889_to_1980EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Crysanthemum

Page 38: Integrating data with phylogenies, at scale

0.0

Campanula_rotundifolia

Pseudonemacladus_oppositifolius

Lobelia_cardinalis

Campanula_latifolia

Cyphocarpus_rigescens

Wahlenbergia_linifolia

Nemacladus_ramosissmus

Lobelia_coronopifolia

Cyphia_elata

Pentaphragma

Crysanthemum

Sphenoclea

Platycodon_grandiflorus

Cyphia_bulbosa

53

Campanula

1

7

8

9

4

Lobelia

Cyphia

6

1 0

2

Class:Campanulaceae_1980EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Lobelia

Page 39: Integrating data with phylogenies, at scale

0.0

Campanula_rotundifolia

Pseudonemacladus_oppositifolius

Lobelia_cardinalis

Campanula_latifolia

Cyphocarpus_rigescens

Wahlenbergia_linifolia

Nemacladus_ramosissmus

Lobelia_coronopifolia

Cyphia_elata

Pentaphragma

Crysanthemum

Sphenoclea

Platycodon_grandiflorus

Cyphia_bulbosa

53

Campanula

1

7

8

9

4

Lobelia

Cyphia

6

1 0

2

Class:Campanulaceae_aier_1995EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Sphenoclea

Page 40: Integrating data with phylogenies, at scale

Phyloreferencesasontologicalexpressions

Phyloreferenceexpressionscanbe:•  Easilygeneratedbyanyone

•  Canworkonanytree•  Namedandregistered

– Topromotereuseandconsistency

– Toimproveusabilityandaccessibility

Class:CampanulaceaeAnnota(ons:rdfs:label“Campanulaceae_aier_1995”dc:descripDon“thecladethatincludesCampanulalaDfoliabutnotSphenoclea”EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Sphenoclea

Class:AGF4-SHRU-3560EquivalentTo:cdao:has_Descendantvaluetaxon:Campanula_laDfoliaandphyloref:excludes_lineagevaluetaxon:Sphenoclea

vs.

Page 41: Integrating data with phylogenies, at scale

Challenges

•  OWL-baseddatamodeltosaDsfyphylogeneDctaxonomy,reasoningexpressivity,scalability

•  ConvenDonsfordatatransformaDon,andconsequencesofdifferentchoices

•  LeastcommonancestorreasoningforOWLdata

•  LackofcanonicalspecimenidenDfiersystem•  Specifiermappingontologies

Page 42: Integrating data with phylogenies, at scale

TreeofLife,ontologized:Auniversalcoordinatesystem

•  TheTreeofLifeisitselfanaggregaDonandintegraDonofourphylogeneDcknowledge.

•  Phyloreferencingisaddressingintoaknowledgeuniverse.

•  Ontologies,reasoning,andotherKRtechniquesarepowerfultoolsforthis.

Page 43: Integrating data with phylogenies, at scale

Acknowledgements

•  NaDonalScienceFoundaDon(DBI-1458484)•  KenandLindaMcGurn•  Phenoscape•  EvoIO


Recommended