TAIR funding and database sustainability Eva Huala...

Post on 14-Feb-2018

214 views 0 download

transcript

TAIRfundinganddatabasesustainability

EvaHualaPhoenixBioinforma=cs

WhatisTAIR?

GenomedatabaseforthemodelplantArabidopsisthaliana,establishedin1999Focusonmanualcura8onofgenefunc8on(~33,000genes)WasNSF-funded1999-2013(1.1M/yrdirectcosts)Interna8onalusage(~53Kuniquevisitors/month

March14,2015

Exampledatasources

•  PubMed-publishedar=clesaboutArabidopsis•  Individualresearchers–func=onalannota=ons•  UniProt–func=onalannota=ons•  NCBI–sequencedata•  Araport–newgenomerelease•  PANTHER–genefamilies

TAIRUsagein20162.2millionsessions14millionpageviews53,800uniquevisitors/month

TypicalTAIRusecases

•  Plantbiologyresearcherneedsinforma=ononanunfamiliargene–  IfArabidopsis

gene,searchmayusesymbolorlocusID

–  Ifnon-Arabidopsisgene,userlocatesclosestArabidopsishomologsfirstviaBLASTsearch

•  SeedandDNAstockordering

TopTAIRPages-2016

Laborandcost-intensiveac=vi=es

•  Extrac=onofgenefunc=onandotherdatafromresearchar=cles

•  So^waremaintenance(bugfixes,upgrades)•  Developmentofnewinterfacesorfeatures•  Integra=onofnewdatasets(e.g.Araport11)•  Helpdesksupportforresearchers

TAIRCura=onSta=s=cs

2014 2015

Arabidopsis articles added

4491 4107

Subset with curatable gene information

2094 (47% of total) 2112 (51% of total)

Number of validated gene-article matches

6972 10,144

Number of new gene symbols added

676 596

Effortstoreducecura=oncosts

•  Canwegetauthorstodomorecura=onoftheirownwork?–  DevelopedTOASTcommunitycura=ontoolandpartnershipswith

journals

•  Canwedecreasecostsandincreasecommunitypar=cipa=onbyrecrui=ng(andpaying)postdocsasexternalcurators?–  Experimentusing4postdocsatUCB/PGEC

•  Canwedomoretoautomatethecura=onprocess?–  Adop=onofTextpressotextminingmethodsforcellularcomponent

cura=on–  Star=ngcollabora=onwithStanfordDeepDivegroup

Reininginso^waredevelopmentandmaintenancecosts

•  Bestprac=cestoavoidintroducingnewbugs–  Con=nuousdeployment,con=nuousintegra=on,unittests

•  Agilemethodsanduser-drivendevelopmentprocessforincreasedefficiencyandbeeeroutcomes

Subscrip=onfundingmodel

•  Advantages–  Stablerevenuestreamdistributedovermanyins=tu=onsand

countries–  Thosethatvaluetheresourceaskedtosupportit–  Financialincen=vesarewellalignedwithourgoaltomakethedata

usableanddiscoverable–  Supportnaturallyscalesupasusageandvaluetoresearchersgrows

•  Disadvantages–  Poten=alforresearchers,studentsandotherstoloseaccess–  Barrierstodatasharingandreuse

Nonprofitapproachcanmi=gatedisadvantagesProblem:•  Lossofaccess

–  Researchersintherelevantdisciplinewhocan’taffordtopay–  Researchersfromotherdisciplines,public(infrequentusers)–  Students

Solu=ons:•  Emphasisonlargesubscrip=onsoversmall

–  countries,consor=aprovideaccesstolargenumberofresearchers

•  Affordablepricingwithslidingscaleaccordingtousagelevel•  Meteredaccess(somefreepageviewseachmonth)

–  providesaccessforinfrequentusers•  Freeaccessforlowestincomecountries•  Freeaccountsforstudentsusingthedatabaseinacourse

Nonprofitapproachcanmi=gatedisadvantagesProblem:•  Barriertodatareuse

–  Researchersneedtopublishsubsetsofdatainthecourseoftheirwork–  Otherrepositoriescan’tdisplay,reuseandfurtherenhancethedata

Solu=ons:•  Adoptacademiclicensetermsthatallowresearcherstopublishandreuse

limitedexcerpts•  Freelyreleasealldatasetsa^eroneyearintherepository

–  TAIRdoesthisviaquarterlydatareleasefiles

SourcesofTAIRsubscrip=onfunds•  WhopaysforTAIRsubscrip=ons?

–  Universitylibrariesandconsor=a(55%)–  Countries(China,Switzerland)(27%)–  Companies(16%)–  Individualresearchers(2%)

•  Howaretheycharged?–  Invoicedforannualormul=-year

subscrip=on(similartojournalsubscrip=on)

$0

$10,000

$20,000

$30,000

$40,000

$50,000

$60,000

$70,000

$80,000

$90,000

$100,000

J FMAMJ J ASOND J FMAMJ J ASOND J FMAMJ J ASOND

TAIR-MonthlySubscrip=onRevenue1/2014-12/2016

Actual

Projected

$0

$10,000

$20,000

$30,000

$40,000

$50,000

$60,000

$70,000

$80,000

$90,000

$100,000

J FMAMJ J ASOND J FMAMJ J ASOND J FMAMJ J ASOND

TAIRMonthlySubscrip5onRevenue1/2014-7/2016

2014 2015 2016

AnnualRevenue:

$627,000 $919,000 $1,035,000

NewFundingParadigm

•  Grant-basedfundingforini=aldatabasedevelopment

•  Subscrip=onfundingtosupportopera=onsofmaturedatabase–  Mustbeaffordable,manysubscrip=onop=ons–  Accessforarangeofusersmustbeconsidered

(domainresearchers,researchersfromotherfields,students)–  Freelyreleasedataa^erminimalperiodrequiredtoprovide

subscrip=onincen=ve

•  Addi=onalgrant-basedfundingtodevelopmajornewfeatures