Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | cecil-higgins |
View: | 212 times |
Download: | 0 times |
The iPlant Collaborative Community Cyberinfrastructure for Life Science
Roger Barthelson/Uwe HilgertiPlant / University of Arizona
The iPlant CollaborativeVision
www.iPlantCollaborative.org
Enable life science researchers and educators touse and extend cyberinfrastructure
Community-driven organization builds cyberinfrastructure for biological sciences
The iPlant CollaborativeVision
UATACC
CSHL
iPlant CollaborativeA virtual organization
Biological CyberinfrastructureBig Data in Biology
Human Genome:$2.7 Billion, 13 Years
Human Genome: $900, 6 Hours
2014Oxford Nanopore
MiniION
2003: ABI 3730 Sequencer
The Egalitarian GenomeNext Generation Sequencing 2014
“BGI, based in China, is the world’s largest genomics research institute, with 167 DNA sequencers producing the equivalent of 2,000 human genomes a day.
BGI churns out so much data that it often cannot transmit its results to clients or collaborators over the Internet or other communications lines because that would take weeks. Instead, it sends computer disks containing the data, via FedEx.”
The Big Data ProblemStorage and Analysis
Biological CyberinfrastructureThe Problem of Big Data in Biology
Biology’s Other Big Data
Phenomics
Visualization
How iPlant CI Enables DiscoveryChallenge: Create an easy-to-use platform powerful enough
to handle data-intensive biologyMany bioinformatics tools “off limits” to those without
specialized computational backgrounds (“command line”).
• Data Store• Discovery Environment – 100s of tools/apps• Atmosphere – Cloud Computing• Bisque – Image Analysis Environment• APIs
iPlant APIsResources
The Biology App Store
The iPlant CollaborativeWhat is cyberinfrastructure?
Manage DataShare Data
Analyze Data
Scalable, accessible computation:data storage, cloud services, and software tools
Utilize Big Data TechFacilitate
CollaborationsConnect Resources
Manage Access
Enable science(verifiable, reproducible, tractable)
The iPlant CollaborativeWhat iPlant offers
• Data Management & Storage Resources• Access to High Performance Computing Resources• Tool Integration System• Application Programming Interfaces (APIs)• Cloud Computing• Genotype To Phenotype Science Enablement• Tree of Life Science Enablement• Image Analysis Platform• Support for Molecular Breeding Platform (IBP)• Support for AgMIP• Others to come...
The iPlant CollaborativeWhat iPlant offers
The iPlant CollaborativeWhat iPlant offers
The iPlant CollaborativeWhat iPlant offers
How iPlant CI Enables DiscoveryChallenge: Navigate biology’s “data deluge”
HT Image data – GB’s per dayHT sequence data – TB’s per run
iPlant Data Store
Texas
Replication
Arizona
Grid Computing
Cloud Computing HPCCommunity
Super Computing
iDrop
WebDAV
FoundationAPI
DE
i-commands
iPlant Data StoreScalableReliableRedundantHigh-PerformanceConnected
How iPlant CI Enables DiscoverySolution: iPlant Data Store
All data in within the same platform speed and accessibility
• Access your data from multiple iPlant services
• Automatic data backup redundant between University of Arizona and University of Texas
• Multiple ways to share data with collaborators
• Multi-threaded high speed transfers
• Default 100 GB allocation. >1 TB allocations available with justification
Source Time (s)
CD 320
Berkeley Server 150
External Drive 36*
USB2.0 Flash 30
iPlant Data Store 18*
My Computer 15
Getting 1 GB onto my computer takes...
How iPlant CI Enables DiscoveryWhat iPlant data solutions mean for a bovine breeder
“It's kind of like being in that COPD commercial where the weight is lifted off your chest, only with iPlant, we have access to more computational power, so we can get to projects much faster and we can do big projects that our machines may not have allowed us to do previously!
The ability to transport 2TB of data overnight using the iRODS system was particularly helpful because previously, we had been mailing hard drives which is not an optimal solution to sharing big data.”
James Koltes, Iowa State
How iPlant CI Enables DiscoverySolution: Discovery Environment
An extensible platform for science
• High-powered computing• Data sharing/collaboration• Easy to use interface• Virtually limitless apps• Analysis history (provenance)
iPlant’s Discovery EnvironmentWeb Interface for Hundreds of Applications
(Some) Apps in Discovery Environment
• Sequence Quality Control– FastQC– Fastx Toolkit– Sabre, Scythe, Sickle (paired end
trimming)– SGA cleanup (paired end quality
trimming)– Coming soon…
Sequence induction, assessment, and trimming pipeline
Mira contaminant detection and removal
(for sequencing studies)
(Some) Apps in Discovery Environment
• Genome Assembly– ABySS– Soapdenovo2– Velvet– Newbler– Contig analysis tools
With or without reference sequence for comparison
– Coming soon…Minimus2MiraPacBioToCA Or PBJelly?
(for sequencing studies)
(Some) Apps in Discovery Environment
• Transcript assembly/RNASeq– Tophat, Cufflinks, Cuffmerge,
CuffDiff– Oases– Trinity– Newbler– Scarf– Coming soon…
Open pipeline for transcript expression analysis (quantitative RNASeq)
Mira transcriptome assembly
(for sequencing studies)
The iPlant CollaborativeWhat iPlant offers
The iPlant CollaborativeWhat iPlant offers
How iPlant CI Enables DiscoveryWhat the Discovery Environment means to bench biologists
“In one week I was able to align my RNA-Seq samples using a method that previously took me a month on my bioinformatics computers…
Being able to access my data any time and from anywhere – price less.
The DE interface is intuitive and easy to use...[and] will allow greater continuity and comparability between different experiments from different laboratories.”
Richard Barker – Univ. Wisconsin, Madison
How iPlant CI Enables DiscoveryChallenge: Collaborate and access software on demand
Frustrated bioinformaticians serving the needs of severalusers
+ works well / powerful- expensive / complex
Cartoon: http://phdhumor.blogspot.com/2008/12/on-lazy-day-for-bioinformatician.html
How iPlant CI Enables DiscoveryiPlant Solution: Atmosphere
On-demand computing resource built on a cloud infrastructure
• Virtual Machine pre-configured with: Software Memory requirements Processing power
• Plant authentication and storage and HPC capabilities
• Build custom images/appliances and share with community
• Cross-platform desktop access to GUI applications in the cloud (using VNC)
Atmosphere: Your Cloud, Your Way
Google Cloud
Atmosphere
AtmosphereSelect a Machine Image, Launch
How iPlant CI Enables DiscoveryWhat Atmosphere means to bioinformaticians
“What my users used to call me for, they now do on their own through Atmosphere. Now I can scale up my user community”
Nathan Miller, Univ. Wisconsin, Madison
• BLAST 400k transcripts against NCBI nr in 36 h vs. 2 months
• Use iPlant Data Store to move 1500 high-res images per day for analysis
“iPlant is a great equalizer.” Mike Covington, UC Davis
The iPlant CollaborativeYour colleagues
Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovJohn FonnerMelyssa FratkinMichael Gatto
Leadership Team
Steve Goff - UADan Stanzione – TACCMatt Vaughn - TACCNirav Merchant – UAEric Lyons - UADoreen Ware – CSHL
Faculty Advisors & Collaborators:Ali AkogluKobus BarnardVolker BrendelTimothy ClausnerSally ElginBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. ManjunathDavid Neale
Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin
Brian O’MearaSudha RamDavid SaltMark SchildhauerNeelima SinhaDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisRick StevensJames TaylorBrett TylerSteve Welch
Zhenyuan LuEric LyonsAaron MarcuseKubitzNaim MatasciRobert McLayNathan MillerSteve Mock Martha NarroShannon OliverBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderEdwin SkidmoreBrandon Smith
Utkarsh GaurCornel GhibanSteve GregoryMathew HelmkeNatalie HenriquesUwe HilgertNicole HopkinsLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina Lee
Mary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu
Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang
Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel
John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce
Michael Schatz – CSHLDavid Micklos – CSHLAnn Stapleton – UNCWRon Vetter – UNCW
Connect with iPlant!
Twitter: @iPlantCollab #iPlantFacebook: facebook.com/iPlantCollab
LinkedIn: iplant.co/iPlantCollabLinkedInGoogle+: iplant.com/iPlantGooglePlus