+ All Categories
Home > Documents > Community*demands*for*cloud*computing*...

Community*demands*for*cloud*computing*...

Date post: 30-Apr-2018
Category:
Upload: vuongnhi
View: 218 times
Download: 3 times
Share this document with a friend
25
1
Transcript
Page 1: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

1

Page 2: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Community  demands  for  cloud  computing  &challenges:  Environmental  Genomics  Community

• Richard  Nichols    (QMUL  &  NBAF)  • Yannick  Wurm      (QMUL)

Page 3: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

What  does  the  NERC  environmental  genomics  community  do  ?            What  do  they  ask  NBAF  for    ?    

(real  data  for  2014-­‐15)Vertebrates Invertebrates Plants Micro-orgs

RAD seq x x x

Epigenomics x xx

Metagenomics/barcoding x x xxxxxxxxxxx

Long-read methods x x x

Sequence capture/ reseq xxxx x x

Transcriptomics xx xxx xxx x

Genomic sequencing x xx

Page 4: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Challenges  to  design  &  delivery  of  rational  provision

UKRO  Funding National  &  International  Infrastructure

AWS  iPLANT  JASMIN  National  services  and  facilities  

Overseas  public  &  commercial  funding

Regional  and  inter-­‐institution  networks

Multi-­‐institution  grants  &  capital  expenditure  windfalls

Regional  HPC  Institution-­‐level  provision  QR  funding  

Subscription  from    smaller  research  grants

Research  group  level

Dedicated  clusters,  servers  and  specialist  architectures  

Smaller  grants

Page 5: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Challenges  to  design  &  delivery  of  rational  provision

UKRO  Funding National  &  International  Infrastructure

AWS  iPLANT  JASMIN  National  services  and  facilities  

Overseas  public  &  commercial  funding

Regional  and  inter-­‐institution  networks

Multi-­‐institution  grants  &  capital  expenditure  windfalls

Regional  HPC  Institution-­‐level  provision  QR  funding  

Subscription  from    smaller  research  grants

Research  group  level

Dedicated  clusters,  servers  and  specialist  architectures  

Smaller  grants

Strategy  ?

Page 6: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Challenges  to  design  &  delivery  of  rational  provision

UKRO  Funding National  &  International  Infrastructure

AWS  iPLANT  JASMIN  National  services  and  facilities  

Overseas  public  &  commercial  funding

Regional  and  inter-­‐institution  networks

Multi-­‐institution  grants  &  capital  expenditure  windfalls

Regional  HPC  Institution-­‐level  provision  QR  funding  

Subscription  from    smaller  research  grants

Research  group  level

Dedicated  clusters,  servers  and  specialist  architectures  

Smaller  grants

Strategy  ?

Opportunism  ?

Page 7: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Challenges  to  design  &  delivery  of  rational  provision

UKRO  Funding National  &  International  Infrastructure

AWS  iPLANT  JASMIN  National  services  and  facilities  

Overseas  public  &  commercial  funding

Regional  and  inter-­‐institution  networks

Multi-­‐institution  grants  &  capital  windfalls

Regional  HPC  Institution-­‐level  provision  QR  funding  

Subscription  from    smaller  research  grants

Research  group  level

Dedicated  clusters,  servers  and  specialist  architectures  

Smaller  grants

Strategy  ?

Opportunism  ?

Exasperation  ?

Page 8: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Expertise

• Training  may  not  be  the  answer  – Remove  the  need?  – Provide  expertise  with  other  services  ?  – Rebalance  the  community  ?  

Page 9: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

Page 10: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

Huge variance of genomics compute needsRepetitiveness “Disk”

Input/Output Memory Duration per task

Build 10,000 trees 10,000x low low short

Trim FASTQ files 40-400x high low short

One de novo genome assembly 1 high high long

Many de novo genome assemblies 20-1000x high high long

Determine which of 10 new tools that

promise X can actually do X (once). “genome hacking”

1 depends depends depends

No easy solutions

Page 11: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

• Biology/life is complex• Field is young.

Genomics computation is harder than other fields

Page 12: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Dorylus driver ants: ants with no home

© BBC

Page 13: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

• Biology/life is complex• Field is young.• Biologists lack computational training.• Generally, analysis tools suck.

• badly written• badly tested• hard to install• output quality… often questionable.

• Understanding/visualizing/massaging data is hard.• Datasets continue to grow!• Data formats keep changing.

Genomics computation is harder than other fields

Page 14: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Specific challenges

• Software switching

• Exploring approaches

• Project-specific versions (for reproducibility)

More genomics-specific challenges

Page 15: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Installing things (yourself/sysadmin)

Too complicated

Cloud VM instance creation interfaces (amazon)

Too slow/complicated/unreliable

Page 16: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

mymac:~/2015-­‐06-­‐01-­‐myproject>  abyss-­‐pe  k=25  reads.fastq.gz          zsh:  command  not  found:  abyss-­‐pe  

mymac:~/2015-­‐06-­‐01-­‐myproject>  oswitch  -­‐l          yeban/biolinux:8          ubuntu:14.04          ontouchstart/texlive-­‐full          ipython/ipython  

mymac:~/2015-­‐06-­‐01-­‐myproject>  oswitch  yeban/biolinux          ######  You  are  now  running:  biolinux  in  container  biolinux-­‐7187.  ######  

biolinux-­‐7187:~/2015-­‐06-­‐01-­‐myproject>  abyss-­‐pe  k=25  reads.fastq.gz          [...  just  works  on  your  files  where  they  are...]  

biolinux-­‐7187:~/2015-­‐06-­‐01-­‐myproject>  exit            ######  Back  to  your  host  OS  ######  

mymac:~/2015-­‐06-­‐01-­‐myproject>          [...  output  is  where  you  expect  it  to  be  ...]

oSwitchOne-line access to other operating systems.

EOS Cloud

https://github.com/wurmlab/oswitch

Or use in one-line e.g.: oswitch yeban/biolinux bwa aln -t 48 genome.fna reads.fq

Page 17: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

e.g.

Page 18: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Things feel (largely) unchanged: • Current working directory• User name, uid and gid• Login shell (bash/zsh/fish)• Home directory (including .dotfiles config).• read/write permissions.• Paths (when possible) - host-mounted

volumes (drives, NAS, USB) available in the container at the same path.

EOS Cloud

https://github.com/wurmlab/oswitch

oSwitchOne-line access to other operating systems.

Page 19: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Specific challenges

• Software switching

• Cloud instance provisioning user experience

More genomics-specific challenges

Page 20: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Usage patterns: CPU/RAM• 90% of the time: need nothing

• 5% of the time: need small resources

• 5% of the time: need huge resources

• sometimes on fat machine (many-core many-cpu)

• sometime via queuing system

Provisioning strategy?

Page 21: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

What users “should” do: • Development:

• Confusing cloud interface. Choose small number of Cores + RAM. Launch.

• Develop pipeline; seems to be working. • Production:

• Confusing cloud interface. Choose larger number of Cores + RAM. Launch.

• After 5 days it crashes. • Confusing cloud interface. Choose even larger number of

Cores + RAM. Launch.• Lucky. Analysis complete.

• Dailed to shutdown: Huge bill 2 months later

Page 22: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io

They’ll be inefficient and frustrated or go elsewhere.

Page 23: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Mylab @ RCUK cloud• Single place to connect to:

ssh mylab.rcukcloud.co.uk

• Is *always* there.

• Instance automagically:• grows from small to medium or large CPU/RAM machine

with increasing CPU and RAM demands (Nerc EOS Boost should be transparent).

• shrinks down to minimum• hibernates/sleeps when unused.

• (Also allows queue submission.)

Page 24: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

Specific challenges

• Software switching

• Cloud instance provisioning user experience

• Balancing storage demand: hyperfast (for working) vs large but cheap & easily accessible archival.

More genomics-specific challenges

Page 25: Community*demands*for*cloud*computing* …environmentalomics.org/.../uploads/2015/06/CloudComp… ·  · 2015-08-11Epigenomics x xx Metagenomics/barcoding x x xxxxxxxxxxx Long-read

http://wurmlab.github.io


Recommended