+ All Categories
Home > Documents > Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) •...

Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) •...

Date post: 18-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
5/12/16 1 Canadian Bioinforma1cs Workshops www.bioinforma1cs.ca
Transcript
Page 1: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

5/12/16

1

CanadianBioinforma1csWorkshops

www.bioinforma1cs.ca

2 Module #: Title of Module

Page 2: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

1

Introduc.ontocloudcompu.ng

MalachiGriffith,ObiGriffith,FrancisOulle?e

bioinformatics.ca RNAsequencingandanalysis

Learningobjec6vesofthecourse

•  Module0:Introduc6ontocloudcompu6ng•  Module1:Introduc.ontoRNASequencing•  Module2:AlignmentandVisualiza.on•  Module3:ExpressionandDifferen.alExpression•  Module4:IsoformDiscoveryandAlterna.veExpression•  Tutorials

–  UsetheAWSEC2consoletosetupanEC2instance–  Logintoinstancefromcommandline

Page 3: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

2

bioinformatics.ca RNAsequencingandanalysis

Learningobjec6vesofmodule0

•  Introduc.ontocloudcompu.ngconcepts•  Introduc.ontocloudcompu.ngproviders•  UsetheAmazonEC2consoletocreateaninstanceforeachstudent– Willbeusedformanyhands-ontutorialsthroughoutthecourse

•  Howtologintoyourcloudinstance

bioinformatics.ca RNAsequencingandanalysis1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 20120

1

10

100

1,000

10,000

100,000

1,000,000

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

Disk Storage (Mbytes/$)

DNASequencing(bp/$)

Harddiskstorage(MB/$)Doubling*me=14mo

Pre-nextgensequencing(bp/$)Doubling*me=19mo

Nextgensequencing(bp/$)Doubling*me=4mo0

DiskCapacityvsSequencingCapacity,1990-2012

Page 4: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

3

bioinformatics.ca RNAsequencingandanalysis

AboutDNAandcomputers

•  We'llhitthe$1000genomeduring2015-?,thenneedtothinkaboutthe$100genome.

•  Thedoubling.meofsequencinghasbeen~5-6months.•  Thedoubling.meofstorageandnetworkbandwidthis

~12months.•  Thedoubling.meofCPUspeedis~18months.•  Thecostofsequencingabasepairwilleventuallyequal

thecostofstoringabasepair

bioinformatics.ca RNAsequencingandanalysis

Whatisthegeneralbiomedicalscien6sttodo?

•  Lotsofdata•  PoorITinfrastructureinmanylabs•  Wheredotheygo?•  Writemoregrants?•  Getbiggerhardware?

Page 5: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

4

bioinformatics.ca RNAsequencingandanalysis

Cloudcompu6ngproviders

•  AmazonAWS–  h?ps://aws.amazon.com/

•  Googlecloud–  h?ps://cloud.google.com/

•  Digitalocean–  h?ps://www.digitalocean.com/

•  OthersIhavenottried:– MicrosofAzure(h?ps://azure.microsof.com/en-us/)–  Rackspacecloud(h?p://www.rackspace.com/cloud)

bioinformatics.ca RNAsequencingandanalysis

AmazonWebServices(AWS)

•  Infinitestorage(scalable):S3(simplestorageservice)•  Computeperhour:EC2(elas.ccloudcompu.ng)•  ReadywhenyouareHighPerformanceCompu.ng•  Mul.plefootballfieldsofHPCthroughouttheworld•  HPCareexpandedatonecontainerata.me:

Page 6: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

5

bioinformatics.ca RNAsequencingandanalysis

Someofthechallengesofcloudcompu6ng:

•  Notcheap!•  Geingfilestoandfromthere•  Notthebestsolu.onforeverybody•  Standardiza.on•  PHI:personalhealthinforma.on&securityconcerns•  IntheUSA:HIPAAact,PSQIAact,HITECHact,Patriotact,CLIAandCAPprograms,etc.–  h?p://www.biostars.org/p/70204/

bioinformatics.ca RNAsequencingandanalysis

Someoftheadvantagesofcloudcompu6ng:

•  WereceivedagrantfromAmazon,sosupportedby‘AWSinEduca.ongrantaward’.

•  Therearebe?erwaysoftransferringlargefiles,andnowAWSmakesitfreetouploadfiles.

•  AnumberofdatasetsexistonAWS(e.g.1000genomedata).

•  Manyusefulbioinforma.csAMI’s(AmazonMachineImages)existonAWS:e.g.cloudbiolinux&CloudMan(Galaxy)–nowoneforthiscourse!

•  Manyflavorsofcloudavailable,notjustAWS

Page 7: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

6

bioinformatics.ca RNAsequencingandanalysis

Inthisworkshop:•  Sometools(data)are

•  onyourcomputer•  ontheweb•  onthecloud.

•  Youwillbecomeefficientattraversingthesevariousspaces,andfindingresourcesyouneed,andusingwhatisbestforyou.

•  Therearedifferentwaysofusingthecloud:1.  Commandline(likeyourownverypowerfulUnixbox)2.  Withaweb-browser(e.g.Galaxy):notinthisworkshop

bioinformatics.ca RNAsequencingandanalysis

Thingswehavesetup:

•  Loadeddatafilestoanfpserver•  WebroughtupanUbuntu(Linux)instance,andloadedawholebunchofsofwareforNGSanalysis.

•  Wethenclonedthis,andmadeseparateinstancesforeverybodyintheclass.

•  We’vesimplifiedthesecurity:youbasicallyallhavethesameloginandfileaccess,andopenedports.Inyourownworldyouwouldbemoresecure.

Page 8: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

7

bioinformatics.ca RNAsequencingandanalysis

AmazonAWSdocumenta6on

h?ps://github.com/griffithlab/rnaseq_tutorial/wiki/Intro-to-AWS-Cloud-Compu.ng

h?p://aws.amazon.com/console/

bioinformatics.ca RNAsequencingandanalysis

LoggingintoAmazonAWS

Page 9: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

8

bioinformatics.ca RNAsequencingandanalysis

LogintoAWSconsole

https://364840684323.signin.aws.amazon.com/console

bioinformatics.ca RNAsequencingandanalysis

Select"EC2"service

Make sure you are in Oregon region

Page 10: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

9

bioinformatics.ca RNAsequencingandanalysis

LaunchanewInstance

bioinformatics.ca RNAsequencingandanalysis

ChooseanAMI–FindtheCSHLSEQTEC2015AMIintheCommunityAMIs

Search for: cshl_seqtec_2015_v3 - ami-58031239 (US West - Oregon)

Page 11: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

10

bioinformatics.ca RNAsequencingandanalysis

Choose”m4.2xlarge"instancetype,then"Next:ConfigureInstanceDetails".

bioinformatics.ca RNAsequencingandanalysis

Select"Protectagainstaccidentaltermina6on",then"Next:AddStorage".

Page 12: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

11

bioinformatics.ca RNAsequencingandanalysis

Youshouldsee"snap-xxxxxxx"(32GB)and"snap-xxxxxxx"(500GB)asthetwostoragevolumesselected.Then,"Next:TagInstance"

bioinformatics.ca RNAsequencingandanalysis

Createataglike“Name=ObiGriffith”[useyourownname].Thenhit"Next:ConfigureSecurityGroup".

Important: Don’t forget to name your instance

Page 13: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

12

bioinformatics.ca RNAsequencingandanalysis

SelectanExis6ngSecurityGroup,choose"SSH_HTTP_8081_IN_ALL_OUT".Thenhit"ReviewandLaunch".

bioinformatics.ca RNAsequencingandanalysis

Reviewthedetailsofyourinstance,notethewarnings,thenhitLaunch

Page 14: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

13

bioinformatics.ca RNAsequencingandanalysis

Chooseanexis6ngkeypair:"CBW"andthenLaunch.

bioinformatics.ca RNAsequencingandanalysis

ViewInstancestoseeyournewinstancespinningup!

Page 15: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

14

bioinformatics.ca RNAsequencingandanalysis

FindYOURinstance,selectit,andthenhitconnectforinstruc6onsonhowtoconnect

bioinformatics.ca RNAsequencingandanalysis

TakenoteofyourIPaddressandtheinstruc6onsonchangingpermissionsforthekeyfile(Note,wewilllogin

asubuntuNOTroot)

Page 16: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

15

bioinformatics.ca RNAsequencingandanalysis

Openinga‘terminalsession’onaMac

In a Finder window ‘Applications’ -> ‘Utilities’ -> ‘Terminal’

Or on your dock

bioinformatics.ca RNAsequencingandanalysis

AddtheterminalApptoyourdock

Page 17: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

16

bioinformatics.ca RNAsequencingandanalysis

Crea6ngaworkingdirectoryonyourMaccalled‘cbw’

bioinformatics.ca RNAsequencingandanalysis

OnMac:Control+

SaveLinkAs

ObtainyourAWS‘key’filefromcoursewiki

Save key file to your new ‘cbw’ directory

Go to course wiki, “Presentations” page

Page 18: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

17

bioinformatics.ca RNAsequencingandanalysis

Viewingthe‘key’fileoncedownloaded

bioinformatics.ca RNAsequencingandanalysis

ls-l(longlis.ng)drwx------+67ogriffitstaff227822May21:25../-rw-r--r--@1ogriffitstaff169622May21:31CBW.pemrwx:ownerrwx:grouprwx:worldrread(4)wwrite(2)xexecute(1)Whicheverwayyouaddthese3numbers,youknowwhichintegerswereused(6isalways4+2,5is4+1,4isbyitself,0isnoneofthemetc…)So,whenyouhave:chmod400<filename>Itis“r”forthethefileowneronly

Changingfilepermissionsofyour‘key’file(Mac/Linux)

Page 19: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

18

bioinformatics.ca RNAsequencingandanalysis

Loggingintoyourinstance

Mac/Linux

cd cbw/ chmod 400 CBW.pem ssh -i CBW.pem ubuntu@[YOUR INSTANCE IP ADDRESS]

bioinformatics.ca RNAsequencingandanalysis

CopyingfilesfromAWStoyourcomputer(usingawebbrowser)

http://[YOUR INSTANCE IP ADDRESS]/

Page 20: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

19

bioinformatics.ca RNAsequencingandanalysis

Loggingoutofyourinstance

Mac/Linux – simply type exit

exit

Note, this disconnects the terminal session (ssh connection) to your cloud instance. But, your cloud instance is still running! See next slide for how to stop your instance.

bioinformatics.ca RNAsequencingandanalysis

Whenyouaredoneforthedayyoucan“Stop”yourinstance–Don’tTerminate!

Go to AWS EC2 Dashboard, select “Instances” tab, then find your instance. Right-click and chose

‘Instance State’ -> ‘Stop’

Page 21: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

20

bioinformatics.ca RNAsequencingandanalysis

Nextmorning,youcan“Start”yourinstanceagain

Go to AWS EC2 Dashboard, select “Instances” tab, then find your instance. Right-click and chose

‘Instance State’ -> ‘Start’

bioinformatics.ca RNAsequencingandanalysis

WhenyourestartyourinstanceyouwillneedtofindyournewIPaddress.Selectyourinstanceand“Connect”orlookinDescrip6ontab.Thengobacktoinstruc6onsfor“Loggingintoyourinstance”

Page 22: Canadian Bioinformacs Workshops - files.bioinformatics.ca · Amazon Web Services (AWS) • Infinite storage (scalable): S3 (simple storage service) • Compute per hour: EC2 (elas.c

6/14/16

21

bioinformatics.ca RNAsequencingandanalysis

So,atthispoint:

•  YourMacisreadyfortheworkshop•  Ifitisnot,youknowwheretogettheinforma.onyouneed

•  YouknowhowtologintoAWS•  ThenextstepistologintoyourlinuxmachineonAWSandlearnthebasicsofalinuxcommandline


Recommended