+ All Categories
Home > Documents > EteRNA talk - v0

EteRNA talk - v0

Date post: 08-Mar-2015
Category:
Upload: adrien-treuille
View: 56 times
Download: 1 times
Share this document with a friend
64
10 Reasons to Crowdsource Science Adrien Treuille Carnegie Mellon University
Transcript
Page 1: EteRNA talk - v0

10 Reasons to Crowdsource

Science

Adrien TreuilleCarnegie Mellon University

Page 2: EteRNA talk - v0

Protein Folding

http://martin-protean.com/protein-structure.html

Page 3: EteRNA talk - v0

Protein Folding

M S F Q YG H IGY I Y T R L A L SA Y V A N T R …L

Amino Acid Sequence Protein Shape

• Key to understanding life.• Huge computational resources.

Page 4: EteRNA talk - v0

Protein Folding

Page 5: EteRNA talk - v0

RNA Nanoengineering

G C U A AG G UCA U A C G A U A CC A A C A T G …A

Nucleotide Sequence Target RNA Shape

• Next-generation Catalysts• Drug-responsive Control Elements

Page 6: EteRNA talk - v0

RNA Nanoengineering

Game Interface Voting

SynthesisResults

Page 7: EteRNA talk - v0

RNA Nanoengineering

Crowdsourcing theScienti!c Method

Page 8: EteRNA talk - v0

Crowdsourcing Science

Launched 2008 Launched 2011

Protein Folding RNA Nanoengineering

57,000 Players 25,000 Players

Computational Chemistry

Experimental Chemistry

Page 9: EteRNA talk - v0

Crowdsourcing Science

Scientists Problem Game Players

Page 10: EteRNA talk - v0

Crowdsourcing Science

Scientists Problem

GamePlayers

Page 11: EteRNA talk - v0

10 Reasons to Crowdsource

Science

Page 12: EteRNA talk - v0

10 Reasons to Crowdsource Science

#1 Games Make Us Understand

Page 14: EteRNA talk - v0

#1Games Make Us Understand

Lock WigglePull/Bands

Shake Rebuild Tweak

Page 15: EteRNA talk - v0

#1Games Make Us Understand

Repulsive Attractive

Solvation HydrogenBonds

IssueAnalysis

Page 16: EteRNA talk - v0

#1Games Make Us Understand

Foldit EteRNA

Page 17: EteRNA talk - v0

#1Games Make Us Understand

InteractiveBiology

Page 18: EteRNA talk - v0

10 Reasons to Crowdsource Science

Games Make Us Understand#1#2 Humans Solve Hard Problems

Page 19: EteRNA talk - v0

#2Humans Solve Hard Problems

Native Conformation

Best Computer Solution

Best Player Solution

Page 20: EteRNA talk - v0

#2Humans Solve Hard Problems

Native Conformation

Best Computer Solution

Best Player SolutionPhD

Page 21: EteRNA talk - v0

#2Humans Solve Hard Problems

Native Conformation

Starting Positioin

Best Player Solution

Predicting protein structures with a multiplayer online game.

Nature Vol 466, 5 August 2010.

Page 22: EteRNA talk - v0

#2Humans Solve Hard Problems

Target Shape

Page 23: EteRNA talk - v0

#2Humans Solve Hard Problems

EteRNA Score: 96%Ding's Round 2

Bulged Star

by Ding

Mat - Bulged star

v1.1

by mat747

Starry's Bulged Star

III

by starryjess

EteRNA Score: 94% EteRNA Score: 94%

Pla

yer

Des

igns

Com

pute

r D

esig

ns

ViennaRNA

Design 03

by ViennaRNA Bot

ViennaRNA

Design 05

by ViennaRNA Bot

ViennaRNA

Design 02

by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%

Page 24: EteRNA talk - v0

#2Humans Solve Hard Problems

EteRNA Score: 96%Ding's Round 2

Bulged Star

by Ding

Mat - Bulged star

v1.1

by mat747

Starry's Bulged Star

III

by starryjess

EteRNA Score: 94% EteRNA Score: 94%

Pla

yer

Des

igns

Com

pute

r D

esig

ns

ViennaRNA

Design 03

by ViennaRNA Bot

ViennaRNA

Design 05

by ViennaRNA Bot

ViennaRNA

Design 02

by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%

Page 25: EteRNA talk - v0

10 Reasons to Crowdsource Science

#1 Games Make Us UnderstandHumans Solve Hard Problems#2

#3 It’s a Total Rush!

Page 26: EteRNA talk - v0

#3 It’s a Total Rush!

Wow, you sure know a lot about Foldit!

Player:

Engineer: Thank you. Actually, I was one of the programmers.

Really?

Yes.

You are a god.

Engineer:

Player:

Player:

Page 27: EteRNA talk - v0

10 Reasons to Crowdsource Science

#2 Humans Solve Hard ProblemsGames Make Us Understand

It’s a Total Rush!

#1

#3#4 Human Learning

Page 28: EteRNA talk - v0

#4 Human Learning

Target Shape

EteRNA Score: 96%Ding's Round 2

Bulged Star

by Ding

Mat - Bulged star

v1.1

by mat747

Starry's Bulged Star

III

by starryjess

EteRNA Score: 94% EteRNA Score: 94%

Pla

yer

Des

igns

Com

pute

r D

esig

ns

ViennaRNA

Design 03

by ViennaRNA Bot

ViennaRNA

Design 05

by ViennaRNA Bot

ViennaRNA

Design 02

by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%

EteRNA Score: 96%Ding's Round 2

Bulged Star

by Ding

Mat - Bulged star

v1.1

by mat747

Starry's Bulged Star

III

by starryjess

EteRNA Score: 94% EteRNA Score: 94%

Pla

yer

Des

igns

Com

pute

r D

esig

ns

ViennaRNA

Design 03

by ViennaRNA Bot

ViennaRNA

Design 05

by ViennaRNA Bot

ViennaRNA

Design 02

by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%

EteRNA Score: 96%Ding's Round 2 Bulged Starby Ding

Mat - Bulged star v1.1by mat747

Starry's Bulged Star IIIby starryjess

EteRNA Score: 94% EteRNA Score: 94%

Player D

esigns

Computer D

esigns

ViennaRNADesign 03by ViennaRNA Bot

ViennaRNADesign 05by ViennaRNA Bot

ViennaRNADesign 02by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75%

EteRNA Score: 73%

Computer Solutions

EteRNA Score: 96%Ding's Round 2

Bulged Star

by Ding

Mat - Bulged star

v1.1

by mat747

Starry's Bulged Star

III

by starryjess

EteRNA Score: 94% EteRNA Score: 94%

Player Designs

Computer Designs

ViennaRNA

Design 03

by ViennaRNA Bot

ViennaRNA

Design 05

by ViennaRNA Bot

ViennaRNA

Design 02

by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%

EteRNA Score: 96%Ding's Round 2

Bulged Star

by Ding

Mat - Bulged star

v1.1

by mat747

Starry's Bulged Star

III

by starryjess

EteRNA Score: 94% EteRNA Score: 94%

Player Designs

Computer Designs

ViennaRNA

Design 03

by ViennaRNA Bot

ViennaRNA

Design 05

by ViennaRNA Bot

ViennaRNA

Design 02

by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%

EteRNA Score: 96%Ding's Round 2 Bulged Starby Ding

Mat - Bulged star v1.1by mat747

Starry's Bulged Star IIIby starryjess

EteRNA Score: 94% EteRNA Score: 94%

Pla

yer

Des

igns

Com

pute

r D

esig

ns

ViennaRNADesign 03by ViennaRNA Bot

ViennaRNADesign 05by ViennaRNA Bot

ViennaRNADesign 02by ViennaRNA Bot

EteRNA Score: 76% EteRNA Score: 75%

EteRNA Score: 73%

Player Solutions

Page 29: EteRNA talk - v0

#4 Human Learning

Computer Solutions

Player Solutions

Page 30: EteRNA talk - v0

#4 Human Learning

Page 31: EteRNA talk - v0

10 Reasons to Crowdsource Science

#3 It’s a Total Rush!

Games Make Us UnderstandHumans Solve Hard Problems

Human Learning

#2#1

#4#5 Humans Create Knowledge

Page 32: EteRNA talk - v0

#5Humans Create Knowledge

Page 33: EteRNA talk - v0

#5Humans Create KnowledgeI have been painstakingly going over most, if not all of the new Lab Design Submissions by brand new players.

I was chagrined to !nd...

...a dozen Christmas Trees were submitted.

Let us not waste even one of our precious few design slots.

Page 34: EteRNA talk - v0

#5Humans Create Knowledge

Page 35: EteRNA talk - v0

#5Humans Create Knowledge

RNAfold POSITIONAL ENTROPY

A meta-analysis of one-cross-bulge results

I: positional entropy and what it means

by alan.robot, updated on 2/9 (changed incorrect link to d9’s design analysis)last updated 3/14 (fixed dead links to server results, thanks to Chaendryn for re-running them)

NOTE: this document is now shared as a web page here that does not require a login, please updateyour links!

Ok, so you are wondering how you can improve your submitted designs using the output from the ViennaRNA suite of programs, which have been confirmed by the devs to be the computational backend foreteRNA. In this tutorial, I’ll show you an example of how positional entropy can be used to help predictwinning and loosing designs, even before you submit!

*disclaimer* I’m not affiliated with eteRNA, and although I am a computational biophysicist, I’m not aspecialist in RNA bioinformatics, so any inaccuracies are my fault alone and not due to eteRNA or its staff.

First things first:

The Vienna RNA servers are here: http://rna.tbi.univie.ac.at/The Vienna source code is here: http://www.tbi.univie.ac.at/~ivo/RNA/THIS is a link to a discussion on how to download the sequence files for submitted designs for lab 103 “onebulge cross”.

I will be referring to output from the web server version for this tutorial, but if you want to do your own analysis of morethan 1 sequence at a time you’re probably best off compiling and running on your own machine. It’s not as hard as itsounds, you do not need to know how to program, but you do need a unix/linux environment to compile in. If you arerunning windows, I can highly recommend ubuntu running on virtualbox (http://www.virtualbox.org/), both are freesoftware and very user-friendly to set up and use, beats the pants off of Cygwin.

Here is a link to the results when the round 4 winning design by dimension 9, input into the RNAfold webserver. Note Ihave no idea how long that link will work, so I’ll cut and paste the relevant bits here if you want to try and reproduce it.Use default settings except where mentioned below, you have to expand the “show advanced options” to see them

sequence:GGAAGGUUCUCUGGCGUUCGUGAAAACAUGAAUGGGAGGCAUCAAGAGAUGGCUCCGCUUGUUCAAGAGAAUAGGCCCAGAGAGCAAA

advanced settings: unpaired bases can participate in at most one dangling end (MFE folding only)

(yes, for the super observant, this is the rule that lets only one side of a loop get a bonus from adding a red ‘G’)

Turner 1999 energy parameters

So now you get a pretty output page with lots of details. What should you pay attention to?

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

1 of 6 7/18/11 11:24 AM

“The optimal secondary structure in dot-bracket notation with a minimumfree energy of -39.80 kcal/mol is given below.”

This line should match up with the energy in ETErna, otherwise you have chosen the wrong energy options! From hereon out, I will say MFE instead of minimum free energy.

“The frequency of the MFE structure in the ensemble is 80.77 %”

Note the very high percentage! This is good. This means of all the possible structures that the server considered likelyto occur (including suboptimal folds, NOT just the MFE fold), the majority of them are of the correct fold. Note that whenthese are synthesized in a lab, you get a test-tube full of these (an “ensemble”), not just one single molecule, and youneed ALL OF THEM (or as close as possible) to fold correctly.

Generally, when one says “ensemble”, one means on the order of Avagadro’s Number of molecules (that’s

6.022x1023), which is ALOT.

“The ensemble diversity is 0.44”

This is average distance, in number of base-pairs, between structures in the “ensemble”. So, loweris better, here we see that the remaining 20% only differ by less than one base pair, on average,from the MFE. That’s good!

Note there are TWO structures displayed below, the MFE and the “centroid”. The centroid isexactly what it sounds like, it’s the “middle-of-the-pack” structure in the ensemble (again distance ismeasured in base-pairs). Since the MFE is 80% of the ensemble, the centroid is identical to theMFE, but if that percentage were lower it would not!

The structures are colored by default to base pair probability, which is the probability the base is inthe structure that you see. They should all be close to 1 for a good structure.

But it’s not the end of the world if one or two base pairs don’t form correctly, that’s still a win ifit doesn’t happen very often.If it’s highly likely that a few base pairs will be off, but it only happens in a few ways thatpreserve the rest of the MFE structure, it could still win.If its highly likely we have wrong base-pairs forming and there are many ways this canhappen without preserving the MFE structure, then we are toast!

How do we measure the number of ways the fold is expected to go wrong weighted by how likely itis? ENTROPY, which, in the words of my physics professor, is just a fancy word for “the logarithm ofthe number of ways”. You can also think if it as disorder, but how do you count an amount ofdisorder?

Click the box that says “positional entropy” to see this map:

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

2 of 6 7/18/11 11:24 AM

Notice the cool colors represent values at the weakest spots of this structure, where entropy >0.Can you see the corresponding peaks on the entropy vs position plot below? These are the mostlikely positions for deviations from the MFE structure. Note the scale of the graph, 0 entropy meansNO deviations, and >0 means some deviations.

How is entropy calculated? This is a Shannon Entropy from information theory which is calculated

where p is the probability of a particular outcome and log is thenatural log (base e). Note that all of the probabilities added together have to sum to exactly 1.

So if there is only 1 possibility with probability 1, -1*log(1) = 0Say, there are 2 possibilities, one with 0.99 probability and 0.01 for the other, that’s -1 *(0.99*log(0.99) + 0.01*log(0.01)) = 0.056 : pretty darn close to 0Say there are 100 equally likely outcomes, -1*(0.01*log(0.01)) * 100 = 4.6. That’s verybig compared to 0 or 0.056.

So, many numbers of equally likely outcomes means entropies much greater than 0, and in the limitthat there is only a single possible way for the base to be positioned, the entropy goes to 0.

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

3 of 6 7/18/11 11:24 AM

Putting it all together:So now we know, the computer expects 80% of the test tube to fold perfectly, and 20% will have adefect, most likely to occur at the green spots on the picture above. BUT, we also know theaverage difference between structures is less than 1 bp from the target, so not all of the greenspots will be wrong at the same time, they probably occur individually in individual molecules one ata time. So the MFE structure will be preserved, this is a win!

CONTRAST WITH A POOR-SCORING ENTRY

The following entry scored a 65 in round 2:GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGUGAAAGUUAAAGAGUUUUUUGCAA

I’ll cut to the chase, here’s the output and here is a summary

The optimal secondary structure in dot-bracket notation with a minimum free energyof -19.20 kcal/mol is given below.The frequency of the MFE structure in the ensemble is 27.45 %.The ensemble diversity is 3.47

You can see below this structure is not predicted to maintain the central hub, and the bottom armprobably doesn’t form correctly. Most of the ensemble is NOT represented by the MFE, and

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

4 of 6 7/18/11 11:24 AM

members of the ensemble differ by 3-4 bp from each other. Note the axis on the entropy map goesto 0.8 this time. 3-4 bp is quite alot if they are right next to each other, because that means an entirearm will form wrong.

Another way to tell this is to, look at the “mountain plot” below, where sloped lines are base-pairedpositions and flat lines are unpaired positions. The fact that the green (the average of theensemble) and blue (the centroid) DON’T overlap indicates we have a problem. And since the coolcolors are all clustered together in groups of 3-4 bp, we could reasonably expect misfolds to lose anentire arm or worse!

So how good is the prediction compared to the lab result? Here’s a snapshot in target mode of the synthesisresults. You can see it’s not an exact prediction, but it gives alot of the right trends.Useful!

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

5 of 6 7/18/11 11:24 AM

That’s all for today. In the segment II, I will explain why Christmas trees are bad using the barriers andsubopt RNAfold kinetics simulation program.

Published by Google Docs – Report Abuse – Updated automatically every 5 minutes

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

6 of 6 7/18/11 11:24 AM

A meta-analysis of one-cross-bulge resultsby Alan.Robot

Page 36: EteRNA talk - v0

#5Humans Create KnowledgeA meta-analysis of one-cross-bulge results

by Alan.Robot

Putting it all together:So now we know, the computer expects 80% of the test tube to fold perfectly, and 20% will have adefect, most likely to occur at the green spots on the picture above. BUT, we also know theaverage difference between structures is less than 1 bp from the target, so not all of the greenspots will be wrong at the same time, they probably occur individually in individual molecules one ata time. So the MFE structure will be preserved, this is a win!

CONTRAST WITH A POOR-SCORING ENTRY

The following entry scored a 65 in round 2:GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGUGAAAGUUAAAGAGUUUUUUGCAA

I’ll cut to the chase, here’s the output and here is a summary

The optimal secondary structure in dot-bracket notation with a minimum free energyof -19.20 kcal/mol is given below.The frequency of the MFE structure in the ensemble is 27.45 %.The ensemble diversity is 3.47

You can see below this structure is not predicted to maintain the central hub, and the bottom armprobably doesn’t form correctly. Most of the ensemble is NOT represented by the MFE, and

RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

4 of 6 7/18/11 11:24 AM

Page 37: EteRNA talk - v0

10 Reasons to Crowdsource Science

#4 Human Learning

Games Make Us UnderstandHumans Solve Hard ProblemsIt’s a Total Rush!

Humans Create Knowledge

#3#2#1

#5#6 Creating New Scientists

Page 38: EteRNA talk - v0

#6Creating New Scientists

Page 39: EteRNA talk - v0

#6Creating New Scientists

Convert Problemto Game

StudyUser

Solutions

Create Algorithms

Page 40: EteRNA talk - v0

#6Creating New Scientists

Inverse Crowdsourcing(but that’s not all)

Page 41: EteRNA talk - v0

#6Creating New Scientists

Berex NZ: @mat and alan, quick question, who would you recommend would use a DIY lab more, than myself out of this community? alan.robot: you mean in terms of players? Berex NZ: yea alan.robot: noone else has volunteered to do actual lab work, but you mean who might want to test out ideas outside of the normal channels? Berex NZ: yep alan.robot: because I think everyone would want a synthesis slot if it were possible :-) alan.robot: didn't realize the consumables were so much, though, did you see the cost breakdown? Berex NZ: I'm just wondering if anyone else has asked to do the actual wet work Berex NZ: yep alan.robot: fortunately, the bulk is DNA synthesis, and historically that's been tracking with moore's law. Berex NZ: just wondering, what was the total you got per round? alan.robot: wasn't it 500 something for 8 designs? Berex NZ: yea 520 Berex NZ: exc labour Berex NZ: Oligos arent that expensive though... 

Page 42: EteRNA talk - v0

#6Creating New Scientists

Backyard Biosynth

Page 43: EteRNA talk - v0

10 Reasons to Crowdsource Science

#5 Humans Create Knowledge

Games Make Us UnderstandHumans Solve Hard ProblemsIt’s a Total Rush!Human Learning

Creating New Scientists

#4#3#2#1

#6#7 Breaking the Rules

Page 44: EteRNA talk - v0

#7Breaking the Rules

Page 45: EteRNA talk - v0

#7Breaking the Rules

Page 46: EteRNA talk - v0

#7Breaking the RulesRNA Alphabet by clollin

Page 47: EteRNA talk - v0

#7Breaking the Rules

Page 48: EteRNA talk - v0

#7Breaking the Rules

by Joshua Weizmann

Page 49: EteRNA talk - v0

10 Reasons to Crowdsource Science

#6 Creating New Scientists

Humans Solve Hard ProblemsIt’s a Total Rush!Human LearningHumans Create Knowledge

Games Make Us Understand

Breaking the Rules

#5#4#3#2#1

#7#8 There Are a Lot of Humans

Page 50: EteRNA talk - v0

#8There Are a Lot of Humans

26man/years(in 6 months)

Page 51: EteRNA talk - v0

10 Reasons to Crowdsource Science

#7 Breaking the Rules

Games Make Us Understand

It’s a Total Rush!Human LearningHumans Create KnowledgeCreating New Scientists

Humans Solve Hard Problems

There Are a Lot of Humans

#6#5#4#3#2#1

#8#9 And They Work for Free

Page 52: EteRNA talk - v0

#9And They Work for Free

potentially transformativerisky

- the NSF

Page 53: EteRNA talk - v0

#9And They Work for Free

Page 54: EteRNA talk - v0

#9And They Work for Free

Don’t just give them points....Give them lots of points...

- Luis Von Ahn

Page 55: EteRNA talk - v0

#9And They Work for Free

Page 56: EteRNA talk - v0

10 Reasons to Crowdsource Science

Games Make Us Understand

It’s a Total Rush!Human LearningHumans Create KnowledgeCreating New Scientists

There Are a Lot of Humans

Humans Solve Hard Problems

And They Work for Free

Breaking the Rules#7#8

#6#5#4#3#2#1

#9#10 It Means a Lot to Players

Page 57: EteRNA talk - v0

#10 It Means a Lot to Players

Page 58: EteRNA talk - v0

#10 It Means a Lot to Players

Page 59: EteRNA talk - v0

10 Reasons to Crowdsource Science

Games Make Us Understand

It’s a Total Rush!Human LearningHumans Create KnowledgeCreating New Scientists

There Are a Lot of Humans

Humans Solve Hard Problems

And They Work for Free

Breaking the Rules#7#8

#6#5#4#3#2#1

#9#10 It Means a Lot to Players

Page 60: EteRNA talk - v0

Crowdsourcing Science

Scientists Problem Game Players

Page 61: EteRNA talk - v0

Crowdsourcing Science

Scientists Problem

GamePlayers

Page 62: EteRNA talk - v0

Crowdsourcing Science

• Is backyard biosyth possible? How can we trust it?

• Who owns these designs?

• Can we get the players to write a paper without us?

• Major progress in the next 5 years....

• Maybe we can save the world.

Page 63: EteRNA talk - v0

Crowdsourcing Science

Seth Cooper Zoran Popović David BakerJee Lee

Page 64: EteRNA talk - v0

10 Reasons to Crowdsource

Science

Adrien TreuilleCarnegie Mellon University


Recommended