10 Reasons to Crowdsource
Science
Adrien TreuilleCarnegie Mellon University
Protein Folding
http://martin-protean.com/protein-structure.html
Protein Folding
M S F Q YG H IGY I Y T R L A L SA Y V A N T R …L
Amino Acid Sequence Protein Shape
• Key to understanding life.• Huge computational resources.
Protein Folding
RNA Nanoengineering
G C U A AG G UCA U A C G A U A CC A A C A T G …A
Nucleotide Sequence Target RNA Shape
• Next-generation Catalysts• Drug-responsive Control Elements
RNA Nanoengineering
Game Interface Voting
SynthesisResults
RNA Nanoengineering
Crowdsourcing theScienti!c Method
Crowdsourcing Science
Launched 2008 Launched 2011
Protein Folding RNA Nanoengineering
57,000 Players 25,000 Players
Computational Chemistry
Experimental Chemistry
Crowdsourcing Science
Scientists Problem Game Players
Crowdsourcing Science
Scientists Problem
GamePlayers
10 Reasons to Crowdsource
Science
10 Reasons to Crowdsource Science
#1 Games Make Us Understand
#1Games Make Us Understand
Foldit EteRNA
BioClipsehttp://chem-bla-ics.blogspot.com/2006/04/
protein-support-in-bioclipse-using.html
#1Games Make Us Understand
Lock WigglePull/Bands
Shake Rebuild Tweak
#1Games Make Us Understand
Repulsive Attractive
Solvation HydrogenBonds
IssueAnalysis
#1Games Make Us Understand
Foldit EteRNA
#1Games Make Us Understand
InteractiveBiology
10 Reasons to Crowdsource Science
Games Make Us Understand#1#2 Humans Solve Hard Problems
#2Humans Solve Hard Problems
Native Conformation
Best Computer Solution
Best Player Solution
#2Humans Solve Hard Problems
Native Conformation
Best Computer Solution
Best Player SolutionPhD
#2Humans Solve Hard Problems
Native Conformation
Starting Positioin
Best Player Solution
Predicting protein structures with a multiplayer online game.
Nature Vol 466, 5 August 2010.
#2Humans Solve Hard Problems
Target Shape
#2Humans Solve Hard Problems
EteRNA Score: 96%Ding's Round 2
Bulged Star
by Ding
Mat - Bulged star
v1.1
by mat747
Starry's Bulged Star
III
by starryjess
EteRNA Score: 94% EteRNA Score: 94%
Pla
yer
Des
igns
Com
pute
r D
esig
ns
ViennaRNA
Design 03
by ViennaRNA Bot
ViennaRNA
Design 05
by ViennaRNA Bot
ViennaRNA
Design 02
by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%
#2Humans Solve Hard Problems
EteRNA Score: 96%Ding's Round 2
Bulged Star
by Ding
Mat - Bulged star
v1.1
by mat747
Starry's Bulged Star
III
by starryjess
EteRNA Score: 94% EteRNA Score: 94%
Pla
yer
Des
igns
Com
pute
r D
esig
ns
ViennaRNA
Design 03
by ViennaRNA Bot
ViennaRNA
Design 05
by ViennaRNA Bot
ViennaRNA
Design 02
by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%
10 Reasons to Crowdsource Science
#1 Games Make Us UnderstandHumans Solve Hard Problems#2
#3 It’s a Total Rush!
#3 It’s a Total Rush!
Wow, you sure know a lot about Foldit!
Player:
Engineer: Thank you. Actually, I was one of the programmers.
Really?
Yes.
You are a god.
Engineer:
Player:
Player:
10 Reasons to Crowdsource Science
#2 Humans Solve Hard ProblemsGames Make Us Understand
It’s a Total Rush!
#1
#3#4 Human Learning
#4 Human Learning
Target Shape
EteRNA Score: 96%Ding's Round 2
Bulged Star
by Ding
Mat - Bulged star
v1.1
by mat747
Starry's Bulged Star
III
by starryjess
EteRNA Score: 94% EteRNA Score: 94%
Pla
yer
Des
igns
Com
pute
r D
esig
ns
ViennaRNA
Design 03
by ViennaRNA Bot
ViennaRNA
Design 05
by ViennaRNA Bot
ViennaRNA
Design 02
by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%
EteRNA Score: 96%Ding's Round 2
Bulged Star
by Ding
Mat - Bulged star
v1.1
by mat747
Starry's Bulged Star
III
by starryjess
EteRNA Score: 94% EteRNA Score: 94%
Pla
yer
Des
igns
Com
pute
r D
esig
ns
ViennaRNA
Design 03
by ViennaRNA Bot
ViennaRNA
Design 05
by ViennaRNA Bot
ViennaRNA
Design 02
by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%
EteRNA Score: 96%Ding's Round 2 Bulged Starby Ding
Mat - Bulged star v1.1by mat747
Starry's Bulged Star IIIby starryjess
EteRNA Score: 94% EteRNA Score: 94%
Player D
esigns
Computer D
esigns
ViennaRNADesign 03by ViennaRNA Bot
ViennaRNADesign 05by ViennaRNA Bot
ViennaRNADesign 02by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75%
EteRNA Score: 73%
Computer Solutions
EteRNA Score: 96%Ding's Round 2
Bulged Star
by Ding
Mat - Bulged star
v1.1
by mat747
Starry's Bulged Star
III
by starryjess
EteRNA Score: 94% EteRNA Score: 94%
Player Designs
Computer Designs
ViennaRNA
Design 03
by ViennaRNA Bot
ViennaRNA
Design 05
by ViennaRNA Bot
ViennaRNA
Design 02
by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%
EteRNA Score: 96%Ding's Round 2
Bulged Star
by Ding
Mat - Bulged star
v1.1
by mat747
Starry's Bulged Star
III
by starryjess
EteRNA Score: 94% EteRNA Score: 94%
Player Designs
Computer Designs
ViennaRNA
Design 03
by ViennaRNA Bot
ViennaRNA
Design 05
by ViennaRNA Bot
ViennaRNA
Design 02
by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: 73%
EteRNA Score: 96%Ding's Round 2 Bulged Starby Ding
Mat - Bulged star v1.1by mat747
Starry's Bulged Star IIIby starryjess
EteRNA Score: 94% EteRNA Score: 94%
Pla
yer
Des
igns
Com
pute
r D
esig
ns
ViennaRNADesign 03by ViennaRNA Bot
ViennaRNADesign 05by ViennaRNA Bot
ViennaRNADesign 02by ViennaRNA Bot
EteRNA Score: 76% EteRNA Score: 75%
EteRNA Score: 73%
Player Solutions
#4 Human Learning
Computer Solutions
Player Solutions
#4 Human Learning
10 Reasons to Crowdsource Science
#3 It’s a Total Rush!
Games Make Us UnderstandHumans Solve Hard Problems
Human Learning
#2#1
#4#5 Humans Create Knowledge
#5Humans Create Knowledge
#5Humans Create KnowledgeI have been painstakingly going over most, if not all of the new Lab Design Submissions by brand new players.
I was chagrined to !nd...
...a dozen Christmas Trees were submitted.
Let us not waste even one of our precious few design slots.
#5Humans Create Knowledge
#5Humans Create Knowledge
RNAfold POSITIONAL ENTROPY
A meta-analysis of one-cross-bulge results
I: positional entropy and what it means
by alan.robot, updated on 2/9 (changed incorrect link to d9’s design analysis)last updated 3/14 (fixed dead links to server results, thanks to Chaendryn for re-running them)
NOTE: this document is now shared as a web page here that does not require a login, please updateyour links!
Ok, so you are wondering how you can improve your submitted designs using the output from the ViennaRNA suite of programs, which have been confirmed by the devs to be the computational backend foreteRNA. In this tutorial, I’ll show you an example of how positional entropy can be used to help predictwinning and loosing designs, even before you submit!
*disclaimer* I’m not affiliated with eteRNA, and although I am a computational biophysicist, I’m not aspecialist in RNA bioinformatics, so any inaccuracies are my fault alone and not due to eteRNA or its staff.
First things first:
The Vienna RNA servers are here: http://rna.tbi.univie.ac.at/The Vienna source code is here: http://www.tbi.univie.ac.at/~ivo/RNA/THIS is a link to a discussion on how to download the sequence files for submitted designs for lab 103 “onebulge cross”.
I will be referring to output from the web server version for this tutorial, but if you want to do your own analysis of morethan 1 sequence at a time you’re probably best off compiling and running on your own machine. It’s not as hard as itsounds, you do not need to know how to program, but you do need a unix/linux environment to compile in. If you arerunning windows, I can highly recommend ubuntu running on virtualbox (http://www.virtualbox.org/), both are freesoftware and very user-friendly to set up and use, beats the pants off of Cygwin.
Here is a link to the results when the round 4 winning design by dimension 9, input into the RNAfold webserver. Note Ihave no idea how long that link will work, so I’ll cut and paste the relevant bits here if you want to try and reproduce it.Use default settings except where mentioned below, you have to expand the “show advanced options” to see them
sequence:GGAAGGUUCUCUGGCGUUCGUGAAAACAUGAAUGGGAGGCAUCAAGAGAUGGCUCCGCUUGUUCAAGAGAAUAGGCCCAGAGAGCAAA
advanced settings: unpaired bases can participate in at most one dangling end (MFE folding only)
(yes, for the super observant, this is the rule that lets only one side of a loop get a bonus from adding a red ‘G’)
Turner 1999 energy parameters
So now you get a pretty output page with lots of details. What should you pay attention to?
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
1 of 6 7/18/11 11:24 AM
“The optimal secondary structure in dot-bracket notation with a minimumfree energy of -39.80 kcal/mol is given below.”
This line should match up with the energy in ETErna, otherwise you have chosen the wrong energy options! From hereon out, I will say MFE instead of minimum free energy.
“The frequency of the MFE structure in the ensemble is 80.77 %”
Note the very high percentage! This is good. This means of all the possible structures that the server considered likelyto occur (including suboptimal folds, NOT just the MFE fold), the majority of them are of the correct fold. Note that whenthese are synthesized in a lab, you get a test-tube full of these (an “ensemble”), not just one single molecule, and youneed ALL OF THEM (or as close as possible) to fold correctly.
Generally, when one says “ensemble”, one means on the order of Avagadro’s Number of molecules (that’s
6.022x1023), which is ALOT.
“The ensemble diversity is 0.44”
This is average distance, in number of base-pairs, between structures in the “ensemble”. So, loweris better, here we see that the remaining 20% only differ by less than one base pair, on average,from the MFE. That’s good!
Note there are TWO structures displayed below, the MFE and the “centroid”. The centroid isexactly what it sounds like, it’s the “middle-of-the-pack” structure in the ensemble (again distance ismeasured in base-pairs). Since the MFE is 80% of the ensemble, the centroid is identical to theMFE, but if that percentage were lower it would not!
The structures are colored by default to base pair probability, which is the probability the base is inthe structure that you see. They should all be close to 1 for a good structure.
But it’s not the end of the world if one or two base pairs don’t form correctly, that’s still a win ifit doesn’t happen very often.If it’s highly likely that a few base pairs will be off, but it only happens in a few ways thatpreserve the rest of the MFE structure, it could still win.If its highly likely we have wrong base-pairs forming and there are many ways this canhappen without preserving the MFE structure, then we are toast!
How do we measure the number of ways the fold is expected to go wrong weighted by how likely itis? ENTROPY, which, in the words of my physics professor, is just a fancy word for “the logarithm ofthe number of ways”. You can also think if it as disorder, but how do you count an amount ofdisorder?
Click the box that says “positional entropy” to see this map:
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
2 of 6 7/18/11 11:24 AM
Notice the cool colors represent values at the weakest spots of this structure, where entropy >0.Can you see the corresponding peaks on the entropy vs position plot below? These are the mostlikely positions for deviations from the MFE structure. Note the scale of the graph, 0 entropy meansNO deviations, and >0 means some deviations.
How is entropy calculated? This is a Shannon Entropy from information theory which is calculated
where p is the probability of a particular outcome and log is thenatural log (base e). Note that all of the probabilities added together have to sum to exactly 1.
So if there is only 1 possibility with probability 1, -1*log(1) = 0Say, there are 2 possibilities, one with 0.99 probability and 0.01 for the other, that’s -1 *(0.99*log(0.99) + 0.01*log(0.01)) = 0.056 : pretty darn close to 0Say there are 100 equally likely outcomes, -1*(0.01*log(0.01)) * 100 = 4.6. That’s verybig compared to 0 or 0.056.
So, many numbers of equally likely outcomes means entropies much greater than 0, and in the limitthat there is only a single possible way for the base to be positioned, the entropy goes to 0.
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
3 of 6 7/18/11 11:24 AM
Putting it all together:So now we know, the computer expects 80% of the test tube to fold perfectly, and 20% will have adefect, most likely to occur at the green spots on the picture above. BUT, we also know theaverage difference between structures is less than 1 bp from the target, so not all of the greenspots will be wrong at the same time, they probably occur individually in individual molecules one ata time. So the MFE structure will be preserved, this is a win!
CONTRAST WITH A POOR-SCORING ENTRY
The following entry scored a 65 in round 2:GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGUGAAAGUUAAAGAGUUUUUUGCAA
I’ll cut to the chase, here’s the output and here is a summary
The optimal secondary structure in dot-bracket notation with a minimum free energyof -19.20 kcal/mol is given below.The frequency of the MFE structure in the ensemble is 27.45 %.The ensemble diversity is 3.47
You can see below this structure is not predicted to maintain the central hub, and the bottom armprobably doesn’t form correctly. Most of the ensemble is NOT represented by the MFE, and
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
4 of 6 7/18/11 11:24 AM
members of the ensemble differ by 3-4 bp from each other. Note the axis on the entropy map goesto 0.8 this time. 3-4 bp is quite alot if they are right next to each other, because that means an entirearm will form wrong.
Another way to tell this is to, look at the “mountain plot” below, where sloped lines are base-pairedpositions and flat lines are unpaired positions. The fact that the green (the average of theensemble) and blue (the centroid) DON’T overlap indicates we have a problem. And since the coolcolors are all clustered together in groups of 3-4 bp, we could reasonably expect misfolds to lose anentire arm or worse!
So how good is the prediction compared to the lab result? Here’s a snapshot in target mode of the synthesisresults. You can see it’s not an exact prediction, but it gives alot of the right trends.Useful!
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
5 of 6 7/18/11 11:24 AM
That’s all for today. In the segment II, I will explain why Christmas trees are bad using the barriers andsubopt RNAfold kinetics simulation program.
Published by Google Docs – Report Abuse – Updated automatically every 5 minutes
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
6 of 6 7/18/11 11:24 AM
A meta-analysis of one-cross-bulge resultsby Alan.Robot
#5Humans Create KnowledgeA meta-analysis of one-cross-bulge results
by Alan.Robot
Putting it all together:So now we know, the computer expects 80% of the test tube to fold perfectly, and 20% will have adefect, most likely to occur at the green spots on the picture above. BUT, we also know theaverage difference between structures is less than 1 bp from the target, so not all of the greenspots will be wrong at the same time, they probably occur individually in individual molecules one ata time. So the MFE structure will be preserved, this is a win!
CONTRAST WITH A POOR-SCORING ENTRY
The following entry scored a 65 in round 2:GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGUGAAAGUUAAAGAGUUUUUUGCAA
I’ll cut to the chase, here’s the output and here is a summary
The optimal secondary structure in dot-bracket notation with a minimum free energyof -19.20 kcal/mol is given below.The frequency of the MFE structure in the ensemble is 27.45 %.The ensemble diversity is 3.47
You can see below this structure is not predicted to maintain the central hub, and the bottom armprobably doesn’t form correctly. Most of the ensemble is NOT represented by the MFE, and
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
4 of 6 7/18/11 11:24 AM
10 Reasons to Crowdsource Science
#4 Human Learning
Games Make Us UnderstandHumans Solve Hard ProblemsIt’s a Total Rush!
Humans Create Knowledge
#3#2#1
#5#6 Creating New Scientists
#6Creating New Scientists
#6Creating New Scientists
Convert Problemto Game
StudyUser
Solutions
Create Algorithms
#6Creating New Scientists
Inverse Crowdsourcing(but that’s not all)
#6Creating New Scientists
Berex NZ: @mat and alan, quick question, who would you recommend would use a DIY lab more, than myself out of this community? alan.robot: you mean in terms of players? Berex NZ: yea alan.robot: noone else has volunteered to do actual lab work, but you mean who might want to test out ideas outside of the normal channels? Berex NZ: yep alan.robot: because I think everyone would want a synthesis slot if it were possible :-) alan.robot: didn't realize the consumables were so much, though, did you see the cost breakdown? Berex NZ: I'm just wondering if anyone else has asked to do the actual wet work Berex NZ: yep alan.robot: fortunately, the bulk is DNA synthesis, and historically that's been tracking with moore's law. Berex NZ: just wondering, what was the total you got per round? alan.robot: wasn't it 500 something for 8 designs? Berex NZ: yea 520 Berex NZ: exc labour Berex NZ: Oligos arent that expensive though...
#6Creating New Scientists
Backyard Biosynth
10 Reasons to Crowdsource Science
#5 Humans Create Knowledge
Games Make Us UnderstandHumans Solve Hard ProblemsIt’s a Total Rush!Human Learning
Creating New Scientists
#4#3#2#1
#6#7 Breaking the Rules
#7Breaking the Rules
#7Breaking the Rules
#7Breaking the RulesRNA Alphabet by clollin
#7Breaking the Rules
#7Breaking the Rules
by Joshua Weizmann
10 Reasons to Crowdsource Science
#6 Creating New Scientists
Humans Solve Hard ProblemsIt’s a Total Rush!Human LearningHumans Create Knowledge
Games Make Us Understand
Breaking the Rules
#5#4#3#2#1
#7#8 There Are a Lot of Humans
#8There Are a Lot of Humans
26man/years(in 6 months)
10 Reasons to Crowdsource Science
#7 Breaking the Rules
Games Make Us Understand
It’s a Total Rush!Human LearningHumans Create KnowledgeCreating New Scientists
Humans Solve Hard Problems
There Are a Lot of Humans
#6#5#4#3#2#1
#8#9 And They Work for Free
#9And They Work for Free
potentially transformativerisky
- the NSF
#9And They Work for Free
#9And They Work for Free
Don’t just give them points....Give them lots of points...
- Luis Von Ahn
#9And They Work for Free
10 Reasons to Crowdsource Science
Games Make Us Understand
It’s a Total Rush!Human LearningHumans Create KnowledgeCreating New Scientists
There Are a Lot of Humans
Humans Solve Hard Problems
And They Work for Free
Breaking the Rules#7#8
#6#5#4#3#2#1
#9#10 It Means a Lot to Players
#10 It Means a Lot to Players
#10 It Means a Lot to Players
10 Reasons to Crowdsource Science
Games Make Us Understand
It’s a Total Rush!Human LearningHumans Create KnowledgeCreating New Scientists
There Are a Lot of Humans
Humans Solve Hard Problems
And They Work for Free
Breaking the Rules#7#8
#6#5#4#3#2#1
#9#10 It Means a Lot to Players
Crowdsourcing Science
Scientists Problem Game Players
Crowdsourcing Science
Scientists Problem
GamePlayers
Crowdsourcing Science
• Is backyard biosyth possible? How can we trust it?
• Who owns these designs?
• Can we get the players to write a paper without us?
• Major progress in the next 5 years....
• Maybe we can save the world.
Crowdsourcing Science
Seth Cooper Zoran Popović David BakerJee Lee
10 Reasons to Crowdsource
Science
Adrien TreuilleCarnegie Mellon University