Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 0 times |
Quizzes
At least 6 pop-quizzes– The lowest one gets dropped
20 minutes long– Not difficult if you come to class
I may announce them ahead of time– if attendance is not a problem
Projects
3 Projects– Kind of hard
Open due dates– Submit early and I’ll give you feedback– Then, submit again to get a better grade– Final submission a week before classes end
Exams
First exam (early October)– Fail it and you should probably drop the course– Things are only going to get harder
Second exam is a warm up for the final– (i.e. late November)
Third exam is cumulative– during finals week.
Lecture
My PowerPoint slides are sometimes bare– Not enough detail to study from
Textbook is a reference manual– Look things up, not a tutorial
Most of the material comes from lecture.– Miss lecture a lot and you will be lost
Grading
25% Cumulative final exam 15% Higher exam grade 10% Lower exam grade 20% Quiz average
– 5 quizzes (equally weights)
30% Project average– 3 projects (different weights)
Bioinformatics
Computer Science/Math meets Biology/Genetics Data-driven Science
– Data collection – wet lab – 5%– Analyzing the data – computer lab – 95%
Human Genome Project– best example of bioinformatics
Machine Learning & Data Mining– used heavily in bioinformatics
Science(remind me to tell you a story)
What is science? Make an observation Develop a hypothesis
– explains the observation Run some experiments
– to test your hypothesis Collect enough data to
support your hypothesis– and it becomes a Theory
Bad Science
Collect some data Look at the data exhaustively and try to find
some property/hypothesis Develop tools and run experiments to help
you discover some property/hypothesis Then develop a hypothesis that ‘makes
sense’ (scientifically speaking)
Bad Science(example)
For years, material science and engineering followed the ‘bad science’ model.
The search for better goop 1. Mix materials to make new types of goop
2. Test all the new types of goop
3. Pick the best goop
4. Then, try to explain scientifically:Why is this goop so good?
Bad Science(again)
1. Collect Data
2. Experiment
3. Design Tools
4. Analyze Data
5. Hypothesis/Theory
Some History
Researcher were shunned for conducting bad science
– Shunned by academia– Not shunned by industry
Then came the Human Genome Project Industry (Celera) kicked Academia’s (NIH)
butt. Q:What was their secret weapon? A: Bad Science
Please don’t shun me!
Analogy
Problem: You are in a cage with 500 lbs. hungry tiger
Only options:1. Reason with the tiger2. Shoot the tiger
Nobody wants to hurt a tiger, but…
Some problems can’t be reasoned with…
Analogy
The Human Genome data is like a 500 lbs. hungry tiger. A high-throughput computer is like a shotgun. Pure scientists will die reasoning with the problem Meanwhile, Industry will use computers and ‘bad
science’ to perform amazing genetic experiments.
The Genome Project(overly simplified version)
Biologists/Chemists figured out a way of reading the genetic material in a cell (sequencing)
Unfortunately, they couldn’t read it from beginning to end
They could only read it in small chunks And, the process was prone to error.
The Genome Project(overly simplified version)
Over time, Biologists/Chemists sequenced a lot of DNA.
Lots of different organisms Lots of different segments Before any ‘real’ hypotheses could be made they
had to1. Assemble the data2. Correct the errors3. Put the segments together
The Genome Project(overly simplified version)
Thus, algorithms, techniques, and methodologies were pioneered just to put together the segments.
Assembling entire Genomes is just the beginning
The next goal is to generate a complete mapping between sequence and function
Heartformation
Mapping
Which pieces do What function?
Glucosemetabolism
Liverfunction
Brain development
Vision
Cancerprob.
BlaaBlaa
I’m not good at making stuff up
Analogy
Imagine a C++ Program Imagine a really big one 4 million lines long…
int main(void) {
int x = 0;
int y = 0;
cout << "Enter two numbers: ";
cin >> x >> y;
int z = x + y;
cout >> z;
}
Analogy
Imagine compiling it into assembly language
00424D90 mov eax,dword ptr [ecx]00424D92 mov edx,7EFEFEFFh00424D97 add edx,eax00424D99 xor eax,0FFh00424D9C xor eax,edx00424D9E add ecx,400424DA1 test eax,81010100h00424DA6 je main_loop (00424d90)00424DA8 mov eax,dword ptr [ecx-4]
Analogy
Now picture the hex code as a binary sequence
– 10101010100011011010111100110001010101000110010011010101101101101010101010100011010101000110110101111001100010101010001100100101010100011011010111010100011011010111100110001010101000110010011010101101101101010101010100011101010100011011010111100110001010101000110010010101010001101101011101010001101101011110011000101010100011001001101010110110110101010101010001101010100011011010111100110001010101000110010010101010001101101011101010001101101011110011000101010100011001001101010110110111101010100011011010111100110001010101000110010010101010001101101011101010001101101011110011000101010100011001001101010110110110101010101010001101010100011011010111100110001010101000110010010101010001101101011101010001101101011110011000101010100011001001
Analogy
Now change some of the bits (2% error rate)
– 10101010100011011000111100110001010101000111010011010101101101101010101010100011000101000110110101111001100010100010001100101101010100011011010111010100011011010111100110001010101000110010011010101101101101010101010100011101010100011011010111100110001011101000110010010101010001101101011101010001101101011010011000101010100001001001001010010110110101010101010001101110100011011010111100110001010101000110010010101010001101101011101010001101101011010011000101010100011001001101010110010111101010100011011010111100110001010101000110010010101110001101101011101010001101101010010011000101010100011001001101010111110110101010101010001101010100011011010111100110001010101000110010010101010001101101011101010101101101011110011000101010100001001001
Analogy
Lets randomly sample segments of the sequences
– 10101010100011011000111100110001010101000111010011010101101101101010101010100011000101000110110101111001100010100010001100101101010100011011010111010100011011010111100110001010101000110010011010101101101101010101010100011101010100011011010111100110001011101000110010010101010001101101011101010001101101011010011000101010100001001001001010010110110101010101010001101110100011011010111100110001010101000110010010101010001101101011101010001101101011010011000101010100011001001101010110010111101010100011011010111100110001010101000110010010101110001101101011101010001101101010010011000101010100011001001101010111110110101010101010001101010100011011010111100110001010101000110010010101010001101101011101010101101101011110011000101010100001001001
1000110110001111001100010101010001
1110101000110110101001001
Analogy
Finally, lets collect 1,000,000 random segments
1010001000110011011010101000110110
101000100101010101000110110
00101010101010011010101000110110
011010111010101000110110
00101010101010011010101101010101
…
Analogy Complete
Problem:– Given 1,000,000 random samples.
Remember there are random errors
– Try to re-construct the original 4 million line program
You don’t have to really re-construct it You just have to tell me EXACTLY what it does.