+ All Categories
Home > Documents > Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Date post: 17-Dec-2015
Category:
Upload: godwin-york
View: 220 times
Download: 0 times
Share this document with a friend
27
Scientific Benchmarks for Structure Prediction Codes Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill
Transcript
Page 1: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Scientific Benchmarks for Structure Prediction

CodesJack Snoeyink & Matt O’Meara

Dept. Computer ScienceUNC Chapel Hill

Page 2: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

With thanks to:

Collaborators Brian Kuhlman, UNC Biochem Many other members of the RosettaCommons Richardson lab, Duke Biochem

Funding NIH NSF

Page 3: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Key Points… Scientific Models, esp. for Structural Molecular Biology

Models are the lens through which we view data Models are predominantly geometric Computational models are complex Models evolve, so testing becomes crucial

Focus on statistical/computational models with a sample source, observable local features, chosen functional form,

fit parameters, & visualization/testing methods Capture assumptions and date used to build models to:

Visualize for making design decisions while building Fit parameters to ensure best performance Record as scientific benchmarks

Case Study: Rosetta protein structure prediction software [B]

Page 4: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Science views nature thru models

Page 5: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Scientists view nature thru models

Page 6: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

People view the world thru models

Page 7: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Geometric molecular models

Page 8: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Model complexity

Physical and Conceptual models Kept simple to aid understanding

Statistical and Computational models Evolve by combining simple models Even when complex can still be effective at

Validation (Molprobity) or Prediction (Rosetta)

Page 9: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Model complexity

Page 10: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Model complexity

Page 11: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Computational model life cycle

Page 12: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Computational model life cycle

Spiral development, much like software Discover problematic features in some data Create an energy function to adjust them Fit parameters to improve results Check into the software as a new option Make default option if everyone likes it Occasionally refactor and rewrite, removing

outdated or unused modelsBut less support for testing…

Page 13: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Computational model testing

Our goal: Capture data and assumptions from model building for use in model visualization and testing.

Page 14: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Our computational models

Abstraction: A simple component of a complex computational model consists of:

One or more sample sources giving Pdb files from native or decoys

Observable local features having a Hydrogen bond distances and angles

Chosen functional form that Energy from distances and angles

Depends on fitting parameters Weights for combining terms

KMB’03

Page 15: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

data set A

data set B

data set Z

. . .

SQL query

ggplot2spec

plots

statistics

gatherfeatures

filter transform

Tool schematic

Page 16: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 17: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Visualization

Implemented tools Compare distributions from sample sources Tufte’s small multiples via ggplot Kernel density estimation Normalization

Opportunities for Statistical analysis Dimension reduction …

Page 18: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Normalization

[KMB’03]Histogram of Hbond A-H distances in natives

0

200

400

600

800

1000

1200

1400

1.45

1.55

1.65

1.75

1.85

1.95

2.05

2.15

2.25

2.35

2.45

2.55

2.65

2.75

2.85

Page 19: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 20: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 21: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 22: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 23: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Tool uses…

Scientific unit tests native, HEAD, ^HEAD run on continuously testing server

Knowledge-base score term creation native, release, experimental turn exploration into living benchmarks

Test design hypotheses native, protocol, designs how strange is the this geometry?

Page 24: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 25: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Rotamer recovery

Page 26: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.
Page 27: Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Key Points… Scientific Models, esp. for Structural Molecular Biology

Models are the lens through which we view data Models are predominantly geometric Computational models are complex Models evolve, so testing becomes crucial

Focus on statistical/computational models with a sample source, observable local features, chosen functional form,

fit parameters, & visualization/testing methods Capture assumptions and date used to build models to:

Visualize for making design decisions while building Fit parameters to ensure best performance Record as scientific benchmarks

Case Study: Rosetta protein structure prediction software [B]


Recommended