Web-Ice and Labelit: Tools for Convenient Diffraction Analysis at the ...

Post on 02-Jan-2017

216 views 2 download

transcript

Web-Ice and Labelit: Tools for Convenient Diffraction Analysis at the Beamline

Advanced Photon Source—Users’ Week 2008Workshop on Software for Challenging Cases in Macromolecular Crystallography

6 May 2008

Nicholas SauterLawrence Berkeley National Laboratory

Collaborators:Stanford Synchrotron Radiation Laboratory Berkeley Center for Structural Biology/ALS

Sector 5 ALS Automounter

Gripper

• ALS-style puck: 112 Crystal Samples• Beamline Operating System (BOS) control• Liquid Nitrogen Autofill

Quantum315 X-rayDetector

Dewar

Micro-scope

Gonio-meter

Present Goals:• Screen for best crystal growth conditions

• Select the highest-qualitysamples from a batch

• Discovery of drug leads andprotein-ligand complexes

• Enable multi-crystal datasetacquisition

• Perform initial characterizationwith minimal radiation dose

Eventual Goals Later…

CryoStream

Also:• Single-run data collection

First task—crystal screening: preliminary characterization of X-ray diffraction quality

• Identify crystal lattice and cell dimensions

• Good fit between model and observation (r.m.s.d.)

• Diffraction to high resolution

• Minimal crystal disorder(mosaicity)

• Minimal diffraction artifacts(ice rings)

The challenge is to perform thisanalysis reliably in a high-throughput automated setting!

Screening results can be viewed both locally & over the Web González et al.(2008) J Appl Cryst 41:176

DISTL: the selection of candidate Bragg spots. Zhang et al.(2006) J Appl Cryst 39:112

LABELIT: characterization of the lattice. Sauter et al.(2004) J Appl Cryst 37:399

Blu-Ice / BOS: graphical beamline interface --- or --- Web-Ice: Web-viewer

Collect 2 oscillationframes 90° apart MOSFLM / BEST / RADDOSE 1 minLABELIT ~25 secDISTL ~5 sec

Heuristic score Q = 1 – (.7*e– 4/resolution) – (1.5*rmsResidual) – (.02*mosaicity)

Second task—selecting the best crystal and deciding on data collection strategy

BEST (Popov & Bourenkov, 2003): optimization of exposure time, Δφ, and distanceso as to maximize the signal-to-noise (I/σ) in the dataset with a given radiation dose.

RADDOSE (Murray et al, 2005): predict the absorbed radiation dose that limits the useful lifetime of the crystal sample.

Beamline-specific and experiment-specific calibration

Details of the “View Strategy”Implementation

• Calculate strategy inthe correct Laue group

• Initiate data collection

• Process data after autoindexing (at the command line)

Web-Ice goals: scalability, extendability, portability

Main site: http://smb.slac.stanford.edu/research/developments/webice

Developers’ wiki: https://smb.slac.stanford.edu/wikipub

Basic idea: the beamline crystallographer logs in to unix accountwith user name & password. Command-line scripts are run toprocess the data:

The output files are in the user’s home directory, which iscross-mounted on all unix systems at the beamline.

run_mosflmrun_labelit

run_distl

The Web-Ice architecture offers the opportunity (through collaboration) to extend beamline efficiency and ultimately improve the science.

run_best

Autoindexing gives the reduced cell,but can only guess at the Bravais lattice

Hexagonal Rhombohedral

Reduced cell

MonoclinicC-centered

Triclinic MonoclinicC-centered

MonoclinicC-centered

Collaborative Goals to Extend Beamline Science

• Early detection of the Laue group with labelit.rsymop / POINTLESS• Phenix.xtriage; detection of twinning• Real-time monitoring of radiation damage or heavy-atom signal• Fully automated data collection with multi-wavelength protocol• Combination of multiple crystals to form complete dataset

Web-ice is not so much an application as it is a computing architecture on which to hang different applications.

Already-implemented features include beamline control & beamline video.

Impersonation Daemon

(C++ application running as root)

Is this ticket valid?

If yes, change processownership to user &

execute job

Under the hood: Systems computing on a handshake

run_mosflmrun_labelitrun_distl

Step 1. Getting a ticketUser

AuthenticationServer

(Java webapprunning on

ApacheTomcat)

Global serverkeeps track of all user

login sessions

https:// “give me a ticket”

…here’s my password

Pluggable AuthenticationModules (PAM)

Unix login LDAP

here’s your ticket

Other LDAP modulesImplemented by John Taylor& Scott Classen

Step 2. Using a ticketUser

https://execute job

…here’s my ticket

Securesockets

As many servers in the clusteras needed to process the data

Impersonation Daemon

High throughput automatic signaling

run_mosflmrun_labelitrun_distl

LocalUser

Web-IceCrystal

AnalysisWebapp

RemoteUser

Blu-Ice or BOS:Graphical Data

Collection Interface

Web-Ice FrontPage (SSRL or

ALS code)

Signal each time anew image is collected

ticketticket

ticket

Manual dataprocessing

Managing the Sample List:Different Choices at SSRL and ALS

LocalUser

Web-IceSample

InformationList

Server

RemoteUser

Blu-Ice

or

BOS

Web-Ice Front Page SSRL

or

ALSExcelspreadsheet

Beamlinedatabase

Standardhttp

protocol

Software demo:SIL server Image server & color markupAJAX client

A Historical Note on Automatic Processing

• LABELIT represented a new software approach to autoindexing– The initial approach of writing shell scripts to wrap existing software

was changed early in development (2003), as legacy software relied too heavily on human input to make choices

– Basic well-known algorithms had to be re-examined (cell reduction; Fourier-based autoindexing)

– Use of the Python language to rapidly prototype new approaches was indispensable

– A core library of C++ crystallography algorithms (cctbx; Grosse-Kunstleve et al. 2002, J Appl Cryst 35: 126) was exposed at the Python scripting level with Boost.Python bindings

• Achieving automation has been an enormous challenge– There are additional challenges related to instrumentation, record-

keeping, and communication – Physical properties of macromolecular diffraction patterns are very

diverse; the simplest algorithms are inadequate for outlying cases

Very Large Unit Cells: Tightly Packed Diffraction Spots

• 621Å cubic cell (virus crystal) leads to barely-separated diffraction spots

• Results from the indexing algorithm are degraded when two bright spots are categorized as a single spot at the average position

• Special fix:– Find the brightest spots (Blue)– Find the best-fit ellipse– Find each spot’s nearest neighbor

(Univ. Maryland ANN)– Plot all nearest-neighbor vectors on

top of each other– Vector-clusters are probable reciprocal

cell vectors– Throw out the large “blobs” longer than

the probable reciprocal cell lengths– Special allowance made to accept

spots with very little baseline separation; balanced against need for sufficient background

Pseudocentering: systematically weak Bragg spots

• The true symmetry is P21 with two protein molecules per asymmetric unit, related by a non-crystallographic translation.

• The NCS translation is ½ the cell length, approximating an additional symmetry operator, giving rise to alternating weak spots (Hauptman & Karle, 1953).

• If weak spots are ignored, the symmetry is C-centered orthorhombic with one protein molecule per asymmetric unit.

• Automatic indexing relies on picking the brightest spots, so it is easy to pick the oC cell by chance.

• Lowering the spot-picking threshhold to find the weak spots is counterproductive.

Construction of the Sublattice: Cell Doubling

Basis vectorsStrong reflections

Patterson peak

2a, b, ch = 2n½, 0, 0

a, b, chkl

0, 0, 0

a, b, 2cl = 2n

0, 0, ½

a, 2b, ck = 2n0, ½, 0

2a, b+a, c+ah + k + l = 2n

½, ½, ½

2a, b+a, ch + k = 2n½, ½, 0

2a, b, c+ah + l = 2n½, 0, ½

a, 2b, c+bk + l = 2n0, ½, ½

Basis vectorsStrong reflections

Patterson peak

a

cb

Evidence for Cell Doubling in the Raw Data

Doubled a-axis Doubled c-axisDoubled b-axis

Pseudo body-centeredPseudo C-face centered Pseudo B-face centered Pseudo A-face centered

*

Original Cell

Filtering out decoy signals

Should the lattice be reindexed by imposing

pseudo A-centering … or … pseudo-body centering?

Statistical outlier rejection

0

100

200

300

400

500

1 3 5 7 9 11 13 15 17 19

0

20

40

60

80

100

120

140

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Distribution of peak-heightsfor the pseudo body-centered coset

Peak height of candidate spot

Distribution of peak-heightsfor the pseudo A-centered coset

Peak height of candidate spot

Exponential Distribution Gaussian Distribution

Outlier

More decoy signals to filter out

Inadequate mosaicity model Mismatched or

non-Bragg-like profile

In Summary

• There is still work to be done so that the most challenging cases can be processed automatically; these cases include samples with large unit cells (viruses), and crystals with pseudo-symmetry.

• While screening has been automated, the longer term goal of automated dataset collection is only beginning to be addressed.

• Web-Ice has been successfully ported from SSRL to BCSB, and will be the focus of continued efforts at real-time data analysis, to enable better high-throughput data collection.

Acknowledgements

Advanced Photon Source—Users’ Week 2008Workshop on Software for Challenging Cases in Macromolecular Crystallography

6 May 2008

Nicholas SauterBilly Poon

Ralf Grosse-Kunstleve

Paul AdamsPeter ZwartJohn Taylor

Yun Zhou

Ana GonzálezMike Soltis

Penjit (Boom) MoorheadJinhu SongKen Sharp

Scott McPhillips

Computational Crystallography Initiativeat Lawrence Berkeley National Lab

Berkeley Center for Structural Biologyat the Advanced Light Source

Stanford Synchrotron Radiation Lab