Web-Ice and Labelit: Tools for Convenient Diffraction Analysis at the Beamline
Advanced Photon Source—Users’ Week 2008Workshop on Software for Challenging Cases in Macromolecular Crystallography
6 May 2008
Nicholas SauterLawrence Berkeley National Laboratory
Collaborators:Stanford Synchrotron Radiation Laboratory Berkeley Center for Structural Biology/ALS
Sector 5 ALS Automounter
Gripper
• ALS-style puck: 112 Crystal Samples• Beamline Operating System (BOS) control• Liquid Nitrogen Autofill
Quantum315 X-rayDetector
Dewar
Micro-scope
Gonio-meter
Present Goals:• Screen for best crystal growth conditions
• Select the highest-qualitysamples from a batch
• Discovery of drug leads andprotein-ligand complexes
• Enable multi-crystal datasetacquisition
• Perform initial characterizationwith minimal radiation dose
Eventual Goals Later…
CryoStream
Also:• Single-run data collection
First task—crystal screening: preliminary characterization of X-ray diffraction quality
• Identify crystal lattice and cell dimensions
• Good fit between model and observation (r.m.s.d.)
• Diffraction to high resolution
• Minimal crystal disorder(mosaicity)
• Minimal diffraction artifacts(ice rings)
The challenge is to perform thisanalysis reliably in a high-throughput automated setting!
Screening results can be viewed both locally & over the Web González et al.(2008) J Appl Cryst 41:176
DISTL: the selection of candidate Bragg spots. Zhang et al.(2006) J Appl Cryst 39:112
LABELIT: characterization of the lattice. Sauter et al.(2004) J Appl Cryst 37:399
Blu-Ice / BOS: graphical beamline interface --- or --- Web-Ice: Web-viewer
Collect 2 oscillationframes 90° apart MOSFLM / BEST / RADDOSE 1 minLABELIT ~25 secDISTL ~5 sec
Heuristic score Q = 1 – (.7*e– 4/resolution) – (1.5*rmsResidual) – (.02*mosaicity)
Second task—selecting the best crystal and deciding on data collection strategy
BEST (Popov & Bourenkov, 2003): optimization of exposure time, Δφ, and distanceso as to maximize the signal-to-noise (I/σ) in the dataset with a given radiation dose.
RADDOSE (Murray et al, 2005): predict the absorbed radiation dose that limits the useful lifetime of the crystal sample.
Beamline-specific and experiment-specific calibration
Details of the “View Strategy”Implementation
• Calculate strategy inthe correct Laue group
• Initiate data collection
• Process data after autoindexing (at the command line)
Web-Ice goals: scalability, extendability, portability
Main site: http://smb.slac.stanford.edu/research/developments/webice
Developers’ wiki: https://smb.slac.stanford.edu/wikipub
Basic idea: the beamline crystallographer logs in to unix accountwith user name & password. Command-line scripts are run toprocess the data:
The output files are in the user’s home directory, which iscross-mounted on all unix systems at the beamline.
run_mosflmrun_labelit
run_distl
The Web-Ice architecture offers the opportunity (through collaboration) to extend beamline efficiency and ultimately improve the science.
run_best
Autoindexing gives the reduced cell,but can only guess at the Bravais lattice
Hexagonal Rhombohedral
Reduced cell
MonoclinicC-centered
Triclinic MonoclinicC-centered
MonoclinicC-centered
Collaborative Goals to Extend Beamline Science
• Early detection of the Laue group with labelit.rsymop / POINTLESS• Phenix.xtriage; detection of twinning• Real-time monitoring of radiation damage or heavy-atom signal• Fully automated data collection with multi-wavelength protocol• Combination of multiple crystals to form complete dataset
Web-ice is not so much an application as it is a computing architecture on which to hang different applications.
Already-implemented features include beamline control & beamline video.
Impersonation Daemon
(C++ application running as root)
Is this ticket valid?
If yes, change processownership to user &
execute job
Under the hood: Systems computing on a handshake
run_mosflmrun_labelitrun_distl
Step 1. Getting a ticketUser
AuthenticationServer
(Java webapprunning on
ApacheTomcat)
Global serverkeeps track of all user
login sessions
https:// “give me a ticket”
…here’s my password
Pluggable AuthenticationModules (PAM)
Unix login LDAP
here’s your ticket
Other LDAP modulesImplemented by John Taylor& Scott Classen
Step 2. Using a ticketUser
https://execute job
…here’s my ticket
Securesockets
As many servers in the clusteras needed to process the data
Impersonation Daemon
High throughput automatic signaling
run_mosflmrun_labelitrun_distl
LocalUser
Web-IceCrystal
AnalysisWebapp
RemoteUser
Blu-Ice or BOS:Graphical Data
Collection Interface
Web-Ice FrontPage (SSRL or
ALS code)
Signal each time anew image is collected
ticketticket
ticket
Manual dataprocessing
Managing the Sample List:Different Choices at SSRL and ALS
LocalUser
Web-IceSample
InformationList
Server
RemoteUser
Blu-Ice
or
BOS
Web-Ice Front Page SSRL
or
ALSExcelspreadsheet
Beamlinedatabase
Standardhttp
protocol
Software demo:SIL server Image server & color markupAJAX client
A Historical Note on Automatic Processing
• LABELIT represented a new software approach to autoindexing– The initial approach of writing shell scripts to wrap existing software
was changed early in development (2003), as legacy software relied too heavily on human input to make choices
– Basic well-known algorithms had to be re-examined (cell reduction; Fourier-based autoindexing)
– Use of the Python language to rapidly prototype new approaches was indispensable
– A core library of C++ crystallography algorithms (cctbx; Grosse-Kunstleve et al. 2002, J Appl Cryst 35: 126) was exposed at the Python scripting level with Boost.Python bindings
• Achieving automation has been an enormous challenge– There are additional challenges related to instrumentation, record-
keeping, and communication – Physical properties of macromolecular diffraction patterns are very
diverse; the simplest algorithms are inadequate for outlying cases
Very Large Unit Cells: Tightly Packed Diffraction Spots
• 621Å cubic cell (virus crystal) leads to barely-separated diffraction spots
• Results from the indexing algorithm are degraded when two bright spots are categorized as a single spot at the average position
• Special fix:– Find the brightest spots (Blue)– Find the best-fit ellipse– Find each spot’s nearest neighbor
(Univ. Maryland ANN)– Plot all nearest-neighbor vectors on
top of each other– Vector-clusters are probable reciprocal
cell vectors– Throw out the large “blobs” longer than
the probable reciprocal cell lengths– Special allowance made to accept
spots with very little baseline separation; balanced against need for sufficient background
Pseudocentering: systematically weak Bragg spots
• The true symmetry is P21 with two protein molecules per asymmetric unit, related by a non-crystallographic translation.
• The NCS translation is ½ the cell length, approximating an additional symmetry operator, giving rise to alternating weak spots (Hauptman & Karle, 1953).
• If weak spots are ignored, the symmetry is C-centered orthorhombic with one protein molecule per asymmetric unit.
• Automatic indexing relies on picking the brightest spots, so it is easy to pick the oC cell by chance.
• Lowering the spot-picking threshhold to find the weak spots is counterproductive.
Construction of the Sublattice: Cell Doubling
Basis vectorsStrong reflections
Patterson peak
2a, b, ch = 2n½, 0, 0
a, b, chkl
0, 0, 0
a, b, 2cl = 2n
0, 0, ½
a, 2b, ck = 2n0, ½, 0
2a, b+a, c+ah + k + l = 2n
½, ½, ½
2a, b+a, ch + k = 2n½, ½, 0
2a, b, c+ah + l = 2n½, 0, ½
a, 2b, c+bk + l = 2n0, ½, ½
Basis vectorsStrong reflections
Patterson peak
a
cb
Evidence for Cell Doubling in the Raw Data
Doubled a-axis Doubled c-axisDoubled b-axis
Pseudo body-centeredPseudo C-face centered Pseudo B-face centered Pseudo A-face centered
*
Original Cell
Filtering out decoy signals
Should the lattice be reindexed by imposing
pseudo A-centering … or … pseudo-body centering?
Statistical outlier rejection
0
100
200
300
400
500
1 3 5 7 9 11 13 15 17 19
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Distribution of peak-heightsfor the pseudo body-centered coset
Peak height of candidate spot
Distribution of peak-heightsfor the pseudo A-centered coset
Peak height of candidate spot
Exponential Distribution Gaussian Distribution
Outlier
More decoy signals to filter out
Inadequate mosaicity model Mismatched or
non-Bragg-like profile
In Summary
• There is still work to be done so that the most challenging cases can be processed automatically; these cases include samples with large unit cells (viruses), and crystals with pseudo-symmetry.
• While screening has been automated, the longer term goal of automated dataset collection is only beginning to be addressed.
• Web-Ice has been successfully ported from SSRL to BCSB, and will be the focus of continued efforts at real-time data analysis, to enable better high-throughput data collection.
Acknowledgements
Advanced Photon Source—Users’ Week 2008Workshop on Software for Challenging Cases in Macromolecular Crystallography
6 May 2008
Nicholas SauterBilly Poon
Ralf Grosse-Kunstleve
Paul AdamsPeter ZwartJohn Taylor
Yun Zhou
Ana GonzálezMike Soltis
Penjit (Boom) MoorheadJinhu SongKen Sharp
Scott McPhillips
Computational Crystallography Initiativeat Lawrence Berkeley National Lab
Berkeley Center for Structural Biologyat the Advanced Light Source
Stanford Synchrotron Radiation Lab