+ All Categories
Home > Documents > Quantitative analysis of vibrational circular dichroism spectra of proteins. Problems and...

Quantitative analysis of vibrational circular dichroism spectra of proteins. Problems and...

Date post: 10-May-2023
Category:
Upload: pitt
View: 1 times
Download: 0 times
Share this document with a friend
24
Faraday Discuss., 1994,99,287-310 Quantitative Analysis of Vibrational Circular Dichroism Spectra of Proteins Problems and Perspectives Petr Pancoska,"Vb Eduard Bitto,"Vit Janota4 and Timothy A. Keiderlingb a Department of Chemical Physics, Charles University Prague, Ke Karlovu 3, 121 16 Prague 2, Czech Republic Department of Chemistry, University of Illinois at Chicago, mlc 11 I, 845 W. Taylor St., Chicago, IL 60607-7061 USA Experimental and computational aspects of the quantitative analysis of vibrational circular dichroism (VCD) of proteins are discussed. Experimen- tally, the effect of spectral resolution, sample concentration, cell selection and spectral normalization effects are considered. The influence of random intensity variations on the results of quantitative analysis of amide I' VCD are shown to be minor up to a 15% variation in spectral intensity. A com- putational algorithm, based on factor analysis of the spectra and multiple linear regression calculation of fractions of secondary structures (FC), was designed to analyse quantitatively the details of the VCD spectra-structure relationship. It also enabled the results of VCD measured independently for the amide I' and amide I1 regions to be combined. Our study is based pri- marily on the optimization of the calculation to predict FC values for pro- teins not included in the reference data set used for regression. The best prediction is obtained with the function using only part of the observable independent VCD spectral components. Inclusion of all components actually reduces the prediction accuracy of the analysis. Spectroscopic reasons for such behaviour and the consequences of the interdependence of the crystallographic FC values on the spectra-structure analysis are dis- cussed. Finally, the possibility of utilizing VCD spectra to obtain quantita- tive structural information about the protein beyond the conventional secondary structure composition is explored. A matrix descriptor of super- secondary structure features for proteins is designed, and preliminary results for prediction of this descriptor from amide I' VCD spectra are presented. These latter calculations use a novel design of the back-propagation neural network. Vibrational circular dichroism (VCD) spectroscopy can now be considered to be an established technique'-'' for studies of proteins in solution that has structural sensi- tivity fully comparable to and sometimes superior to the 'conventional' techniques of electronic circular dichroism (ECD) or FTIR spectroscopy. Combining the first-order dependence of rotational strength on the protein conformation with the structural sensi- tivity of IR frequencies enhances the ability of VCD to discriminate qualitatively among the spectral features corresponding to specific structural features. A new aspect of VCD spectra in comparison to ECD spectra is the possibility of combining information from more than one transition whose spectral manifestations are well resolved. Given the differences in vibrational modes involved, these transitions might exhibit different depen- dences on the structure of the protein and thus offer complementary interpretations. 287 Published on 01 January 1994. Downloaded by University of Pittsburgh on 11/10/2014 05:30:38. View Article Online / Journal Homepage / Table of Contents for this issue
Transcript

Faraday Discuss., 1994,99,287-310

Quantitative Analysis of Vibrational Circular Dichroism Spectra of Proteins

Problems and Perspectives

Petr Pancoska,"Vb Eduard Bitto," Vit Janota4 and Timothy A. Keiderlingb a Department of Chemical Physics, Charles University Prague, Ke Karlovu 3, 121 16

Prague 2, Czech Republic Department of Chemistry, University of Illinois at Chicago, mlc 11 I , 845 W . Taylor St.,

Chicago, IL 60607-7061 U S A

Experimental and computational aspects of the quantitative analysis of vibrational circular dichroism (VCD) of proteins are discussed. Experimen- tally, the effect of spectral resolution, sample concentration, cell selection and spectral normalization effects are considered. The influence of random intensity variations on the results of quantitative analysis of amide I' VCD are shown to be minor up to a 15% variation in spectral intensity. A com- putational algorithm, based on factor analysis of the spectra and multiple linear regression calculation of fractions of secondary structures (FC), was designed to analyse quantitatively the details of the VCD spectra-structure relationship. It also enabled the results of VCD measured independently for the amide I' and amide I1 regions to be combined. Our study is based pri- marily on the optimization of the calculation to predict FC values for pro- teins not included in the reference data set used for regression. The best prediction is obtained with the function using only part of the observable independent VCD spectral components. Inclusion of all components actually reduces the prediction accuracy of the analysis. Spectroscopic reasons for such behaviour and the consequences of the interdependence of the crystallographic FC values on the spectra-structure analysis are dis- cussed. Finally, the possibility of utilizing VCD spectra to obtain quantita- tive structural information about the protein beyond the conventional secondary structure composition is explored. A matrix descriptor of super- secondary structure features for proteins is designed, and preliminary results for prediction of this descriptor from amide I' VCD spectra are presented. These latter calculations use a novel design of the back-propagation neural network.

Vibrational circular dichroism (VCD) spectroscopy can now be considered to be an established technique'-'' for studies of proteins in solution that has structural sensi- tivity fully comparable to and sometimes superior to the 'conventional' techniques of electronic circular dichroism (ECD) or FTIR spectroscopy. Combining the first-order dependence of rotational strength on the protein conformation with the structural sensi- tivity of IR frequencies enhances the ability of VCD to discriminate qualitatively among the spectral features corresponding to specific structural features. A new aspect of VCD spectra in comparison to ECD spectra is the possibility of combining information from more than one transition whose spectral manifestations are well resolved. Given the differences in vibrational modes involved, these transitions might exhibit different depen- dences on the structure of the protein and thus offer complementary interpretations.

287

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online / Journal Homepage / Table of Contents for this issue

288 Quantitative Analysis of V C D Spectra of Proteins

From our experience with the qualitative analysis of protein VCD, we know that the above-described advantages are seen largely to overrule the lower signal to noise (S/N) ratio and other experimental factors that otherwise tend to lower the quality of experi- mental VCD information for protein^.^-^ Naturally, it was anticipated that this rich qualitative information could be utilized to obtain a corresponding improvement for quantitative structural studies of proteins. This is an intensely exploited area of spectro- scopic analysis of biomolecules for which ECDl2-I7 and FTIR18-21 spectra are often but not solely used. We instead found3.’ that, compared with the other spectral methods, quantitative analyses of VCD spectra do not show substantial improvements in terms of average error, but do have small improvements for specific features, such as P-sheet determination. In an attempt to investigate this situation further, we expanded our data set and added information from the amide 11. Further, we re-designed the algorithm used in the analysis to assist in the determination of the cause of this sur- prising result.

In this paper we discuss some aspects of our current experience with the quantifica- tion of this enhanced set of VCD spectra in terms of protein secondary structure. In the first part we enumerate some relevant experimental problems related to quantitative analysis of VCD. In the second part we report some of the results of our extensive analysis of protein VCD spectra in the ‘classical’ sense [i.e. transforming them into information about the fractional components (FC) of various secondary structures in a given protein]. In the third part of the paper we discuss some new possibilities of bene- fiting more fully from the intrinsic capabilities of this chiroptical technique by improving the structural descriptor to address ‘super-secondary ’ structure or the connectivity and distribution of segments of uniform secondary structure. These latest studies used neural network models to establish a relationship between the new descriptor and the meas- ured spectra.

Experimental

As a starting point for the quantitative structural analyses of proteins, one needs to obtain spectra with the correct intensity and bandshape (artifact free) for a defined protein molecular state. There are several areas where specific experimental aspects of VCD can affect the results. VCD is an inherently weak phenomenon for which it is moderately difficult to obtain spectra with high S/N ratio for aqueous protein samples. In comparison to the related techniques of ECD and FTIR, this leads to longer data collection times (hours) that require high stability of the sample and instrument, and also relatively high sample concentrations. Solvent interference significantly affects the light intensity throughput. This has two consequences : (i) frequency resolution may be limited, which can affect both spectral bandshape and intensity, and (ii) additional sample treatment such as H-D exchange is often necessary, which can lead to ambi- guities in the sample states that are reflected in the VCD spectra. A summary follows of our established procedures and instrumental features that have been developed to mini- mize the effects of the above-mentioned experimental difficulties in protein analysis.

Spectrometers

In the UIC laboratory we use both dispersive and Fourier-transform (FT) VCD instru- ments which are described in detail in the For the dispersive spectrometer, the spectral radiance of our 2500 K carbon rod source and the light intensity losses in the optical path of the instrument (monochromator, optical filters, polarizing and focus- ing optics) result in our finding that a spectral resolution of 9-10 cm-’ is optimal for VCD in terms of S/N. With our FTIR-based VCD instrument,22 using a BioRad FTS 60A step-scan interferometer with a standard S ic light source, 8 cm-’ spectral resolution was again found to be a reasonable practical compromise for routine protein m e a s ~ r e m e n t s . ~ ~

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 289

Cells

For protein studies, we use demountable BaF, or CaF, cells with Teflon or Mylar spacers providing optical pathlengths of 6 pm or higher. The shortest ones are required for H,O-based solutions.'' Recently, we switched to Specac Analytical 1400 ' flow-in' cells, allowing us to record baseline and sample spectra without cell demounting.22 This ensures consistency in the optical properties of the cell (pathlength, possible stress- induced birefringence upon assembly and window orientation) during the whole experi- ment, which is a distinct improvement over the demountable cells used previously.

Sample Preparation

Typical protein concentrations used for VCD measurement are 5 mg per 100 pl of solvent in D,O and 20 mg per 100 pl in H,O. For study of D 2 0 solutions, the proteins are normally H-D exchanged by freeze or vacuum drying of the protein solution, redissolving in D,O and repeating three times, While it is clear that this procedure cannot completely N-deuteriate a globular protein and, in fact, will lead to different levels of deuteriation with different proteins, it was chosen to be a uniform method of handling all proteins used in our systematic analyses. Recently, we have completely deuteriated some test-case proteins using thermal or chemically induced unfolding and subsequent re-folding of the protein in D 2 0 .22 These procedures are dependent on there being a reversible means of unfolding the protein. These tests are important because D-H exchange induces observable spectral effects even in the amide I VCD. This might lead one to prefer H,O as a solvent for quantitative VCD studies of proteins. On the other hand, the higher the protein concentrations required for H 2 0 solution measure- ments, the lower the S/N ratio of the resultant VCD spectra. Furthermore, difficulties in accurate water band subtraction procedures necessary for the absorbance measurements which are used for normalization of the VCD in H,O may well compensate for ambi- guities accompanying deuterium exchange. At the same time, the variable deuteriation effects on the VCD may contain structural information beyond the typically sought secondary structural parameters, as suggested in the previous paper in this conference by Keiderling et ~ 2 1 . ~ ~ Further studies of these effects are currently under way in the UIC laboratory.

The high sample concentration required for VCD can, for some proteins, lead to aggregation. This is a particular problem for studies carried out under specific experi- mental conditions (high or low pH, presence of salts etc.) that might be desirable for other purposes. For our systematic quantitative study of protein VCD, which is mea- sured for samples in pure solvent, we must assume that any aggregation (if present) will not substantially affect the results of the analysis. This, of course, could be a limitation, but as of now little variation in VCD has been detected for studies over the small range of concentrations available to us. Furthermore, for a few globular protein test cases, we have been able to obtain qualitatively the same bandshape VCD on films as for solution samples. This lack of concentration dependence will be valid only for proteins in the same intramolecular state, which means that a change in local conformation or degree of deuteriation would lead to a change in VCD. However, proximity to another protein is not likely to be a significant perturbation on the VCD owing to its short-range length dependence.,

Spectral Processing

Concentration error can be source of a substantial intensity error in VCD. Unfor- tunately, the information required for conversion of raw VCD spectra into molar quan- tities is often incomplete and is affected by sample purity and the error in the determination of the sample pathlength in the cell. To minimize the total intensity error,

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

290 Quantitative Analysis of V C D Spectra of Proteins

we therefore normalize the VCD spectra to the IR absorbance of the measured sample. This is not a perfect method. Molar absorption coefficients vary among structural units. For structured (multimaxima) absorption peaks, the intensity at the maximum may not be the proper concentration descriptor, which could be compensated for by normal- ization to the total area under the absorbance envelope. An area normalization, in turn, requires some method of deconvolution and subsequent band selection to identify the components of interest in the regions of spectral overlap which involves a mathematical operation that could also affect the concentration descriptor. For H,O solutions, the absorbance of the protein is available only after water band subtraction. Since the water band is the dominant absorption by a factor of 3-10 in the amide I band region, sub- traction can introduce additional error into the normalized VCD intensity. Despite all these problems, we believe that, with careful work, the inevitable intensity error is not the main source of difficulty in our quantitative application of VCD to protein structure determination.

To test the validity of this assumption, quantitative analyses were performed on protein data from the training set with their VCD spectra artificially altered in intensity. Multiple regression calculations with the best selection of subspectra (see next section) for helix and sheet were performed. All spectra in these calculations had their intensity multiplied by a computer-generated random number selected from an interval systemati- cally varied from 0 to 80% of the maximal intensity difference in the set. The dependence of the standard deviations of FC values, fit to the a-helix and p-sheet fraction, on the extent of the (random) intensity perturbation is shown in Fig. 1 for analyses based on the amide I’ VCD. It is seen that random fluctuations up to 15-20°h do not significantly affect the error characteristics of the fit. This implies that the algorithms we use for VCD structural analysis are only dependent on overall intensity in higher order.

Reference Structural Data

Our protein VCD spectra were collected systematically on a set of 28 proteins (see Table 1) of which 23 have known X-ray structure. This protein subset (which we refer to as the training set) is the source of the reference structural information for our quantitative analyses. Atomic coordinates from Brookhaven Protein Data Bank (PDB) files were input into the Kabsch and Sander program for determination of secondary structures in proteins (DSSP)26 to obtain the structural parameters used for spectral correlation. From the standard output of this program, the following reductions were made. Both

I l 3 t X

0 20 40 60 80 amide I’ intensity variation %,

Fig. 1 Dependence of the absolute standard deviation of fit of the FC values to amide I’ spectra on a random intensity variation of increasing magnitude: (m) FChelix, (0) FCsheet

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 29 1

Table 1 Proteins used in our training set

protein species PDB

source file

alcohol dehydrogenase carbonic anhydrase or-chymotrypsinogen A a-chymotrypsin type I1 concanavalin A cytochrome c tosyl elastase glutathione reductase haemoglo bin A-immunoglobulin lactate dehydrogenase ly sozyme myoglobin papain rhodanese ribonuclease A ri bonuclease S subtilisin BPN’ superoxide dismutase thermol ysin triose phosphate isomerase trypsin inhibitor trypsin albumin lactoferrin a-lactalbumin fi-lactoglobulin A thaumatin

horse liver bovine erythrocytes bovine pancreas bovine pancreas jack bean tuna porcine pancreas wheat germ human human rabbit hen egg white horse heart papaya latex bovine liver bovine bovine pancreas bacterial bovine erythrocytes bacterial yeast soybean bovine pancreas bovine serum human milk bovine milk bovine milk bacterial

Fluka 05648 Sigma C-7500 Sigma C-4879 Sigma C-4129 Sigma C-2020 Sigma C-2011 Sigma E-0127 Sigma G-6004 Sigma H-7379 Fluka 56834 Calbiochem 42721 7 Sigma L-6876 Sigma M-1882 Sigma P-4762 Sigma R-4125 Sigma R-5125 Sigma R-6000 Sigma P-8038 Fluka 86200 Sigma P-1512 Sigma T-2507 Sigma T-9003 Sigma T-8253 Sigma A-0281 Sigma L-5665 Sigma L-5385 Sigma L-7880 Sigma T-7638

4ADH 1CA2 2CGA 5CHA 3CNA lCYT 3EST 2GRS lHCO 1 RE1 4LDH 7LYZ lMBN 8PAP lRHD 1RN3 lRNS 1 SBT 2SOD 2TLN lTIM 3 PTI 3PTN - - - - -

a-helical (H) and 3,, helical (G) structures were grouped into one class, denoted as helical (H). Parallel and antiparallel P-sheet structures are not distinguished in the program and, consequently, form a single class (S). For testing purposes we further selected bends (B) (code S in the DSSP output, structures with no hydrogen bonds and a C,-C, dihedral angle <74”) and turns (T) (code T in DSSP output, structures with a roughly helical dihedral angle for less than four amino acids but with a deficit in ideal helical H-bonding). All other amino acid residues were assigned to a class we prefer to refer to as ‘other’ (C) since it lacks any common descriptor. Note that this is not equiva- lent to the commonly referred ‘random coil’ in the polypeptide literature, though it may have spectral characteristics in common with that structure.27

Mathematical Aspects

General Methodology

In a formal sense, the VCD structural information is encoded in the variability of the (continuous) spectra recorded for proteins with different structures. On the other hand, the reference information, quantifying the protein structural features, originates in some transformation of protein atomic coordinates (usually from X-ray data) and is thus quite discrete in nature. A mathematical model of use of quantitative spectral analysis should

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

292 Quantitative Analysis of V CD Spectra of Proteins

be able to convert the essentially continuous spectral data into the discrete structural representation and to find the proper mapping between them.

For VCD spectra, there is another condition to be matched, the underlying band frequencies of the components of the VCD (and, by extension, IR) spectra also carry structural information.8 Moreover, VCD information derived from different IR bands (amide I’ and I1 in the case of proteins) can be independently obtained, analysed and the analyses combined. The mathematics used for the analysis thus must be capable of emphasizing some frequency regions of the evaluated spectrum at the expense of others when quantifying different structural features of the protein. This can be done numeri- cally by application of an appropriate weighting scheme or by using a ‘bandshape representation’ of the frequency variability in the form of frequency shifting spectral components (e.g. with a derivative-like bandshape for a single band) which can represent the frequency variability in the form of an additive basis spectrum.

The above-discussed VCD features necessitate the implementation of selectivity into the analysis algorithms. By contrast, computational schemes applied to date for quanti- tative analyses of ECD spectra assumed that entire information in the protein ECD spectrum should be used to evaluate the content of any structural component. This might not be a valid assumption for VCD spectra, and better performance can, in prin- ciple, be obtained by discriminative usage of the various spectral components. Our com- putational scheme described below has allowed us to test this conjecture.

Formally, the tendency to use the complete ECD spectrum (even if transformed into the linear combination of its independent components) for correlation with the protein structural descriptors can be related to the early orientation of the quantitative studies which were directed towards obtaining the best fit of the reference FC data. Inclusion of more spectral components into the fitting algorithm increases the number of adjustable parameters in the transformation function which, in turn, trivially improves the error of fit. The ultimate goal of these methods, of course, should be the prediction of structural descriptors for a protein with unknown conformation from its spectra. This aspect of quantitative ECD and other spectral-structure correlation studies though recognized early’ 2 * 1 3 has been critically explored more recently. 143’ For empirical relationships between independent variables and the noisy reference data, it is not guaranteed that the mapping providing the best fit will, at the same time, provide the best predictions. We discuss this aspect later in relation to the results of our calculations.

There are two popular computational schemes employed to find a relationship between continuous spectral and discrete reference structural information. One uses a decomposition of the spectra into basis component spectra followed by regression-based transformation of this representation of experimental spectra (in terms of the contribu- tion of each component) into the protein structural descriptor. An alternative possibility is to use neural network mapping, which has the inherent capability of processing relationships between continuous and discrete data. Neural networks are based on a supervised weighting scheme, that, with the proper choice of network topology, as dis- cussed below, can adapt to the dispersal of structural information as spectrally manifest- ed. In the next section we describe how we have implemented the above-discussed requirements for structural prediction into our design of these two mathematical formal- isms to assess the quantitative information in the VCD spectra of proteins.

Fit and Prediction Calculations

We used two different calculation ‘strategies’. One we call ‘fit’ because it corresponds to the goal of finding the best projection of the spectral information onto the reference structural descriptors. For our data, the coefficients of the multiple linear regression equations relating FC values to spectral parameters or, alternatively, the weights of the neural network, are optimized to minimize the corresponding error function. The good-

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 293

ness of fit is based on the differences of the values found after optimization from the corresponding X-ray crystallographic structural descriptor. The ‘fit’ (which in the neural network terminology, is the ‘training’) calculation is a necessary first step in developing the prediction-based strategy described next.

In the ‘prediction’ calculations we tested the ability of various models that were optimized based on ‘fit’ to determine structural descriptors for a protein from its spectra, which was not included in the optimization. To relate the prediction results to the actual protein structure, we generated 23 subsets of the training set by systematically leaving out one ‘known’ protein. Therefore the ‘fit’ optimization and prediction calcu- lations were repeated 23 times based on 23 different (by one protein) 22-member training subsets. The prediction of the structural descriptor for each protein when left out of the training set was then compared to the crystallographic parameters. Error characteristics of the ‘prediction’ calculations were combined from the 23 independent calculations to give a measure of the prediction quality of the model.

Factor Analysis and Complete Selective Multiple Linear Regression Calculation

The classical approach for transforming the continuous spectral information into a dis- crete representation of its underlying bandshape variability is to express each spectrum as a linear combination of basis spectra. These can be either ‘external’, for example, the spectra derived from model polypeptide measurements, or ‘internal ’, generated by some mathematical transformation of the training set spectra. The concept of internal basis spectra (we will call them subspectra for simplicity) has been used in the majority of recent analyses of protein ECD. They are often (but not necessarily) constructed to be orthogonal and can be combined through the coefficient matrix [Cij] into the experi- mental spectra [OXv)] according to the following equation :

In our study, the subspectra 4,{v) (j = 1, . . . , p ) and coefficients [C,] are calculated analytically as principal components using a factor analysis algorithm.? This method combines the correlated intensity variances that are of roughly a similar order of magni- tude into each subspectrum in the order of decreasing significance. In this way, the number ( p ) of subspectra necessary to describe the variability of the experimental spectra is reduced. Mathematically, this reduction in the number of subspectra is equivalent to finding of the rank of the matrix [Oi(v)] or to determination of the number of linearly independent column vectors (subspectra) in the experimental data. This determination is affected by the noise level in the spectral data.28 In our study, p was chosen in such a way, that by using eqn. (l), the subspectra could reproduce > 98% of the total variance in the spectral intensities.$ For the amide I’ and I1 VCD data this criterion was met by six subspectra.8

In eqn. (l), the coefficients [Cij] provide discrete representations of the variability of the experimental VCD, uniquely characterizing each protein sample. This representation has the convenient form of a p-component vector, in which each component represents a different (and independent) spectral feature, that can be visualized by the bandshape of the related subspectrum. This allows easy implementation of a selective algorithm for

f Alternatively, the singular value decomposition computational scheme” or other projecting algorithms16 can be used.

$ In the factor analysis scheme, this percentage of total input variance is quantified by the percentage of the cumulative sum of the eigenvalues of the correlation matrix of the analysed spectra, as related to the trace of this matrix (which is the number of analysed spectra).28

0 For electronic CD spectra collected on the same set of protein samples, five subspectra were found to be sufficient to meet the same criterion, which agrees with reports of other researcher^.'^*'

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

294 Quantitative Analysis of V C D Spectra of Proteins

determination of FC values. For the transformation of the coefficients [Cij] of eqn. (1) into FC, values, we chose to use a series of the multiple linear functions of the forms:

Here (FC,)" are FC values calculated from regression with n coefficients, a" are the corre- sponding intercepts and bfk are 'weights ' of individual subspectral contributions, which are optimized in a least-squares multiple linear regression scheme. The selective aspect of our calculations arises by letting n, the number of coefficients selected for a particular regression vary from 1 to p , the full range of k in cik. For each n, all possible com- binations of c i k values, consistent with n are tested. This algorithm provides us with the multiple transformations of subspectral coefficients into the FC values, in which different spectral features (subspectra) and all possible combinations of them, are tested as the source of structural information.? Based on the values of the multiple regression coeffi- cients r, calculated for each fit, the best-fitting combination of coefficients, C i k , was selec- ted for each n (n = 1, 2, . . . , p ) . In the 'prediction' calculations, only these selected best regression equations were tested for their ability to predict the FC values from the VCD spectrum for an 'unknown' protein. This substantially reduces the calculation time and can be rationalized by the assumption that the best positioned regression hyperplane will be minimally affected by the changes of the training set that are necessary for the prediction calculations.

Prediction Testing in Factor Analysis Calculations

In the next step, all the multiple regression functions [eqn. (2)] which depend on n = 1, 2, . . . , p c i k coefficients were reoptimized 23 times for each reduced (22-member) training set. These equations were then used to predict the FC, values for the protein left out of the regression step using its Cik coefficients. These predicted FC, values were then com- pared with the X-ray determined FC values for the protein and, from the resulting differences accumulated over all 23 predictions, the error descriptors for the prediction calculation were determined for a given n.

Combination of the Information from Amide I' and Amide I1 Regions

Selected multiple h e a r transformation of C i k values into FC descriptors has another computational advantage. We can easily combine the structural information derivable from the variability of VCD over both IR frequency regions studied (amide I' and 11). First, in accordance with our 98% criterion for p selection described above, we trun- cated the spectral coefficient vectors [ Ci j ] originating from individual factor analyses of the amide I' ['C,] and amide I1 ["Cij] VCD to contain only six coefficients for each protein i (representing the most significant subspectra in both regions). Then these vectors were combined into a (12 x 23) coefficient matrix ['C,, "Cij]. This truncation step effectively eliminates the contributions of subspectra most heavily affected by the experimental noise. At the same time, it substantially reduces the number of com- binations necessary for complete testing of all the possible n-coefficient equations [eqn. (2)]. The matrix C1Cij, IrCij] was then subjected to the complete regression scheme as described above. 1

t As an illustration, in the combined analysis of VCD of amide I' and amide I1 spectra, which are described by 12 (= 6 + 6) C, coefficients, the complete set of 4082 combinations was tested in five regression cycles for the five different sets of FC values.

1 The results of calculations combining the information from both IR regions with that from ECD in the matrix ['Cij, "Cij, ECDCij], involving 6 + 6 + 5 coefficients, will be described in detail ~eparately.~'

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 295

Design of Neural Network Prediction Training Algorithm

Neural network transformation of VCD spectra into a novel matrix descriptor of protein structure (described in a later section) was performed using the Neuralworks Professional II/Plus (v. 5.0, Neuralware, Inc. Pittsburgh, PA) package. The back- propagation paradigm for supervised training of the network was used. We used a hyperbolic tangent as the transfer function and the delta or normalized cumulative delta rule for error function updating.2g As the input information, we directly used 50 normal- ized amide I’ VCD intensities at equidistantly selected frequency points. This corre- sponds to a frequency spacing of 4 cm-l, which is less than the spectral resolution of the dispersive experimental setup used to generate the training set deta, so we believe that no substantial experimental information is lost.

The training of the network was performed 23 times on the reduced 22-member training sets with the left-out protein used as a basis for tests of the network prediction abilities. After each 1000 iteration steps of the learning cycle, the weight adjustment was interrupted, the test protein spectrum was input into the network, and the predicted values of the structural descriptor were compared with the X-ray ones. If the deviation of the predicted values from the X-ray ones was smaller than in previous testing (and only in this case), the complete network was saved on the disc. Then the training cycle was allowed to proceed by another 1000 steps to another prediction test. The iteration was interrupted after no improvement in prediction was observed in 10 consecutive test cycles. After that, the best predicting network was re-loaded and used to calculate the final prediction of the tested protein structural descriptor from its amide I’ VCD spec- trum. In this way, the networks can be considered to have been trained according to the ‘best prediction’ strategy discussed above. This procedure was again repeated 23 times for individual training subsets. Based on the results of these independent calculations, the deviations of the predictions from the crystal structure values for the proteins left out from the training were then again combined together for determination of the average error characteristics of the network prediction capabilities.

Design of the Neural Network Topology

Traditionally, the back-propagation network consists of an input layer (in our case com- posed of 50 neurons, corresponding to the amide I’ VCD intensity at 50 frequency points), an output neuron layer (in our case this could be the five FC values or elements of the matrix descriptor to be described in the last section) and one or more hidden layers. The multiplicity of the hidden layers and the number of neurons in them (together with the number of neurons in the input and output layers) is called the network topology and has a decisive role in the algorithm performance.

We tested two network topologies. The first one was classical, with input and output layers as described above and with two fully connected hidden layers of 13 and 10 neurons. The second topology was designed to provide the network with the possibility of evaluating input VCD spectra in two different branches of the network with different weight updating strategies which we will refer to as the A-B branch network. The resulting two different compressions of the spectral input were then combined into the values of the protein structural descriptor by fully connecting the output of the two disjunct hidden layers to the output neurons.

The numbers of neurons used in these calculations were set after several test training cycles with the same network designs but with larger numbers of neurons in all hidden layers. The final design was set by elimination of those neurons that were not carrying relevant information. We first used the Hinton diagram2’ to visualize the synaptic weights for this purpose and then checked the numerical values in the trained network and eliminated J w i j J 2 < 0.01. A final cycle of test calculations was done with the opti- mized topologies. We found that the best information throughput in the A-B branch

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

296 Quantitative Analysis of V C D Spectra of Proteins

network was achieved by using the delta rule for error function updating in branch A and the normalized cumulative delta rule in branch B.? For classical design of the network topology, the normalized cumulative delta rule was found to be optimal.

Error Characteristics

For characterization of both fit and prediction error in determination of the structural descriptor for individual proteins we use the average deviation, di, defined as

di = (l/N,) 1 I ’FC, - ‘FC,, 1 j

(3)

Here, N , is the number of elements used in the structural descriptor (for FC values it is 5, H, S, B, T and C fractions), i = 1, 2, . . . , 23; j = 1, 2, . . . N , and ’FC, and ‘FC,, are elements of the structural descriptor obtained from VCD spectra and calculated from the Kabsch-Sander reduction of the X-ray crystal structures, respectively. For character- ization of the error in the determination of the particular element of the structural descriptor (e.g. FChelix values) we compute the standard deviation, aj

(4)

where N = 23 is number of proteins considered, i = 1, 2, . . . , N , and the other symbols have the same meaning as above. For ease of comparison, we use the relative standard deviation, which is the relation (in %) between the standard deviation [eqn. (4)] and the dynamic range of the corresponding element of the protein structural descriptor:

Results

Here, we summarize the results of fit and prediction calculations performed on the amide 1’, amide I1 and on combined data from both regions. The results of this part of the study then form the basis for a critical evaluation of the information content in both the FC reference data and the VCD spectra. The evaluation provides a motivation for development of a new type of protein structure descriptor, which will be outlined in the last section together with the preliminary results of the neural network determination of its parameters.

Error Characteristics of Factor Analysis and Complete Multiple Linear Regression Calculations

Fit

Increasing the number of subspectral coefficients, C , , included in the regression func- tions [eqn. (2)] resulted in a continuous decrease in the standard deviation of the spec- trally derived FC values. This was observed in the analyses of both amide frequency regions and for all secondary structures, as it must. However, note that for the combined amide I’ + amide I1 calculation, we observed lower values for standard deviations with the same number of parameters than were obtained in fits to data from the individual regions alone. For example, for the amide I’ fit, the decrease of relative standard devi-

-f Practically, this results in different timing of the synaptic weight updates in both branches, which intro- duces a ‘phase shift’ in the delivery of information from both branches into the output layer. This is very similar to the physiological transmission of the three-dimensional perception of hand movement, where the full information is transferred only if there are two fully operational neural connections from the hand, one trans- mitting directly to the brain, the other with a phase delay through the spinal chord. Actually, this was the original ideal behind the A-B branch topology design.

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 297

ation for FChelix is from 13.5% for a single-coefficient regression function [eqn. (2), n = 11 to only 12.5% for the function containing six Cik coefficients. At the same time, the relative standard deviation of the six-coefficient amide I’ + I1 fit is improved to 9.7%. Similarly, improvements between 4 and 9% in terms of the relative standard devi- ation of six-coefficient fits were observed with the combined calculation for other sec- ondary structure types. In terms of precision, there are two classes of fit to secondary structure types, the first for helix, sheet and ‘other’ with optimal relative errors of ca. lo%, and the second for the much less well defined ‘bend’ and ‘turn’ structures whose relative fit errors are ca. 20%. For ‘turn’, the best multiple-regression relationships are statistically insignificant even at a low confidence level ( < 85%).

Predict ion For the amide 1’, I1 and I’ + I1 analyses, Fig. 2(a), (b) and (c), respectively, summarize the dependence of the relative standard deviations of FC predictions on the number of subspectral coefficients included in the regression functions [eqn. (2)] for the three best represented secondary structures, i.e. helix, sheet and ‘other ’. Not surprisingly, the errors plotted in Fig. 2 are larger than the respective fitting errors, as indicated by the more limited comparison in Table 2 for just the best predicting sets. The most striking result of these analyses is that in all cases, the best prediction is generally achieved with a much smaller number, n, of subspectral coefficients than the maximal number, p . The forms of the best prediction equations which yielded the minimum 0 values in Fig. 2 are indicated by Xs in Table 2. Interestingly, the minima in the ore,-n dependence for all secondary structures are more obvious in the combined analysis than they are for the individual regions. Another important result is that for the combined amide I’ + I1 cal- culation, we generally observe improvement of the prediction error, compared with pre- dictions utilizing only the information from individual regions.

Comparison of Individual and Combined Analyses The combination of VCD subspectra from the amide I’ and I1 regions [see Fig. 2(c)] results in improvement of the prediction of the FC values for all cases except ‘turn’ which stays the same. The results reveal that helix and sheet fractions are predictable with relative variance between 15 and 20% from VCD spectra in individual regions. The improvement of the helix and sheet predictions with the combined amide I’ + I1 data set

10 L

40

.5 35 c. .- 2 30 U P

25 c (0 + 2 20 > .- c.

15 2

10

7t ...

1 2 3 4 5 6 1 2 3 4 . 5 6 0 2 4 6 8 1 0 1 2 no. of coefficients no. of coefficients no. of coefficients

Fig. 2 Dependence of the relative standard deviation of predicted FC values on the number of subspectral coefficients used in eqn. (2) for (a) amide I’ analysis, (b) amide I1 analysis, (c) combined

amide I‘ + I1 analysis: (m) helix, (V) sheet, (e) other

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

Tabl

e 2

Sum

mar

y of

the

form

of

the

best

pre

dict

ing

mul

tiple

-reg

ress

ion

equa

tions

[eq

n. (

2)]

rela

ting

VC

D s

ubsp

ectra

l coe

ffic

ient

s 1-

6 to

FC

val

ues

for i

ndiv

idua

l sec

onda

ry s

truct

ures

amid

e I'

subs

pect

ra

amid

e I1

sub

spec

tra

~

anal

ysis

" 1

2 3

4 5

6

helix

am

ide

I' am

ide

I1

amid

e I'

+ I1

shee

t am

ide

I' am

ide

I1

amid

e I'

+ I1

turn

am

ide

I' am

ide

I1

amid

e I'

+ I1

bend

am

ide

I' am

ide

I1

amid

e I'

+ I1

othe

r am

ide

I' am

ide

I1

amid

e I'

+ I1

1 2

3 4

5 6

rb

bret

(fit)

O

reld

(pre

d*)

B

12.8

15

.1

g. k 0.

857

14.8

17

.3

E

%

0.28

3 25

.8

28.8

TI

-

x-

-

x-

0.4

85

23.5

26

.8

b x z

-

2 -

X

0.68

8 18

.3

20.5

%

X

0.78

2 15

.7

19.1

? 2 2.

0.62

1 20

.5

23.8

CA a

-

-

-+ 0.

850

-

-

-

-

f3 F'

-

0.80

8 14

.4

16.9

-

0.88

9 11

.2

13.8

x

x-

-

-

xx

-

-

-

n

-

-

-

-

-

-

-

19.5

2.

xx

x-

-

0.8 1

4 16

.7

-

xx

- -

-

0.90

3 12

.3

15.5

7

-

-

-

-

-

-

-

0.1

x-

-

x-

0.4

85

23.5

26

.8

-

0.59

7 20

.2

23.3

-

-

-

-

-

x-

-

xx

- -

-

3

-

-

-

-

-

-

x-

X

0.72

2 18

.1

21.6

-

0.90

3 11

.2

15.3

x

- -

x-

-

-

-

~

" X m

arks

tho

se s

ubsp

ectra

l coe

ffic

ient

s tha

t ar

e us

ed in

the

bes

t pr

edic

ting

form

of e

qn. (

2) fo

r th

e am

ide

1', a

mid

e I1

and

am

ide

I' +

I1 a

naly

ses.

Th

e m

ultip

le-r

egre

ssio

n co

effic

ient

for

fitt

ing

each

opt

imal

equ

atio

n in

dica

ted.

' R

elat

ive

stan

dard

dev

iatio

n of

fit

of F

C v

alue

s us

ing

the

best

pr

edic

ting

form

of

eqn.

(2) a

nd a

vera

ged

over

the

23

prot

eins

in

the

train

ing

set.

Rel

ativ

e st

anda

rd d

evia

tion

of p

redi

ctio

n of

FC

val

ues

aver

aged

ov

er th

e 23

left-

out p

rote

ins

in th

e tra

inin

g se

t.

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 299

is only ca. 5%. Contrary to this, the ‘other’ fraction is relatively poorly represented using individual region spectra, but we observe the largest improvement (ca. 10% in orel) in its prediction upon combination of amide I’ and I1 information. Bend and turn struc- tures are characterized by largest prediction error, orel in both cases is greater than 20% with only small improvement (ca. 3-5%) upon combining amide I’ and 11.

The form of the selected prediction regressions is stable. In the combined amide I’ + amide I1 predictions, 76% (13 of 17) of the subspectra which were found in the best equations for predictions based on individual regions are also included in the best pre- dicting combined-region regression functions. A more detailed look at the information flow in the transformation function shows that for the helix and sheet fractions, the dominant contribution to the FC predictions stems from the second subspectrum. The opposite signs of the weighting for the coefficient of the second subspectrum of the amide I’ found for these two structures suggest that an anticorrelation exists between the FChelix and FCsheet Values.

Checks of the Reliability of the Results In principle, the empirical character of our algorithm can be a source of computational artifacts or inconsistencies. We should therefore check the mathematical results of the best predicting equations for consistency with external conditions valid for FC values which were not directly implemented into the algorithm. There are only a few (two to four out of 115) FC values predicted to be negative and even these are smaller in absol- ute value than the average error in prediction. Another natural external condition for FC values is that they must add up to 100%. For our approach, it is even more signifi- cant to test this, as each individual secondary structure is treated independently with no constraint concerning the prediction of the others. If we apply the summation condition to eqn. (2), it can be rearranged into the following form:

The sum of the constant terms, a”, in our best predicting regression equations is 104%. The term on the left-hand side of eqn. (6) is thus effectively 0 (+4%) which, in turn, shows that the five ‘vectors’ of bFn coefficients are linearly dependent (since C,s are non-zero). This shows that the mathematical ‘structure ’ of the best predicting transform- ation of subspectral coefficients into FC values does not contradict the functionality of the reference data parameters, where, indeed, there are only four independent FC values. In accordance with this, the rank of the matrix of bFk constants was found to be four by the number of non-zero eigenvalues of the matrix [b$] - [b;J’. For selections of coeffi- cients in the regressions different from the best predicting ones, the sums of the absolute terms of regression equations were found to deviate more significantly (by 14-25Oh) from the ideal 100%. This shows that the best predicting selection is optimal not only in calculational performance but also in terms of consistency with external (molecular- based) conditions.

Discussion

Owing to its mathematical nature, the results of our quantitative analysis of VCD spectra are relatively straightforward to interpret. By including more than the optimal number of spectral features we actually deteriorate the potential of the analysis algo- rithm to predict accurately the FC values. The subspectral coefficient matrix is reduced to a very sparse form through the process of optimizing the predictability of FC values. In spectroscopist’s language this means that we have to use only a limited part of the experimentally available spectral variability to form the best predicting algorithm. It is

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

300 Quantitative Analysis of V C D Spectra of Proteins

appropriate to reiterate here that we tested all possible combinations of independent spectral components for the best regression relationships. The completeness of this search should exclude the possibility that this conclusion is a computational artifact.

FC values representing the average contribution of each secondary structural type to the protein fold are traditionally considered to be the ‘appropriate’ reduction of detailed structural information to a level consistent with the structural sensitivity of optical spec- troscopic techniques. For purposes of quantitative protein structural analyses, the spec- tral bandshape is normally assumed to be composed of a linear combination of invariant basis spectra multiplied by FC values. Given some deconvolution method, the FC value can be determined from a Beer’s law type concept. Molecularly this means that any protein is considered to be a simple ‘mixture’ of non-interacting parts of poly- peptide backbone folded into predefined conformations. Once this conceptual model was established, quantitative analyses were naturally directed towards the most accurate method of extraction of FC values. Our results imply that instead of the question of ‘how’ to extract the FC descriptor from VCD spectra, it is more appropriate to formu- late the question of ‘what’ is the complete structural information we can really obtain.

We can start the reformulation of the problem of quantitative analysis of protein spectra by a critical analysis of our results obtained with the usual FC descriptor. First, we wish to emphasize that, despite our inability to reduce the error in the FC determi- nation sharply, the relationship between the secondary structure composition of a protein and its VCD spectra is significant. There is no doubt that the conformation of the polypeptide backbone dominates the geometry-dependent part of the coupling of the amide vibrational modes and is thus the main source of the structurally sensitive varia- bility of the vibrational rotational strength. From another perspective, qualitative agree- ment of theoretical calculations3 ’ with typical experimental spectra for respective regular secondary structures confirms this contention. The high qualitative discrimi- native power of VCD spectra as documented in our earlier work^^-^ and also in the accompanying paper by Keiderling et aZ.24 leads to the same conclusion. Paradoxically, this significant qualitative advantage of VCD is not obvious in comparing these quanti- tative studies to those with other spectroscopic techniques. In various attempts to quan- tify VCD spectra, there seems to be a limiting error creating a barrier beyond wbich the analysis seems not to be able to penetrate. If formulated in terms of relative standard deviation, this barrier is at 10-15% for prediction. At the same time, this barrier was also found for our parallel ECD spectra a n a l y s e ~ ~ ~ ~ ~ and is presumably similarly present in other studies.’2-’7 One reason for such a behaviour could be a relationship between the FC values. We can trace this feature by spectroscopic interpretation of the mathe- matical results. In our method, the subspectra are the objects used to model the relevant bandshape features. They are generated in a purely mathematical way and as such they need not be directly related to the spectral characterizations of the respective secondary structures. On the other hand, some of them closely resemble the experimental spectra of proteins with a dominant fraction of one secondary structure type. For example, the amide I’ subspectrum (2) resembles a typical a-helical VCD and subspectrum (1) resem- bles that for a protein with high P-sheet content. At the same time, the amide I’ VCD coefficients, C i 2 , of the helical-like subspectrum play a dominant role not only in deter- mination of helical fractions but also for ‘sheet’ and ‘bend’ in the sets of subspectral coefficients corresponding to the best predicting equations for these conformational types (Table 2).

In structural terms, this dominance of coefficients related to the a-helix component of the spectrum in the quantitative expressions for other structural types can be explained by the existence of an interrelationship among the fractions of various secondary struc- tures. It is natural first to search for such relationships with the helical fraction, because in globular proteins FChelix has generally the largest dynamic range and, at the same time, is well defined both structurally and spectroscopically. We have shown recently by

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P . Pancoska et al. 301

I

al al z

$

“V

40

30

20

10

0

1- I

-10 -1 I 0 20 40 60 80

F C h , I , x

Fig. 3 Interrelationships of the crystallographic FCsheet values (--) and the a-helical fraction of the proteins in the training set. Most probable estimates of FCsheet from known FChelix using the

quadratic relationship from ref. 32. (0) FC (X-ray), (V) FC (pred.).

analysis of a large set of globular protein crystal structure results, that such a relation- ship does exist.32 If FChelix is known for a protein, then there is a non-linear (quadratic) function of FChelix allowing one to calculate the most probable fractions of other sec- ondary structures in that protein. The ‘spread’ of real FC values around these most probable values (or the error in this FC estimation from the helical content) has a normal (Gaussian) distribution with a width (variance) that again depends on FChelix . This error in FC estimation decays exponentially with increasing fraction of the helical structure. In Fig. 3, it is shown that the FCsheet values used for our protein training set resemble the most probable relationships with a distribution that is much tighter for the /?-sheet than for the whole PDB data set.32 The most obvious difference between this relationship for the whole PDB data set and our training subset is observed for low values of FChelix, where in the whole set a wider spread of the actual FC values around the most probable function is observed. (For the P-sheet this can be as large as &25% for FChelix close to zero.) This means that an accurate knowledge of FChelix would be sufficient to make a reasonable estimation of the fractions of /?-sheet secondary structure and would provide a good guess for the other secondary structures in our training set pr0teins.t This is suggested by comparison of the last column of Table 3, listing the average errors of the FC estimates based on the above-discussed relationships, with the FC prediction error averaged over the same secondary structures from amide I’ VCD spectra. When the errors for all non-helical parts are averaged, less than half of predic- tions based on the VCD from individual regions are better than the most probable estimates calculated from the X-ray determined FChelix . This is only somewhat improved for the predictions based on combination of both spectral regions. Note that these comparisons are in a sense worst case, since there is no independent method of determining FChelix perfectly, which is effectively what was used here and since the summation of error includes the ‘turns’ and ‘bends’ on an equal footing. Thus, while qualitatively VCD really does determine contributions from /?-sheet, the quantitative determination is strongly influenced by the internal correlations of FCsheet to the a- helical component and is not carried out on a completely independent basis.

t The strong helix-sheet fraction anticorrelation in the structures of the training set proteins effectively reduces the number of independent FC values to three in the set, and, at the same time, reduces the dynamic range available for the remaining secondary structural types to a level that is more difficult to characterize quantitatively in the spectra.

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

302 Quantitative Analysis of V C D Spectra of Proteins

Table 3 Comparison of average errors of FC values predicted from VCD spectra and average errors in FC values calculated from FChelix using crystallographic interrelationships'

protein amide I' amide I1 amide I' + I1 estimated

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23

chymotrypsinogen alcohol dehydrogenase chymotrypsin concanavalin A carbonic anhydrase cytochrome c elas t ase glutathione reductase haemoglobin immunoglobulin lactate dehydrogenase lysozy me myoglobin papain rhodanese ribonuclease A ribonuclease S subtilisin superoxide dismutase thermolysin triose phosphate isomerase trypsin inhibitor trypsin

2.92 2.19 1.96 7.16 0.53 3.73 6.08 3.97 5.64 6.59 5.09 7.57 6.78 5.46 5.2 1 3.77 2.70 1.97 3.21 3.44 3.52 6.43 3.01

2.43 2.1 1 2.38 3.30 0.52 6.70 4.1 1 3.72 5.80 7.93 3.52 4.66 8.26 7.14 4.67 1.40 6.45 3.80 5.34 2.68 4.40 6.46 3.88

4.86 2.3 1 2.75 3.99 1.56 4.15 5.29 4.24 3.13 7.22 3.98 3.60 4.96 4.78 4.09 4.36 3.05 3.02 2.74 1.63 3.61 3.69 1.94

2.01b 1.62 1.69 3.56 1.75 7.60 2.85 3.51 6.15 4.26 2.42 4.69 8.84 4.59 3.73 6.24 6.39 2.14 4.48 3.12 5.48 4.3 1 2.9 1

standard deviation 5.82 4.95 4.50 4.18 relative standard deviation (%) 12.20 35.4 24.3 19.1

' Interrelationships taken from ref. 32. protein.

Bold numbers show the lowest average error for each

This inherent internal correlation of the reference structural information is an impor- tant (and usually not explicitly stated) factor affecting not only our, but also generally all, quantitative analyses of protein spectral data, and seems to be a general property of crystallizable globular proteins (based on the structures in the PDB32). The extensive overlaps of the protein training sets used for different quantitative spectral analyses serves to conserve the type of FC dependences depicted in Fig. 3 for most algorithms. With knowledge of this intrinsic relationship, we can further understand the details of the form found for the best predicting functions in our analysis.

As mentioned above, the amide I' Ci2 are highly weighted in the functions that best predict helix, sheet and bend fractions. The bandshape variability of the amide I1 VCD for different ?rotein conformational types is smaller than that for the amide 1'. Neverthe- less, the intensity of the amide I1 VCD band tends to increase for proteins with very high a-helical content. In the factor analysis scheme used for subspectra generation, the major part of the intensity changes is carried by the first, most significant, subspectrum. Since FChelix is correlated to the coefficient of the first amide I1 subspectrum, which is in large part due to correlation to the amide I1 intensity, the prediction functions for the amide I1 can also be seen to depend on the helix content for other secondary structure types.

T3ese facts can be combined into an internally consistent explanation of our results. The a-helical contribution to the spectra has a well defined character, but the sheet, bend, turn and 'other' types of secondary structures are more variable in their spectral

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P . Pancoska et al. 303

manifestations. For example, in sheet structures the orientation of the interacting amide chromophores can cause the resulting VCD spectrum to be sensitive to the sense and degree of twist of the ideal ‘pleated sheet’ conformation, which are general character- istics of sheet-containing proteins. ‘Bend,’ ‘turn’ and ‘other’ parts of the backbone are poorly defined structurally and encompass a variability in conformation that provides the backbone with the structural flexibility needed for proper function of the protein. A more realistic model for the characteristic spectra would be one having a distribution of basis spectra bandshapes for each secondary structural type. In such a model, the helical part of the secondary structure would be expected to be characterized by a relatively ‘narrow’ distribution; for other structures we can expect the bandshape distributions to broaden.

The reference structural data, traditionally expressed as FC values, are naturally tied to the single-bandshape model. Quantitative analysis schemes, including ours, do not implement any weighting of the reference FC values with regard to the secondary struc- ture type which would permit utilization of such distributions for the corresponding spectral features. Consequently, the reference data set employed for the regression opti- mization of an empirical analysis algorithm such as the one we use represents a rigid numerical template into which one seeks to transform the variability of the experimental spectra. At the same time there is a significant interrelation between any FC and the FChelix values in the reference data set. In the case of the P-sheet this means that the method is forced to look for spectral signatures correlated to FCsheet which, according to the reference data template, also have a strong functional relationship to the presence of the or-helical spectral signatures. By contrast, in a weighted scheme with a distributed basis spectra model, the sheet representation would be assumed to be more ‘fuzzy’ than the helical one. That means that sheet spectral features would be allowed to be less consistent with the model upon which the quantitative determination of the FC struc- tural descriptor is based,

Returning to our fixed-pattern model, the interrelated helix secondary structure frac- tion will conform much better to FCsheet than does a ‘fuzzy’ P-sheet spectral type. Under these circumstances, the best solution found with a complete regression search is one that utilizes interrelations in the reference data and therefore has a dominant relation of the FCsheet fractions with the variability of a more ideally behaving helix spectral repre- sentation. This is what we see in our best predicting algorithms for the P-sheet. Despite the enhanced qualitative sensitivity of VCD to the P-sheet f r a ~ t i o n , ~ . ~ that information is underutilized in the quantitative algorithm owing to the very strong correlation of P-sheet to a-helix in the training set. This general problem leads to a major change in the direction of our research on the use of optical spectra to determine protein structure.

New Structural Descriptor for Quantitative Spectroscopic Studies of Proteins

The results discussed above show unambiguously that, by looking for a descriptor of protein secondary structure based only on average contributions, or the typical FC values, we do not completely utilize the information encoded in VCD spectra. A natural, constructive question is : what are the structural features behind the clearly experimen- tally observable spectral features which have to be neglected in the FC-based quantifica- tion of protein VCD spectra? Also, how can we quantify these new features mathematically for utilization in an improved structural analysis? We can summarize some of the requirements for the construction of a new structural descriptor as follows. (1) The new structural descriptor should not negate the concept of regular secondary structures, as they form the basis for the dominant qualitative bandshape patterns seen in VCD spectra. (2) It needs to be able to quantify some ‘super-secondary’ structural feature that goes beyond the concept of average contributions with the goal of utilizing the spectroscopic information not currently exploited for the prediction of secondary

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

304 Quantitative Analysis of V CD Spectra of Proteins

structure fractions. (3) At the same time, the newly included structural features must have the capacity to influence the rotational or oscillatory strength of vibrational tran- sitions, i.e. to be observable in the experimental spectra.

We are currently in the process of optimizing the design of such a descriptor. As a starting point, we developed a matrix scheme which is consistent with the above require- ments and is flexible enough for further development. The basis for its formulation is the notion of defining and enumerating independent secondary structure segments. For our purposes, a segment will be considered to be a continuous sequence of amino acids in the protein polypeptide backbone that is folded into only one secondary structure con- formation of interest in the analysis. The assignment of amino acids to the segments is currently defined externally by some classification scheme such as the DSSP algorithm. Segments of the respective secondary structures form the molecular basis for the charac- teristic spectroscopic features attributable to each secondary structure type [condition (l)]. Structural features that go beyond the average contribution of the segments but still have the capacity to influence spectroscopic properties include the following. (a) The length of a segment. The length dependence of the VCD intensity for helical segments is well documented using data obtained with oligopeptides.’ Similarly, length dependences in ECD are understood well enough that correction terms have been introduced into some ECD quantitative analytical schemes of proteins25 and have even been recognized in some FTIR analyses.”

(b) The perturbations of the ideal conformation at both ends of a secondary structure segment. The contact regions where a segment corresponding to one secondary struc- ture transforms into another structure necessarily make up a part of the peptide back- bone whose conformation must deviate at some level from the ideal secondary structure fold. This modified conformation, in turn, would not be expected to occur in the middle of a long regular structure segment. That these have important spectral consequences is suggested by the finding that a distinction between ordered and disordered helices was necessary for optimal quantitative analyses of protein FTIR ~ p e c t r a . ’ ~ . ~ ~ In terms of our understanding of the spectral characterization of secondary structures through the dis- tribution of basis spectra bandshapes, the width of this distribution will be larger for a molecule with a large number of separate segments than for a comparable (FC-based) molecule with longer, hence more uniform segments.

(c) Overall flexibility of the molecule. Any empirical quantitative structural analysis of protein spectra using crystallographic reference information is really an attempt to determine what the crystal structure of a protein would exhibit based on its solution properties. It is generally accepted, and in some cases supported by NMR solution structures, that the protein conformation does not change significantly in solution. On the other hand, we cannot completely exclude some spectroscopically observable differ- ences occurring between the solid-state and solution conformations. Such changes should be spectroscopically observable for flexible protein molecules, which, in turn, should correlate with those proteins having a larger number of segments or having segments connected with flexible loop regions.

The simplest mathematical object that allows quantification of the above-discussed features is a matrix whose diagonal elements enumerate the number of different second- ary structure segments of each type and whose off-diagonal elements enumerate the contacts of these segments. An example of the construction of this matrix descriptor for two hypothetical proteins (Fig. Al, later) is described in the Appendix. This example was constructed to demonstrate the ability of the matrix descriptor to quantify structural features that cannot be described by the FC descriptor. The two protein chains were designed to have very different folds yet to result in identical secondary structure com- position on the basis of FC values. Despite this, the difference in the fold of these two proteins as evidenced in their ‘super-secondary structure’ is clearly reflected in the matrix descriptors enumerated in Table A1 (later).

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 305

There are several advantages and disadvantages of this new descriptor of the protein structure which will be discussed in detail elsewhere.34 Here we present some prelimi- nary results of a neural network analysis of only the amide I’ spectra in terms of the relatively simple 3 x 3 [HSC] matrix descriptor which is restricted to enumeration of ‘helix’, ‘sheet’ and ‘other’ segments and their interconnections. These serve clearly to demonstrate that it is possible to determine more than just FC values from optical spectra by using VCD spectra as a test case. Table 4 summarizes the prediction results for the better predicting ‘A-B branch’ network. The A-B branch topology, designed to discriminate among information from different frequency components of the amide I’ VCD band, has a much better predictive performance than the classical back- propagation network. We therefore restrict further discussion here to the results of the A-B network analysis.?

Overall, the elements of the descriptor are predicted in this manner with fairly good accuracy. Those values that are large are generally predicted to be so and the same for the very small elements. As usual, there are some exceptions. Two proteins are predicted very poorly, concanavalin A (the only protein of our training set with no helix in its structure) and lactate dehydrogenase (a protein whose amide I’ VCD spectrum resem- bles those of haemoglobin and myoglobin, FChelix > 70%, but has only 41% of helix according to the X-ray data). Concanavalin provides an extreme and, perhaps more importantly given our prediction scheme, unique example of a descriptor form in our training set owing to the absence of a helix. Its inclusion or exclusion from the network training may therefore significantly affect the generalization abilities of the network for proteins that are low in helix and high in sheet fractions. Lactate dehydrogenase pro- vides an example where there is a striking qualitative difference between the reference structural data and the VCD bandshape. If these two extreme examples are excluded from the calculation of the variance in segment number prediction, the standard devi- ations in prediction of the numbers of individual segments is typically less than 2 (the average standard deviation for all nine elements is 1.55). These error characteristics can be made more easily comparable with the errors in the FC descriptor values by express- ing them in terms of relative variance (i.e. variance related to the dynamic range of the corresponding matrix elements).$ Table 4 summarizes these relative errors. It is seen that (with exception of contact elements with very low dynamic range 0-1, as is seen for the contacts of helix and sheet segments) the relative prediction errors are comparable with the relative errors in FC prediction and, with the exclusion of the two outliners, the matrix errors are even better. This preliminary test demonstrates that VCD spectra can be used to predict a completely different picture of the protein that yields a higher level of structural insight than has been previously possible with optical spectroscopic data.

Conclusions

We have designed a mathematical method that allows us to investigate, in detail, the reasons for the discrepancy between the qualitative sensitivity of protein VCD spectra in the amide transition regions and the failure of the quantitative analysis to utilize this qualitative advantage to attain a substantial improvement in the prediction of fractional concentrations of various secondary structures from these spectra. We have a large set of protein VCD data, which is quite comparable to similar sets used for empirical studies with the related spectroscopic techniques of ECD or FTIR. Our computational scheme included a complete search over all possible combinations of independent components

t There is potential for improvement of the predictive performances of the network by further optimizing

$ We nevertheless have developed a formalism, based on a statistical analysis of protein X-ray structures, its topology.

that allows us to recalculate the FC values from the integer matrix descriptor to a good appr~ximat ion.~~

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

Tabl

e 4

Com

pari

son

of t

he c

ryst

allo

grap

hic

valu

es o

f el

emen

ts o

f th

e [H

SC]

mat

rix

desc

ript

or o

f pr

otei

n st

ruct

ure

and

pred

ictio

ns f

rom

th

e am

ide

I’ V

CD

spec

tra

usin

g th

e A-

B ne

twor

ka

w

0

Q\

prot

ein

H

HS

HC

SH

S

sc

CH

cs

C

av

. err

orb

chym

otry

psin

ogen

al

coho

l deh

ydro

gena

se

chym

otry

psin

co

ncan

aval

in

carb

onic

anh

ydra

se

cyto

chro

me

c el

as ta

se

glut

athi

one

redu

ctas

e ha

emog

lobi

n im

mun

oglo

bulin

la

ctat

e de

hydr

ogen

ase

1 yso

z ym

e m

yogl

obin

pa

pain

rh

odan

ese

ribon

ucle

ase

A

ribon

ucle

ase

S su

btili

sin

supe

roxi

de d

ism

utas

e th

erm

olys

in

trios

e ph

osph

. iso

mer

ase

tryps

in in

hibi

tor

tr yp

sin

stan

dard

dev

iatio

n re

l. st

anda

rd d

evia

tion

(%)

stan

dard

dev

iatio

n (r

edr

rel.

stan

dard

dev

iatio

n (r

ed) (%

r

616

12/1

2

0/7

515

618

213

718

14/1

2 17

/18

15/2

8 113

918

715

618

314

313

919

1 I2

212

414 3.52

20

.7

1.7

10.0

1117

12/1

4 13

/10

616

11/1

2 213

01

7 718

515

618

14/1

2 17

/18

15/2

8 113

715

918

616

314

313

918

112

12/1

3 13

/10

212

414

1117

3.52

20

.7

1.68

9.

9

14/1

4 18

/15

16/1

3 17

/10

15/1

5

14/1

1 01

1

2012

1

10/1

0

01 1

816

719

11/1

2 9/

10

14/1

0

013 516 1113

1011

4

718

819

212

13/1

4

3.00

15

.0

2.1

10.5

13/1

4 17

/15

16/1

3 17

19

14/1

4

1311

1 19

12 1

1019

11

13

011

816

719

11/1

2 9/

10

1419

71

9 21

2

011

013

416 1011

3

718

12/1

4

3.16

16

.6

2.2

11.6

516

213

017

518

113

615

918 11/1

2

618

515

13/1

2 17

/19

15/2

8

618

314

313

112

12/1

0 21

2 31

3 3.58

21

.1

1.8

10.6

1116

918 12/1

3

14/1

4 17

/15

16/1

3 17

/10

15/1

5 01

1 14

/11

2012

1

013

1019

11

13

515

0/1

816

1011

3 71

9

11/1

2 71

8

819

9/10

14

/10

212

13/1

4

2.9 1

14

.5

1.96

9.

8

2012

0 29

13 1

19

/19

18/1

8 22

/23

616

2012

1

34/3

5 18

/19

12/1

2 27

/28

12/1

3 lo

p0

15

/15

22/2

3 11

11 1

1111

21

/22

1111

1 27

/29

2 1/2

2 51

5 17

/17

0.88

3.3

0.

92

3.2

0.22

1.

22

1.33

5.

0 0.

56

0.33

1.

89

1.22

1.7

8 1 .o

7.3

3 1 .o

0.

67

1.44

2.

78

1.22

0.

44

0.78

0.

67

2.67

1.

56

0.0

0.56

* Fo

rmat

: cry

stal

logr

aphi

c/sp

ectra

l va

lue.

The

pre

dict

ion

resu

lts i

n th

e ta

ble

are

base

d on

the

23

test

run

s of

the

tra

ined

bes

t pr

edic

ting

netw

orks

usi

ng t

he a

mid

e I’

VC

D o

f th

e pr

otei

n w

hich

was

lef

t ou

t fr

om t

he i

nput

trai

ning

set

. T

he a

vera

ge e

rror

is c

alcu

late

d as

the

ar

ithm

etic

mea

n of

the

abs

olut

e va

lues

of

the

diff

eren

ces b

etw

een

the

two

valu

es in

eac

h co

lum

n.

Stan

dard

dev

iatio

ns c

alcu

late

d fo

r th

e re

duce

d se

t not

incl

udin

g th

e la

ctat

e de

hydr

ogen

ase

and

conc

anav

alin

pre

dict

ion

resu

lts.

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 307

observable in the experimental VCD spectra of this protein training set. The multi- transition advantage of the VCD phenomenon allowed us to combine for the first time the information from two independent measurements of two different regions (ar2ide I’ and amide 11) on the same sample. The regression functions, which transform the spec- tral bandshape variability into numerical values of FC descriptors, were optimized with respect to prediction performance. The mathematical structure of the independently determined optimal regressions was checked against known external conditions. No apparent contradictions were found : in the resulting regression equations the informa- tion from the more variable amide I’ spectra was given higher weight over that originat- ing from the less variable amide I1 VCD spectra; and the sum of all fractions was ca. loo%, despite this not being considered by the algorithm. With all these satisfactory features, the error in the prediction of FC values was not significantly better than that in analyses utilizing experimental information from spectroscopic methods with qualit- atively less sensitivity to the protein structure.

The dependence of the predictive errors on the number of spectral components included in the regression equations explains this apparent contradiction. Only a limited number of independent components of the protein VCD spectra are used for the best predicting equations. This reflects the strong interrelationship between the fractions of various secondary structures, which is a general property of globular protein structure and is independent of the properties of VCD spectra.

These results demonstrate that: (a) The picture of the protein structure as reduced to an average secondary structure contribution (FC) descriptor represents only a part of the total information available from protein VCD spectra, and by extension all optical spectral techniques.

(b) The basic assumptions underlying empirical analysis methods involving FC values are in need of modification for use with VCD spectroscopy. VCD senses varia- tions not only in secondary, but also in selected super-secondary structural features. This structural sensitivity of VCD leads to a violation of the assumed invariance of the basis spectra, which should characterize each secondary structure type and forms the basis for spectroscopic correlations with FC values. In the case of protein VCD, the invariant basis spectra for each structural type should be more adequately represent- ed as a distribution of component spectra having roughly similar qualitative features. Since such a distribution of basis spectra is not normally used, a barrier to improvement of the regression step in the quantitative analysis occurs, limiting the ultimate accuracy obtainable.

(c) Owing to the strong interrelation between secondary structure fractions in globu- lar proteins, the relationship sought between the less well defined VCD signatures of the non-helical secondary structure types and their FC values is mutated by the regression step into correlations of these fractions with the more well behaved spectral compo:ients arising from the helical structure. The interrelation of structural fractions is further enhanced by the typical selection of reference proteins used for quantitative spectral analysis. This feature of the reference data must cause the same sort of problem in any analytical algorithm that depends on decomposition of various types of spectra to frac- tional contributions from basis spectrum.

(d) The implication of the above conclusions is that we should add new dimensions to our descriptors of protein structure to optimize the utilization of the full VCD spectral information content. A preliminary example of such an approach was demonstrated by the formulation of a new super-secondary structure matrix descriptor of the intercon- nectivity of structural segments, and its correlation with the amide I’ VCD spectra using a novel neural network approach. Information about contacts of secondary structure segments in the protein fold, the propensity of the structure to be flexible in solution and the coherence lengths of segments is now explicitly or implicitly encompassed in the new descriptor. The challenge to the users of this new chiroptical technique is to overcome

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

308 Quantitative Analysis of V C D Spectra of Proteins

the traditional barrier of thinking in FC terms and to start to ask new, more detailed questions about the protein structure. The information is there, we must extract it.

This work was supported by a grant from the National Institutes of Health (GM 30147) for which we are most grateful. Cooperation between groups at UIC and Charles Uni- versity was facilitated by a National Science Foundation International grant (INT91- 07588). The work at Charles University was supported, in part, by grant no. GAUK- 302. We wish to acknowledge the development of several of our sample handling methods discussed herein by Drs. Baoliang Wang, Vladimir Baumruk and Maria Urba- nova.

Appendix

Scheme for Construction of the Matrix Descriptor of Protein Structure

We have developed a matrix descriptor of the protein super-secondary structure in order to utilize better the information content of optical spectra. We feel that its construction is best understood by example. This Appendix develops such a descriptor based on all five secondary structure types used in this paper, but the specific example in the text used only the ‘helix’, ‘sheet’ and ‘other’ content of the secondary structure which we designated [HSC]. Below are the steps used to obtain the larger [HSBTC] descriptor. The methodology is general for any size of the matrix.

S t e p I Using a classification scheme based on X-ray data, assign each amino acid to a given type of secondary structure (e.g. H, S, B, T and C by the DSSP program).

S t e p 2 Define secondary structure segments as continuous stretches of amino acid residues with identical secondary structure classification ; the resulting scheme for two example pro- teins is as follows (subscripts denote the sequence number of the first and last amino acid residue in the segment)

protein 1

1 C2-3H2 3-24T29-30C3 3-34T4 1-42C45-46T5 1-5 ZH 72-7 3c74

protein 2

1 c6-7c 1 3-1 qT 1 7-1 8 24-2 5 T2 8-2 9 3 4-3 gT3 8-3 9H4 5-46T49-5 OH 5 6-5 7T60-6 1 6 8-6gC74

These two test proteins have the same secondary structure composition in terms of FC values :

FC1 = [56.8%H, O.O%S, O.O%B, 27.O%T, 16,2%C] = FC,

They differ in the super-secondary structure features as shown in Fig. Al.

S t e p 3 (a) Convert the structure into a linear scheme of segment layouts (Scheme A); (b) identify the types of contacts between the segments (for example, CH denotes the contact ‘other’ -+ ‘helix’, HC denotes the contact ‘helix’ --f ‘other’, respectively (Scheme B):

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

P. Pancoska et al. 309 protein 1

T protein 2

T

3 52

C

72 . c

n

T T

Fig. A1 Schematic representation of the ‘super-secondary’ structure of the proteins used in the example for construction of the matrix descriptor. Secondary structure segments are identified by the letters (helices as HlLH6); numbers are sequence positions of the first and last amino acid

residue in each helical segment.

Step 4 (a ) Calculate the number of H, S, B, T and C segments and put these numbers on the diagonal of the 5 x 5 matrix [HSBTC]. (b) Calculate the number of each type of segment interconnections (see layout of segments and contacts in the Table AI) and put these numbers in as the corresponding off-diagonal elements of the [HSBTC] matrix as it is shown in Table AI.

Table A1 Matrix descriptors for the proteins in Fig. A1

layout of segments and contacts protein 1 protein 2

H S B T C H S B T C H S B T C

H H HS HB HT HC 2 0 1 0 1 6 0 5 0 1 S S H S S B S T S C O O O O O O O O O O B B H B S B B T B C 1 0 3 0 2 5 0 5 0 0 T T H T S T B T T C O O O O O O O O O O C C H C S C B C T C 1 0 2 0 2 1 0 0 0 2

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online

3 10

References Quantitative Analysis of V C D Spectra of Proteins

1 T. A. Keiderling and P. Pancoska, in Advances in Spectroscopy, ed. R. E. Hester and R. J. Clark,

2 T. A. Keiderling, Practical Fourier Transformation Infrared Spectroscopy, ed. K. Krishnan and J. P.

3 P. Pancoska, S. C. Yasui and T. A. Keiderling, Biochemistry, 1991,30, 5089. 4 P. Pancoska, S. C. Yasui and T. A. Keiderling, Biochemistry, 1989,28, 5817. 5 P. Pancoska and T. A. Keiderling, Biochemistry, 1991,30,6885. 6 M. Urbanova, R. K. Dukor, P. Pancoska, V. P. Gupta and T. A. Keiderling, Biochemistry, 1991, 30,

7 S. C. Yasui, P. Pancoska, R. K. Dukor, T. A. Keiderling, V. Renugopalakrishnan, M. J. Glimcher and

8 P. Pancoska, L. Wang and T. A. Keiderling, Protein Sci., 1993, 2, 41 1. 9 M. Urbanova, P. Pancoska and T. A. Keiderling, Biochem. Biophys. Acta A, 1993,203,290.

10 V. Baumruk and T. A. Keiderling J . Am. Chem. SOC., 1993,115,6939. 1 1 C. C. LaBrake, L. Wang, T. A. Keiderling and L. W. M. Fung, Biochemistry, 1993,32, 10296. 12 J. P. Hennessey and W. C. Johnson, Biochemistry, 1981,20, 1085. 13 W. C. Johnson, Meth. Biochem. Anal., 1985,31,61. 14 N. Sreerama and R. W. Woody, Anal. Biochem., 1993,209,32. 15 A. Tuomadje, S. W. Alcorn and W. C. Johnson, Anal. Biochem., 1992,200,321,331. 16 A. Perczel, K. Park and G. D. Fasman, Anal. Biochem., 1992,203,83. 17 S. Y. Venyaminov, I. A. Baikalov, C-S. C. Wu and J. T. Yang, Anal. Biochem., 1991,198,250. 18 D. C. Lee, P. I. Haris, D. Chapman and R. C. Mitchell, Biochemistry, 1990,29, 9185. 19 F. Doussesau and M. Pezolet, Biochemistry, 1990, 29, 8771. 20 D. H. Byler and H. Susi, Biopolymers, 1986, 25,469. 21 W. Surewicz and H. H. Matsch, Biochim. Biophys. Acta, 1988,952, 115. 22 B. Wang and T. A. Keiderling, Appl. Spectrosc., submitted. 23 B. Wang, unpublished results. 24 T. A. Keiderling, B. Wang, M. Urbanova, P. Pancoska and R. K. Dukor, Faraday Discuss., 1994, 99,

25 J. T. Yang, C. S. C. Wu and H. M. Martinez, Meth. Enzymol., 1986, 130,208. 26 W. Kabsch and C. Sander, Biopolymers, 1983,22,2577. 27 R. K. Dukor and T. A. Keiderling, in Peptides 1988, Proceedings of the 20th European Peptide Sympo-

28 E. R. Malinowski and D. G. Howery, in Factor Analysis in Chemistry, Wiley, New York, 1970. 29 P. K. Simpson, A Review of Artificial Neural Systems 11: Paradigms, Applications and Implementations,

General Dynamics, San Diego, 1988. 30 P. Pancoska, E. Bitto, M. Urbanova, V. P. Gupta and T. A. Keiderling, Anal. Biochem., 1994, to be

submitted. 31 P. Bour and T. A. Keiderling, J . Am. Chem. SOC., 1993, 115,9602. 32 P. Pancoska, M. Blazek and T. A. Keiderling, Biochemistry, 1992,31, 10250. 33 W. K. Surewicz, H. H. Mantsch and D. Chapman, Biochemistry, 1993,32,389. 34 P. Pancoska, V. Janota, J. Nesetril and T. A. Keiderling, in preparation. 35 V. Janota, Thesis, Charles University, Prague, 1992.

Biomolecular Spectroscopy B, Wiley, Chichester, 1993, vol. 21, pp. 269-31 5.

Ferraro, Academic Press, San Diego, 1990, pp. 203-284.

10479.

R. K. Clark, J . Biol. Chem., 1990,265, 3780.

263.

sium, ed. E. Bayer and G. Jung, de Gruyter, Berlin, 1989, pp. 519-521.

Paper 4/06083K; Received 5th October, 1994

Publ

ishe

d on

01

Janu

ary

1994

. Dow

nloa

ded

by U

nive

rsity

of

Pitts

burg

h on

11/

10/2

014

05:3

0:38

. View Article Online


Recommended