+ All Categories
Home > Documents > The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between...

The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between...

Date post: 05-Oct-2016
Category:
Upload: milos
View: 217 times
Download: 2 times
Share this document with a friend
12
focal point MAX DIEM,KOSTAS PAPAMARKAKIS,JENNIFER SCHUBERT,BENJAMIN BIRD, MELISSA J. ROMEO, AND MILOS ˇ MILJKOVIC ´ DEPARTMENT OF CHEMISTRY AND CHEMICAL BIOLOGY NORTHEASTERN UNIVERSITY BOSTON, MA 02115 The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States INTRODUCTION O ver the past decade, new med- ical diagnostic methods have been developed by several re- search groups worldwide, based on infrared microspectroscopy and micro- scopic imaging (see, for example, the compiled references in a number of recent books 1–3 ). These methods can be applied both to tissue sections and individual exfoliated cells. The success of these methods in differentiating cancerous from normal tissues, as well as individual cancerous, precancerous, and normal cells, is due to two major factors. First, infrared microspectrosco- py monitors, in one measurement, a snapshot of the overall biochemical composition of an individual cell. This composition varies with a number of well-understood cell-biological process- es; thus, the cell’s division cycle, its maturation and differentiation, as well as a transition from normal to cancerous states can be monitored via a well- understood spectral measurement. This differs significantly from the standard cytopathological methodology, which relies on a visual inspection of cell morphology and tissue architecture and is, therefore, subjective in nature. The second factor for the success of spectral diagnoses is the fact that data can be acquired fairly rapidly: it takes about 500 ms to collect a good infrared micro-spectrum from a voxel of biolog- ical material. The size of such a voxel is typically about 12 3 12 3 5 lm 3 in the x, y, and z directions, where the lateral (x,y) dimension is determined by the diffrac- tion limit and the z direction is deter- mined by the thickness of the tissue section or the thickness of a cell. In the case of infrared micro-spectral imaging of human tissues, up to 100 000 indi- vidual voxel spectra are collected to create huge hyperspectral data sets, where the term ‘‘hyperspectral’’ implies spatially resolved data with distinct x and y coordinates, and spectral informa- tion from each x,y point. The analysis of the hyperspectral dataset is carried out by methods of chemometrics, 4,5 which detect small, but recurring differences, APPLIED SPECTROSCOPY 307A
Transcript
Page 1: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

focal pointMAX DIEM, KOSTAS PAPAMARKAKIS, JENNIFER SCHUBERT, BENJAMIN BIRD,

MELISSA J. ROMEO, AND MILOS MILJKOVIC

DEPARTMENT OF CHEMISTRY AND CHEMICAL BIOLOGY

NORTHEASTERN UNIVERSITY

BOSTON, MA 02115

The Infrared SpectralSignatures of Disease:

Extracting theDistinguishing Spectral

Features Between Normaland Diseased States

INTRODUCTION

Over the past decade, new med-ical diagnostic methods havebeen developed by several re-

search groups worldwide, based oninfrared microspectroscopy and micro-scopic imaging (see, for example, thecompiled references in a number ofrecent books1–3). These methods canbe applied both to tissue sections andindividual exfoliated cells. The successof these methods in differentiatingcancerous from normal tissues, as wellas individual cancerous, precancerous,and normal cells, is due to two majorfactors. First, infrared microspectrosco-py monitors, in one measurement, asnapshot of the overall biochemical

composition of an individual cell. Thiscomposition varies with a number ofwell-understood cell-biological process-es; thus, the cell’s division cycle, itsmaturation and differentiation, as well asa transition from normal to cancerousstates can be monitored via a well-understood spectral measurement. Thisdiffers significantly from the standardcytopathological methodology, whichrelies on a visual inspection of cellmorphology and tissue architecture andis, therefore, subjective in nature.

The second factor for the success ofspectral diagnoses is the fact that datacan be acquired fairly rapidly: it takesabout 500 ms to collect a good infraredmicro-spectrum from a voxel of biolog-

ical material. The size of such a voxel istypically about 12 3 12 3 5 lm3 in the x,y, and z directions, where the lateral (x,y)dimension is determined by the diffrac-tion limit and the z direction is deter-mined by the thickness of the tissuesection or the thickness of a cell. In thecase of infrared micro-spectral imagingof human tissues, up to 100 000 indi-vidual voxel spectra are collected tocreate huge hyperspectral data sets,where the term ‘‘hyperspectral’’ impliesspatially resolved data with distinct xand y coordinates, and spectral informa-tion from each x,y point. The analysis ofthe hyperspectral dataset is carried outby methods of chemometrics,4,5 whichdetect small, but recurring differences,

APPLIED SPECTROSCOPY 307A

Page 2: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

even in the presence of larger randomvariance. Similarly, for the analysis ofcells, data sets containing over 100 000spectra of individual cells have beencollected and analyzed via chemometricmethods. These efforts have shown thatthe two diagnostic methods discussedhere, spectral cytopathology (SCP, spec-tral analysis of cells) and spectralhistopathology (SHP, spectral analysisof tissue) are more sensitive techniquesthan standard cytopathology and histo-pathology in the sense that disease canbe detected objectively, earlier, on asmaller scale, and without the use ofstains and contrast agents.*

In the past, we have used multivariatechemometric analysis mostly as anunsupervised technique and have dis-carded the spectral information availablefrom chemometrics in favor of the tissuearchitectural information (in SHP) andcell classification (in SCP). However,the spectral information is related to thebiochemical changes used by the che-mometric methods to classify the dataand can shed light onto the biologicalprocesses that occur between normaland diseased states. In this paper, wepresent an approach to extract spectralinformation from the chemometric anal-ysis, and we present spectra that wereused by the algorithms for spectraldifferentiation. These spectra representthe accumulated signatures of all bio-chemical changes that occur betweennormal and diseased cells and thus maybe referred to as the ‘‘infrared spectralsignatures of disease’’. In general, thesespectra are complex superpositions ofbiochemical component spectra, but insome cases, the interpretation is straight-forward. These spectral signatures maybe relatable to the largest spectralchanges defined from proteomic studiesof diseases and will define an upper limitof the compositional change betweennormal and diseased cells.

This paper follows an Applied Spec-troscopy Focal Point article from theauthors’ laboratory a decade ago6 andwill highlight in a few examples theenormous progress achieved in theutilization of infrared microspectroscopy

as a medical diagnostic tool. Using twoexamples each for SHP and SCP, themethodology for signature extractionwill be introduced and the sensitivityof these two techniques will be demon-strated. For SCP, we demonstrate thatthe spectral signature along whichchemometrics distinguishes classes ofcells can sometimes be directly relatedto the presence or absence of onecompound. For SHP, two examples ofnon-malignant change will be presented:the activation of B-lymphocytes in alymph node, and the formation ofkeratin pearls in abnormal oral tissue.Hopefully, these methods will be used inthe future by other groups to comparethe spectral differences and pinpoint thebiochemical variations observed in anumber of diseases.

BACKGROUND

Instrumentation. All spectral datarecorded here were collected via one oftwo commercial Perkin Elmer (Shelton,CT) imaging infrared microspectrome-ters (PE Spectrum One/Spotlight 400) atthe Laboratory for Spectral Diagnosis(LSpD) at Northeastern University.These instruments utilize 16 individual,photoconductive HgCdTe detector ele-ments arranged in the focal plane. Dataacquisition rates of up to 170 pixels/s (at16 cm-1 resolution) can be realized;however, the data reported here werecollected at 4 or 8 cm-1 resolution withone level of zero-filling, eight co-addedscans for SHP, and two co-added scansfor SCP. Data were collected at a pixelresolution of 6.25 lm or 25 lm. In the PESpotlight instruments, the pixel resolu-tion (pixel size) is determined by theoptical magnification and can be selectedto be 50, 25, or 6.25 lm. The actualspatial resolution, as pointed out in theIntroduction, is determined by the dif-fraction limit and depends on the wave-length and the numerical aperture of theobjective. Using military resolution tar-gets, the actual spatial resolution of thePE 400 was determined to be approxi-mately 12 lm at 1250 cm-1. All datawere collected in transflection (reflec-tance/absorbance) mode on low-emissiv-ity (‘‘low-e’’) microscope slides (KevleyTechnologies, Chesterland, OH).

The instruments, including the micro-scope stage, are continuously purged

with dry air (-40 8C dew point) toreduce ambient water vapor. This is anextremely important step because with-out good purging, the most commonspectral distinguishing feature is that ofwater vapor. All data are stored in nativePE format (.fsm files) and processed off-line using software described below.

Sampling for Spectral Cytopathol-ogy and Spectral Histopathology.Samples of cells were prepared asfollows. Cellular samples were providedby Tufts University Medical Center(Boston, MA) on collection devicesimmersed in SurePatht (BD Diagnos-tics, Burlington, NC) fixative or collect-ed in house and placed immediately intothe same fixative solution. After suitablepurification, cells were spin-depositedonto low-e microscope slides via aCytospint centrifuge (Thermo Shandon,Pittsburg, PA) at a density of about 50cells/mm2, yielding a 5 mm diametersample spot. Given the size of the largestoral squamous cell (0.004 mm2), thiscell density is sufficiently low to preventcells from overcrowding. Spectral datawere collected in imaging mode from asample area of 16 mm2, resulting in adataset of 409 600 spectra. Subsequent-ly, each cellular spectrum was calculatedfrom a contiguous area occupied by onecell by co-adding between approximate-ly 20 to 100 individual pixel spectra,each covering about 4 3 10-5 mm2. Theco-addition process eliminates spectrafrom the edges of cells, which may becontaminated by dispersion artifacts orexcessive noise. The procedure of re-constructing cellular spectra from im-aged data is referred to as the ‘‘PapMap’’algorithm and has been submitted forpatent protection. After spectral dataacquisition, cells were stained usingconventional Pap7 stain and imaged at403 magnification for standard cytolog-ical diagnosis.

Tissue samples were sectioned fromparaffin-embedded tissue blocks using amicrotome to a thickness of 5 lm, de-paraffinized, and mounted on low-emicroscope slides. Spectral data werecollected in imaging mode of the PEinstruments; the acquisition of a 1 3 1mm2 tissue section at 6.25 lm pixelresolution takes about 40 minutes andyields a data set containing 25 600spectra.8

* The terms SCP and SHP, for the remainder ofthis paper, imply infrared microspectral dataacquisition followed by suitable methods ofchemometric analysis.

308A Volume 63, Number 11, 2009

focal point

Page 3: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

Computational Methods. All multi-variate chemometric calculations werecarried out on second-derivative spectra.Although the conversion of absorbancedata sets to second-derivative data setsdegrades the signal-to-noise ratio (S/N)of the data, well established advantagesof second derivatives—the removal ofbaseline effects and the reduction of theband half-width, both of which facilitatethe automatic comparison of spectra—compensate for the loss of spectralquality.9 For the data reported here, thespectral range from 900 to 1800 cm-1

was utilized, which contains the mostsignificant spectral fingerprints of bio-chemical components. Second-deriva-tive spectra in this data range werevector normalized to account for vari-able thickness of cells and tissue sec-tions.

Principal Component Analysis forSpectral Cytopathology. Spectral datasets from cytological samples wereprocessed via principal component anal-ysis (PCA), using the PCA algorithm ascontained in the PLS toolbox runningunder MATLAB. PCA decomposes thespectra in a data set into ‘‘principalcomponents’’ (PCs) or ‘‘loading vec-tors’’, which are based on the variance inthe data set. These ‘‘loading vectors’’ arerelated to the eigenvectors of thecovariance matrix and contain (seebelow), in descending order, less andless of the variance: the first PCexpresses the mean of all spectra,whereas the second PC expresses themost significant variations in the dataset. Subsequently, PCA reconstructs theoriginal spectra as linear combinationsof the PCs. A PCA ‘‘scores plot’’, suchas the one shown in Fig. 1A, depicts thecontribution of PC2 and PC3 to theoriginal spectra. Frequently, data fromdifferent classes of cells split along thePC2 axis. This indicates that PC2contains a large fraction of the biochem-ical variation detected by PCA if onlytwo classes of cells are in the data set.

There are, of course, other methods ofdata analysis. However, we prefer PCAsince it is, at this level, a completelyunsupervised method to establish wheth-er or not spectra of cells group intoclasses due to cell type, donor identity,or disease, among other factors. For amore detailed description of PCA, the

reader is referred to books on chemo-metrics, for example, Adams.4

For PCA, the entire spectral data set,containing n spectra, is written as amatrix S, in which each column repre-sents one spectrum S(m) of m intensitydata points.9 We assume that the spacingbetween data points is constant over theentire spectrum; therefore, we need todeal with intensity values only. Theintensity correlation matrix is construct-ed from the spectral matrix S accordingto

C=SST ð1Þ

C is an (m 3 m) matrix, in which the off-diagonal terms Ckl are the correlationsbetween intensity values at wavelengthsmk and ml, summed over all spectra.Diagonalization of the correlation ma-trix, according to

PTCP=K ð2Þ

yields the eigenvector matrix P, fromwhich ‘‘principal components’’ Z arecalculated according to

Z=SP ð3Þ

The eigenvalues K express the variancecontained in each of the principalcomponents. Thus, from the viewpointof linear algebra, the principal compo-nents are the original spectra expressedin a rotated coordinate system, which isbased on the maximum variance of theoriginal spectra. Subsequently, we ex-press each of the original spectra S(m) interms of the principal components via

S=aZ ð4Þ

where the ‘‘scores’’, a, are given by

a=PT ð5Þ

For spectral data sets of individual cells,one finds that a large fraction of the totalspectral variance is contained in the firstfew ‘‘loading vectors’’. Typically, five toeight loading vectors contain more than99% of the variance. The score matrix adetermines how much each principalcomponent contributes to each spec-trum. Similar spectra exhibit similarscores, a, which may be used todiscriminate, or group, spectra. This isaccomplished by plotting the values ai

and aj (that is, the contribution of PCiand PCj to each spectrum) against eachother, where each data point representsone spectrum. If grouping is observed,there are quantifiable and significantvariations in the spectra.

In general, the PCs are difficult tointerpret spectrally since they are basedon the variance in the data set anddepend on the number of distinguishableclasses and the number of spectra ineach class. Methods have been reportedin the literature to convert the PCs tocomponent spectra,10 but this workswell only in the case of spectral mixturesof few, spectrally distinct components.For biochemical samples, this criterionis not fulfilled, and different methodswill be presented below to extractspectrally useful information from thePCs. The situation is further complicatedby the fact that the PCs are second-derivative-based (see second footnote).

Hierarchical Cluster Analysis forSpectral Histopathology. Hierarchicalcluster analysis (HCA) classifies databased on the smallest ‘‘distances’’ be-tween spectra, where the term ‘‘distanc-es’’may imply Euclidean or Mahalanobisdistances4 or correlation coefficients. Inthe latter case, the spectral correlationmatrix, C0, is computed according to

C0=STS ð6Þ

This matrix is an (n 3 n) matrix (n is thenumber of spectra in the data set) inwhich the off-diagonal terms Cij are thecorrelations between spectra i and j,summed over all data points of thespectral vector S(mN). The correlationmatrix is subsequently searched for thetwo most similar spectral coefficients,i.e., two spectra i and j for which thecorrelation coefficient Cij is closest tounity. Subsequently, the columns i and jof the correlation matrix are merged, andthe correlation coefficient of this newobject and all other spectra is recalcu-lated. This reduces the dimensionality ofthe correlation matrix by 1. The processof merging is repeated, and a member-ship list is kept that accounts for allindividual spectra that are eventuallymerged into a cluster. We have usedWard’s algorithm11 for the process ofmerging of columns of the correlationmatrix. HCA was carried out using theHCA functions in the CytoSpect pro-

APPLIED SPECTROSCOPY 309A

Page 4: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

gram, which was written specifically forthe multivariate analysis of spectralhypercubes from infrared and Ramanmicrospectroscopy.12

Once all spectra are merged into a fewclusters, color codes are assigned to eachcluster, and the coordinates from whicha spectrum was collected is indicated inthis color. In this way, pseudo-colorimages are obtained that are basedstrictly on spectral similarities. Meancluster spectra may be calculated thatrepresent the chemical composition ofall spectra in a cluster.

RESULTS

Methods for ‘‘Signature Extrac-tion’’: Interpreting the DifferentiatingFeatures in Principal ComponentAnalysis and Hierarchical ClusterAnalysis. We now turn to the method-ology employed in this work to extractthe spectral signatures that are used bythe data analysis algorithms to classifyspectra. We shall use two examples ofSCP to demonstrate the procedure forthe signature extraction and to show thespectral signatures of dysplastic andcancerous changes. Subsequently, asimilar discussion of the procedures toextract spectral differences for SHP willbe presented.

Signature Extraction in SpectralCytopathology. Figure 1A presents aPCA scores plot of cervical cell spectracollected from women not using hor-monal contraceptives at different pointsof the menstrual cycle. It is well knownthat in squamous cervical cells (i.e., cellsfrom the ecto-cervix), the glycogenconcentration varies under the influenceof hormones and generally reaches amaximum around ovulation.13 However,glycogen accumulation in cells is sub-ject to a number of variables and cannotbe used for diagnostic purposes. Glyco-gen manifests itself in cellular spectra bythree intense peaks at approximately1025, 1080, and 1150 cm-1 with a

distinct and reproducible intensity pat-tern (see Fig. 1C).

Since the glycogen variations arestrong and subject to uncontrollablevariables, we generally omit the glyco-gen region and use the 1480 to 1700cm-1 (‘‘protein’’) region for diagnosticpurposes. However, the analysis shownin Fig. 1A utilizes the entire ‘‘finger-print’’ region from 900 to 1800 cm-1.This figure, in which each symbolrepresents the spectrum of one individ-ual cell, depicts a typical separation ofspectra according to an underlyingvariation in the cells’ spectra. In thiscase, the cells were exfoliated from sixsubjects, depending on the last menstru-al period (LMP): three women were atthe beginning of the menstrual cycle(approximately 4 days since the LMP),and three were at the midpoint of thecycle (approximately 14 days since theLMP).

The separation of the two classes,shown in Fig. 1A, is by no meansperfect but represents typical varianceobserved in SCP. First and foremost,there are always cells that have muchhigher or much lower glycogen contentthan the majority of cells in a sample;the reasons for this variation are notunderstood. Secondly, the LMP param-eter is only an approximate indicator ofhormonal status, since the length of themenstrual cycle may vary betweensubjects. Third, there may be other,hitherto unrecognized changes in cellu-lar spectra with maturation that manifestthemselves as changes along PC3.Nevertheless, there is a distinctionbetween early and mid-cycle cells de-picted in this graph. Logically thequestion arises as to what the spectraldifferences are that allow PCA to groupthe spectra into two classes. The PCAalgorithm presents, in order of decreas-ing variance, the loading vectors alongwhich the differentiation of spectraoccurred.

The interpretation of loading vectorsis difficult since they are derived from

the eigenvectors of the covariancematrix (see Eq. 4), which may changein an unpredictable manner when thedata set parameters are changed. Forexample, the PCs will change dependingon whether a data set contains 100spectra of class A and 1000 spectra ofclass B, or 1000 spectra of class A and100 spectra of class B, since the totalvariance in the data set changes. Fur-thermore, the loading vectors changeunpredictably if another class, C, ofspectra is introduced into a data set,since the variance between classes Aand C may be larger than the variancebetween classes A and B.

However, the data set to be subjectedto PCA can be configured such that theloading vectors have interpretable phys-ical meaning. To achieve this,

(1) the input data set must be restrictedto two classes of spectra,

(2) the spectra in each data set must beappropriately normalized,

(3) there should be equal numbers ofspectra in each class, and

(4) the data should separate mostlyalong PC2 (loading vector 2).

Under these conditions, PC2 is the mostdifferentiating feature, or the differencespectrum between the classes. This isdue to the fact that for the highly similarspectra of cells, PC1 is proportional tothe mean spectrum, whereas PC2 con-tains the largest variance. This isdemonstrated in Fig. 1B, which showsa comparison between the second load-ing vector and the difference spectra ofthe data set shown in Fig. 1A.� Thedifferences in the second loading vectorand the difference spectrum in theprotein region (1480 to 1700 cm-1) are

!FIG. 1. (A) PCA ‘‘scores’’ plot of cervical cells (6 subjects) early midway in the menstrual cycle. (B) Second loading vector(red) from PCA, and (black) mean second-derivative difference spectra of cellular spectra from dataset shown in (A). (C)Integrated second loading vector (red), glycogen (black), and cellular protein (blue) reference spectra. (D–G) Highmagnification (403) micrographs of typical cells contained in the dataset shown in Fig. 1A.

� The difference spectrum shown in Fig. 1B wasobtained by subtracting the mean second-derivative, vector-normalized spectra of thetwo classes. The second-derivative spectraeliminate the background features, whichmakes the subtraction of the original spectradifficult and sometimes meaningless. This willbe further discussed in the next section.

310A Volume 63, Number 11, 2009

focal point

Page 5: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

APPLIED SPECTROSCOPY 311A

Page 6: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

most likely due to the fact that there issome variance along PC3 in the dataset,as indicated in Fig. 1A.

The interpretation of the loadingvector is complicated because it is basedon second-derivative spectra. The ne-cessity of utilizing second derivativeshas been pointed out above. However,subsequent integrations of the second-derivative loading vector yield a spectraltrace that may be interpreted as thespectral component along which PCAdistinguishes the spectral classes. This‘‘signature spectrum’’ is shown in Fig.1C, along with pure glycogen andprotein reference spectra.2 It is interest-ing to note that the glycogen and proteinreference spectra and the ‘‘signaturespectrum’’ differ in frequencies and bandshapes: an inspection of the 900 to 1200cm-1 region reveals that the lowestfrequency glycogen peak at approxi-mately 1025 cm-1 undergoes a shifttoward lower wavenumber, indicatingthat the spectral features of pure glyco-gen and glycogen in a cell are somewhatdifferent. This may be attributed todifferent association or hydration ofglycogen in both cases.

The ‘‘signature spectrum’’ also con-tains some negative, protein-relatedfeatures between 1480 and 1700 cm-1.These features coincide with the lowwavenumber shoulders on the amide Iand amide II peaks and can be associ-ated with changes in the protein com-position of the cells as a consequence ofmaturation. Similar behavior, to bediscussed in the next section, was seenfor different examples in which strongphosphate vibrations were negativelycorrelated with protein features. Theimages of stained cells, shown in Figs.1D through 1G, are from cells in thedataset and demonstrate that the differ-ences detected by SCP cannot bediscerned in visual cytopathology.

We now turn to the extraction of thesignature spectra of oral dysplasia andoral cancer in SCP. We have recently

shown14 that the spectra of cells fromthe oral cavity depend on the region ofcell exfoliation: spectra of cells from thecheeks (buccae), the gums (gingivae),and the roof of the mouth (palate) areidentical, whereas cells from the tongueand the floor of the mouth (from underthe tongue) are slightly different incomposition. Morphologically, all theaforementioned oral cell types are indis-tinguishable. Thus, in order to diagnosedisease, cells from the same anatomicalstructures of the mouth should be used.

Figure 2 shows results for cancerouscells collected from the tongue. Com-pletely analogous datasets were collect-ed for other areas of the oral cavity aswell.14 The dataset shown in Fig. 2consists of three classes of spectra:

(1) normal cells from four volunteers,with cell morphologies exemplifiedby Fig. 2B. These cells are repre-sented by the blue cluster of cells inFig. 2A;

(2) cancerous cells collected directlyfrom the lesion, with morphologiesrepresented by Fig. 2D. These cellsare represented by the red cluster inFig. 2A; and

(3) cells with normal morphology (Fig.2C), from the tongue of the cancerpatients, but adjacent to the cancer-ous lesion. These cells are represent-ed by the green cluster in Fig. 2A.

(Images of cells were obtained, asdescribed above, from cells that werestained subsequent to spectral dataacquisition).

Although the majority of cells col-lected from the tongues of the cancerpatients showed normal morphology(Fig. 2C), SCP detects a distinct spectralchange between these cells and trulynormal cells. In fact, there is a progres-sion in the location within the PC2/PC3plot for the tongue cells from normalvolunteers, to the tongue cells withnormal morphology from cancer pa-tients, to the actual cancer cells (Fig.

2A). The biochemical composition ofthe cells from cancer patients, which stillexhibit normal morphology, is differentfrom that of truly normal cells. Thesechanges in morphologically normal cellsfrom cancer patients have been termed‘‘malignancy associated changes’’(MACs) in the medical literature.15,16

Thus, it appears that SCP differentiatesbetween normal and MAC cells andnormal from cancerous cells of the samedisease type.

In order to extract the signatures ofdisease, the data was split into normalversus MAC and normal versus cancerdatasets, because the loading vectors canonly be interpreted reliably if there areonly two classes of cells in the data set(see above). Furthermore, for the extrac-tion of the ‘‘signature of disease’’spectra, equal numbers of spectra ineach of the classes were utilized. Theresulting signature spectra for MAC andcancer are shown in Figs. 2E and 2F. Inboth these plots, the red trace representsthe second loading vector and the blacktrace represents the mean second-deriv-ative difference spectra. The ‘‘signatureof disease spectra’’ are similar in bothcases, indicating that the spectral chang-es between normal/MAC and normal/cancer are closely related.

The integrated ‘‘signature spectra’’ areshown in Fig. 2G (red trace: integratedloading vector, black trace: integrateddifference spectrum) for the cancer cells(Fig. 2F). These ‘‘signature spectra’’indicate that the most prominent differ-ences between normal and cancerouscells are protein features, although thereare significant phosphor-ester vibration-al features at 1230 and 1080 cm-1,presumably from DNA, RNA, or phos-pholipids. The magnitude of the inte-grated loading vector is indeterminatesince the eigenvectors, and consequent-ly, the PCs, depend on the total variancein the dataset. However, the second-derivative difference spectra may beused to estimate the magnitude of the

!FIG. 2. (A) PCA ‘‘scores’’ plot of cells exfoliated from the tongues of normal volunteers (blue), from cells with normalmorphology with MAC from cancer patients (green), and from cancerous cells (red). (B–D): High magnification (403)micrographs of typical cells from the three classes shown in (A). (E) Second loading vector (red) and mean second-derivative difference spectra (black) for normal and MAC cells. (F) Second loading vector (red) and mean second-derivativedifference spectra (black) for normal and cancerous cells. (G) Integrated loading vector and difference spectra for normaland cancerous cells.

312A Volume 63, Number 11, 2009

focal point

Page 7: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

APPLIED SPECTROSCOPY 313A

Page 8: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

‘‘signature spectrum’’ shown in Fig. 2G,which is between 5 and 10% of theoverall spectral intensity.

Signature Extraction in SpectralHistopathology. The signature extrac-tion for tissue analyzed by SHP can becarried out in a similar fashion. In theprocedural example discussed next, a‘‘signature spectrum’’ of lymphocyteactivation will be extracted. Lympho-cyte activation is a primary response ofthe immune system to infection or otherdiseases. B-lymphocytes are presentedin the lymph nodes with the antigen andare trained to recognize it. Subsequently,they start to proliferate as activatedlymphocytes in the secondary lymphnode follicles. The primary follicles areassumed to contain mostly non-activatedlymphocytes. The activation of lympho-cytes is a highly complex process, andthe associated pathways and molecularchanges are just now being under-stood.17

Here, we report a method to extractthe difference spectrum for non-activat-ed and activated B-lymphocytes. To thisend, 50 spectra each from the tworegions of a lymph node section wereselected, which were identified by HCAand classical histopathology to be due tonon-activated and activated B-lympho-cytes in the primary and secondaryfollicles (Fig. 3A). These regions ofinterest are shown by the two arrows inFig. 3A and are represented by theyellow (for activated) and pink (for non-activated) regions.

Principal component analysis wascarried out for these spectra (Fig. 3B),and separation of the two classes ofspectra was found to occur mostly alongPC2. The resulting second loading vectoris shown in Fig. 3C, red trace. In addition,the mean second-derivative differencespectrum was calculated (Fig. 3C, blacktrace). For subtraction of HCA meancluster spectra, it is advantageous to workwith normalized second-derivative spec-tra, since subtraction of the originalabsorbance spectra is plagued by back-

ground fluctuations, which are eliminatedfor second-derivative difference spectra.Below 1400 cm-1, the second loadingvector and the difference spectrum arenearly identical, but slightly differenttraces are observed in the protein region.This indicates that PC2 does not containall the distinguishing features betweenthe spectral classes, which is also indi-cated by the splitting along PC3.

The integrated difference spectrum,shown in Fig. 3D, represents the bio-chemical differences between the acti-vated and non-activated B-lymphocyte.Inspection of this trace reveals that the900–1300 cm-1 region of this spectrumis dominated by nucleic acid features,whereas the protein spectral region isdominated by a negative broad band.The implications of these signaturefeatures will be discussed in the Discus-sion section.

Similarly, we have applied the samestrategy of signature extraction to aregion of an abnormal tissue section inwhich the formation of keratin-pearlsoccurred. Keratin-pearls, although non-malignant, occur frequently within ma-lignant changes of tissue, and theirdetection by spectroscopic techniqueshas been reported by others.18 Spectrafrom normal squamous oral epitheliumand from an abnormal tissue section thatcontained keratinizing areas were select-ed and processed as described above.These tissue areas are shown by thearrows in Fig. 4A. Panel a shows thetypical layers of normal stratified squa-mous epithelium (blue, gray, and darkgreen) overlaying the stroma (pink andlight green). Panel b of Fig. 4A shows atissue section with histopathologicaldiagnosed oral cancer, indicated by thedisorganized epithelial structure withnumerous keratin deposits (gray). Thekeratin deposit used for this analysis isshown by the arrow.

Principal component analysis revealsa very clean separation of the tissuetypes (see Fig. 4B), and the secondloading vectors and the difference spec-

tra, both shown in Fig. 4C, are virtuallyidentical and show prominent signals inthe protein region only. This indicatesthat the tissue regions selected differmostly by one spectral component,keratin. However, the integrated loadingvector did not yield a pure keratinspectrum, but a spectrum that representsthe difference spectrum between thenormal cellular protein spectrum andkeratin. Thus, this result implies that thetissue regions do not just differ by theover-expression of keratin, but a signif-icant decrease in other proteins, suchthat a difference spectrum between thetwo protein contributions is observed. Infact, the difference spectra betweenvarious protein secondary structuresappears quite similar to the spectrumshown in Fig. 4D.

DISCUSSION

In this Focal Point article, we presentfor the first time the spectral features thatare used by chemometric methods todistinguish between tissue types, cellularfeatures, and disease. These signaturespectra will open the opportunity tointerpret the variations of spectral fea-tures in terms of the gross molecularchanges that occur inside cells. Anotherapproach to interpret spectral differencesquantitatively has been in use by theresearch group of N. Stone at CranfieldUniversity in Bedfordshire, UK.19 Inthis approach, spectra of tissues ofdifferent disease states were decom-posed into spectral contributions frompure reference compounds. It was foundthat the distribution of these referencespectra underwent significant and repro-ducible changes between different stagesof disease and allowed reliable, in vivodiagnoses of bladder and esophagealtissue. This approach, however, worksbest if a very comprehensive set ofreference compounds is used; further-more, it assumes that the final spectrumof a cell or tissue pixel obeys a linearmixing model. This latter point is of

!FIG. 3. (A) Infrared spectral map of lymph node tissue section, constructed via HCA. The arrow in panel (a) points to an areafrom which spectra of non-activated B-lymphocytes were exported, whereas the arrow in panel (b) points to an area fromwhich activated B-lymphocyte spectra were exported. (B) PCA ‘‘scores’’ plot of activated vs. non-activated lymphocytes. (C)Second loading vector (red) and mean second-derivative difference spectra (black) for activated vs. non-activated B-lymphocytes. (D) Integrated difference spectra for activated vs. non-activated lymphocytes.

314A Volume 63, Number 11, 2009

focal point

Page 9: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

APPLIED SPECTROSCOPY 315A

Page 10: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

particular interest, since the spectralsignatures derived here seem to suggestthat the observed spectra of cells andtissues are not always a linear combina-tion of basis spectra. This is demonstrat-ed, for example, by the comparison ofthe pure glycogen absorption spectrumwith the integrated second loadingvector shown in Fig. 1C, which suggeststhat the 1025 cm-1 glycogen bandundergoes intensity and frequency shiftsin a cellular environment. Furthermore,some components, notably DNA, arenever found inside a cell without beingcomplexed to protein. Thus, DNA/histone spectra, rather than pure DNAspectra, should probably be used as areference spectrum to account for thespectral changes DNA experiences uponbinding to protein.

Similarly, it appears that proteinspectra undergo quite significant chang-es that may be positively or negativelycorrelated to changes in other cellularcomponents. In the example of thecervical cells shown in Fig. 1, anincrease in glycogen contribution iscorrelated to a decrease in a proteincomponent that exhibits low amide I andII frequencies. Significant shifts inamide I and II frequencies, and/or theappearance of shoulders on these peaks,have been reported for a number ofcellular effects or disease states.20,21 Inthe method of signature extractionreported here, no a priori knowledgeof protein spectral change is required,whereas in an approach that usesspectral basis vectors from referencecompounds, proteins with different sec-ondary structures need to be included forproper identification of the spectralchanges.

We now turn to the discussion of thecytological results of cells exfoliatedfrom the oral cavity. The similarity ofthe spectral changes observed for MACsand oral cancer seems to indicate aprogression of disease in terms of thespectral changes. From a diagnosticviewpoint, this observation is extremelyimportant. First, the similarity of theMAC and cancer spectra sheds lightonto the elusive subject of MACs, whichare defined as ‘‘. . . subtle alterations inthe morphology and nuclear texture ofcells in the vicinity of a malignantlesion.’’ MACs have been implicated in

the high rate of recurrence of tumors,particularly in the oral cavity, which is15 to 20 fold higher15,16 than theoccurrence of a primary tumor. The factthat SCP detects subtle changes incellular composition before morpholog-ical changes occur in the majority ofcells has enormous implications forscreening for oral (and other epithelial)cancers. We have seen completelyanalogous results for exfoliated cervicalcells,22 and Mahadevan-Jensen at Van-derbilt University has observed similareffects by in vivo Raman diagnostics ofcervical tissue.23

Secondly, the observed changes be-tween normal and cancerous spectraappear quite similar to the spectralsignature observed for viral infections.14

Virally infected cells were collecteddirectly from a cold sore, caused byinfection with the herpes simplex virus,and from the surrounding tissue. Therealso was a linear progression of diseasedetectable between cells collected di-rectly from the cold sore, which couldbe diagnosed as infected cells, and fromcells collected in the vicinity of thelesion, which still had normal morphol-ogy. The similarity of the signature ofviral disease and the cancer signaturesshown in Fig. 2 lead to the question ofwhether or not the oral MAC and cancerare viral in origin, in particular, since theEpstein–Barr virus and the HPV virushave been implicated in oral can-cers.24,25 Furthermore, cervical dyspla-sia is nearly always associated withhuman papilloma virus (HPV), whichsuggests a link between oral cancer andviral infections. On the other hand, theobserved spectral signatures of viralinfection, MAC, and cancer could bethe cellular response to any seriousdisease. The results presented here donot determine the origin of disease, butindicate the most significant change incellular composition brought on bydisease.

Finally, we turn to the discussion ofthe spectral changes observed in SHPfor lymphocyte activation. This is a verycommon biological process in whichantigen-presenting cells interact with B-lymphocytes, which subsequently pro-liferate in the secondary lymphoidfollicles (germinal centers) to produceactivated B-lymphocytes specifically

‘‘trained’’ to recognize the antigen. Wefirst reported the observation of distinctprimary and secondary germinal centersin lymph nodes in 2004,26 which wasconfirmed later for activated and non-activated lymphocytes in the spleen.27

Up to now, the spectral signature oflymphocyte activation was not known.Figure 3D shows the signature spectrumof lymphocyte activation. Nearly iden-tical signature spectra were obtainedbetween non-activated and activatedmouse lymphocytes and for humanlymphocytes from axillary lymphnodes. In the 900 to 1300 cm-1 region,the spectral trace appears quite similarto DNA/protein spectral features report-ed earlier.6 Thus, one aspect of lym-phocyte activation can be characterizedby an increase in DNA spectral features.However, it should be noted that theDNA signals in the signature spectrumare shifted by approximately 10 cm-1

toward lower wavenumber, indicatingsome significant change in local envi-ronment upon activation. This againpoints to the difficulties of using fixedreference spectra to interpret complexand interacting mixtures of biologicalmolecules. The region above 1500cm-1 in the signature spectrum (Fig.3D) is superimposed on a broad nega-tive peak on which the amide I and IIbands are barely recognizable, indicat-ing that lymphocyte activation is asso-ciated with a large change in theproteome of the cells, in addition tochanges in DNA/RNA abundance andstructure. In contrast, the signaturespectrum of keratinization, shown inFig. 4D, exhibits sizeable features onlyin the amide I spectral region. Thisimplies that the overall composition ofthe tissue pixels changes in favor ofkeratin, and at the expense of otherproteins, such that a true differencespectrum is observed. This differencespectrum is not readily amenable tospectroscopic interpretation but is quitesimilar to the protein difference oneobtains when subtracting a sheet proteinfrom a helical protein, or any otherprotein spectra with strongly differentamide I peak positions.‘‘Signature of disease’’ spectra have

been obtained for a number of cases inboth SCP and SHP, which will bereported at a later date. We believe that

316A Volume 63, Number 11, 2009

focal point

Page 11: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

FIG. 4. (A) Infrared spectral maps of oral tissue sections, constructed via HCA. (a) Normal section, showing layers of theepithelium (dark blue, gray, and dark green) and the underlying stroma (red, green). Light blue areas are due to very sparsetissue and may be ignored. The arrow points to an area from which spectra of normal epithelium were exported. (b)Cancerous section, showing disorganized epithelial structure with numerous keratin deposits (gray). The arrow points to anarea from which spectra of keratinized epithelium were exported. (B) PCA ‘‘scores’’ plot of normal vs. keratinized epithelium.(C) Second loading vector (red) and mean second-derivative difference spectra (black) for normal vs. keratinized epithelium(the traces are off-set for clarity). (D) Integrated difference spectra for normal vs. keratinized epithelium.

APPLIED SPECTROSCOPY 317A

Page 12: The Infrared Spectral Signatures of Disease: Extracting the Distinguishing Spectral Features Between Normal and Diseased States

these ‘‘signature spectra’’ contain con-centrated, and therefore, more interpret-able, spectral information. This may beexplained as follows. In an individualcell, for example, there will be biochem-ical components that are more or lessunchanged between a normal and adiseased cell, whereas other proteincomponents may be up- or down-regulated by disease. The method ofspectral signature extraction discards theinvariant spectral information and em-phasizes the spectral changes. Thus, webelieve that the ‘‘signature spectra’’ willbe even more sensitive than the overallspectra as a diagnostic tool.

CONCLUSION

A method to extract the spectralchanges responsible for classificationof spectra in SCP and SHP is presented.These spectral changes can be directlyrelated to variations in chemical com-position of the voxels from which thespectra were collected. The variations inchemical composition, in turn, may bedue to one or a few compounds, such asthe glycogen or keratin in the exampleslisted above. In general, however, thesevariations present complex spectral pat-terns, due to the fact that the spectrasample a superposition of all composi-tional changes. This is further indicatedin that the spectral signature of onecompound, e.g., glycogen, is presentedas a positive pattern in the signaturespectra, whereas the spectral patterns ofother components (e.g., protein) arerepresented by a negative contributionin the same spectrum. This indicates thatglycogen and protein contributions areanti-correlated. The shift in protein and

glycogen band positions indicate thatthe signature spectra do not just indicateabundance (for example, a cell withmore glycogen has less total protein),but detect changes in spectra that aredue to the interactions of the compo-nents.

ACKNOWLEDGMENTS

Partial support of this work by a grant (CA090346) from the NIH (to M.D.) is gratefullyacknowledged. The authors wish to express theirgratitude to the staff at Tufts University MedicalCenter (Ms. Kristi Bedrossian and Dr. Nora Laver)for their cytological diagnoses.

1. P. Lasch and J. Kneipp, Biomedical Vibra-tional Spectroscopy (Wiley-Interscience,Hoboken, NJ, 2008), pp. 121–147.

2. M. Diem, P. R. Griffiths, and J. M. Chalmers,Vibrational Spectroscopy for Medical Diag-nosis (John Wiley and Sons, Chichester, UK,2008).

3. R. Salzer and H. W. Siesler, Infrared andRaman Spectroscopic Imaging (Wiley-VCHVerlag, Weinheim, Germany, 2008).

4. M. J. Adams, Chemometrics in AnalyticalSpectroscopy (Royal Society of Chemistry,Cambridge, 2004).

5. A. de Juan, M. Maeder, T. Hancewicz, L.Duponchel, and R. Tauler, in ChemometricTools for Image Analysis, R. Salzer and H. W.Siesler, Eds. (Wiley-VCH, Weinheim, 2009),pp. 65–106.

6. M. Diem, S. Boydston-White, and L. Chir-iboga, Appl. Spectrosc. 53, 148A (1999).

7. G. N. Papanicolaou and H. F. Traunt, Am. J.Obstet. Gynecol. 42, 193 (1941).

8. B. Bird, M. J. Romeo, N. Laver, and M. Diem,J. Biophotonics 2, 37 (2009).

9. M. J. Romeo, B. Bird, S. Boydston-White, C.Matthaus, M. Miljkovic, T. Chernenko, andM. Diem, ‘‘Infrared and Raman Micro-spec-troscopic Studies of Individual Human Cells’’,in Vibrational Spectroscopy for MedicalDiagnosis, M. Diem, P. Griffiths, and J.Chalmers, Eds. (J. Wiley-Interscience, Chi-chester, UK, 2008), pp. 27–70.

10. X. Liang, J. E. Andrews, and J. A. de Haseth,Anal. Chem. 68, 378 (1996).

11. J. H. Ward, J. Am. Stat. Assoc. 58, 236(1963).

12. P. Lasch, www.Cytospec.com.

13. M. J. Romeo, B. R. Wood, and D. McNaugh-ton, Vib. Spectrosc. 28, 167 (2002).

14. K. Papamarkakis, B. Bird, J. M. Schubert,M. Miljkovic, K. Bedrossian, N. Laver, andM. Diem, Lab. Invest., paper submitted(2009).

15. G. R. Ogden, J. G. Cowpe, and A. J. Wight, J.Oral Pathol. Med. 26, 201 (1997).

16. G. R. Ogden, J. G. Cowpe, and M. W. Green,Cancer 65, 477 (1990).

17. K. A. Frauwirth and C. B. Thompson, J. Clin.Invest. 109, 295 (2002).

18. C. Schultz, K.-Z. Liu, P. Kerr, and H. H.Mantsch, Oncol. Res. 10, 277 (1998).

19. N. Stone, C. Kendall, and H. Barr, ‘‘RamanSpectroscopy as a Potential Tool for EarlyDiagnosis of Malignancies in Esophageal andBladder Tissues’’, in Vibrational Spectroscopyfor Medical Diagnosis, M. Diem, P. Griffiths,and J. Chalmers, Eds. (J. Wiley-Interscience,Chichester, UK, 2008), pp. 203–230.

20. S. Boydston-White, M. J. Romeo, T. Cher-nenko, A. Regina, M. Miljkovic, and M.Diem, Biochim. Biophys. Acta 1758, 908(2006).

21. N. Jamin, L. Miller, J. Moncuit, W. H.Fridman, P. Dumas, and J. L. Teillaud,Biopolym. (Biospectrosc.) 72, 366 (2003).

22. J. M. Schubert, K. Papamarkakis, B. Bird, M.Miljkovic, K. Bedrossian, N. Laver, and M.Diem, Analyst, paper submitted (2009).

23. U. Utzinger, D. L. Heintzelmann, A. Maha-devan-Jansen, A. Malpica, M. Follen, and R.Richards-Kortum, Appl. Spectrosc. 55, 955(2001).

24. S. Porter and A. Waugh, Brit. Dental J. 188,366 (2000).

25. J. Klozar, V. Kratochvil, M. Salakova, J.Shmahelova, E. Vesela, E. Hamsikova, J.Betka, and R. Tachezy, Eur. Arch. Otorhino-laryngol. 265, S75 (2008).

26. M. J. Romeo and M. Diem, Vib. Spectrosc.38, 115 (2005).

27. C. Krafft, R. Salzer, G. Soff, and M. Meyer-Hermann, Cytometry, Part A 64A, 53 (2005).

318A Volume 63, Number 11, 2009

focal point


Recommended