+ All Categories
Home > Documents > Discrimination of marine phytoplankton species through the ...

Discrimination of marine phytoplankton species through the ...

Date post: 13-Feb-2017
Category:
Upload: hoangnhi
View: 218 times
Download: 0 times
Share this document with a friend
14
Journal of Plankton Research Vol.18 no.7 pp.1225-1238.19% Discrimination of marine phytoplankton species through the statistical analysis of their flow cytometric signatures M.R.Carr, G.A.Tarran and P.H.Burkill Plymouth Marine Laboratory, Prospect Place, West Hoe, Plymouth PL1 3DH, UK Abstract. Flow cytometry is a research technique for the rapid analysis of phytoplankton abundance and distribution in marine waters. Although the technique is inherently much faster than optical microscopy for counting phytoplankton. its capability for analysing individual taxa is restricted, due largely to the lack of suitable data analysis protocols. These protocols, which use univariate and bivar- iate plots, can typically differentiate a maximum of three or four phytoplankton taxa under laboratory conditions. We present here two multivanate statistical techniques, quadratic discriminant analysis (QDA) and canonical variate analysis (CVA). used to identify 32 species of phytoplankton from flow cytometric data. CVA was shown to be a useful graphical technique for analysing and displaying data, while QDA was successful at discriminating over two-thirds of the phytoplankton species, with classifi- cation rates >70%. QDA was also shown to be more than two orders of magnitude faster than conven- tional flow cytometric analyses for discriminating and enumerating phytoplankton species. We discuss the potential of multivariate analysis, and ways of developing these techniques for the detailed analysis of complex natural phytoplankton populations in the marine environment. Introduction The distribution and abundance of phytoplankton have traditionally been deter- mined from samples, collected and preserved at sea, and subsequently analysed in the laboratory by optical microscopy (Colebrook, 1960). Recently, however, pro- tocols have been developed to analyse plankton populations at sea byflowcyto- metry (Chisholm etal., 1988; Olsen etal., 1990; Li etal., 1992). These protocols can differentiate phytoplankton from other cells and detritus in seawater, using the red autofluorescence of chlorophyll that characterizes all phytoplankton. Flow cytometric analysis of phytoplankton offers several advantages over tra- ditional approaches. Whereas conventional microscopic analysis of phytoplank- ton takes a few hours per sample and requires stable laboratory facilities, flow cytometric analysis can be performed in minutes on board ship. This rapid throughput also obviatesfixationand so small delicate species are more likely to be enumerated. However, although flow cytometry offers many advantages over optical microscopy, several challenges remain to be addressed. One of these is the identification and enumeration of individual taxonomic groups of phytoplankton from theirflowcytometric signatures. Commercially availableflowcytometers, designed for biomedical use, have cer- tain limitations when used in marine science. These include the lack of suitable data analysis protocols for the discrimination of the many cell types present in complex mixtures. In the biomedical sciences, there are few (typically less than five) cell types and these can usually be discriminated using two or threeflowcyto- metric parameters. In contrast, natural plankton populations have many more cell types. Although some progress has been made in differentiating groups of species using histograms and bivariate scatterplots (Olson et al., 1989), such methods are slow and underexploit the multivariate nature of the cytometric data. © Oxford University Press 1225 Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141 by guest on 12 April 2018
Transcript
Page 1: Discrimination of marine phytoplankton species through the ...

Journal of Plankton Research Vol.18 no.7 pp.1225-1238.19%

Discrimination of marine phytoplankton species through thestatistical analysis of their flow cytometric signatures

M.R.Carr, G.A.Tarran and P.H.BurkillPlymouth Marine Laboratory, Prospect Place, West Hoe, Plymouth PL1 3DH,UK

Abstract. Flow cytometry is a research technique for the rapid analysis of phytoplankton abundanceand distribution in marine waters. Although the technique is inherently much faster than opticalmicroscopy for counting phytoplankton. its capability for analysing individual taxa is restricted, duelargely to the lack of suitable data analysis protocols. These protocols, which use univariate and bivar-iate plots, can typically differentiate a maximum of three or four phytoplankton taxa under laboratoryconditions. We present here two multivanate statistical techniques, quadratic discriminant analysis(QDA) and canonical variate analysis (CVA). used to identify 32 species of phytoplankton from flowcytometric data. CVA was shown to be a useful graphical technique for analysing and displaying data,while QDA was successful at discriminating over two-thirds of the phytoplankton species, with classifi-cation rates >70%. QDA was also shown to be more than two orders of magnitude faster than conven-tional flow cytometric analyses for discriminating and enumerating phytoplankton species. We discussthe potential of multivariate analysis, and ways of developing these techniques for the detailed analysisof complex natural phytoplankton populations in the marine environment.

Introduction

The distribution and abundance of phytoplankton have traditionally been deter-mined from samples, collected and preserved at sea, and subsequently analysed inthe laboratory by optical microscopy (Colebrook, 1960). Recently, however, pro-tocols have been developed to analyse plankton populations at sea by flow cyto-metry (Chisholm etal., 1988; Olsen etal., 1990; Li etal., 1992). These protocols candifferentiate phytoplankton from other cells and detritus in seawater, using the redautofluorescence of chlorophyll that characterizes all phytoplankton.

Flow cytometric analysis of phytoplankton offers several advantages over tra-ditional approaches. Whereas conventional microscopic analysis of phytoplank-ton takes a few hours per sample and requires stable laboratory facilities, flowcytometric analysis can be performed in minutes on board ship. This rapidthroughput also obviates fixation and so small delicate species are more likely to beenumerated. However, although flow cytometry offers many advantages overoptical microscopy, several challenges remain to be addressed. One of these is theidentification and enumeration of individual taxonomic groups of phytoplanktonfrom their flow cytometric signatures.

Commercially available flow cytometers, designed for biomedical use, have cer-tain limitations when used in marine science. These include the lack of suitabledata analysis protocols for the discrimination of the many cell types present incomplex mixtures. In the biomedical sciences, there are few (typically less thanfive) cell types and these can usually be discriminated using two or three flow cyto-metric parameters. In contrast, natural plankton populations have many more celltypes. Although some progress has been made in differentiating groups of speciesusing histograms and bivariate scatterplots (Olson et al., 1989), such methods areslow and underexploit the multivariate nature of the cytometric data.

© Oxford University Press 1225

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 2: Discrimination of marine phytoplankton species through the ...

M.R.Carr, C.A.Tarran and P.H.Burkill

To address these limitations, the use of various pattern recognition techniqueshas recently been explored. This has included the use of principal componentanalysis for dimension reduction, and the fitting of mixture distributions to sam-ples of cultured algae (Demerse/a/., 1992) and to natural samples (Li, 1990). Suchtechniques are examples of so-called 'unsupervised learning'. Supervised learningtechniques, such as back-propagation neural networks, have been applied to cul-tured algae (Balfoort et al., 1992; Boddy et al., 1994) and to natural samples (Fran-kel et al., 1989). However, no attempt has been made to apply supervised learningtechniques from the statistical field.

In this paper we show that two techniques, quadratic discriminant analysis(QDA) and canonical variate analysis (CVA) (Seber, 1984), can be used to differ-entiate between 32 cultured species of phytoplankton using flow cytometric signa-tures. The successes, limitations and future potential of this approach to theidentification of phytoplankton are discussed.

Method

Cytometric analysis of phytoplankton

Phytoplankton cultures from the Plymouth Culture Collection (CCMS, PlymouthMarine Laboratory/Marine Biological Association, UK) were maintained in F/2enriched seawater medium (Guillard, 1975), under continuous illumination at 200u.E nr2 s"1, at 17°C. Cultures were maintained under constant conditions for 1month before analysis. Subculturing was carried out at 1-2 week intervals and at 4days before analysis to obtain cells in the exponential phase of growth. In total, 32species, found in Northern European shelf seas, were selected from seven taxo-nomic groups (Table I), with a wide range of sizes and morphologies (Figure 1).

Cultures were analysed using a Coulter EPICS 741 flow cytometer (CoulterElectronics, Luton, UK) using standardized instrument conditions. Illuminationwas provided by a vertically polarized Coherent 90 (Coherent, Cambridge, UK)argon ion laser exciting at 488 nm. The laser was focused to an ellipse 16.5 u.m highand 130 ̂ .m across. Cells crossed the laser beam within a Coulter Biosense flow tipwith a square-sided, 250 (xm orifice. Log forward light scatter (10-19°) was mea-sured in both the vertical (LVFLS) and horizontal (LHFLS) planes using a quad-scat detector (preceded by a 1.0 neutral density filter). This detector comprisedtwo photodiodes arranged to allow the transmission of either vertically polarized(upper photodiode) or horizontally polarized (lower photodiode) light. Threeother signals were measured at 90° to the laser beam: (i) 488 nm light scatter (LI90)that provided information on the internal granularity of the cells, and fluorescentlight that was spectrally filtered to measure (ii) chlorophyll fluorescence (>660 nm)(LIRFL) and (iii) phycoerythrin fluorescence (530-590 nm) (LIOFL) using photo-multiplier tubes.

Coulter Standard Brite 10 p.m fluorospheres were used to align the flow cyto-meter before samples were analysed and after each sample to check that themachine calibration remained constant. All samples were pre-screened through

1226

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 3: Discrimination of marine phytoplankton species through the ...

Murrivariate analysis of marine phytoplankton

Table I. The phytoplankton species analysed with their approximate dimensions and correct QDAclassifications as a percentage

Speciesidentification

AlA2A3A4A5A6A7

BlB2B3B4B5

ClC2C3C4

DlD2D3D4D5D6D7D8D9

ElE2E3E4E5E6E7E8

Group and species name

DinoflagellidaGymnodinium micrwnGymnodinium simplexGymnodinium veneficumGyrodinium aureolumHeterocapsa tnquetraProrocentnim micansScrippsiella trochoidea

BacillariophytaAmphora coffacformisChaetoceros calatransSkelttonema costatumThatassiosira pscudonanaThalassiosira weissflogii

CryptomonadidaChroomonas salinaCryptomonas maculalaCryptomonas rostrellaRhodomonas sp.

PrymnesiidaChrysochromulina camcllaChrysochromulina chitonEmiliania huxleyiEmiliania huxleyiOchrosphacra neopolitanaPavlova lutheriPhaeocyslis poucheliiPleurochrysis caneraePrymnesium parvttm

Other flagellatesOchromonas sp.Pscudopcdinella spPyramimonas obovalaPyramimonas grossiiTelraselmis suecicaTelraselmis verrucosaDunaliella terlioleclaChlamydomonas reginae

PCC no.

207368103497a16997a

104

547537106DSH541

54429

405530

2971469292d

1627564

15694

242361280

78305456

83399

Length(jun)

8-186-129-16

16-3515-273CM020-30

10-204-63-58-20

12-20

5-1212-2016-255-10

6-125-95-75-78-104-63-5

10-188-10

3-128-104-85-106-153-114-10

11-20

Percentagecorrect

84639193947979

459

206225

78829390

728768816475528469

7148466378829080

PCC. Plymouth Culture Collection, Marine Biological Association of the United Kingdom.DSH. isolated by D.S.Harbour, Plymouth Marine Laboratory, UK.

50 .̂m gauze and analysed. Analysis for all parameters was carried out in log modeusing a scale of 0-255 that represented three logarithmic decades. Data for the fivemeasured parameters were stored in list mode format for up to 10 000 events perspecies. These data were transferred from the Coulter MDADS computer to apersonal computer using an RS232 cable and Coulter CytoLogic software.

Analysis of data using conventional flow cytometric analysis software wasachieved by displaying events from any two parameters as a bivariate scatterplot.

1227Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 4: Discrimination of marine phytoplankton species through the ...

M.R.CBTT, G.A.Tarran and P.H.BurkiU

d • e

10 pm

J8 h

Fig. 1. Morphological and size variation of phytoplankton species used in this study. Dinoflagellida: (a)Prorocentrum micans; (b) Hcterocapsa triquetra; (c) Gymnodinium micnim. Prymnesiida: (d) Emilia-nia hialeyi; (e) Prymnesmmparvum. Prasinomonadida: (f) Pyramimonas sp.; (g) Tctraselmis sp. Cryp-tomonadida: (h) Chroomonas sp. Bacillariophyta: (i) Thalassiosira sp.; (j) Skelewnema costmum.

The list mode data for the chosen parameters were then run and visualized as dotswithin the scatterplot boundaries. For a single species the dots typically formed anelliptical cluster of points. Points within these clusters were enumerated by draw-ing a polygonal analysis region around the cluster, which provided a total count, amean (sensitivity values) for each of the parameters used and a measure of vari-ation about the mean.

For the analysis of phytoplankton mixtures, containing up to five species, a com-bination of bivariate scatterplots was created to determine which combination ofparameters best separated the species. Control list mode files of the single speciesin the mixture were then run to determine their parameter values. Referenceregions could then be drawn around the approximate areas for each species in themixture for enumeration.

200

rr100

Speciessymbol • Total

100LVFLS

200

Fig. 2. Statistical analyses for three species of phytoplankton. (a) Bivariate plot of forward light scatter(LVFLS) and red fluorescence (LIRFL). The lines represent the quadratic boundaries separating thethree species, (b) Quadratic discriminant analysis showing the confusion matrix which estimates thecorrect classifications (diagonal bold italic figures) and misclassifications (other figures).

1228

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 5: Discrimination of marine phytoplankton species through the ...

Multivariate analysis of marine phvtoplankton

Statistical analysis

CVA (Seber, 1984) was used to obtain two-dimensional graphical representationsof the five-dimensional cytometric data. In this method, the multiparametric dataare reduced to the best two-dimensional graphical form which maximizes the sep-aration of the group means. The goodness of fit of the two-dimensional represen-tation was assessed using the percentage of variation explained. The standardizedcoefficients were also used to assess the relative contribution of each parameter toeach canonical variate.

QDA (Seber, 1984) was used as an objective method for discriminating phy-toplankton taxa. This method assumes that the data for each species follow a sep-arate multivariate normal distribution. By assigning an unknown data point to themost probable distribution, quadratic boundaries (rules) between the differentdistributions are derived (Figure 2a). In practice, training data for each species areused to derive a set of rules and these rules are applied to test data to derive amatrix of classification rates known as the confusion matrix (Figure 2b).

The training data were obtained by gating out noise due to cell debris, dead cellsand bacteria, and then selecting a random subset of 500 from each species data set.

Noise was gated out by plotting histograms and bivariate scatterplots, and defin-ing rectangular gates. Having removed the noise, most of the species exhibitedunimodalty and many species distributions were approximately multivariate nor-mal. Hence, the assumptions for QDA were approximately satisfied for manyspecies. However, one species Chlamydomonas reginae, was clearly bimodal. Thiswas due to the presence of a small (—1.5 u,m) phytoflagellate co-occurring withC.reginae, and for this species only the upper part of the distribution was used.

The test data were derived by first applying as common dating, by excludingobservations with LIRFL <35 in each species data set, and then selecting a randomsubset of 500 observations.

When analysing a sample containing unknown proportions of different species,the estimated proportions are biased due to misclassifications. However, unbiasedestimates of the proportions iTd can be found using the confusion matrix, J.

ird = 7"1 (mjm,..., mJm)T

where mt is the number assigned to group i and m = m, + ... + mg.The mean values of the parameters were calculated for each species using the

training data. These values were used as input to CVA based on the five taxonomicgroups: dinoflagellates, diatoms, cryptomonads, prymnesiids and other flagellates.CVA was also used with a small subset of five species in order to compare a typicalscatterplot of LVFLS against LIRFL with a scatterplot of the first two canonicalvariates.

All the analyses were carried out using the procedures CANDISC andDISCRIM in the Statistical Analysis Systems (SAS) (1989).

1229Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 6: Discrimination of marine phytoplankton species through the ...

M.R.CarT, G.A.Tarran and P.H.Burkill

8

C\J

O

Fig. 3. A plot of the first canonical variale against the second canonical variate for all species (numbers)from the different taxonomic groups A, dinoflagellates: B. diatoms; C. cryptomonads; D. prymnesiids.E, other flagellates.

Results

Statistical analysis of the 32 species

CVA was carried out on the means for all phytoplankton species (Figure 3). Thefirst two canonical variates (based on standardized parameters) were0.23 LHFSL - 0.28 LVFLS - 1.93 LIRFL + 3.71 LIOFL - 0.56 LI90and 0.12 LHFLS + 1.79 LVFLS - 0.04 LIRFL - 0.24 LIOFL - 0.11 LI90.

These explain 98% (74 and 24%, respectively) of the variation in the speciesmeans. The cryptomonads (C) were clearly separated from other taxa on the firstcanonical variate because of their high orange to red fluorescence ratios due tosignificant cellular phycoerythrin contents. Many of the dinoflagellates (A) wereclearly separated from other taxa on the second canonical variate because of theirhigh LVFLS values. Other orders and groups could not be so well differentiatedusing CVA.

QDA was carried out at the species level and achieved classification rates of>70% for two-thirds of the species (Table II). Of the five groups, the dinoflagel-lates and cryptomonads had particularly high classification success rates (TableII). The off-diagonal values of the confusion matrix (Table II) showed that mostspecies were well separated from the others, represented by zero or very low classi-fication rates. In 55% of all comparisons, there was no overlap with other species,and only 6% of the comparisons had misclassification rates >5%.

The lowest classification rates were observed in the diatoms. Most of the speciesin this group typically form chains of between 2 and 40 individuals, each chainbeing analysed by the flow cytometer as a single event. This would lead to a widerange of values for all the parameters, overlapping those for many other speciesand increasing the misclassification rate of diatoms with other phytoplanktonspecies (Table II). The most notable example in this case was Chaetoceros cal-citrans, B2 which was misclassified as several species; in particular Chrysochromu-lina camella, Dl (23%) and Pseudopedinella sp., E2 (16%).

1230

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 7: Discrimination of marine phytoplankton species through the ...

Mullivariate analysts of marine pbytoplankton

res

CO

1o

CJ

OO

ali

. ~

nal

o

diag

8

co

icat

:lass

if

OO

ccoQ .

res

h co

r

sSJ

> ^

anal

cot^

lanl

tall

;

cof-J

or"Ocaa>u-

ca>

sz

§ • *

phyl

. .o

(her

ocre

peci

e;ie

s as

mu

foi

X

tn

CO

Bc

USI

O

cou

e II

abl

ua.

coo

CosXT

lass

i

E

iD.

0 0

UJ

w

tu

UJ

tu

UJ

tu

tu

Q00

Qr»Q

Q

Q

ssD

Q

30CM

uo

CO

CO

( NCQ

CO

^:

<

<

ies

pec

CO — < < < < < < < commcoaa u u u u O Q Q Q Q O Q Q Q tu UJ UJ tu UJ w tu UJ

1231

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 8: Discrimination of marine phytoplankton species through the ...

M.R.Carr, G.A.Tarran and P.H.Burkill

Comparison of statistical techniques with AFC software

Flow cytometric data from five species were combined and analysed using conven-tional flow cytometry (AFC) software and CVA. Figure 4a illustrates a typicalbivariate scatterplot of LVFLS (approximating to particle size) and LIRFL(approximating to cellular chlorophyll content). Considerable overlap betweenthe distributions of the five species can be seen, making it impossible to separateany of the species. Figure 4b and c shows the bivariate scatterplots of LVFLS withLIOFL (approximating to phycoerythrin fluorescence) and LVFLS with LI90(approximating to granularity). Both plots show some separation of the Chroomo-nas and Amphora species. When CVA was applied using all of the AFC par-ameters, Chroomonas was clearly separated from the other species, and Amphoraand Chrysochromulina were also well defined, with a small degree of overlap (Fig-ure 4d). Although Gymnodinium and Prymnesium overlapped with each other,

300

,200

100

• Gymnodlrttim• Amphora• ChroomonajB ChrytochromuBna• Prynuwstum

300

,200

100

• QymncxSnJLini• Amphora• ChroomonasS ChrysochnxmiJnaD Prymnssium

0 100 200

LVFLS300 100 200

LVFLS300

300

200o

100

• Gymnodniun• Amphora• ChroomonasB ChryiodvonxJnaD Prymrmlum

-6

B Qymnodinnjm• Amphora• Chroomonas8 CtvylodlrtxmiinaD Prymnetium

• • I

0 100 200

LVFLS300 -3 0 3

CV1Fig. 4. Comparison of phytoplankton analysis for five species by, flow cytometry analysis software (a-c)and canonical variate analysis (d). In (a-c), a maximum of two variables can be compared at any onetime, whereas with a multivariate statistical approach (d) all four variables are analysed simultaneouslyto provide the best two-dimensional representation of the data.

1232

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 9: Discrimination of marine phytoplankton species through the ...

Multivariate analysis of marine phytoplankton

they were well separated from the other three species. CVA, therefore, can rep-resent a useful initial step for visualizing flow cytometric data.

The speed and accuracy of the AFC software and QDA for phytoplanktonidentification and enumeration were compared using pre-defined data sets. Ident-ical data sets were analysed 'blind', i.e. without knowing the abundance compo-sition of the mixtures, by QDA and by AFC software. The results of the analysesare summarized in Table III.

QDA estimates for all four mixtures were accurate. A slight variation betweenGymnodinium simplex and Prymnesium parvum in mixtures 2 and 4 was observed,in which the QDA estimate differed by 1% from the true proportion. In each case,QDA accounted for 100% of the true observations. With AFC software, althoughmany of the estimates were close to the true proportions, there was a great deal ofvariation. The worst case was in mixture 4, where P.parvum was overestimated bya factor of four. In addition, none of the totals equalled 100%. This was due tomis-assignments of some observations to one of the five species in mixtures 1 and 2,and to slight overlaps in the analysis gates for mixtures 3 and 4.

The most striking aspect of the comparison between the two techniques was thedifference in the analysis times. Using the AFC software, each mixture took

Table III. Comparison of phytoplankton analysis for different mixtures of five species using flow cyto-metry software and quadratic discriminant analysis. Mixtures containing variable proportions of thefive species were analysed to test the accuracy and speed of the analysis techniques

Species

Gymnodinium simplexAmphora coffaeformisChroomonas salinaChrysochromulina camellaPrymnesium parvum

Total %Analysis time (min)

Species

Gymnodinium simplexChroomonas salinaChrysochromulina camellaPrymnesium parvumOchromonas sp.

Total %Analysis time (min)

% composition: mixture 1

True

1015202530

100

QDAestimate

1015202530

10001

AFCsoftware

1213202130

9635

% composition: mixture 3

True

1020302515

100-

QDAestimate

1020302515

1000.1

AFCsoftware

921283212

10245

% composition: mixture 2

True

501515155

100

QDAestimate

511515154

1000.1

AFCsoftware

491215138

9735

% composition: mixture 4

True

5020105

15

100-

QDAestimate

5120104

15

1000.1

AFCsoftware

4621101612

10549

1233

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 10: Discrimination of marine phytoplankton species through the ...

M.R.Can-, G.A.Tarran and P.H.Burkill

between 35 and 49 min to analyse, whereas the QD A was 2-3 orders of magnitudefaster.

Analysis of large mixtures of phytoplankton by QDA

Three data sets were produced using variable combinations of 17 species of phy-toplankton from the total database of 32 species. Each data set contained a total of10 000 events, split into either equal or variable proportions between the 17 speciesused. QDA was then carried out on the mixtures to test its accuracy in estimatingindividual phytoplankton species abundance in the mixtures.

The results for all three mixtures were very similar in terms of their overall accu-racy at estimating phytoplankton abundance, regardless of the species used or theproportions of observations for each species (Table IV). In most cases, the abun-dance estimates were within 5% of the actual value and in only 13 out of 51 casesdid the abundance estimate vary >10% from the actual value for any one species.

At the group level, the abundance estimates of dinofiagellates (A) were gener-ally within 3-4% of the actual abundances. The only exception was Scrippsiellatrochoidea (Table IV; A7) which had an abundance estimate in mixture 3 whichwas 17% lower than the actual abundance. Both the cryptomonads (C) and otherflagellates group (E) were enumerated accurately, with all but one of the abun-dance estimates being within 10% of the actual abundance. Most of the prymne-siids (D) were accurately estimated, with two exceptions. Chrysochromulinacamella (Table IV; Dl) was consistently overestimated, even when it was not actu-ally in the mixture, and the naked, uncalcified morph of Emiliania huxleyi (TableIV; D3) was overestimated by 33% in mixture 3.

The abundance estimates of diatoms (B) were generally poor, except forT.pseudonana (Table IV; B4). Amphora coffaeformis and Thalassiosira weissflogii(Table IV; Bl and B5, respectively) were always overestimated, whereas Chaeto-ceros calcitrans and S.costatum (Table IV; B2 and B3, respectively) were alwaysunderestimated. These results contrast with the QDA results in Table II, in whichT.pseudonana had the highest classification percentage of the diatoms.

Discussion

Group-level and species-level analysis using CVA

One of the original aims of the study was to investigate whether multivariate tech-niques offered an improvement in the differentiation of phytoplankton taxa. Thefirst step was, therefore, to see whether it was possible to differentiate the phy-toplankton into their taxonomic orders, i.e. diatoms, dinofiagellates, crypto-monads, etc. CVA was used to see whether taxonomic order separation waspossible. The CVA plot of the species means (Figure 3) showed that the cryptomo-nads (C) were completely separated from the others. Species of this group typicallyhave high orange to rend fluorescence ratios due to the presence of cellular phy-coerythrin, and it was this property which resulted in their discrimination fromother species. Most dinofiagellates (A) were also clearly separated. As the dinofiag-ellates tended to be the largest species analysed, it was likely that their LVFLSsignals would be higher than those for other species, and this turned out to be true.

1234

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 11: Discrimination of marine phytoplankton species through the ...

Multivariate analysis of marine phytoplankton

Table IV. Comparisons of actual and estimated phytoplankton abundance for mixtures of 17 phy-toplankton species using quadratic discriminant analysis "trained" for all species

Speciesidentification(see Table I)

AlA2A3A4A5A6A7

BlB2B3B4B5

ClC2C3C4

DlD2D3D4D5D6D7D8D9

ElE2E3E4E5E6E7E8

Total

Mixture Iabundances

Actual

5880

5890

5880

589

0588

0589

0

5880

5890

5880

5880

5880

5880

588

0588

0588

0588

0588

I0000

Estimated

5740

57113

5870

577

24293

0577223

5890

5690

6728

5940

53420

5860

566

0671

0579

0566

17590

10000

Mixture 2abundances

Actual

1030270890790320

0670

3701120

0490510

0000

1900

960320

0350

0430

0

0870

000

42000

10000

Estimated

967267880759312

15609

507620

0467796

0401

3220

993297

0365

16396

0

0930

642

0412

710

10000

Mixture 3abundances

Actual

0290

000

1340190

0360

111000

0580670

1180

01250200

0640

027080

0

0780

0160860

0400

10000

Estimated

0316

2444

1296163

3344

8570

260

0607652

1123

881211299

0587

0255620

1686088

159920

1746

9

10000

The other groups could not be discriminated, suggesting that differences in mor-phological characteristics between the taxonomic groups are not reflected in lightscattering and fluorescence properties alone.

CVA was also applied to the analysis of data from five species (Figure 4). Theresults here clearly showed that CVA can be a useful preliminary graphicalmethod to apply when analysing a small number of groups.

Species-level analysis using QDA

QDA proved to be successful at discriminating individual phytoplankton species(Table II). The results of the QDA showed many species to be well separated, with

1235Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 12: Discrimination of marine phytoplankton species through the ...

M.R.CarT, G.A.Tarran and P.H.Burkill

classification rates exceeding 70% for more than two-thirds of the species. QDAalso had few misclassifications of one species as another, as can be seen in the rowsof Table II. Only the diatoms had low classification rates, probably due to theirchain-forming habit which caused them to overlap with many other species.

Flow cytometric signatures are well suited to QDA, as they tend to exhibit multi-variate normal distributions. Even for those distributions which were clearly non-normal, the effects of non-normality could, in many cases, be ignored due to theclear separation of many of the species. However, the chain-forming diatoms ex-hibited non-normal distributions and this, along with the large variations in theirdistributions, led to overestimates or underestimates of abundance (Table IV).

QDA was shown to be a great improvement over conventional AFC software,both in terms of accuracy and analysis time (Table III). QDA also performed wellin the artificial mixtures tests shown in Table IV, with the abundance estimates formost species being very accurate. However, there were a number of cases ofspecies being identified which were not present. This is to be expected since thetraining data consist of more species than are likely to be present in a single mix-ture and because natural variation will lead to observations being misclassified(due to the overlap of species distributions).

The statistical approach in our study compares favourably with a study usingback-propagation neural networks to identify 42 marine phytoplankton strains(Boddy et al., 1994). Although the data sets were different, 25 species were com-mon in both studies. In these instances, QDA tended to produce higher classifi-cation rates than the neural networks. However, the neural network study usedmore species and this is likely to be the reason why neural nets gave a slightlypoorer performance. It would be interesting to compare directly the performancesof QDA and neural nets in discriminating phytoplankton using identical test data.

QDA has some advantages over other, more complex analysis techniques suchas back-propagation neural networks. QDA only takes a few minutes to train,whereas back-propagation neural networks may take several hours. It is also easyto add new species with QDA since it is only necessary to fit a multivariate normaldistribution to the new data. In contrast, back-propagation neural networks mayrequire further hidden layers if more species need to be identified which wouldrequire retraining the network.

Applications and future development

There are many laboratory-based studies in which QDA is likely to be of greatbenefit in marine research. This is particularly true for trophodynamic exper-iments in which phytoplankton are grazed by other organisms, and for inter-specific competition experiments between phytoplankton. At present, it is gener-ally only possible to discriminate between three or four species of phytoplanktonusing conventional flow cytometry software. Using QDA, it should be possible todifferentiate routinely many more species in experimental mixtures providing theflow cytometric signatures do not vary over the course of the experiment.

A longer-term goal is to use flow cytometry to analyse phytoplankton popu-lations in the field with the same degree of confidence as for laboratory cultures.Making this step represents a complex challenge. There are a number of factors

1236

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 13: Discrimination of marine phytoplankton species through the ...

Mullivariale analysis of marine phytoplankton

that need to be considered for the analysis of field samples. We cannot be sure thatthe flow cytometric signatures obtained from laboratory cultures are representa-tive of those present in the field. The signatures will vary according to environmen-tal conditions, such as nutrient availability, temperature or light. Of these, light islikely to be the most important variable as phytoplankton are capable of photoa-daptation in a few hours, altering their pigment composition to suit the ambientlight levels. To assess the importance of the variation in environmental conditions,it will be necessary to carry out growth studies on a number of phytoplanktonspecies over a range of conditions likely to be encountered in the natural environ-ment. If the distribution changes are small, QDA may prove to be sufficientlyrobust since it relies on defining boundaries rather than explicitly modelling thedistributions and defining cut-off points. However, if the distribution changes aresignificant, then it will be necessary to incorporate such data into the training dataset. As an additional improvement, it may be possible to include the light level orequivalent depth of the species, as has been done by Frankel et al. (1989).

QDA has been shown to be a useful technique for identifying marine phyto-plankton. In particular, it is faster and simpler to train than more complex tech-niques. However, there are a number of cases in which more complex techniques,such as neural networks or kernel discriminant analysis, may prove useful. Forexample, a low classification rate may be due to the assumption of a multivariatenormal distribution being a poor representation of the true distribution. By model-ling the distribution more precisely (e.g. by kernel discriminant analysis), betterclassification rates would be achieved. Noise can also adversely affect the esti-mated proportions for a few species, and in these cases it will be useful to model thenoise as well as the real data. This has already been achieved by Frankel et al.(1989) using a back-propagation neural network. It is also possible to classify anobservation as unknown by specifying a cut-off value. This may be an improve-ment on QDA, which assigns all events to one of the classifications in the originaltraining data set, and may be necessary for field samples which may containunknown species. However, if the distributions change, an approach based ondefining boundaries may prove to be more robust than explicitly modelling distri-butions and defining cut-off points. Discrimination of chain-forming colonialspecies could be improved by increasing the number of parameters measured perevent. For data sets containing such a large number of parameters, QDA isunlikely to be useful and techniques such as neural networks are likely to provemore successful. For example, Errington and Graham (1993) used 17 parametersas input to a multilayer perceptron neural network designed to identify 24 chromo-somes. In the best case, the classification error rate was only 6.2%, after networkoptimization. Most of the input parameters were in the form of grey level profilesconsisting of a series of peaks, corresponding to the banding on the chromosomes.Particle profiles can also be measured by flow cytometry (Cunningham, 1990).Such information could be useful for identifying colonial species such as chain-forming diatoms, as each individual cell within a chain would appear as a distinctpeak within the overall flow cytometric profile.

In conclusion, the long-term goal of the present studies is to develop fast andefficient techniques for the identification and enumeration of phytoplankton. The

1237Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018

Page 14: Discrimination of marine phytoplankton species through the ...

M.R.Carr, G.A.Tarran and P.H.Burkill

combination of multiparametric data sets with multivariate data analysis tech-niques has been shown to be successful at discriminating many species of phy-toplankton. Future studies will focus on refining these techniques andinvestigating the use of more complex statistical approaches to improve upon thecurrent level of discrimination.

Acknowledgements

The authors would like to thank Dr J.C.Green of the Plymouth Culture Collectionfor the provision of phytoplankton cultures for this study, Dr Lynne Boddy, Uni-versity of Wales, College of Cardiff, and Dr Bob Clarke, Plymouth Marine Lab-oratory, for valuable comments on earlier drafts of the manuscript. This work hasbeen funded by the CEC MAST II project EurOPA, contract number MAS2-CT91-0001 and the UK Natural Environment Research Council PRIME SpecialTopic GST/02/1062. It forms part of the research of Strategic Research Project 1 ofthe Plymouth Marine Laboratory, UK.

References

Balfoort.H.W., SnoekJ., SmitsJ.R.M.. Breedfeld.L.W., HofstraatJ.W. and RingelbergJ. (1992)Automatic identification of algae: neural network analysis of flow cytometric data. J. Plankton Res ,14,575-589.

Boddy.L., Morris.C. W . Wilkins.M.F., Tarran.G.A. and Burkill,P.H. (1994) Neural network analysis offlow cytometric data for 40 marine phytoplankton species. Cytometry, 15, 283-293.

Chisholm,S.W., Olsen.RJ., Zettler.E.R., Goencke.R., WaterburyJ.B. and Welschmeyer.N.A. (1988)A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature, 334, 340-343

ColebrookJ.M. (I960) Continuous plankton records: methodsof analysis, 1950-1959. Bull. Mar. Ecol..5,51-64.

Cunningham.A. (1990) Fluorescence pulse shape as a morphological indicator in the analysis of col-onial microalgae by flow cytometry. J. Microb. Methods, 11, 27-36.

Demers.S., Kim J., Legendre.P. and Legendre.L. (1992) Analysing multivariate flow cytometric data inaquatic sciences. Cytometry, 13, 291-299

Errington.P.A. and Graham J. (1993) Application of artificial neural networks to chromosome classifi-cation. Cytometry. 14. 627-639.

Frankel.D.S., Olson.RJ., Frankel.S.L. and Chisholm.S.W. (1989) Use of a neural net computer systemfor analysis of flow cytometric data of phytoplankton populations. Cytometry, 10, 540-550.

Guillard.R.R.L. (1975) Culture of phytoplankton for feeding marine invertebrates. In Smith, W.L. andChanley.M.H. (eds). Culture of Marine Invertebrate Animals. Plenum, New York. pp. 29-60.

Li.W.K.W. (1990) Bivariate and trivanate analysis in flow cytometry: phytoplankton size and fluor-escence. LimnoL Oceanogr.. 35, 1356-1368.

Li.W.K.W., Dickie.P.M.. Irwin.B.D. and Wood.A.M. (1992) Biomass of bacteria, cyanobacteria. pro-chlorophytes and photosynthetic eukaryotes in the Sargasso Sea. Deep-Sea Res., 39,501-519.

Olson.RJ., Chisholm.S.W., Zettler.E.R. and Armbrust.E.V. (1990) Pigments, size and distribution ofSynechococcus in the North Atlantic and Pacific Oceans. Limnol. Oceanogr., 35,45-58.

Olson.RJ.. Zettler.E.R. and Anderson.O.K. (1989) Discrimination of eukaryotic phytoplankton celltypes from light scatter and autofluorescence properties measured by flow cytometry. Cytometry, 10.636-646.

Seber.G.A.F. (1984) Multivariate Observations. John Wiley and Sons, New York.Statistical Analysis Systems Institute (1989) SAS/STAT User's Guide, Version 6, 4th edn. Volume I.

SAS. Cary, NC, 943 pp.

Received on October 23, 1995: accepted on February 26, 1996

1238

Downloaded from https://academic.oup.com/plankt/article-abstract/18/7/1225/1424141by gueston 12 April 2018


Recommended