Post on 04-Sep-2016
transcript
Atmospheric Environment 38 (2004) 4667–4673
ARTICLE IN PRESS
*Correspond
E-mail addr
1352-2310/$ - se
doi:10.1016/j.at
Comparison between Conditional Probability Function andNonparametric Regression for Fine Particle Source Directions
Eugene Kima, Philip K. Hopkeb,*aDepartment of Civil and Environmental Engineering, Clarkson University, Box 5708, Potsdam, NY 13699, USA
bDepartment of Chemical Engineering, Clarkson University, Box 5708, Potsdam, NY 13699, USA
Received 18 March 2004; accepted 26 May 2004
Abstract
The objective of this study is to examine the use of conditional probability function (CPF) and nonparametric
regression (NPR) to identify directions of PM2.5 (particulate matter p2.5mm in aerodynamic diameter) sources using
data collected from multiple monitoring sites across the US NPR has been used on cyclohexane data from Houston, TX
and correctly showed the direction of the source. In recent source apportionment studies using positive matrix
factorization (PMF), ambient PM2.5 compositional data sets from 24-h integrated samples including eight individual
carbon fractions collected at four monitoring sites, Atlanta, GA, Washington, DC, Brigantine, NJ, and Seattle, WA,
were analyzed identifying 10–11 sources. To analyze local point source impacts from various wind directions, CPF and
NPR were calculated using the source contributions estimated from PMF coupled with wind direction measured on
site. The comparison between CPF and NPR demonstrated that both methods agreed well with the locations of known
local point sources. CPF was simpler and easier to calculate than NPR. In contrast, NPR provided PM2.5
concentrations and associated uncertainties. This study indicates that both methods can be utilized to enhance source
apportionment study of ambient PM2.5.
r 2004 Elsevier Ltd. All rights reserved.
Keywords: Conditional probability function; Nonparametric regression; Positive matrix factorization; Source apportionment; Source
direction
1. Introduction
Since US Environmental Protection Agency promul-
gated new national ambient air quality standards for
airborne particulate matter less than 2.5 mm in aero-
dynamic diameter (PM2.5), advanced source apportion-
ment methods for PM2.5 are required for the source-
specific community epidemiology as well as in the
implementation planning process for attaining the air
quality standards. Positive matrix factorization (PMF)
(Paatero, 1997) has been used to assess PM2.5 source
ing author. Fax: +1-315-268-6654.
ess: hopkepk@clarkson.edu (P.K. Hopke).
e front matter r 2004 Elsevier Ltd. All rights reserve
mosenv.2004.05.035
contributions (Ramadan et al., 2000; Polissar et al.,
2001; Song et al., 2001; Lee et al., 2002; Kim et al.,
2003).
In recent source apportionment studies using PMF
method, ambient PM2.5 compositional data sets of daily
integrated samples including eight individual carbon
fractions collected at four monitoring sites, Atlanta,
GA, Washington, DC, Brigantine, NJ, and Seattle, WA,
were analyzed identifying 10–11 sources. In these
studies, a method is needed to find the directions of
high values of PMF estimated source contributions that
are likely to be related with the directions of sources.
The conditional probability function (CPF) has been
successfully used to analyze point source impacts in
d.
ARTICLE IN PRESSE. Kim, P.K. Hopke / Atmospheric Environment 38 (2004) 4667–46734668
these four studies as well as other source apportionment
studies (Kim et al., 2003, 2004a; Zhou et al., 2004;
Begum et al., 2004). Nonparametric regression (NPR)
has been tested using cyclohexane data from Houston,
TX and correctly pointed in the direction of the source
(Henry et al., 2002).
The objective of this study is to explore the use of
CPF and NPR to identify directions of PM2.5 sources
using data collected at multiple motoring site across US.
To analyze source impacts from various wind directions,
CPF and NPR are calculated using source contributions
estimated from PMF coupled with wind direction
measured on sites. The CPF values for each source are
compared with corresponding NPR results.
2. Experiment
2.1. Sample collection and chemical analysis
Atlanta PM2.5 samples were collected at the South-
Eastern Aerosol Research and CHaracterization
(SEARCH) (Kim et al., 2003; Hansen et al., 2003)
monitoring site located in Atlanta, GA. Daily integrated
PM2.5 samples were collected using the particulate
composition monitor (PCM, Atmospheric Research
and Analysis, Inc.) that permits simultaneous sampling
on a 3-stage filter pack (Teflon, Nylon, and cellulose
filter), a Nylon filter, and a quartz filter.
The PM2.5 samples from Washington, DC, Brigan-
tine, NJ, and Seattle, WA were collected on Wednesdays
and Saturdays at the Interagency Monitoring of
Protected Visual Environments (IMPROVE) (Malm
et al., 1994) monitoring site. Integrated 24-h PM2.5
samples were collected on Teflon, Nylon, and quartz
filters. The quartz filters collected by both SEARCH and
IMPROVE studies were analyzed via the thermal optical
reflectance (TOR) method (Chow et al., 1993) for eight
temperature resolved carbon fractions (OC1-OC4, EC1-
EC3) (Desert Research Institute, Reno, NV).
For the Atlanta study, 529 daily samples collected
between August 1998 and August 2000 were used. A
total of 718 samples collected between August 1988 and
December 1997 and 910 samples collected between
March 1992 and May 2001 were used for the Washing-
ton, DC and Brigantine studies, respectively. For the
Seattle study, 384 samples collected between March
1996 and February 2000 were analyzed. Detailed filter
analyses and summaries of PM2.5 speciation data used in
these PMF analyses are provided in Kim et al. (2004b,c)
and Kim and Hopke (2004a,b).
2.2. Source apportionment
The general receptor modeling problem can be stated
in terms of the contribution from p independent sources
to all chemical species in a given sample as follows
(Miller et al., 1972; Hopke, 1985)
xij ¼Xp
k¼1
gikfkj þ eij ; ð1Þ
where xij is the jth species concentration measured in the
ith sample, gik is the particulate mass concentration from
the kth source contributing to the ith sample, fkj is the
jth species mass fraction from the kth source, eij is
residual associated with the jth species concentration
measured in the ith sample, and p is the total number of
independent sources. PMF provides a solution that
minimizes a weighted sum of squared residuals (Paatero,
1997; Polissar et al., 1998). PM2.5 compositional data
measured in Atlanta, GA, Washington, DC, and
Brigantine, NJ were analyzed using PMF2, and those
measured in Seattle, WA was analyzed using more
flexible multivariate modeling tool, the multilinear
engine (Paatero, 1999).
The application of PMF depends on the estimated
uncertainties for each of the data values. The uncer-
tainty estimation provides a useful tool to decrease the
weight of missing and below detection limit data in the
solution. The procedure of Polissar et al. (1998) was
used to assign measured data and the associated
uncertainties. The final PMF solutions were determined
by experiments with different number of sources with
the final choice based on the evaluation of the resulting
source profiles as well as the quality of the species fits.
The 11–, 10 –, 11– and 11-source models provided the
most physically reasonable source profiles for the
Atlanta, Washington, DC, Brigantine, and Seattle
studies, respectively. A summary of the PMF resolved
average source contributions to PM2.5 mass concentra-
tions at four monitoring sites used in this study is shown
in Table 1.
2.3. Conditional probability function
CPF (Ashbaugh et al., 1985) analyzes point source
impacts from varying wind directions using the source
contribution estimates from PMF coupled with the wind
direction values measured on site (Kim et al., 2003). The
CPF estimates the probability that a given source
contribution from a given wind direction will exceed a
predetermined threshold criterion.. The same daily
contribution was assigned to each hour of a given day
to match to the hourly wind data. The CPF is defined as
CPFDy ¼mDy
nDy; ð2Þ
where mDy is the number of occurrence from wind sector
Dy that exceeded the threshold criterion, and nDy is the
total number of data from the same wind sector. In this
study, 24 sectors were used (Dy ¼ 15�). Calm wind
(o1m s�1) periods were excluded from this analysis due
ARTICLE IN PRESS
Table 1
The PMF resolved average source contributions (mgm�3) to PM2.5 mass concentrations at four monitoring sites
Average source contribution (standard error)
Atlanta, GAa Washington, DCb Brigantine, NJc Seattle, WAd
Secondary sulfate aerosol 2.2 (0.10)
Secondary sulfate aerosol (carbon-low) 8.3 (0.3)
Secondary sulfate aerosol (summer-high) 7.7 (0.3) 5.4 (0.2)
Secondary sulfate aerosol (winter-high) 1.1 (0.03) 0.6 (0.02)
Secondary sulfate aerosol (carbon-rich) 1.0 (0.03) 1.9 (0.07) 0.7 (0.02)
Diesel emissions 1.8 (0.07) 0.3 (0.01) 0.4 (0.01) 1.9 (0.10)
Gasoline vehicle 1.1 (0.04) 3.8 (0.1) 1.4 (0.04) 0.8 (0.06)
Nitrate-rich secondary aerosol 1.4 (0.06) 1.6 (0.06) 0.6 (0.03) 0.4 (0.02)
Airborne soil 0.5 (0.02) 0.3 (0.02) 0.4 (0.02) 0.6 (0.03)
Incinerator 0.7 (0.02) 0.2 (0.01)
Aged sea salt 0.4 (0.01) 0.8 (0.03) 0.7 (0.03)
Oil combustion 0.3 (0.01) 0.1 (0.005) 0.2 (0.02)
Sea salt 0.5 (0.03) 0.3 (0.04)
Wood smoke 1.1 (0.04) 1.4 (0.09)
Paper mill 0.2 (0.01)
Metal processing 0.6 (0.03) 0.1 (0.005)
Railroad traffic 0.4 (0.01)
Cement kiln/carbon-rich 0.3 (0.01)
Bus maintenance facility/highway traffic 0.3 (0.02)
aKim et al., 2004b.bKim and Hopke, 2004a.cKim and Hopke, 2004b.dKim et al., 2004c.
E. Kim, P.K. Hopke / Atmospheric Environment 38 (2004) 4667–4673 4669
to the isotropic behavior of wind vane under calm
winds. From tests with several different percentile of the
fractional contribution from each source, a threshold
criterion of the upper 25 percentile was chosen to define
the directionality of the sources. The sources are likely
to be located to the direction that have high conditional
probability values.
2.4. Nonparametric regression
NPR (Hardle, 1990) is a regression model without
parameters since it estimates the expected value of
concentration given wind direction. To find the direc-
tions of peaks in the ambient concentrations, Henry et al.
(2002) suggested NPR using a Gaussian kernel as a
nonsubjective alternative to the usual bar chart method
that is highly dependent on the location and size of Dy:NPR produces statistical confidence intervals as well as
estimates of the location of peaks, and is able to separate
closely located peaks. NPR was applied to the hourly
measured cyclohexane data from two sites in Houston,
TX. The triangulation of the peak directions estimated
from two sites correctly pointed to the source (Henry
et al., 2002).
The average concentration over a sliding window of
width Dy centered at y is generalized to use Dy as a
smoothing parameter.
%Cðy;DyÞ ¼Pn
i¼1 Kððy� WiÞ=DyÞCiPni¼1 Kððy� WiÞ=DyÞ
; ð3Þ
where Wi is the measured wind direction and Ci is the
measured concentration for the ith sample, and n is the
total number of samples. To give different weights to the
measurements, a Gaussian kernel function, K(x), is used
and defined as
KðxÞ ¼1ffiffiffiffiffiffi2p
p expð�0:5x2Þ: ð4Þ
This function gives maximum weight to the measure-
ments near y: Henry et al. (2002) defined Dy in terms of
the full width of the peak in kernel function measured at
the point where the curve has fallen to half of its
maximum value. The optimal Dy is the one that
minimize the sum of squared difference (V ðDyÞ) betweenthe measured concentration and the estimated concen-
tration leaving out one observation.
V ðDyÞ ¼Xn
j¼1
ðCj � %CjðWj ;DyÞÞ2; ð5Þ
ARTICLE IN PRESS
Fig. 2. Comparison between (a) CPF and (b) NPR plots for the
oil combustion in Seattle, WA.
E. Kim, P.K. Hopke / Atmospheric Environment 38 (2004) 4667–46734670
where
%CjðWj ;DyÞ ¼
Piaj KððWj � WiÞ=DyÞCiP
iaj KððWj � WiÞ=DyÞ: ð6Þ
The confidence intervals in the estimated concentra-
tions are based on the asymptotic normal distribution
(Hardle, 1990) and defined as
S2ðyÞ ¼Ck %sðyÞnDy %fðyÞ
; ð7Þ
where
Ck ¼1ffiffiffiffiffiffi2p
p ; ð8Þ
%fðyÞ ¼ ðnDyÞ�1Xn
i¼1
Ky� Wi
Dy
� �; ð9Þ
%s2ðyÞ ¼ ðn %fðyÞÞ�1Xn
i¼1
Ky� Wi
Dy
� �ðCi � %Cðy;DyÞÞ2: ð10Þ
In this study, the same daily contribution deduced by
PMF was assigned to each hour of a given day as Ci in
Eq. (3) to match to the hourly wind data. Calm winds
were also excluded from this analysis.
3. Results and discussion
To determine the optimal Dy for the calculation of
NPR, a variety of angular intervals were tested. The
final Dy for each of the forty-three source contributions
to particle mass at the four monitoring sites ranged from
3� to 14�. These values were determined based on the
evaluation of the measured concentrations and the
estimated concentrations leaving out one observation
(Eq. (5)).
Fig. 1. Location of the monitoring site in Seattle, WA. The
Beacon Hill monitoring site in Seattle is denoted by %.
The comparison between CPF and NPR demon-
strated that both methods agreed well with the locations
of known local point sources. Fig. 1 shows the locations
of the Seattle monitoring site and the Port of Seattle,
which is located about 5 km northwest of the site. It is a
major containerized cargo port with large marine oil
combustion sources. Fig. 2 presents the CPF and NPR
Fig. 3. Location of the monitoring site in Brigantine, NJ. The
monitoring site is denoted by %.
Fig. 4. Comparison between (a) CPF and (b) NPR plots for the
sea salt in Brigantine, NJ.
ARTICLE IN PRESS
Table 2
Pearson correlation coefficients between CPF values and NPR values
Pearson correlation coefficient
Atlanta, GAa Washington, DCb Brigantine, NJc Seattle, WAd
Secondary sulfate aerosol 0.96
Secondary sulfate aerosol (carbon-low) 0.47
Secondary sulfate aerosol (summer-high) 0.72 0.91
Secondary sulfate aerosol (winter-high) 0.95 0.97
Secondary sulfate aerosol (carbon-rich) 0.21 0.95 0.95
Diesel emissions 0.58 0.91 0.96 0.80
Gasoline vehicle 0.59 0.93 0.93 0.69
Nitrate-rich secondary aerosol 0.93 0.98 0.94 0.93
Airborne soil 0.86 0.82 0.87 0.95
Incinerator 0.97 0.97
Aged sea salt 0.95 0.95 0.72
Oil combustion 0.94 0.85 0.99
Sea salt 0.96 0.96
Wood smoke 0.88 0.73
Paper mill 0.78
Metal processing 0.91 0.81
Railroad traffic 0.75
Cement kiln/carbon-rich 0.82
Bus maintenance facility/highway traffic 0.72
aKim et al., 2004b.bKim and Hopke, 2004a.cKim and Hopke, 2004b.dKim et al., 2004c.
E. Kim, P.K. Hopke / Atmospheric Environment 38 (2004) 4667–4673 4671
plots for the oil combustion source in which the
calculated values are displayed in terms of a polar scale.
The radial axis has units of mgm�3, and inner and outer
lines denote the 95% confidence intervals in the NPR
plot. In Fig. 2, both the CPF and NPR plots indicate oil
combustion contributions from the direction of the Port
of Seattle. Fig. 3 shows the location of the Brigantine
monitoring site and Fig. 4 shows CPF and NPR plots
for sea salt. Both plots in Fig. 4 indicates sea salt
contributions from the direction of Atlantic Ocean that
is located south and east of the Brigantine site.
Table 2 shows the Pearson correlation coefficients for
the values from CPF against those from NPR for four
monitoring sites. To match the 24 angular sectors used
for the CPF, the 24 NPR values corresponding to the
middle of each 24 angular sectors used for the CPF were
only used without averaging NPR values or changing Dyfor the correlation analysis. Table 2 indicates high
correlations (r2 ¼ 0:7220:99) between CPF and NPR
estimations for local point sources. Regional secondary
sources as well as traffic sources show good correlations
(r2 ¼ 0:7220:98), except for two secondary sulfate
aerosols (r2 ¼ 0:47 and 0:21), gasoline vehicle
(r2 ¼ 0:59), and diesel emissions (r2 ¼ 0:58) in Atlanta,
GA. The CPF and NPR plots of these four Atlanta
sources are compared in Fig. 5 showing similar trends
although they have poor correlations. The source
directions of north and southeast for carbon-low
secondary sulfate, southeast for gasoline vehicles, and
south and northeast for the diesel emissions are shown
in both CPF and NPR plots. The poor correlations
for the four sources may be due to the poorly defined
directionality of secondary and traffic sources.
4. Conclusion
The CPF and NPR were compared for the usefulness
in identifying directions of PM2.5 sources using source
contributions at four monitoring sites across US:
Atlanta, GA, Washington, DC, Brigantine, NJ, and
Seattle, WA. The comparison between CPF and NPR
demonstrated that both methods agreed well with the
locations of known local point sources. CPF is simpler
and easier to calculate than NPR. In contrast, NPR
provided smoothed PM2.5 concentrations and associated
uncertainties for 360 wind sectors. In this study, NPR
was tested using a variety of PM2.5 source contributions
for the first time, and demonstrates that both methods
can be utilized to enhance source apportionment study
of ambient PM2.5.
ARTICLE IN PRESS
Fig. 5. The comparison between (a) CPF and (b) NPR plots for
Atlanta, GA.
E. Kim, P.K. Hopke / Atmospheric Environment 38 (2004) 4667–46734672
Acknowledgements
We would like to thank Dr. Ronald Henry for
providing his script for the NPR calculation. This work
was supported by the United States Environmental
Protection Agency (US EPA)’s the Science to Achieve
Results (STAR) program under Grant R83107801.
Although the research described in this article has been
funded by the US EPA, the views expressed herein are
solely those of the authors and do not represent the
official policies or positions of the US EPA.
References
Ashbaugh, L.L., Malm, W.C., Sadeh, W.Z., 1985. A residence
time probability analysis of sulfur concentrations at Grand
Canyon national park. Atmospheric Environment 19 (8),
1263–1270.
Begum, A.B., Kim, E., Biswas, S.K., Hopke, P.K., 2004.
Investigation of sources of atmospheric aerosol at urban
and semi-urban areas in Bangladesh. Atmospheric Environ-
ment 38, 3025–3038.
Chow, J.C., Watson, J.G., Pritchett, L.C., Pierson, W.R.,
Frazier, C.A., Purcell, R.G., 1993. The DRI thermal/optical
reflectance carbon analysis system: description, evaluation
and applications in US. air quality studies. Atmospheric
Environment 27A (8), 1185–1201.
Hansen, D.A., Edgerton, E.S., Hartsell, B.E., Jansen, J.J.,
Kandasamy, N., Hidy, G.M., Blanchard, C.L., 2003. The
southeastern aerosol research and characterization study:
part 1—overview. Journal of Air and Waste Management
Association 53, 1460–1471.
Hardle, W., 1990. Applied Nonparametric Regression.
Cambridge University Press, Cambridge.
Henry, R.C., Chang, Y.-S., Spiegelman, C.H., 2002. Location
nearby sources of air pollution by nonparametric regression
of atmospheric concentrations on wind direction. Atmo-
spheric Environment 36, 2237–2244.
Hopke, P.K., 1985. Receptor Modeling in Environmental
Chemistry. Wiley, New York.
Kim, E., Hopke, P.K., 2004a. Source apportionment of fine
particles at Washington, DC utilizing temperature resolved
carbon fractions. Journal of Air and Waste Management
Association 54, 773–785.
Kim, E., Hopke, P.K., 2004b. Improving source identification
of fine particles in a rural northeastern US area utilizing
temperature resolved carbon fractions. Journal of Geophy-
sical Research 109, D09204.
Kim, E., Hopke, P.K., Edgerton, E.S., 2003. Source identifica-
tion of atlanta aerosol by positive matrix factorization.
Journal of Air and Waste Management Association 53,
731–739.
Kim, E., Hopke, P.K., Larson, T.V., Covert, D.S., 2004a.
Analysis of ambient particle size distributions using positive
matrix factorization and unmix. Environmental Science and
Technology 38, 202–209.
Kim, E., Hopke, P.K., Edgerton, E.S., 2004b. Improving source
identification of Atlanta aerosol using temperature resolved
carbon fractions in positive matrix factorization. Atmo-
spheric Environment 38, 3349–3362.
Kim, E., Hopke, P.K., Larson, T.V., Maykut, N.N., Lewtas, J.,
2004c. Factor analysis of Seattle fine particles. Aerosol
Science and Technology, in press.
Lee, J.H., Yoshida, Y., Turpin, B.J., Hopke, P.K., Poirot, R.L.,
Lioy, P.J., Oxley, J.C., 2002. Identification of sources
contributing to mid-Atlantic regional aerosol. Journal of
Air and Waste Management Association 52, 1186–1205.
ARTICLE IN PRESSE. Kim, P.K. Hopke / Atmospheric Environment 38 (2004) 4667–4673 4673
Malm, W.C., Sisler, J.F., Huffman, D., Eldred, R.A., Cahill,
T.A., 1994. Spatial and seasonal trends in particle concen-
tration and optical extinction in the United States. Journal
of Geophysical Research 99 (D1), 1347–1370.
Miller, M.S., Friedlander, S.K., Hidy, G.M., 1972. A chemical
element balance for the Pasadena aerosol. Journal of
Colloid and Interface Science 39, 165–176.
Paatero, P., 1997. Least square formulation of robust non-
negative factor analysis. Chemometrics and Intelligent
Laboratory Systems 37, 23–35.
Paatero, P., 1999. The multilinear engine-a table driven, least
square program for solving multilinear problems, including
the n-way parallel factor analysis model. Journal of
Computational and Graphical Statistics 8 (4), 854–888.
Polissar, A.V., Hopke, P.K., Paatero, P., Malm, W.C., Sisler,
J.F., 1998. Atmospheric aerosol over Alaska 1. Elemental
composition and sources. Journal of Geophysical Research
103 (D15), 19045–19057.
Polissar, A.V., Hopke, P.K., Poirot, R.L., 2001. Atmospheric
aerosol over Vermont: chemical composition and sources.
Environmental Science and Technology 35, 4604–4621.
Ramadan, Z., Song, X.H., Hopke, P.K., 2000. Identification of
sources of Phoenix aerosol by positive matrix factorization.
Journal of Air and Waste Management Association 50,
1308–1320.
Song, X.H., Polissar, A.V., Hopke, P.K., 2001. Source of fine
particle composition in the northeastern US. Atmospheric
Environment 35, 5277–5286.
Zhou, L., Kim, E., Hopke, P.K., Stanier, C., Pandis, S., 2004.
Advanced factor analysis on Pittsburgh particle size
distribution data. Aerosol Science and Technology 38 (51),
118–132.