Date post: | 05-May-2023 |
Category: |
Documents |
Upload: | khangminh22 |
View: | 0 times |
Download: | 0 times |
INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING VOL. 12 NO. 1 (2020) 270-279
© Universiti Tun Hussein Onn Malaysia Publisher’s Office
IJIE
Journal homepage: http://penerbit.uthm.edu.my/ojs/index.php/ijie
The International
Journal of
Integrated
Engineering
ISSN : 2229-838X e-ISSN : 2600-7916
*Corresponding author: [email protected] 270 2020 UTHM Publisher. All rights reserved.
penerbit.uthm.edu.my/ojs/index.php/ijie
Reduction of Response Variable Influential Outliers Using
MEstimation in the Next Day Prediction of Ground-Level
Ozone Concentration
Muqhlisah Muhamad1, Ahmad Zia Ul-Saufie1*, Sayang Mohd Deni2 1Faculty of Computer and Mathematical Sciences,
Universiti Teknologi MARA, 13500 Permatang Pauh, Pulang Pinang, MALAYSIA
2Faculty of Computer and Mathematical Sciences,
Universiti Teknologi MARA, 40450 Shah Alam, Selangor, MALAYSIA
*Corresponding Author
DOI: https://doi.org/10.30880/ijie.2020.12.01.027
Received 13 March 2019; Accepted 30 October 2019; Available online 16 February 2020
Abstract: Ground-level ozone concentration (O3) is a second significant air pollutant in Malaysia after particulate matter
concentration. It is a secondary pollutant that created by photochemical reaction of primary pollutant such as volatile
organic compound (VOCs) and nitrogen oxides (NOx) under the influence of solar radiation (UVB). O3 photochemical
reactions used solar radiation with certain wavelength as the catalyst. In statistical analysis of prediction, the concentration
level of O3 contains the influential outliers due to several factors such as offense in data recording and sampling, the error
in data acquisition or data management and the damage of monitoring instrument in data recording that can lead to
misleading result or information. The objective of this study is to predict the level of O3 concentration for next day (D+1)
by using predictors of wind speed (WS), temperature (T), relative humidity (RH), nitric oxide (NO), sulphur dioxide
(SO2), nitrogen dioxide (NO2), ozone (O3) and carbon monoxide (CO) for selected urban area of Shah Alam by the method
of minimizing influential outliers from response variable using M-estimation. The influential outliers from response
variable is minimized using tuning constant approached at 95% level of efficiency. The improvement has been proved
when Fair method has minimized 5.34% influential outliers from response variable and the average accuracy of the model
is 0.5134.
Keywords: Secondary pollutant, prediction, tuning constant, concentration
1. Introduction
Pollution is a dirty substance that pollutes water, air, and land [1]. Air pollution is poisonous gases and trapped
particles that come from primary and secondary pollutants. Air pollutants that are emitted directly from a source are
considered primary pollutants, as they can be released from natural ways or human action. Secondary pollutants are those
that are not directly emitted from a source but form when primary pollutants react chemically in the atmosphere.
[2].
Malaysia has a department in monitoring air pollution and environment. The Department of Environment (DoE) is
the main department under the Ministry of Natural Resources and Environment, responsible to ensure a healthy and safe
environment for people in Malaysia [3]. Meanwhile, Alam Sekitar Malaysia Sdn. Bhd. (ASMA) is a private agency under
DoE that provides environmental solutions to the government, industries, research institutions and individuals. The DoE
Muhamad et al., International Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
271
has 65 monitoring stations and all stations record on an hourly basis the measurement of ozone (O3), particulate matter
(PM10), sulphur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2) and particulate matter (PM2.5).
The formation of secondary pollutants of O3 needs a photochemical reaction between the primary pollutants of
nitrogen oxide (NOx) and volatile organic compounds (VOCs) combined with ultraviolet sunlight (UVB) [4]. NOx refers
to the molecules of nitrogen dioxide (NO2) and nitric oxide (NO). The primary sources of NOx are emitted from motor
vehicles and combustion processes. Meanwhile, the presence of VOCs are mainly due to industries and high traffic
volume.
According to the Malaysian Department of Environment [5], the perfect combination between the conducive
atmospheric condition and the emission from motor vehicles and industrial activities will result in the formation of O3.
The burning of hydrocarbon fuels from transport, heating from homes, factories and business, manufacturing process and
power plants are the main sources of O3 [6]. O3 was also detected to be highly active from 7 am to 7 pm due to the
presence of sunlight [7].
The other contributors to the formation of O3 are CO and methane (CH4). Both air pollutants of CO and CH4 are emitted from man-made activities such as motor vehicles, landfills, power plant and industries facilities. At times, natural
sources such as, lightning, trees and soil can also contribute to the formation of O3 concentration.
In statistical analysis and air pollution, regression analysis was widely used as a tool to predict the concentrations
level of some air pollutants. This study used several independent variables to form an equation in terms to predict the
level of O3 concentrations and the equation was known as multiple linear regression. The relationship between dependent
variable and independent variables could be concluded in multiple linear regression model with several analyses [8].
However, outlier is one of common issues in developing regression model. Even a single point of outlier could distort
the regression analysis and lead to incorrect inferences [9]. Outlier is a huge different observation point from another
point of observation. In other words, the point of observation which is different to the general trend of the observation
[10]. The presence of outliers will affect the accuracy of the model prediction in the forming of regression coefficients
[11]. Every data cannot be claimed to be free from outliers [12] and most of studies did not take the assessment of outliers
into their prediction consideration. Sometimes the offense in data recording and sampling, the error in data acquisition or data management and the damage of monitoring instrument in data recording are the factors that contribute to the
formation of outliers [13].
According to Yahaya [11] and Field [14], the whole influential outlier from the data observation could be measured
using standardized residual or Cook’s distance because the influential for the overall of outliers depend on the response
variable. The aim of this study to approach M-estimation method from robust regression to minimize the number of
influential outliers from response variable of the next day prediction (D+1) of O3 concentration level in urban area of
Shah Alam.
1.1 Variable Selection
The variables used in this study were consist of ozone (O3, ppb), wind speed (WS, km/h), ambient temperature (T, oC),
relative humidity (RH, %), nitric oxide (NO: ppb), nitrogen dioxide (NO2, ppb), carbon monoxide (CO, ppb) and sulphur
dioxide (SO2, ppb) that were chose as the predictors in order to predict the level of O3 concentration level for next day (D+1). The variables are selected [15-24] by the factors that associated with high contribution to the formation of O3
concentration such as the conducive meteorological factor of temperature, wind speed and relative humidity when they
combine with the other air pollutants.
2. Methodology
The purpose of this study is to develop the next day prediction model (D+1) of O3 concentration level for Shah Alam
using robust method by M-estimation in order to minimize the influential outliers from response variable. The procedures
of this study are illustrated in Fig. 1. Before the robust regression model developed, the assessment of the outliers will be conducted to identify the influential outliers from response variable using Cook’s distance and standardized residual.
Thus, M-estimation method will be used to reduce the contamination of the influential outliers from response variable
using tuning constant in developing the prediction model. Tuning constant controls how sharp M-estimation as an outliers
detector that contaminated response variable data. Nine method from M-estimation has been introduced such as Huber,
Andrew, Bisquare, Cauchy, Fair, Talwar, Logistic, Welsch and Hampel. This method will be compared with classical
method of ordinary least square in order to determine a better model to be used in the prediction of O3 concentration level.
2.1 Site Selection
The monitoring station in Shah Alam is located at Taman Tun Dr. Ismail (TTDI) Jaya Primary School (N 3.077324o,
E 101.510323o) nearby a residential area. At the same time, this station is located at the main transportation area such as major road, highways, and airport. Besides, Shah Alam city is located between Petaling Jaya city (east) and Klang town
(west). Shah Alam station is selected due to the highest level of O3 concentration in Malaysia according to the rising
number of registered mobile vehicles in Shah Alam throughout 2003 [25] and commencing on 2003, the increased burning
of industrial waste including from hotels, commercial centres, institutions and night markets tend to produce large
emission of NOx and VOCs which are the main element of O3 formation [26].
Muhamad et al., Internatinal Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
272
Fig. 1 - Research flow
2.2 Data Acquisition
In air pollution monitoring procedures, United States Protection Agency (EPA) standardized some guidelines to
measure air pollutants and meteorological variables [27]. The air pollutant and meteorological variables were monitored
by Teledyne Ozone Analyzer Model 400A UV Absorption (O3), Teledyne Model 200A (NO and NO2), Teledyne Model
100A (SO2), Teledyne Model 300 (CO), Met One 010C Sensor (WS), Met One 062 Sensor (T) and Met One 083D Sensor (RH) [28]. The primary data was managed by Alam Sekitar Malaysia Sendirian Berhad (ASMA), which is the private
company under supervision of Department of Environmental Malaysia (DoE). The secondary data from 1st January 2002
until 31st December 2012 were obtained from Department of Environmental Malaysia (DoE).
2.3 Data Management
In this study, the hourly concentrations for each variable selected were transformed into daily 12 hours average
concentration, from 7am to 7pm because the level of O3 concentration level was suspected to be highly active during
morning and evening [29]. According to Mohammed, Ramli, and Yahaya [7], most of the areas have a large emission of
O3 formation factor from morning till evening. This study only used 12 hours average ozone concentrations from 7 am
until 7 pm to predict the next day (from 7 am until 7 pm) ozone concentrations.
Missing values is one of the problems in the process of data acquisition that may lead to the interrupted analysis. The
offence in data recording and sampling, the failure of machine and human error are the several reasons that contributed to the missing values observation [30]. One of the suitable methods in the imputation of missing values for air pollution
data is by using the mean imputation technique as suggested by [31]. The missing values will be replaced by the mean
obtained between the above value and the below value also known as mean above below method (MAB) as followed by
Noor et al. [30]. The data from 2002 until 2012 were randomized into 80% and 20% as suggested by [12] where 80% of
the data were used for training while the other 20% were used for validation.
2.4 Influential Outliers Identification
The influential point of outliers is the case where there is existing larger residual that differs substantially from the
other observations [32]. Influential outliers are any point that has a large effect on the analysis of regression. According
to Sarkar, Midi, and Rana [9], the results of the analysis will lead to incorrect inferences by the unduly influence of outliers. The change of regression coefficients after removing several data observation showed that the data before was
influenced by outliers. This study only considers the influential outliers from response variable by computing standardized
residual and Cook’s distance assessment as shown in Table 1.
Muhamad et al., International Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
273
Table 1 - The assessment of the influential outliers from response variable
2.5 M-estimation
Huber [35] introduced M-estimation as the simplest method in the detection of outliers from response variable.
Mestimation is an extension from the maximum likelihood estimation where the main principle in M-estimation is to
minimize the residual function of weighting function and the steps of M-estimation are shown as follows [36] and [37],
• Step 1: Test assumptions of ordinary least square.
• Step 2: Detect the presence of outliers in the data.
• Step 3: Calculate regression coefficient ( ) with ordinary least square.
• Step 4: Calculate initial residual value:
(1)
• Step 5: Calculate initial value of standard deviation:
(2)
• Step 6: Calculate value:
(3)
• Step 7: Calculate the weighted value of Bisquares (Tukey):
(4)
• Step 8: Calculate using weighted least square method with weighted .
At the first step, each of the regression model for O3 (D+1, D+2 and D+3) was checked to ensure the test assumptions
in ordinary least square (OLS) were satisfied. The presence of the outliers were detected using standardized residual and
Cook’s distance at step two. Then, the regression coefficient was calculated to obtain the predicted value ( ) for O3
using the method of OLS estimate. In order to obtain regression coefficients from Mestimation, the estimator for
standard deviation that we denoted as sigma is rescaled to median absolute deviation (MAD) by the factor 1.4826 at
step 5, where 1.4826 is the value when the residual is normally distributed and the sample is large.
From step 6, the proportion of the residual from the estimated scale of standard deviation ( ) was obtained to be used
in the Bisquare weighting function (step 7) where 4.685 represents the value of tuning constant of Bisquare at 95% level
of efficiency. Step 4 until 7 were repeated until the value of regression coefficient in step 8 converged. The same
steps were applied in other weighting function (Huber, Andrew, Cauchy, Fair, Talwar, Logistic, Welsch and Hampel) to
obtain the value of regression. Table 2 shows all weighting function used in M-estimation.
2.6 Performance Indicator
Performance indicators are used to evaluate the performance and the adequacy of the models. The performance
indicators (Table 3) are consists of normalized absolute error (NAE), Root Mean Square Error (RMSE), Index of
Agreement (IA) and prediction accuracy (PA). The best prediction model will be obtained by comparing the performance
indicators between the models.
Muhamad et al., Internatinal Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
274
Table 2 - Weighting function of M-estimation [38]
Note: ψ ( ) = ρ’ , = residual and a, b, c = tuning constant.
Muhamad et al., International Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
275
Table 3 - Performance indicator (PI) [39]
Note: N = Number of sample daily measurement of a selected sites, = Predicted value of one set daily data,
= Observed values of one set daily data, = Mean of the predicted values of one set daily data
= Mean of the observed values of one set daily data
3. Results and Discussion
3.1 Air Pollutants Characteristics
Shah Alam was located at a busy main transportation area and thus exposed to traffic congestion. Almost daily sees
Shah Alam polluted by 32.421 ppb of O3 concentration. The reading of O3 concentration was very high on 15th March
2003 at 97.00 ppb. This is due to the rising number of registered mobile vehicles in Shah Alam throughout 2003 [25],
and that same year saw the increased burning of industrial waste including from hotels, commercial centres, institutions
and night markets [26]. Table 4 summarizes the characteristics of O3 concentrations level for Shah Alam.
Table 4 - Summarizes the characteristics of O3 concentrations level for Shah Alam
Shah Alam O3
Mean 32.421
Median 31.250
Mode 32.421
Standard Deviation 11.441
Variance 130.907
Skewness 0.657
Kurtosis 1.030
Maximum 97.000
Percentiles 95% 52.750
Percentiles 99% 65.709
Following up to these scenes, the formation of O3 become active due to the high presence of NOx (108.330 ppb), CO
(6214.167 ppb) and VOCs from the vehicles and industrial burning respectively. The measurement of dispersion showed
that the level of O3 concentrations for Shah Alam is moderately skewed when the skewness value is 0.657) and this tended
to have a standard deviation with value 11.441. Besides, the distribution for O3 concentrations has a positive kurtosis
with value 1.030.
The percentage of outliers for O3 in Shah Alam was described using box and whisker plot. The points that were out
of the box and whisker plot indicated that the observation of the outliers. The percentage of outliers suspected for Shah
Alam and is 1.69 as shown in Fig. 2.
Muhamad et al., Internatinal Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
276
Fig. 2 - Box and Whisker plot for O3 concentration level
3.2 Ordinary Least Square
Since all of the assumption of ordinary least square estimate have been fulfilled, thus the multiple linear regression
model for next day (D+1) prediction has been developed as show in Table 4. The multiple linear regression model in
Table 5 are evaluated by performance indicators. The average error is 5.488 and the average accuracy is 0.5127.
Meanwhile, the model show that 20.56% of the total variation in O3, D+1 is explained by the regression line using the
predictors.
Table 5 - Multiple linear regression model and performance indicator
Model Description of the Model
O3, D+1 = 62.009500 + 0.258307WS - 1.100540T -
0.214758RH + 0.012961NO -0.068383SO2 + 0.069568NO2
+ 0.437523O3 + 0.001931CO
NAE = 0.244080
RMSE = 10.731951
IA = 0.571453
PA = 0.454015
R2 (100%) = 20.56
3.3 Influential Outliers Identification
The influential outliers from response variable was identifed at 172 observation after compute the assessment of
standardized residual and Cook’s distance. Furthermore, the influence of the outliers from response variable could be
illustrated using scatter plot of Cook’s distance against centered leverage value. The point of observation that was above
the line indicates the observation of influential outliers from y-direction after conducting the assessment of standardized
residual and Cook’s distance. After removing the outliers from response variable, the uncontaminated data from response
variable is remained below the horizontal line of Cook’s as in Fig. 3.
Fig. 3 - Box and Whisker plot for O3 concentration level
( a) The influential outliers from respo nse
variable ( b) After remove the influential outliers from response
variable
Muhamad et al., International Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
277
3.4 M-Estimation Models
The models developed by M-estimation use tuning constant to detect the outliers from response variable. Tuning
constant is located in the weighting function of Huber, Andrew, Bisquare, Cauchy, Fair, Logistic, Talwar, Welsch and
Hampel. The tuning constant has an important role in determining the regression coefficient in order to develop the model
in the prediction of O3 concentrations level for next day (D+1). Table 6 shows the models developed to predict the level
of O3 concentrations in Shah Alam using the nine M-estimation methods. Then, each next day (D+1) prediction model of O3 concentrations level from M-estimation was evaluated using
performance indicators. The lower error (NAE and RMSE) and the higher accuracy (IA and PA) indicated that the model
is appropriate. In order to determine a good model among these nine methods, the ranking method of performance indicators [40] was conducted where the error of NAE value and RMSE value are ranked in increasing order (1 = the
smallest value of error to 9 = the largest value of error) and the accuracy of IA value and PA value are ranked in decreasing
order (1 = the largest value of accuracy to 9 = the smallest value of accuracy). Hence, a good Mestimation method was
found by the smallest summation from ranking the performance indicators of each method as shown in Table 7. The ranking with the smallest summation (Table 8) which is the Fair method is seen as a good M-estimation model
to predict the level of O3 concentrations in Shah Alam for next day (D+1). The comparison of the result between
Mestimation (Fair) and ordinary least square method was shown in Table 9.
Table 6 - Robust regression model by m-estimation for Shah Alam next day prediction (D+1)
Method Model for Shah Alam
Huber
(1.345)
O3, D+1 = 58.34081 + 0.218359WS - 0.999551T - 0.201683RH - 0.003597NO -
0.080723SO2 + 0.071342NO2 + 0.427896O3 + 0.002304CO
Andrew
(1.3390
O3, D+1 = 59.29342 + 0.211954WS - 1.016579T - 0.206885RH - 0.008400NO -
0.080975SO2 + 0.068869NO2 + 0.424208O3 + 0.002495CO
Bisquare
(4.685)
O3, D+1 = 59.28020 + 0.211684WS - 1.01620T - 0.206799RH -0.008292NO - 0.080967SO2
+ 0.068917NO2 + 0.424289O3 + 0.002487CO
Cauchy
(2.385)
O3, D+1 = 58.50722 + 0.215841WS - 1.01235T - 0.200501RH - 0.004714NO - 0.078353SO2
+ 0.070814NO2 + 0.431530O3 + 0.002348CO
Fair
(1.400)
O3, D+1 = 58.33889 + 0.220274WS - 1.02206T - 0.197090RH -0.002412NO - 0.075385SO2
+ 0.071074NO2 + 0.438312O3 + 0.002279CO
Logistic
(1.205)
O3, D+1 = 58.35987 + 0.217172WS - 1.01374T - 0.198907RH - 0.003527NO - 0.077309SO2
+ 0.071205NO2 + 0.434015O3 + 0.002310CO
Talwar
(2.975)
O3, D+1 = 61.93210 + 0.192428WS - 1.04524T - 0.225391RH -.003655NO - 0.063148SO2 +
0.058234NO2 + 0.414953O3 + 0.002605CO
Welsch
(2.985)
O3, D+1 = 58.929651 + 0.213073WS - 1.013085T - 0.204147RH - 0.007236NO -
0.080173SO2 + 0.069647NO2 + 0.426784O3 + 0.002440CO
Hampel
(2,4 and 5)
O3, D+1 = 61.60702 + 0.235683WS -1.07250T - 0.216456RH + 0.001681NO - 0.075576SO2
+ 0.068695NO2 + 0.427674O3 + 0.002221CO
Table 7 - Performance Indicators of M-estimation for Shah Alam Next Day Prediction Model (D+1)
Method NAE RMSE IA PA
Huber 0.244177 10.755146 0.569628 0.453377
Andrew 0.244382 10.762230 0.568863 0.452901
Bisquare 0.244377 10.762060 0.568862 0.452915
Cauchy 0.244172 10.756848 0.571193 0.453215
Fair 0.244047 10.753994 0.573525 0.453260
Logistic 0.244112 10.755345 0.572023 0.453286
Talwar 0.244590 10.753717 0.566154 0.453188
Welsch 0.244302 10.760139 0.569656 0.453013
Hampel 0.244248 10.743702 0.568980 0.453509
Muhamad et al., Internatinal Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
278
Table 8 - The ranking of performance indicators of M-estimation for Shah Alam next prediction (D+1)
Method NAE RMSE IA PA Sum
Huber 4 4 5 2 15
Andrew 8 9 7 9 33
Bisquare 7 8 8 5 28
Cauchy 3 6 3 8 20
Fair 1 3 1 4 9
Logistic 2 5 2 3 12
Talwar 9 2 9 6 26
Welsch 6 7 4 7 24
Hampel 5 1 6 1 13
Table 9 - The comparison between ordinary least square method and M-estimation
Method NAE RMSE IA PA
OLS 0.244080 10.731951 0.571453 0.454015
Fair 0.244047 10.753994 0.573525 0.453260
4. Conclusion
This study has proved that robust method is better than ordinary least square method since the influential outliers in
air pollution data have been reduced using the weightage approached. The average accuracy of M-estimation by Fair
method is 0.5134 which is better than ordinary least square method where the average accuracy is 0.5129. Besides, the
improvement has been proved when Fair method has minimized 5.34% influential outliers from response variable.
Therefore, these models could be implemented among heath public health, government, citizen and the other authorities
to prepare and can take an early action to avoid the negative impact of O3 concentration.
Acknowledgement
This study was funded by 600-RMI/FRGS 5/3 (40/2014). We would also like to extend our appreciation to the
Department of Environmental Malaysia (DoE) for providing the air quality data for this research.
References
[1] H. Collins, Collin COBUILD Advanced Learner's English Dictionary, Great Britain, 2006.
[2] S. Azmi, Isu Alam Sekitar di Malaysia (Ancaman Alam dan Atmosfera), Kuala Lumpur, 2007.
[3] DoE, "Department of Environment, Malaysia. Malaysia Air Quality Report 2012," Department of Environment,
Ministry of Natural Resources and Environment, Malaysia, Kuala Lumpur, 2013.
[4] DoE, "Department of Environment, Malaysia. Malaysia Environmental Quality Report 2013," Department of
Environment, Ministry of Natural Resources and Environment, Malaysia, Kuala Lumpur, 2014.
[5] DoE, "Department of Environment, Malaysia. Malaysia Environmental Quality Report 2003," Department of
Environment, Ministry of Natural Resources and Environment, Malaysia, Kuala Lumpur, 2004.
[6] R. Mozer, "Ground Level Ozone," 2008.
[7] N. I. Mohammed, N. A. Ramli and A. S. Yahaya, "Ozone Phytotoxicity Evaluation and prediction of Crops
Production in Tropical Regions," Atmospheric Environment, pp. 343-349, 2013.
[8] S. Chatterjee and A. S. Hadi, Regression Analysis by Example, Fourth ed., New Jersey: John Wiley, 2006.
[9] S. K. Sarkar, H. Midi and R. Rana, "Detection of Outliers and Influential Observations in Binary Logistic
Regression: An Empirical Study," Applied Sciences, pp. 11 (1): 26-35, 2011.
[10] W. D. Berry and S. Feldman, Multiple Regression in Practice. Quantitative Application in Social Sciences, Newbury
Park, : Sage University Paper, 1985.
[11] A. S. Yahaya, "Applied Regression Models Using SPSS," 2014.
[12] A. Z. Ul-Saufie, Future Daily Particulate Matter Concentrations Prediction Using Regression Artificial Neural
Network and Hybrid Models in Malaysia, Pulau Pinang : Universiti Sains Malaysia, 2012.
[13] J. W. Osborne and A. Overbay, "The Power of Outliers (and why researcher should always check for them),"
Practical Assessment, Research & Evaluation, 2004.
[14] A. Field, Discovering Statistics Using SPSS, Second ed., London: Sage, 2005.
Muhamad et al., International Journal of Integrated Engineering Vol. 12 No. 1 (2020) p. 270-279
279
[15] E. Agirre-Basurko, G. I. Berastegi and I. Madariaga, "Regression and Multilayer Perceptron-Based Models to
Forecast Hourly Ozone and Nitrogen Dioxides levels in the Bilbao area," Environmental Modelling & Software,
2006.
[16] M. Musa, A. A. Jemain and W. Z. Wan Zin, "Scaling and Persistence of Ozone Concentrations in Klang Valley,"
Journal of Quality Measurement and Analysis, pp. 9(1): 9-20, 2013.
[17] K. Jaioun, K. Saithanu and J. Mekparyup, "Multiple Linear Regression Model to Estimate Ozone Concentration in
Chonburi, Thailand," International Journal of Applied Environmental Sciences, pp. 4: 1305-1308, 2014.
[18] W. Wang, W. Lu, X. Wang and A. Y. Leung, "Prediction of Maximum Daily Ozone Level Using Combined Neural
Network and Statistical Characteristics," Environmental International, pp. 29: 555-562, 2003.
[19] N. A. Ghazali, N. A. Ramli, A. s. Yahaya, N. F. F. MD Yusof, N. Sansuddin and W. A. Al Madhoun ,
"Transformation of Nitrogen Dioxide into Ozone and Prediction of Ozone Concentrations Using Multiple Linear
Regression," Environ Monit Assess, pp. 165: 475-489, 2010.
[20] J.-S. Heo, K.-H. Kim and D.-S. Kim, "Pattern Recognition of high Ozone Episodes in Forecasting Daily Maximum
Ozone Levels," TAO, vol. 15, pp. 199-220, 2004.
[21] A. W. Delcloo and H. d. Backer, "Modelling Planetary Boundary Layer Ozone, Using Meteorological Parameters
at Uccle and Payerne," Atmospheric Environment, pp. 39: 5067-5077, 2005.
[22] N. A. Ramli, N. A. Ghazali and A. S. Yahaya, "Diurnal Fluctuations of Ozone Concentrations and its Precursors
and Prediction of Ozone Using Multiple Linear Regression," Malaysian Journal of Environmental Management, pp.
11(2): 57-69, 2010.
[23] N. Banan, M. T. Latif, L. Juneng and M. F. Khan, "An Application of Artificial Neural Network for the Prediction
of Surface Ozone Concentrationns in Malaysia," Springer Science, 2014.
[24] U. Schlink, M. Richter, S. Dorling, G. Nunnari, G. Cawley and E. Pelikan, "Statistical Models to Assess the Health
Effect and to Forecast Ground-Level Ozone," Environmental Modelling & Software, pp. 21: 547-558, 2006.
[25] S. H. M. Shafie and M. Mahmud, "Analisis Pola Taburan Reruang PM10 dan O3 di Lembah Klang dengan
Mengaplikasi Teknik Geographic Information System," Malaysian Journal of Society and Space, vol. 11, no. 3, pp.
61-73, 2015.
[26] DoE, Department of Environment, Malaysia. Malaysia Air Quality Report 2011, Kuala Lumpur: Department of
Environment, Ministry of Natural Resources and Environment, Malaysia, 2012.
[27] F. Ahamad, M. T. Latif, R. Tang, L. D. Juneng and H. Juahir, "Variation of Surface Ozone Exceedane Around Klang
Valley, Malaysia," Atmospheric Research, 2014.
[28] N. R. Awang, N. A. Ramli, N. I. Mohammed and A. S. Yahaya, "Times Series Evaluation of Ozone Concentrations
in Malaysia Based on Location of Monitoring Stations," International Journal of Engineering and Technology, vol.
3, 2013.
[29] N. M. Noor, A. S. Yahaya and N. A. Ramli, "Estimation of Missing Values for Air Pollution Data Using Mean
Imputation Techniques," 2008.
[30] A. S. Yahaya, N. A. Ramli and N. Fitri, "Effects of Estimating Missing Values on Fitting Distribution: International
Conference on Quantitative Sciences and Its Applications," 2005.
[31] G. Bohrnstedt and D. Knoke, "Norusis's SPSS 11 Chapter 22 on "Analyzing Residual:" Hamilton's Chapter on
"Robust Regression"," in Statistics for Social Data Analyis, 1982.
[32] D. Blatna, "Outliers in Regression," University of Economic Prague, 2005.
[33] D. L. Stevens, Sampling Design and Statistical Analysis Methods for the Integrated Biological and Physical
Monitoring of Oregon Streams, Corvalis, Oregon, 2002.
[34] P. J. Huber, Robust Regression: Asymptotics, Conjectures and Monte Carlo, Ann. Stat., 1973, pp. 1, 799-821
[35] Y. Susanti, H. Pratiwi, S. Sulistijowati H and L. Twenty, "M Estimation, S Estimation, and MM Estimation in
Robust Regression," International Journal of Pure and Applied Mathematics, 2014.
[36] C. Stuart, Robust Regression, 2011.
[37] C. Chen and G. Yin, "Computing the Efficiency and Tuning Constant for M-Estimation," Joint Statistical Meetings-
Statistical Computing Section, 2002.
[38] O. Gervasi, "Computational Science and Its Applications, Italy. Springer," 2008.
[39] N. I. Mohammed, "Developement and Assessment of New AOTX Models for Ozone Phytotoxicity Effect on Paddy
Yield Reductions Malaysia," 2012.