+ All Categories
Home > Documents > ComparisonofMachine-LearningAlgorithmsforNear-Surface...

ComparisonofMachine-LearningAlgorithmsforNear-Surface...

Date post: 07-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
14
Research Article Comparison of Machine-Learning Algorithms for Near-Surface Air-Temperature Estimation from FY-4A AGRI Data KeZhou, 1 HaileiLiu , 1,2 XiaoboDeng, 1,3 HaoWang , 1 andShenglanZhang 1 1 Key Laboratory of Atmospheric Sounding, Chengdu University of Information Technology, Chengdu 610225, China 2 National Satellite Meteorology Center, China Meteorological Administration, Beijing 100081, China 3 Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing 210044, China Correspondence should be addressed to Hailei Liu; [email protected] Received 21 April 2020; Revised 11 August 2020; Accepted 22 September 2020; Published 6 October 2020 Academic Editor: Stefania Bonafoni Copyright © 2020 Ke Zhou et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Six machine-learning approaches, including multivariate linear regression (MLR), gradient boosting decision tree, k-nearest neighbors, random forest, extreme gradient boosting (XGB), and deep neural network (DNN), were compared for near-surface air-temperature (T air ) estimation from the new generation of Chinese geostationary meteorological satellite Fengyun-4A (FY-4A) observations. e brightness temperatures in split-window channels from the Advanced Geostationary Radiation Imager (AGRI) of FY-4A and numerical weather prediction data from the global forecast system were used as the predictor variables for T air estimation. e performance of each model and the temporal and spatial distribution of the estimated T air errors were analyzed. e results showed that the XGB model had better overall performance, with R 2 of 0.902, bias of 0.087 ° C, and root-mean-square error of 1.946 ° C. e spatial variation characteristics of the T air error of the XGB method were less obvious than those of the other methods. e XGB model can provide more stable and high-precision T air for a large-scale T air estimation over China and can serve as a reference for T air estimation based on machine-learning models. 1.Introduction Air temperature (T air ) is one of the basic meteorological ob- servation parameters [1–3] and is of great concern in scientific disciplines like hydrology, meteorology, and environmental science. Furthermore, it influences most land-surface pro- cesses, such as photosynthesis and land-surface evapotrans- piration [4]. Obtaining high-resolution T air data can reduce human health risks and promote urban heat island research, so high-resolution T air information is quite crucial [5, 6]. e summer T air value in China is generally above 20 ° C, except in the high-altitude regions (e.g., Qinghai-Tibet Plateau). Sum- mer heat waves have a major impact on agricultural food production, as well as the use of water and electricity [7]. is study focuses on the issue of summer T air estimation in China using Advanced Geostationary Radiation Imager (AGRI) data. Large-scale T air data are mainly obtained by interpolation from the data collected by surface meteorological stations. However, the distribution of meteorological stations is usually uneven due to geographical factors, and some sparsely pop- ulated areas even have no meteorological observation [8]. erefore, the accuracy of the interpolated T air data is limited, and researchers are unable to obtain high-spatial-resolution T air information [9]. Meteorological satellites such as low-Earth-orbit (LEO) satellites and geostationary-Earth-orbit (GEO) satellites can provide continuous surface (i.e., land-surface temperature (LST)) and atmospheric observations with a wide spatial coverage at global and regional scales [10–12]. In the last several decades, LEO and GEO observations have been gradually applied to T air estimation with the development of meteorological satellite technology. LEO satellites can only acquire data once or twice a day for one place. In addition, cloud contamination will reduce the effective data for T air estimation [13–15]. Unlike LEO satellites, GEO meteorological satellites can continuously provide data every 15 or 30 min on Hindawi Advances in Meteorology Volume 2020, Article ID 8887364, 14 pages https://doi.org/10.1155/2020/8887364
Transcript
Page 1: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

Research ArticleComparison of Machine-Learning Algorithms for Near-SurfaceAir-Temperature Estimation from FY-4A AGRI Data

Ke Zhou1 Hailei Liu 12 Xiaobo Deng13 Hao Wang 1 and Shenglan Zhang1

1Key Laboratory of Atmospheric Sounding Chengdu University of Information Technology Chengdu 610225 China2National Satellite Meteorology Center China Meteorological Administration Beijing 100081 China3Collaborative Innovation Center on Forecast and Evaluation of Meteorological DisastersNanjing University of Information Science amp Technology Nanjing 210044 China

Correspondence should be addressed to Hailei Liu liuhaileicuiteducn

Received 21 April 2020 Revised 11 August 2020 Accepted 22 September 2020 Published 6 October 2020

Academic Editor Stefania Bonafoni

Copyright copy 2020Ke Zhou et alis is an open access article distributed under the Creative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Six machine-learning approaches including multivariate linear regression (MLR) gradient boosting decision tree k-nearestneighbors random forest extreme gradient boosting (XGB) and deep neural network (DNN) were compared for near-surfaceair-temperature (Tair) estimation from the new generation of Chinese geostationary meteorological satellite Fengyun-4A (FY-4A)observations e brightness temperatures in split-window channels from the Advanced Geostationary Radiation Imager (AGRI)of FY-4A and numerical weather prediction data from the global forecast system were used as the predictor variables for Tairestimation e performance of each model and the temporal and spatial distribution of the estimated Tair errors were analyzede results showed that the XGB model had better overall performance with R2 of 0902 bias of minus0087degC and root-mean-squareerror of 1946degCe spatial variation characteristics of the Tair error of the XGBmethod were less obvious than those of the othermethods e XGB model can provide more stable and high-precision Tair for a large-scale Tair estimation over China and canserve as a reference for Tair estimation based on machine-learning models

1 Introduction

Air temperature (Tair) is one of the basic meteorological ob-servation parameters [1ndash3] and is of great concern in scientificdisciplines like hydrology meteorology and environmentalscience Furthermore it influences most land-surface pro-cesses such as photosynthesis and land-surface evapotrans-piration [4] Obtaining high-resolution Tair data can reducehuman health risks and promote urban heat island research sohigh-resolution Tair information is quite crucial [5 6] esummer Tair value in China is generally above 20degC except inthe high-altitude regions (eg Qinghai-Tibet Plateau) Sum-mer heat waves have a major impact on agricultural foodproduction as well as the use of water and electricity [7] isstudy focuses on the issue of summer Tair estimation in Chinausing Advanced Geostationary Radiation Imager (AGRI) data

Large-scale Tair data are mainly obtained by interpolationfrom the data collected by surface meteorological stations

However the distribution of meteorological stations is usuallyuneven due to geographical factors and some sparsely pop-ulated areas even have no meteorological observation [8]erefore the accuracy of the interpolated Tair data is limitedand researchers are unable to obtain high-spatial-resolutionTair information [9]

Meteorological satellites such as low-Earth-orbit (LEO)satellites and geostationary-Earth-orbit (GEO) satellites canprovide continuous surface (ie land-surface temperature(LST)) and atmospheric observations with a wide spatialcoverage at global and regional scales [10ndash12] In the lastseveral decades LEO and GEO observations have beengradually applied to Tair estimation with the development ofmeteorological satellite technology LEO satellites can onlyacquire data once or twice a day for one place In additioncloud contamination will reduce the effective data for Tairestimation [13ndash15] Unlike LEO satellites GEOmeteorologicalsatellites can continuously provide data every 15 or 30min on

HindawiAdvances in MeteorologyVolume 2020 Article ID 8887364 14 pageshttpsdoiorg10115520208887364

one-third of the Earthrsquos surface [16ndash20] erefore GEOsatellites comprise an effective method of obtaining high-spatial- and high-temporal-resolution Tair data in a fixed areaand have the potential to facilitate the study on the dailychange of Tair [20 21]

At present the methods for Tair estimation from satellitebrightness temperatures (BTs) and land-surface temperature(LST) product data can be divided into simple linearmultivariate linear and nonlinear approaches [21 22]Previous studies [7 23 24] have shown that machine-learning algorithms can obtain higher-accuracy Tair valuesthan those in other methods For example a machine-learning model (eg a neural network model (NN)) hashigher accuracy and the root-mean-square error (RMSE) isreduced by 129degC compared with linear models [7]

e AGRI aboard Fengyun-4A (FY-4A) has 14 spectralbands [18 20 25 26]mdashsix visiblenear-infrared (VISNIR)six infrared (IR) and two water vapor bandsmdashwith atemporal resolution of 15min for the full disk and a spatialresolution of 4 km at IR bands It provides an unprecedentedopportunity for obtaining high-precision Tair data overChina and surrounding areas

Machine-learning methods are used to estimate Tairbased on moderate-resolution imaging spectroradiometer(MODIS) data in several studies [27ndash29] However there iscurrently a lack of relevant studies on Tair estimation basedon FY-4Ae use of FY-4A data to estimate high-resolutionTair is of great significance to the study of human health andhigh-temporal- and high-spatial-resolution Tair in East AsiaIn addition there is a need for timely and high-resolutionTair data for the sustainable planning and management ofclimate-resilient cities [3]

is study aims to develop the machine-learning ap-proaches for Tair estimation using FY-4A data and comparesthe performances of different machine-learning models [iemultivariate linear regression (MLR) gradient boostingdecision tree (GBTD) k-nearest neighbors (KNN) randomforest (RF) extreme gradient boosting (XGB) and deepneural network (DNN)] in Tair estimation which to the bestof our knowledge has never been done before By comparingdifferent machine-learning algorithms a machine-learningalgorithm with good applicability for estimating Tair is se-lected e algorithm is widely applicable to meteorologicalsatellites without surface-temperature products

e remainder of this paper is organized as follows InSection 2 the study area and data used for model devel-opment are introduced and the construction of the above-listed six machine-learning models for Tair estimation isdescribed Variable importance analysis validation resultsand discussion are described in Section 3 Conclusions arepresented in Section 4

2 Materials and Methods

21 Study Area e study area is located in China andFigure 1 shows the spatial distribution of 1812 meteoro-logical stations used in this study ere is a higher altitudein the West over China than in the East and even theQinghai-Tibet Plateau has an average elevation of over

4000m [30] ere are more stations in the East areas thanin the West ones due to the uneven distribution of pop-ulation and economic development in China (Figure 1)

22Data edata used in this studymainly include FY-4AAGRI brightness temperature (BT) and L2 cloud mask dataglobal forecast system (GFS) 3 h forecast data meteoro-logical data of 1812 stations in China and other auxiliarydata (longitude latitude and Julian day)

221 Satellite Data FY-4A the new generation of Chinesegeostationary meteorological satellites was launched onDecember 11 2016 It was fixed at a position of 995degE abovethe equator As thermal infrared split-window channels the12 and 13 bands of AGRI (BT12 and BT13 respectively) aremainly used for studies of cloud aerosol and Tair estimationeir central wavelengths are 108 and 120 μm [31]

BT12 BT13 and L2 cloud mask products during Summer2018 (ie June July and August) were used e ARGI datawere selected at 3 h intervals (ie 00 03 06 09 12 15 18and 21 UTC) per day e data were downloaded from theChina National Satellite Meteorological Center (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx)

222 Meteorological Data is study selected meteoro-logical data at 3 h intervals from 1812 observation stations inChina during summer 2018 e meteorological variablesused in this study include Tair and the digital elevation model(DEM) Tair in summer 2018 ranges from minus5degC to 40degC andthe DEM of the station was between 0 and 5000m esedata were obtained from the China Meteorological DataService Center (CMDC) (httpdatacmacn)

223 Numerical Weather Prediction Data and AuxiliaryData Previous studies showed that the relationship be-tween BTs (or LST) and Tair is easily affected by surfacecharacteristics and atmospheric conditions [7 31] ere-fore the accuracy of Tair estimation was effectively improvedby adding several auxiliary parameters [32] In this studyGFS 3 h precipitable water vapor (GFS PWV) and relativehumidity (GFS RH) forecast fields data were used eforecast length of the GFS data (GFS PWV and GFS RH)used was 3 h per day and there were eight periods of data perday (ie 00 03 06 09 12 15 18 and 21 UTC)e GFS datawere interpolated according to the location and time in-formation of the AGRI pixels GFS data were obtainedthrough the US National Oceanic and Atmospheric Ad-ministration (NOAA) National Centers for EnvironmentalPrediction (httpwwwnconcepnoaagovpmbproductsgfs) Table 1 presents the temporal and spatial resolutioninformation of the data used in this study

23 Methods

231 Preparation of Training Dataset e BT12 BT13 GFSPWV GFS RH and auxiliary data were used as the inputvariables Tair was used as the response variable of the

2 Advances in Meteorology

machine-learning models (Table 1) and all data points(across space and time) were included in one model (ie theXGB model) [33] e construction of the representativetraining data was crucial to develop successful retrievalmodels using machine learning us data from June toAugustmdashexcept the 1st 10th 20th and 30th of eachmonthmdashwere collected as the original dataset and theoriginal dataset was randomly divided into a training dataset(80 971773 samples) and a test dataset (20 242944samples) with the same number of pieces of data for each bin(ie 10degC in temperature) as shown in Figure 2 For thevalidation the data that were not used for training wereselected from June to August 1st 10th 20th and 30th

232 Machine-Learning Algorithm Machine-learningmethods have been widely used in classification and re-gression in the field of remote sensing [34ndash41] In this studysix machine-learning approaches that is MLR GBTDKNN RF XGB and DNN were used for constructing Tairestimation models e flowchart of Tair estimation based onmachine-learning approaches is shown in Figure 3 L2 cloud

mask products were used to detect cloud If the data werecloudless FY-4A data matched both the GFS data andmeteorological station data (same space and time) and thenTair was estimated through the machine-learning models

As a simple machine-learning algorithm MLR hasusually been the basic tool for the estimation of meteo-rological parameters [42 43] Similarly as a local non-linear algorithm the prediction process of KNN isgenerally divided into two steps First when the KNNalgorithm predicts a point it searches for the k-nearestneighbors closest to the point in the training datasetSecond the mean of the target variable of the k-nearestneighbors is computed [44 45] In this studythe hyperparameters of MLR and KNN were set to defaultvalues Unlike MLR and KNN RF is an ensemble to adecision-tree-based approach for improving the predic-tion accuracy such that each tree depends on the values ofa random vector sampled independently and withthe same distribution for all trees in the forest[34 43 46ndash50] e Scikit-learn library was used forhyperparameter tuning named GridSearchCV from Py-thon to filter the hyperparameters including number of

80deg0prime0PrimeE

N

0 140 280 560 Miles

20deg0prime0Prime

N30

deg0prime0Prime

N40

deg0prime0Prime

N50

deg0prime0Prime

N

90deg0prime0PrimeE 100deg0prime0PrimeE 110deg0prime0PrimeE 120deg0prime0PrimeE 130deg0prime0PrimeE

4000 - 50003000 - 39992000 - 2999

1000 - 19990 - 999

Figure 1 Elevation of study area and meteorological sites in China Differently colored dots represent sites located at different elevations

Table 1 Temporal and spatial information of the primary data used in this study

Abbreviation Units Spatial resolution Temporal resolution SourceAGRI BTs K 4 km 15min FY-4A AGRIGFS PWV mm 05deg 3degh GFSGFS RH 05deg 3degh GFSDEM m Site mdash FY-4A AGRILongitude mdash Site mdash CMDCAGRILatitude mdash Site mdash CMDCAGRITair degC Site 1degh CMDCJulian day mdash mdash mdash mdash

Advances in Meteorology 3

All datasets

0

20000

40000

60000

80000

100000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(a)

Selected training dataset

0

20000

30000

40000

50000

60000

70000

80000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(b)

Selected test dataset

0

2500

5000

7500

10000

12500

15000

17500

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(c)

Figure 2 Histograms of training data for machine learning (a) All datasets (b) training dataset and (c) test dataset

Machine learning models(MLR RF GBTD KNN XGB DNN)

FY-4AAGRI data (BT12 BT13)

Clear sky(L2 Product)End No

Yes

Machine learning models(MLR RF GBTD KNN XGB DNN)

Estimation ofairtemperature

CMDC data (Tair Julian daylongitude latitude)

GFS data (GFS PWV GFSRH)

Figure 3 Flowchart of the Tair estimation model in this study

4 Advances in Meteorology

trees (n_estimators) minimum number of samples(min_samples_leaf ) and maximum depth of a tree(max_depth) e result of parameter selection isn_estimators 200 min_samples_leaf 50 andmax_depth 3

e principle of GBTD is to sequentially apply a clas-sification algorithm to the weighted version of the trainingdata [51 52] descending along the gradient direction of themodel loss function previously established and then per-form a weighted majority vote on the resulting classifiersequence As an improved algorithm of GBTD XGB uses alldata in each iteration which is similar to RF [53 54]erefore XGB reduces the complexity of the model andmakes the learned model simpler [35 54ndash58] In this studyfour hyperparameters in GBTD and XGB models (ien_estimators max_depth learning_rate (lr) and minimumloss reduction) required to make a further partition on a leafnode of the tree (gamma) were empirically tuned based onRMSE e optimum n_estimators gamma max_depthand lr in the two models were 500 02 5 and 01respectively

An artificial neural network (ANN) is a biologicallyinspired machine-learning method [59] Here DNN asubset of ANN with multiple hidden layers uses a fullyconnected structure which has the ability to learn time andspace relationships [60 61] It adjusts the connectionstrength through back-propagation and minimizes theprediction error by iterating between neurons [62ndash64] Eachhidden layer was tested in the DNN model at one to fivehidden layers and 5ndash200 neurons in five intervals In ad-dition some widely used optimizers (ie stochastic gradientdescent RMSProp and Adam) were tested by comparing thecalculated results In this study the hyperparameters of theDNN were set as follows batch_size 128 dropout_rate 01stop_steps 20 (if the validation-set loss function was notimproved within 20 training will be terminated) andlearning rate 0001 e optimizer chose Adam the numberof hidden layers was three and the number of hiddenneurons was 256

24 Error Analyses Four statistical factorsmdashdeterminationcoefficient (R2) RMSE MSE and mean bias (bias)mdashwereused to evaluate the accuracy of Tair estimation model asfollows

R2

1113944 Tea minus Tea( 1113857 Toa minus Toa( 11138571113960 1113961

2

1113936 Tea minus Tea( 11138572

1113944 Toa minus Toa( 11138572 (1)

RMSE

1113944

N

i1Tea minus Toa( 1113857

2

N

111397411139731113972

(2)

bias

1113944

N

i1Tea minus Toa( 1113857

N

(3)

MSE

1113944

N

i1Tea minus Toa( 1113857

2

N

(4)

where Tea is the estimated Tair Toa is the observed Tair at themeteorological stations and N is the sample size

3 Results and Discussion

In this section the results of variable importance werepresented and the performance of the six machine-learningmodels was verified e spatial distribution characteristicsof the Tair errors of each model were also analyzed

31 Variable Importance Results Correlation analysis wasperformed to analyze the linear relationship between Tair andBT12 BT13 GFS PWV GFS RH DEM longitude (LONG)latitude (LAT) and Julian day (JD) Table 2 shows thecorrelation coefficient matrix of these variables

As described in Figure 4(a) GFS PWV DEM BT12 andBT13 had a better correlation with Tair than other variablesand the R values of the four variables were 0635 minus05960459 and 0413 respectively is indicated that thesevariables played more important roles in the linear Tairestimation models However the Pearson correlation co-efficient only described the linear correlation between twovariables it could not identify the nonlinear relationshipbetween two variables erefore the variable importance ofthe RF algorithm was also analyzed (Figure 4(b)) e RFalgorithm modeled the nonlinear relationship well e GFSPWV was identified as the most important variable for Tairestimation in the RF model while the GFS RH and BT12 alsoplayed important roles than other predictors ereforePWV and RH were used as inputs to effectively improve theaccuracy of Tair estimation which was consistent with theprevious study [65]

32 Model Performance Results For evaluating the overallperformance of each model a 10-fold cross-validationmethod was used K-fold cross-validation was used formodel configuration selection When a particular value of Kwas selected (where K was 10) the datasets were randomlyand equally distributed among K groups One group wasfolded for test and the Kminus 1 group was folded for trainingIn a total of k validations the model performance wascalculated using different test folds for each validation [35]Finally the average validation results were used to evaluatethe overall performance of each model

Figure 5 illustrates the six models with different sta-tistical parameters including RMSE Bias MSE and R2e MLR model had the lowest performance of the sixmodels e variation range of RMSE Bias MSE and R2 inthe MLR model was quite wide even the range of RMSEwas 1602degCndash4487degC while the DNN model used in thisstudy had better overall performance and higher efficiencythan the other five models e DNN model showed thehighest accuracy with an average RMSE of 1736degC e

Advances in Meteorology 5

range of RMSE in the DNN model was 0852degCndash2584degCshowing good concentration and stability as presented inFigure 5(a) In addition the overall performances of theXGB and GBTD models of the remaining models wereequivalent which were better than those of the MLR KNNand RF models

33 Validation Results Model performance was used as anindicator to internally validate each model e model ac-curacy must be evaluated with a dataset that was not used fortraining or testing To validate the developed MLR RFKNN GBTD XGB and DNNmodels the observed data notused for both training and testing were utilized (validationdataset in Section 231) Figure 6 illustrates the quantitativevalidation results of the estimated Tair during the validationtime (the 1st 10th 20th and 30th of JunendashAugust 2018)Compared with the results in the test dataset the overallaccuracy of the six models on the validation dataset de-creased For example for the DNN model the RMSE of Tairusing the test dataset was 1736degC while that of the vali-dation results was 2006degC is difference may be caused byoverfitting due to the fact that the best model was not se-lected based on the final validation results [35]

e biases of the MLR RF DNN GBTD and XGBmodels were within plusmn02degC indicating no obvious overes-timation or underestimation In contrast the KNN modelshowed a larger negative bias of minus0492degC e reason thatthe KNNmodel had a larger negative bias may be that it hadpoor robustness Robustness mainly depended on thedataset and poor robustness made the model difficult todirectly apply to other cases so the KNN model had a lowbias on the test dataset and a high bias on the validationdataset

e XGB model had excellent modeling performancewith R2 of 0902 e R2 values of the GBTD and DNNmodels were 0898 and 0890 respectively and the R2 valueof the remaining three models was less than 089 Moreovercompared with the other models the XGB and GBTDmodels can repeatedly learn to generate a weighted averageof the weak learners erefore the XGB and GBTD modelsshowed a relatively better performance in the validationdataset in most sites In general the XGB model showed ahigher overall performance than the other five models on thevalidation dataset

e Tair estimation models based on satellite and nu-merical forecast data are susceptible to factors such as al-titude and surface roughness To further evaluate the

Table 2 Pearson correlation matrix for variables considered in the Tair estimation model

Tair BT12 BT13 GFS PWV GFS RH DEM LONG LAT JDTair 1000 0459 0413 0635 minus0182 minus0596 0303 minus0288 minus0383BT12 0459 1000 0995 0047 minus0256 minus0287 0199 minus0022 minus0099BT13 0413 0995 1000 0010 minus0249 minus0268 0190 0002 minus0077GFS PWV 0635 0047 0010 1000 0383 minus0585 0366 minus0463 minus0310GFS RH minus0182 minus0256 minus0249 0383 1000 minus0020 0189 minus0355 minus0046DEM minus0596 minus0287 minus0268 minus0585 minus0020 1000 minus0663 0057 0003LONG 0303 0199 0190 0366 0189 minus0663 1000 0134 minus0004LAT minus0288 minus0022 0002 minus0463 minus0355 0057 0134 1000 minus0006JD minus0383 minus0099 minus0077 minus0310 minus0046 0003 minus0004 minus0006 1000Data represent the correlation coefficient between different variables

0413

0459

ndash0182

0635

ndash0596

ndash0288

0303

ndash0383JD

LON

LAT

DEM

PWV

RH

BT12

BT13

ndash08 ndash06 ndash04 ndash02 0 02 04 06 08 10ndash10Correlation coefficient

(a)

0012

0165

0165

0444

0080

0048

0021

0065JD

LON

LAT

DEM

PWV

RH

BT12

BT13

02 04 06 08 100Attribute usage

(b)

Figure 4 (a) Relative variable importance identified by Pearson correlation coefficient and (b) RF variable importance

6 Advances in Meteorology

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 2: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

one-third of the Earthrsquos surface [16ndash20] erefore GEOsatellites comprise an effective method of obtaining high-spatial- and high-temporal-resolution Tair data in a fixed areaand have the potential to facilitate the study on the dailychange of Tair [20 21]

At present the methods for Tair estimation from satellitebrightness temperatures (BTs) and land-surface temperature(LST) product data can be divided into simple linearmultivariate linear and nonlinear approaches [21 22]Previous studies [7 23 24] have shown that machine-learning algorithms can obtain higher-accuracy Tair valuesthan those in other methods For example a machine-learning model (eg a neural network model (NN)) hashigher accuracy and the root-mean-square error (RMSE) isreduced by 129degC compared with linear models [7]

e AGRI aboard Fengyun-4A (FY-4A) has 14 spectralbands [18 20 25 26]mdashsix visiblenear-infrared (VISNIR)six infrared (IR) and two water vapor bandsmdashwith atemporal resolution of 15min for the full disk and a spatialresolution of 4 km at IR bands It provides an unprecedentedopportunity for obtaining high-precision Tair data overChina and surrounding areas

Machine-learning methods are used to estimate Tairbased on moderate-resolution imaging spectroradiometer(MODIS) data in several studies [27ndash29] However there iscurrently a lack of relevant studies on Tair estimation basedon FY-4Ae use of FY-4A data to estimate high-resolutionTair is of great significance to the study of human health andhigh-temporal- and high-spatial-resolution Tair in East AsiaIn addition there is a need for timely and high-resolutionTair data for the sustainable planning and management ofclimate-resilient cities [3]

is study aims to develop the machine-learning ap-proaches for Tair estimation using FY-4A data and comparesthe performances of different machine-learning models [iemultivariate linear regression (MLR) gradient boostingdecision tree (GBTD) k-nearest neighbors (KNN) randomforest (RF) extreme gradient boosting (XGB) and deepneural network (DNN)] in Tair estimation which to the bestof our knowledge has never been done before By comparingdifferent machine-learning algorithms a machine-learningalgorithm with good applicability for estimating Tair is se-lected e algorithm is widely applicable to meteorologicalsatellites without surface-temperature products

e remainder of this paper is organized as follows InSection 2 the study area and data used for model devel-opment are introduced and the construction of the above-listed six machine-learning models for Tair estimation isdescribed Variable importance analysis validation resultsand discussion are described in Section 3 Conclusions arepresented in Section 4

2 Materials and Methods

21 Study Area e study area is located in China andFigure 1 shows the spatial distribution of 1812 meteoro-logical stations used in this study ere is a higher altitudein the West over China than in the East and even theQinghai-Tibet Plateau has an average elevation of over

4000m [30] ere are more stations in the East areas thanin the West ones due to the uneven distribution of pop-ulation and economic development in China (Figure 1)

22Data edata used in this studymainly include FY-4AAGRI brightness temperature (BT) and L2 cloud mask dataglobal forecast system (GFS) 3 h forecast data meteoro-logical data of 1812 stations in China and other auxiliarydata (longitude latitude and Julian day)

221 Satellite Data FY-4A the new generation of Chinesegeostationary meteorological satellites was launched onDecember 11 2016 It was fixed at a position of 995degE abovethe equator As thermal infrared split-window channels the12 and 13 bands of AGRI (BT12 and BT13 respectively) aremainly used for studies of cloud aerosol and Tair estimationeir central wavelengths are 108 and 120 μm [31]

BT12 BT13 and L2 cloud mask products during Summer2018 (ie June July and August) were used e ARGI datawere selected at 3 h intervals (ie 00 03 06 09 12 15 18and 21 UTC) per day e data were downloaded from theChina National Satellite Meteorological Center (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx)

222 Meteorological Data is study selected meteoro-logical data at 3 h intervals from 1812 observation stations inChina during summer 2018 e meteorological variablesused in this study include Tair and the digital elevation model(DEM) Tair in summer 2018 ranges from minus5degC to 40degC andthe DEM of the station was between 0 and 5000m esedata were obtained from the China Meteorological DataService Center (CMDC) (httpdatacmacn)

223 Numerical Weather Prediction Data and AuxiliaryData Previous studies showed that the relationship be-tween BTs (or LST) and Tair is easily affected by surfacecharacteristics and atmospheric conditions [7 31] ere-fore the accuracy of Tair estimation was effectively improvedby adding several auxiliary parameters [32] In this studyGFS 3 h precipitable water vapor (GFS PWV) and relativehumidity (GFS RH) forecast fields data were used eforecast length of the GFS data (GFS PWV and GFS RH)used was 3 h per day and there were eight periods of data perday (ie 00 03 06 09 12 15 18 and 21 UTC)e GFS datawere interpolated according to the location and time in-formation of the AGRI pixels GFS data were obtainedthrough the US National Oceanic and Atmospheric Ad-ministration (NOAA) National Centers for EnvironmentalPrediction (httpwwwnconcepnoaagovpmbproductsgfs) Table 1 presents the temporal and spatial resolutioninformation of the data used in this study

23 Methods

231 Preparation of Training Dataset e BT12 BT13 GFSPWV GFS RH and auxiliary data were used as the inputvariables Tair was used as the response variable of the

2 Advances in Meteorology

machine-learning models (Table 1) and all data points(across space and time) were included in one model (ie theXGB model) [33] e construction of the representativetraining data was crucial to develop successful retrievalmodels using machine learning us data from June toAugustmdashexcept the 1st 10th 20th and 30th of eachmonthmdashwere collected as the original dataset and theoriginal dataset was randomly divided into a training dataset(80 971773 samples) and a test dataset (20 242944samples) with the same number of pieces of data for each bin(ie 10degC in temperature) as shown in Figure 2 For thevalidation the data that were not used for training wereselected from June to August 1st 10th 20th and 30th

232 Machine-Learning Algorithm Machine-learningmethods have been widely used in classification and re-gression in the field of remote sensing [34ndash41] In this studysix machine-learning approaches that is MLR GBTDKNN RF XGB and DNN were used for constructing Tairestimation models e flowchart of Tair estimation based onmachine-learning approaches is shown in Figure 3 L2 cloud

mask products were used to detect cloud If the data werecloudless FY-4A data matched both the GFS data andmeteorological station data (same space and time) and thenTair was estimated through the machine-learning models

As a simple machine-learning algorithm MLR hasusually been the basic tool for the estimation of meteo-rological parameters [42 43] Similarly as a local non-linear algorithm the prediction process of KNN isgenerally divided into two steps First when the KNNalgorithm predicts a point it searches for the k-nearestneighbors closest to the point in the training datasetSecond the mean of the target variable of the k-nearestneighbors is computed [44 45] In this studythe hyperparameters of MLR and KNN were set to defaultvalues Unlike MLR and KNN RF is an ensemble to adecision-tree-based approach for improving the predic-tion accuracy such that each tree depends on the values ofa random vector sampled independently and withthe same distribution for all trees in the forest[34 43 46ndash50] e Scikit-learn library was used forhyperparameter tuning named GridSearchCV from Py-thon to filter the hyperparameters including number of

80deg0prime0PrimeE

N

0 140 280 560 Miles

20deg0prime0Prime

N30

deg0prime0Prime

N40

deg0prime0Prime

N50

deg0prime0Prime

N

90deg0prime0PrimeE 100deg0prime0PrimeE 110deg0prime0PrimeE 120deg0prime0PrimeE 130deg0prime0PrimeE

4000 - 50003000 - 39992000 - 2999

1000 - 19990 - 999

Figure 1 Elevation of study area and meteorological sites in China Differently colored dots represent sites located at different elevations

Table 1 Temporal and spatial information of the primary data used in this study

Abbreviation Units Spatial resolution Temporal resolution SourceAGRI BTs K 4 km 15min FY-4A AGRIGFS PWV mm 05deg 3degh GFSGFS RH 05deg 3degh GFSDEM m Site mdash FY-4A AGRILongitude mdash Site mdash CMDCAGRILatitude mdash Site mdash CMDCAGRITair degC Site 1degh CMDCJulian day mdash mdash mdash mdash

Advances in Meteorology 3

All datasets

0

20000

40000

60000

80000

100000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(a)

Selected training dataset

0

20000

30000

40000

50000

60000

70000

80000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(b)

Selected test dataset

0

2500

5000

7500

10000

12500

15000

17500

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(c)

Figure 2 Histograms of training data for machine learning (a) All datasets (b) training dataset and (c) test dataset

Machine learning models(MLR RF GBTD KNN XGB DNN)

FY-4AAGRI data (BT12 BT13)

Clear sky(L2 Product)End No

Yes

Machine learning models(MLR RF GBTD KNN XGB DNN)

Estimation ofairtemperature

CMDC data (Tair Julian daylongitude latitude)

GFS data (GFS PWV GFSRH)

Figure 3 Flowchart of the Tair estimation model in this study

4 Advances in Meteorology

trees (n_estimators) minimum number of samples(min_samples_leaf ) and maximum depth of a tree(max_depth) e result of parameter selection isn_estimators 200 min_samples_leaf 50 andmax_depth 3

e principle of GBTD is to sequentially apply a clas-sification algorithm to the weighted version of the trainingdata [51 52] descending along the gradient direction of themodel loss function previously established and then per-form a weighted majority vote on the resulting classifiersequence As an improved algorithm of GBTD XGB uses alldata in each iteration which is similar to RF [53 54]erefore XGB reduces the complexity of the model andmakes the learned model simpler [35 54ndash58] In this studyfour hyperparameters in GBTD and XGB models (ien_estimators max_depth learning_rate (lr) and minimumloss reduction) required to make a further partition on a leafnode of the tree (gamma) were empirically tuned based onRMSE e optimum n_estimators gamma max_depthand lr in the two models were 500 02 5 and 01respectively

An artificial neural network (ANN) is a biologicallyinspired machine-learning method [59] Here DNN asubset of ANN with multiple hidden layers uses a fullyconnected structure which has the ability to learn time andspace relationships [60 61] It adjusts the connectionstrength through back-propagation and minimizes theprediction error by iterating between neurons [62ndash64] Eachhidden layer was tested in the DNN model at one to fivehidden layers and 5ndash200 neurons in five intervals In ad-dition some widely used optimizers (ie stochastic gradientdescent RMSProp and Adam) were tested by comparing thecalculated results In this study the hyperparameters of theDNN were set as follows batch_size 128 dropout_rate 01stop_steps 20 (if the validation-set loss function was notimproved within 20 training will be terminated) andlearning rate 0001 e optimizer chose Adam the numberof hidden layers was three and the number of hiddenneurons was 256

24 Error Analyses Four statistical factorsmdashdeterminationcoefficient (R2) RMSE MSE and mean bias (bias)mdashwereused to evaluate the accuracy of Tair estimation model asfollows

R2

1113944 Tea minus Tea( 1113857 Toa minus Toa( 11138571113960 1113961

2

1113936 Tea minus Tea( 11138572

1113944 Toa minus Toa( 11138572 (1)

RMSE

1113944

N

i1Tea minus Toa( 1113857

2

N

111397411139731113972

(2)

bias

1113944

N

i1Tea minus Toa( 1113857

N

(3)

MSE

1113944

N

i1Tea minus Toa( 1113857

2

N

(4)

where Tea is the estimated Tair Toa is the observed Tair at themeteorological stations and N is the sample size

3 Results and Discussion

In this section the results of variable importance werepresented and the performance of the six machine-learningmodels was verified e spatial distribution characteristicsof the Tair errors of each model were also analyzed

31 Variable Importance Results Correlation analysis wasperformed to analyze the linear relationship between Tair andBT12 BT13 GFS PWV GFS RH DEM longitude (LONG)latitude (LAT) and Julian day (JD) Table 2 shows thecorrelation coefficient matrix of these variables

As described in Figure 4(a) GFS PWV DEM BT12 andBT13 had a better correlation with Tair than other variablesand the R values of the four variables were 0635 minus05960459 and 0413 respectively is indicated that thesevariables played more important roles in the linear Tairestimation models However the Pearson correlation co-efficient only described the linear correlation between twovariables it could not identify the nonlinear relationshipbetween two variables erefore the variable importance ofthe RF algorithm was also analyzed (Figure 4(b)) e RFalgorithm modeled the nonlinear relationship well e GFSPWV was identified as the most important variable for Tairestimation in the RF model while the GFS RH and BT12 alsoplayed important roles than other predictors ereforePWV and RH were used as inputs to effectively improve theaccuracy of Tair estimation which was consistent with theprevious study [65]

32 Model Performance Results For evaluating the overallperformance of each model a 10-fold cross-validationmethod was used K-fold cross-validation was used formodel configuration selection When a particular value of Kwas selected (where K was 10) the datasets were randomlyand equally distributed among K groups One group wasfolded for test and the Kminus 1 group was folded for trainingIn a total of k validations the model performance wascalculated using different test folds for each validation [35]Finally the average validation results were used to evaluatethe overall performance of each model

Figure 5 illustrates the six models with different sta-tistical parameters including RMSE Bias MSE and R2e MLR model had the lowest performance of the sixmodels e variation range of RMSE Bias MSE and R2 inthe MLR model was quite wide even the range of RMSEwas 1602degCndash4487degC while the DNN model used in thisstudy had better overall performance and higher efficiencythan the other five models e DNN model showed thehighest accuracy with an average RMSE of 1736degC e

Advances in Meteorology 5

range of RMSE in the DNN model was 0852degCndash2584degCshowing good concentration and stability as presented inFigure 5(a) In addition the overall performances of theXGB and GBTD models of the remaining models wereequivalent which were better than those of the MLR KNNand RF models

33 Validation Results Model performance was used as anindicator to internally validate each model e model ac-curacy must be evaluated with a dataset that was not used fortraining or testing To validate the developed MLR RFKNN GBTD XGB and DNNmodels the observed data notused for both training and testing were utilized (validationdataset in Section 231) Figure 6 illustrates the quantitativevalidation results of the estimated Tair during the validationtime (the 1st 10th 20th and 30th of JunendashAugust 2018)Compared with the results in the test dataset the overallaccuracy of the six models on the validation dataset de-creased For example for the DNN model the RMSE of Tairusing the test dataset was 1736degC while that of the vali-dation results was 2006degC is difference may be caused byoverfitting due to the fact that the best model was not se-lected based on the final validation results [35]

e biases of the MLR RF DNN GBTD and XGBmodels were within plusmn02degC indicating no obvious overes-timation or underestimation In contrast the KNN modelshowed a larger negative bias of minus0492degC e reason thatthe KNNmodel had a larger negative bias may be that it hadpoor robustness Robustness mainly depended on thedataset and poor robustness made the model difficult todirectly apply to other cases so the KNN model had a lowbias on the test dataset and a high bias on the validationdataset

e XGB model had excellent modeling performancewith R2 of 0902 e R2 values of the GBTD and DNNmodels were 0898 and 0890 respectively and the R2 valueof the remaining three models was less than 089 Moreovercompared with the other models the XGB and GBTDmodels can repeatedly learn to generate a weighted averageof the weak learners erefore the XGB and GBTD modelsshowed a relatively better performance in the validationdataset in most sites In general the XGB model showed ahigher overall performance than the other five models on thevalidation dataset

e Tair estimation models based on satellite and nu-merical forecast data are susceptible to factors such as al-titude and surface roughness To further evaluate the

Table 2 Pearson correlation matrix for variables considered in the Tair estimation model

Tair BT12 BT13 GFS PWV GFS RH DEM LONG LAT JDTair 1000 0459 0413 0635 minus0182 minus0596 0303 minus0288 minus0383BT12 0459 1000 0995 0047 minus0256 minus0287 0199 minus0022 minus0099BT13 0413 0995 1000 0010 minus0249 minus0268 0190 0002 minus0077GFS PWV 0635 0047 0010 1000 0383 minus0585 0366 minus0463 minus0310GFS RH minus0182 minus0256 minus0249 0383 1000 minus0020 0189 minus0355 minus0046DEM minus0596 minus0287 minus0268 minus0585 minus0020 1000 minus0663 0057 0003LONG 0303 0199 0190 0366 0189 minus0663 1000 0134 minus0004LAT minus0288 minus0022 0002 minus0463 minus0355 0057 0134 1000 minus0006JD minus0383 minus0099 minus0077 minus0310 minus0046 0003 minus0004 minus0006 1000Data represent the correlation coefficient between different variables

0413

0459

ndash0182

0635

ndash0596

ndash0288

0303

ndash0383JD

LON

LAT

DEM

PWV

RH

BT12

BT13

ndash08 ndash06 ndash04 ndash02 0 02 04 06 08 10ndash10Correlation coefficient

(a)

0012

0165

0165

0444

0080

0048

0021

0065JD

LON

LAT

DEM

PWV

RH

BT12

BT13

02 04 06 08 100Attribute usage

(b)

Figure 4 (a) Relative variable importance identified by Pearson correlation coefficient and (b) RF variable importance

6 Advances in Meteorology

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 3: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

machine-learning models (Table 1) and all data points(across space and time) were included in one model (ie theXGB model) [33] e construction of the representativetraining data was crucial to develop successful retrievalmodels using machine learning us data from June toAugustmdashexcept the 1st 10th 20th and 30th of eachmonthmdashwere collected as the original dataset and theoriginal dataset was randomly divided into a training dataset(80 971773 samples) and a test dataset (20 242944samples) with the same number of pieces of data for each bin(ie 10degC in temperature) as shown in Figure 2 For thevalidation the data that were not used for training wereselected from June to August 1st 10th 20th and 30th

232 Machine-Learning Algorithm Machine-learningmethods have been widely used in classification and re-gression in the field of remote sensing [34ndash41] In this studysix machine-learning approaches that is MLR GBTDKNN RF XGB and DNN were used for constructing Tairestimation models e flowchart of Tair estimation based onmachine-learning approaches is shown in Figure 3 L2 cloud

mask products were used to detect cloud If the data werecloudless FY-4A data matched both the GFS data andmeteorological station data (same space and time) and thenTair was estimated through the machine-learning models

As a simple machine-learning algorithm MLR hasusually been the basic tool for the estimation of meteo-rological parameters [42 43] Similarly as a local non-linear algorithm the prediction process of KNN isgenerally divided into two steps First when the KNNalgorithm predicts a point it searches for the k-nearestneighbors closest to the point in the training datasetSecond the mean of the target variable of the k-nearestneighbors is computed [44 45] In this studythe hyperparameters of MLR and KNN were set to defaultvalues Unlike MLR and KNN RF is an ensemble to adecision-tree-based approach for improving the predic-tion accuracy such that each tree depends on the values ofa random vector sampled independently and withthe same distribution for all trees in the forest[34 43 46ndash50] e Scikit-learn library was used forhyperparameter tuning named GridSearchCV from Py-thon to filter the hyperparameters including number of

80deg0prime0PrimeE

N

0 140 280 560 Miles

20deg0prime0Prime

N30

deg0prime0Prime

N40

deg0prime0Prime

N50

deg0prime0Prime

N

90deg0prime0PrimeE 100deg0prime0PrimeE 110deg0prime0PrimeE 120deg0prime0PrimeE 130deg0prime0PrimeE

4000 - 50003000 - 39992000 - 2999

1000 - 19990 - 999

Figure 1 Elevation of study area and meteorological sites in China Differently colored dots represent sites located at different elevations

Table 1 Temporal and spatial information of the primary data used in this study

Abbreviation Units Spatial resolution Temporal resolution SourceAGRI BTs K 4 km 15min FY-4A AGRIGFS PWV mm 05deg 3degh GFSGFS RH 05deg 3degh GFSDEM m Site mdash FY-4A AGRILongitude mdash Site mdash CMDCAGRILatitude mdash Site mdash CMDCAGRITair degC Site 1degh CMDCJulian day mdash mdash mdash mdash

Advances in Meteorology 3

All datasets

0

20000

40000

60000

80000

100000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(a)

Selected training dataset

0

20000

30000

40000

50000

60000

70000

80000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(b)

Selected test dataset

0

2500

5000

7500

10000

12500

15000

17500

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(c)

Figure 2 Histograms of training data for machine learning (a) All datasets (b) training dataset and (c) test dataset

Machine learning models(MLR RF GBTD KNN XGB DNN)

FY-4AAGRI data (BT12 BT13)

Clear sky(L2 Product)End No

Yes

Machine learning models(MLR RF GBTD KNN XGB DNN)

Estimation ofairtemperature

CMDC data (Tair Julian daylongitude latitude)

GFS data (GFS PWV GFSRH)

Figure 3 Flowchart of the Tair estimation model in this study

4 Advances in Meteorology

trees (n_estimators) minimum number of samples(min_samples_leaf ) and maximum depth of a tree(max_depth) e result of parameter selection isn_estimators 200 min_samples_leaf 50 andmax_depth 3

e principle of GBTD is to sequentially apply a clas-sification algorithm to the weighted version of the trainingdata [51 52] descending along the gradient direction of themodel loss function previously established and then per-form a weighted majority vote on the resulting classifiersequence As an improved algorithm of GBTD XGB uses alldata in each iteration which is similar to RF [53 54]erefore XGB reduces the complexity of the model andmakes the learned model simpler [35 54ndash58] In this studyfour hyperparameters in GBTD and XGB models (ien_estimators max_depth learning_rate (lr) and minimumloss reduction) required to make a further partition on a leafnode of the tree (gamma) were empirically tuned based onRMSE e optimum n_estimators gamma max_depthand lr in the two models were 500 02 5 and 01respectively

An artificial neural network (ANN) is a biologicallyinspired machine-learning method [59] Here DNN asubset of ANN with multiple hidden layers uses a fullyconnected structure which has the ability to learn time andspace relationships [60 61] It adjusts the connectionstrength through back-propagation and minimizes theprediction error by iterating between neurons [62ndash64] Eachhidden layer was tested in the DNN model at one to fivehidden layers and 5ndash200 neurons in five intervals In ad-dition some widely used optimizers (ie stochastic gradientdescent RMSProp and Adam) were tested by comparing thecalculated results In this study the hyperparameters of theDNN were set as follows batch_size 128 dropout_rate 01stop_steps 20 (if the validation-set loss function was notimproved within 20 training will be terminated) andlearning rate 0001 e optimizer chose Adam the numberof hidden layers was three and the number of hiddenneurons was 256

24 Error Analyses Four statistical factorsmdashdeterminationcoefficient (R2) RMSE MSE and mean bias (bias)mdashwereused to evaluate the accuracy of Tair estimation model asfollows

R2

1113944 Tea minus Tea( 1113857 Toa minus Toa( 11138571113960 1113961

2

1113936 Tea minus Tea( 11138572

1113944 Toa minus Toa( 11138572 (1)

RMSE

1113944

N

i1Tea minus Toa( 1113857

2

N

111397411139731113972

(2)

bias

1113944

N

i1Tea minus Toa( 1113857

N

(3)

MSE

1113944

N

i1Tea minus Toa( 1113857

2

N

(4)

where Tea is the estimated Tair Toa is the observed Tair at themeteorological stations and N is the sample size

3 Results and Discussion

In this section the results of variable importance werepresented and the performance of the six machine-learningmodels was verified e spatial distribution characteristicsof the Tair errors of each model were also analyzed

31 Variable Importance Results Correlation analysis wasperformed to analyze the linear relationship between Tair andBT12 BT13 GFS PWV GFS RH DEM longitude (LONG)latitude (LAT) and Julian day (JD) Table 2 shows thecorrelation coefficient matrix of these variables

As described in Figure 4(a) GFS PWV DEM BT12 andBT13 had a better correlation with Tair than other variablesand the R values of the four variables were 0635 minus05960459 and 0413 respectively is indicated that thesevariables played more important roles in the linear Tairestimation models However the Pearson correlation co-efficient only described the linear correlation between twovariables it could not identify the nonlinear relationshipbetween two variables erefore the variable importance ofthe RF algorithm was also analyzed (Figure 4(b)) e RFalgorithm modeled the nonlinear relationship well e GFSPWV was identified as the most important variable for Tairestimation in the RF model while the GFS RH and BT12 alsoplayed important roles than other predictors ereforePWV and RH were used as inputs to effectively improve theaccuracy of Tair estimation which was consistent with theprevious study [65]

32 Model Performance Results For evaluating the overallperformance of each model a 10-fold cross-validationmethod was used K-fold cross-validation was used formodel configuration selection When a particular value of Kwas selected (where K was 10) the datasets were randomlyand equally distributed among K groups One group wasfolded for test and the Kminus 1 group was folded for trainingIn a total of k validations the model performance wascalculated using different test folds for each validation [35]Finally the average validation results were used to evaluatethe overall performance of each model

Figure 5 illustrates the six models with different sta-tistical parameters including RMSE Bias MSE and R2e MLR model had the lowest performance of the sixmodels e variation range of RMSE Bias MSE and R2 inthe MLR model was quite wide even the range of RMSEwas 1602degCndash4487degC while the DNN model used in thisstudy had better overall performance and higher efficiencythan the other five models e DNN model showed thehighest accuracy with an average RMSE of 1736degC e

Advances in Meteorology 5

range of RMSE in the DNN model was 0852degCndash2584degCshowing good concentration and stability as presented inFigure 5(a) In addition the overall performances of theXGB and GBTD models of the remaining models wereequivalent which were better than those of the MLR KNNand RF models

33 Validation Results Model performance was used as anindicator to internally validate each model e model ac-curacy must be evaluated with a dataset that was not used fortraining or testing To validate the developed MLR RFKNN GBTD XGB and DNNmodels the observed data notused for both training and testing were utilized (validationdataset in Section 231) Figure 6 illustrates the quantitativevalidation results of the estimated Tair during the validationtime (the 1st 10th 20th and 30th of JunendashAugust 2018)Compared with the results in the test dataset the overallaccuracy of the six models on the validation dataset de-creased For example for the DNN model the RMSE of Tairusing the test dataset was 1736degC while that of the vali-dation results was 2006degC is difference may be caused byoverfitting due to the fact that the best model was not se-lected based on the final validation results [35]

e biases of the MLR RF DNN GBTD and XGBmodels were within plusmn02degC indicating no obvious overes-timation or underestimation In contrast the KNN modelshowed a larger negative bias of minus0492degC e reason thatthe KNNmodel had a larger negative bias may be that it hadpoor robustness Robustness mainly depended on thedataset and poor robustness made the model difficult todirectly apply to other cases so the KNN model had a lowbias on the test dataset and a high bias on the validationdataset

e XGB model had excellent modeling performancewith R2 of 0902 e R2 values of the GBTD and DNNmodels were 0898 and 0890 respectively and the R2 valueof the remaining three models was less than 089 Moreovercompared with the other models the XGB and GBTDmodels can repeatedly learn to generate a weighted averageof the weak learners erefore the XGB and GBTD modelsshowed a relatively better performance in the validationdataset in most sites In general the XGB model showed ahigher overall performance than the other five models on thevalidation dataset

e Tair estimation models based on satellite and nu-merical forecast data are susceptible to factors such as al-titude and surface roughness To further evaluate the

Table 2 Pearson correlation matrix for variables considered in the Tair estimation model

Tair BT12 BT13 GFS PWV GFS RH DEM LONG LAT JDTair 1000 0459 0413 0635 minus0182 minus0596 0303 minus0288 minus0383BT12 0459 1000 0995 0047 minus0256 minus0287 0199 minus0022 minus0099BT13 0413 0995 1000 0010 minus0249 minus0268 0190 0002 minus0077GFS PWV 0635 0047 0010 1000 0383 minus0585 0366 minus0463 minus0310GFS RH minus0182 minus0256 minus0249 0383 1000 minus0020 0189 minus0355 minus0046DEM minus0596 minus0287 minus0268 minus0585 minus0020 1000 minus0663 0057 0003LONG 0303 0199 0190 0366 0189 minus0663 1000 0134 minus0004LAT minus0288 minus0022 0002 minus0463 minus0355 0057 0134 1000 minus0006JD minus0383 minus0099 minus0077 minus0310 minus0046 0003 minus0004 minus0006 1000Data represent the correlation coefficient between different variables

0413

0459

ndash0182

0635

ndash0596

ndash0288

0303

ndash0383JD

LON

LAT

DEM

PWV

RH

BT12

BT13

ndash08 ndash06 ndash04 ndash02 0 02 04 06 08 10ndash10Correlation coefficient

(a)

0012

0165

0165

0444

0080

0048

0021

0065JD

LON

LAT

DEM

PWV

RH

BT12

BT13

02 04 06 08 100Attribute usage

(b)

Figure 4 (a) Relative variable importance identified by Pearson correlation coefficient and (b) RF variable importance

6 Advances in Meteorology

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 4: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

All datasets

0

20000

40000

60000

80000

100000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(a)

Selected training dataset

0

20000

30000

40000

50000

60000

70000

80000

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(b)

Selected test dataset

0

2500

5000

7500

10000

12500

15000

17500

Num

ber o

f dat

a

0 5 10 15 20 25 30 35 40 45ndash5Temperature (degC)

(c)

Figure 2 Histograms of training data for machine learning (a) All datasets (b) training dataset and (c) test dataset

Machine learning models(MLR RF GBTD KNN XGB DNN)

FY-4AAGRI data (BT12 BT13)

Clear sky(L2 Product)End No

Yes

Machine learning models(MLR RF GBTD KNN XGB DNN)

Estimation ofairtemperature

CMDC data (Tair Julian daylongitude latitude)

GFS data (GFS PWV GFSRH)

Figure 3 Flowchart of the Tair estimation model in this study

4 Advances in Meteorology

trees (n_estimators) minimum number of samples(min_samples_leaf ) and maximum depth of a tree(max_depth) e result of parameter selection isn_estimators 200 min_samples_leaf 50 andmax_depth 3

e principle of GBTD is to sequentially apply a clas-sification algorithm to the weighted version of the trainingdata [51 52] descending along the gradient direction of themodel loss function previously established and then per-form a weighted majority vote on the resulting classifiersequence As an improved algorithm of GBTD XGB uses alldata in each iteration which is similar to RF [53 54]erefore XGB reduces the complexity of the model andmakes the learned model simpler [35 54ndash58] In this studyfour hyperparameters in GBTD and XGB models (ien_estimators max_depth learning_rate (lr) and minimumloss reduction) required to make a further partition on a leafnode of the tree (gamma) were empirically tuned based onRMSE e optimum n_estimators gamma max_depthand lr in the two models were 500 02 5 and 01respectively

An artificial neural network (ANN) is a biologicallyinspired machine-learning method [59] Here DNN asubset of ANN with multiple hidden layers uses a fullyconnected structure which has the ability to learn time andspace relationships [60 61] It adjusts the connectionstrength through back-propagation and minimizes theprediction error by iterating between neurons [62ndash64] Eachhidden layer was tested in the DNN model at one to fivehidden layers and 5ndash200 neurons in five intervals In ad-dition some widely used optimizers (ie stochastic gradientdescent RMSProp and Adam) were tested by comparing thecalculated results In this study the hyperparameters of theDNN were set as follows batch_size 128 dropout_rate 01stop_steps 20 (if the validation-set loss function was notimproved within 20 training will be terminated) andlearning rate 0001 e optimizer chose Adam the numberof hidden layers was three and the number of hiddenneurons was 256

24 Error Analyses Four statistical factorsmdashdeterminationcoefficient (R2) RMSE MSE and mean bias (bias)mdashwereused to evaluate the accuracy of Tair estimation model asfollows

R2

1113944 Tea minus Tea( 1113857 Toa minus Toa( 11138571113960 1113961

2

1113936 Tea minus Tea( 11138572

1113944 Toa minus Toa( 11138572 (1)

RMSE

1113944

N

i1Tea minus Toa( 1113857

2

N

111397411139731113972

(2)

bias

1113944

N

i1Tea minus Toa( 1113857

N

(3)

MSE

1113944

N

i1Tea minus Toa( 1113857

2

N

(4)

where Tea is the estimated Tair Toa is the observed Tair at themeteorological stations and N is the sample size

3 Results and Discussion

In this section the results of variable importance werepresented and the performance of the six machine-learningmodels was verified e spatial distribution characteristicsof the Tair errors of each model were also analyzed

31 Variable Importance Results Correlation analysis wasperformed to analyze the linear relationship between Tair andBT12 BT13 GFS PWV GFS RH DEM longitude (LONG)latitude (LAT) and Julian day (JD) Table 2 shows thecorrelation coefficient matrix of these variables

As described in Figure 4(a) GFS PWV DEM BT12 andBT13 had a better correlation with Tair than other variablesand the R values of the four variables were 0635 minus05960459 and 0413 respectively is indicated that thesevariables played more important roles in the linear Tairestimation models However the Pearson correlation co-efficient only described the linear correlation between twovariables it could not identify the nonlinear relationshipbetween two variables erefore the variable importance ofthe RF algorithm was also analyzed (Figure 4(b)) e RFalgorithm modeled the nonlinear relationship well e GFSPWV was identified as the most important variable for Tairestimation in the RF model while the GFS RH and BT12 alsoplayed important roles than other predictors ereforePWV and RH were used as inputs to effectively improve theaccuracy of Tair estimation which was consistent with theprevious study [65]

32 Model Performance Results For evaluating the overallperformance of each model a 10-fold cross-validationmethod was used K-fold cross-validation was used formodel configuration selection When a particular value of Kwas selected (where K was 10) the datasets were randomlyand equally distributed among K groups One group wasfolded for test and the Kminus 1 group was folded for trainingIn a total of k validations the model performance wascalculated using different test folds for each validation [35]Finally the average validation results were used to evaluatethe overall performance of each model

Figure 5 illustrates the six models with different sta-tistical parameters including RMSE Bias MSE and R2e MLR model had the lowest performance of the sixmodels e variation range of RMSE Bias MSE and R2 inthe MLR model was quite wide even the range of RMSEwas 1602degCndash4487degC while the DNN model used in thisstudy had better overall performance and higher efficiencythan the other five models e DNN model showed thehighest accuracy with an average RMSE of 1736degC e

Advances in Meteorology 5

range of RMSE in the DNN model was 0852degCndash2584degCshowing good concentration and stability as presented inFigure 5(a) In addition the overall performances of theXGB and GBTD models of the remaining models wereequivalent which were better than those of the MLR KNNand RF models

33 Validation Results Model performance was used as anindicator to internally validate each model e model ac-curacy must be evaluated with a dataset that was not used fortraining or testing To validate the developed MLR RFKNN GBTD XGB and DNNmodels the observed data notused for both training and testing were utilized (validationdataset in Section 231) Figure 6 illustrates the quantitativevalidation results of the estimated Tair during the validationtime (the 1st 10th 20th and 30th of JunendashAugust 2018)Compared with the results in the test dataset the overallaccuracy of the six models on the validation dataset de-creased For example for the DNN model the RMSE of Tairusing the test dataset was 1736degC while that of the vali-dation results was 2006degC is difference may be caused byoverfitting due to the fact that the best model was not se-lected based on the final validation results [35]

e biases of the MLR RF DNN GBTD and XGBmodels were within plusmn02degC indicating no obvious overes-timation or underestimation In contrast the KNN modelshowed a larger negative bias of minus0492degC e reason thatthe KNNmodel had a larger negative bias may be that it hadpoor robustness Robustness mainly depended on thedataset and poor robustness made the model difficult todirectly apply to other cases so the KNN model had a lowbias on the test dataset and a high bias on the validationdataset

e XGB model had excellent modeling performancewith R2 of 0902 e R2 values of the GBTD and DNNmodels were 0898 and 0890 respectively and the R2 valueof the remaining three models was less than 089 Moreovercompared with the other models the XGB and GBTDmodels can repeatedly learn to generate a weighted averageof the weak learners erefore the XGB and GBTD modelsshowed a relatively better performance in the validationdataset in most sites In general the XGB model showed ahigher overall performance than the other five models on thevalidation dataset

e Tair estimation models based on satellite and nu-merical forecast data are susceptible to factors such as al-titude and surface roughness To further evaluate the

Table 2 Pearson correlation matrix for variables considered in the Tair estimation model

Tair BT12 BT13 GFS PWV GFS RH DEM LONG LAT JDTair 1000 0459 0413 0635 minus0182 minus0596 0303 minus0288 minus0383BT12 0459 1000 0995 0047 minus0256 minus0287 0199 minus0022 minus0099BT13 0413 0995 1000 0010 minus0249 minus0268 0190 0002 minus0077GFS PWV 0635 0047 0010 1000 0383 minus0585 0366 minus0463 minus0310GFS RH minus0182 minus0256 minus0249 0383 1000 minus0020 0189 minus0355 minus0046DEM minus0596 minus0287 minus0268 minus0585 minus0020 1000 minus0663 0057 0003LONG 0303 0199 0190 0366 0189 minus0663 1000 0134 minus0004LAT minus0288 minus0022 0002 minus0463 minus0355 0057 0134 1000 minus0006JD minus0383 minus0099 minus0077 minus0310 minus0046 0003 minus0004 minus0006 1000Data represent the correlation coefficient between different variables

0413

0459

ndash0182

0635

ndash0596

ndash0288

0303

ndash0383JD

LON

LAT

DEM

PWV

RH

BT12

BT13

ndash08 ndash06 ndash04 ndash02 0 02 04 06 08 10ndash10Correlation coefficient

(a)

0012

0165

0165

0444

0080

0048

0021

0065JD

LON

LAT

DEM

PWV

RH

BT12

BT13

02 04 06 08 100Attribute usage

(b)

Figure 4 (a) Relative variable importance identified by Pearson correlation coefficient and (b) RF variable importance

6 Advances in Meteorology

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 5: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

trees (n_estimators) minimum number of samples(min_samples_leaf ) and maximum depth of a tree(max_depth) e result of parameter selection isn_estimators 200 min_samples_leaf 50 andmax_depth 3

e principle of GBTD is to sequentially apply a clas-sification algorithm to the weighted version of the trainingdata [51 52] descending along the gradient direction of themodel loss function previously established and then per-form a weighted majority vote on the resulting classifiersequence As an improved algorithm of GBTD XGB uses alldata in each iteration which is similar to RF [53 54]erefore XGB reduces the complexity of the model andmakes the learned model simpler [35 54ndash58] In this studyfour hyperparameters in GBTD and XGB models (ien_estimators max_depth learning_rate (lr) and minimumloss reduction) required to make a further partition on a leafnode of the tree (gamma) were empirically tuned based onRMSE e optimum n_estimators gamma max_depthand lr in the two models were 500 02 5 and 01respectively

An artificial neural network (ANN) is a biologicallyinspired machine-learning method [59] Here DNN asubset of ANN with multiple hidden layers uses a fullyconnected structure which has the ability to learn time andspace relationships [60 61] It adjusts the connectionstrength through back-propagation and minimizes theprediction error by iterating between neurons [62ndash64] Eachhidden layer was tested in the DNN model at one to fivehidden layers and 5ndash200 neurons in five intervals In ad-dition some widely used optimizers (ie stochastic gradientdescent RMSProp and Adam) were tested by comparing thecalculated results In this study the hyperparameters of theDNN were set as follows batch_size 128 dropout_rate 01stop_steps 20 (if the validation-set loss function was notimproved within 20 training will be terminated) andlearning rate 0001 e optimizer chose Adam the numberof hidden layers was three and the number of hiddenneurons was 256

24 Error Analyses Four statistical factorsmdashdeterminationcoefficient (R2) RMSE MSE and mean bias (bias)mdashwereused to evaluate the accuracy of Tair estimation model asfollows

R2

1113944 Tea minus Tea( 1113857 Toa minus Toa( 11138571113960 1113961

2

1113936 Tea minus Tea( 11138572

1113944 Toa minus Toa( 11138572 (1)

RMSE

1113944

N

i1Tea minus Toa( 1113857

2

N

111397411139731113972

(2)

bias

1113944

N

i1Tea minus Toa( 1113857

N

(3)

MSE

1113944

N

i1Tea minus Toa( 1113857

2

N

(4)

where Tea is the estimated Tair Toa is the observed Tair at themeteorological stations and N is the sample size

3 Results and Discussion

In this section the results of variable importance werepresented and the performance of the six machine-learningmodels was verified e spatial distribution characteristicsof the Tair errors of each model were also analyzed

31 Variable Importance Results Correlation analysis wasperformed to analyze the linear relationship between Tair andBT12 BT13 GFS PWV GFS RH DEM longitude (LONG)latitude (LAT) and Julian day (JD) Table 2 shows thecorrelation coefficient matrix of these variables

As described in Figure 4(a) GFS PWV DEM BT12 andBT13 had a better correlation with Tair than other variablesand the R values of the four variables were 0635 minus05960459 and 0413 respectively is indicated that thesevariables played more important roles in the linear Tairestimation models However the Pearson correlation co-efficient only described the linear correlation between twovariables it could not identify the nonlinear relationshipbetween two variables erefore the variable importance ofthe RF algorithm was also analyzed (Figure 4(b)) e RFalgorithm modeled the nonlinear relationship well e GFSPWV was identified as the most important variable for Tairestimation in the RF model while the GFS RH and BT12 alsoplayed important roles than other predictors ereforePWV and RH were used as inputs to effectively improve theaccuracy of Tair estimation which was consistent with theprevious study [65]

32 Model Performance Results For evaluating the overallperformance of each model a 10-fold cross-validationmethod was used K-fold cross-validation was used formodel configuration selection When a particular value of Kwas selected (where K was 10) the datasets were randomlyand equally distributed among K groups One group wasfolded for test and the Kminus 1 group was folded for trainingIn a total of k validations the model performance wascalculated using different test folds for each validation [35]Finally the average validation results were used to evaluatethe overall performance of each model

Figure 5 illustrates the six models with different sta-tistical parameters including RMSE Bias MSE and R2e MLR model had the lowest performance of the sixmodels e variation range of RMSE Bias MSE and R2 inthe MLR model was quite wide even the range of RMSEwas 1602degCndash4487degC while the DNN model used in thisstudy had better overall performance and higher efficiencythan the other five models e DNN model showed thehighest accuracy with an average RMSE of 1736degC e

Advances in Meteorology 5

range of RMSE in the DNN model was 0852degCndash2584degCshowing good concentration and stability as presented inFigure 5(a) In addition the overall performances of theXGB and GBTD models of the remaining models wereequivalent which were better than those of the MLR KNNand RF models

33 Validation Results Model performance was used as anindicator to internally validate each model e model ac-curacy must be evaluated with a dataset that was not used fortraining or testing To validate the developed MLR RFKNN GBTD XGB and DNNmodels the observed data notused for both training and testing were utilized (validationdataset in Section 231) Figure 6 illustrates the quantitativevalidation results of the estimated Tair during the validationtime (the 1st 10th 20th and 30th of JunendashAugust 2018)Compared with the results in the test dataset the overallaccuracy of the six models on the validation dataset de-creased For example for the DNN model the RMSE of Tairusing the test dataset was 1736degC while that of the vali-dation results was 2006degC is difference may be caused byoverfitting due to the fact that the best model was not se-lected based on the final validation results [35]

e biases of the MLR RF DNN GBTD and XGBmodels were within plusmn02degC indicating no obvious overes-timation or underestimation In contrast the KNN modelshowed a larger negative bias of minus0492degC e reason thatthe KNNmodel had a larger negative bias may be that it hadpoor robustness Robustness mainly depended on thedataset and poor robustness made the model difficult todirectly apply to other cases so the KNN model had a lowbias on the test dataset and a high bias on the validationdataset

e XGB model had excellent modeling performancewith R2 of 0902 e R2 values of the GBTD and DNNmodels were 0898 and 0890 respectively and the R2 valueof the remaining three models was less than 089 Moreovercompared with the other models the XGB and GBTDmodels can repeatedly learn to generate a weighted averageof the weak learners erefore the XGB and GBTD modelsshowed a relatively better performance in the validationdataset in most sites In general the XGB model showed ahigher overall performance than the other five models on thevalidation dataset

e Tair estimation models based on satellite and nu-merical forecast data are susceptible to factors such as al-titude and surface roughness To further evaluate the

Table 2 Pearson correlation matrix for variables considered in the Tair estimation model

Tair BT12 BT13 GFS PWV GFS RH DEM LONG LAT JDTair 1000 0459 0413 0635 minus0182 minus0596 0303 minus0288 minus0383BT12 0459 1000 0995 0047 minus0256 minus0287 0199 minus0022 minus0099BT13 0413 0995 1000 0010 minus0249 minus0268 0190 0002 minus0077GFS PWV 0635 0047 0010 1000 0383 minus0585 0366 minus0463 minus0310GFS RH minus0182 minus0256 minus0249 0383 1000 minus0020 0189 minus0355 minus0046DEM minus0596 minus0287 minus0268 minus0585 minus0020 1000 minus0663 0057 0003LONG 0303 0199 0190 0366 0189 minus0663 1000 0134 minus0004LAT minus0288 minus0022 0002 minus0463 minus0355 0057 0134 1000 minus0006JD minus0383 minus0099 minus0077 minus0310 minus0046 0003 minus0004 minus0006 1000Data represent the correlation coefficient between different variables

0413

0459

ndash0182

0635

ndash0596

ndash0288

0303

ndash0383JD

LON

LAT

DEM

PWV

RH

BT12

BT13

ndash08 ndash06 ndash04 ndash02 0 02 04 06 08 10ndash10Correlation coefficient

(a)

0012

0165

0165

0444

0080

0048

0021

0065JD

LON

LAT

DEM

PWV

RH

BT12

BT13

02 04 06 08 100Attribute usage

(b)

Figure 4 (a) Relative variable importance identified by Pearson correlation coefficient and (b) RF variable importance

6 Advances in Meteorology

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 6: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

range of RMSE in the DNN model was 0852degCndash2584degCshowing good concentration and stability as presented inFigure 5(a) In addition the overall performances of theXGB and GBTD models of the remaining models wereequivalent which were better than those of the MLR KNNand RF models

33 Validation Results Model performance was used as anindicator to internally validate each model e model ac-curacy must be evaluated with a dataset that was not used fortraining or testing To validate the developed MLR RFKNN GBTD XGB and DNNmodels the observed data notused for both training and testing were utilized (validationdataset in Section 231) Figure 6 illustrates the quantitativevalidation results of the estimated Tair during the validationtime (the 1st 10th 20th and 30th of JunendashAugust 2018)Compared with the results in the test dataset the overallaccuracy of the six models on the validation dataset de-creased For example for the DNN model the RMSE of Tairusing the test dataset was 1736degC while that of the vali-dation results was 2006degC is difference may be caused byoverfitting due to the fact that the best model was not se-lected based on the final validation results [35]

e biases of the MLR RF DNN GBTD and XGBmodels were within plusmn02degC indicating no obvious overes-timation or underestimation In contrast the KNN modelshowed a larger negative bias of minus0492degC e reason thatthe KNNmodel had a larger negative bias may be that it hadpoor robustness Robustness mainly depended on thedataset and poor robustness made the model difficult todirectly apply to other cases so the KNN model had a lowbias on the test dataset and a high bias on the validationdataset

e XGB model had excellent modeling performancewith R2 of 0902 e R2 values of the GBTD and DNNmodels were 0898 and 0890 respectively and the R2 valueof the remaining three models was less than 089 Moreovercompared with the other models the XGB and GBTDmodels can repeatedly learn to generate a weighted averageof the weak learners erefore the XGB and GBTD modelsshowed a relatively better performance in the validationdataset in most sites In general the XGB model showed ahigher overall performance than the other five models on thevalidation dataset

e Tair estimation models based on satellite and nu-merical forecast data are susceptible to factors such as al-titude and surface roughness To further evaluate the

Table 2 Pearson correlation matrix for variables considered in the Tair estimation model

Tair BT12 BT13 GFS PWV GFS RH DEM LONG LAT JDTair 1000 0459 0413 0635 minus0182 minus0596 0303 minus0288 minus0383BT12 0459 1000 0995 0047 minus0256 minus0287 0199 minus0022 minus0099BT13 0413 0995 1000 0010 minus0249 minus0268 0190 0002 minus0077GFS PWV 0635 0047 0010 1000 0383 minus0585 0366 minus0463 minus0310GFS RH minus0182 minus0256 minus0249 0383 1000 minus0020 0189 minus0355 minus0046DEM minus0596 minus0287 minus0268 minus0585 minus0020 1000 minus0663 0057 0003LONG 0303 0199 0190 0366 0189 minus0663 1000 0134 minus0004LAT minus0288 minus0022 0002 minus0463 minus0355 0057 0134 1000 minus0006JD minus0383 minus0099 minus0077 minus0310 minus0046 0003 minus0004 minus0006 1000Data represent the correlation coefficient between different variables

0413

0459

ndash0182

0635

ndash0596

ndash0288

0303

ndash0383JD

LON

LAT

DEM

PWV

RH

BT12

BT13

ndash08 ndash06 ndash04 ndash02 0 02 04 06 08 10ndash10Correlation coefficient

(a)

0012

0165

0165

0444

0080

0048

0021

0065JD

LON

LAT

DEM

PWV

RH

BT12

BT13

02 04 06 08 100Attribute usage

(b)

Figure 4 (a) Relative variable importance identified by Pearson correlation coefficient and (b) RF variable importance

6 Advances in Meteorology

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 7: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

applicability of these models the spatial distribution of eachmeteorological observation was evaluated (Figures 7ndash9)

It can be seen that the Tair estimation errors of all modelsshowed obvious spatial distribution characteristics (Fig-ure 7) Generally the RMSE is relatively low in the easternregions (eg Guangdong Province) and high in thenorthwestern regions for each model (eg Xinjiang Prov-ince) For example the RMSE in Guangdong Province of theXGB model was approximately 12degCndash18degC while that inXinjiang Province was about 20degCndash32degC Because thenorthwestern regions have relatively wide Tair changesduring day and night high altitude and few meteorologicalobservations the accuracy difference between northwesternand eastern China is obvious Moreover the RMSE of theKNN DNN GBTD and XGB models was relatively low inthe eastern and southern regions However the MLR RFKNN and DNNmodels had a higher RMSE in northwesternChina In contrast the GBTD and XGB models had a rel-atively smaller RMSE in northwestern China because the

GBTD and XGB models can generate repeated weightedaverages to adjust the applicability of different regionsthrough repeated learning of numerous data

Furthermore Gongrsquos study (2015) [66] illustrated thatthe RMSE of GFS Tair in most eastern regions reaches15degCndash30degC and was above 35degC in the northwestern re-gions By contrast the results showed that the RMSE of Tairestimated by the DNN XGB and GBTD models was ob-viously lower than that of GFS data In the present study theRMSE of the XGB model was 10degCndash20degC in most easternregions and it was below 35degC in the northwestern regionsIn addition RMSElt 20degC accounted for 482 andRMSElt 25degC accounted for 876 in the XGB model

e six models showed the same distribution trend asshown in Figure 8 with R2 being higher in the easternregions but R2 gradually became lower as it got closer to thesouthwestern regions Compared with the central regions(eg Henan Province) the viewing zenith angle (VZA) ofARGI over the western China is larger e larger the VZA

0

1

2

3

4

5RM

SE (deg

C)

KNN GBTD RF XGB DNNMLRModel

(a)

ndash3

ndash2

ndash1

0

1

2

3

Bias

(degC)

KNN GBTD RF XGB DNNMLRModel

(b)

0

2

4

6

8

10

12

14

16

18

MSE

KNN GBTD RF XGB DNNMLRModel

(c)

00

02

04

06

08

10

12

R2

KNN GBTD RF XGB DNNMLRModel

(d)

Figure 5 Boxplots of performance evaluation for the six models (MLR KNN SVM RF XGB and DNN) using the test dataset in terms of(a) RMSE (b) Bias (c) MSE and (d) R2

Advances in Meteorology 7

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 8: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5ndash5 0 5 10 15

Station Tair (degC)20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2840degCBias = 0055degCR2 = 0791

(a)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2233degCBias = ndash0185degCR2 = 0878

(b)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 2120degCBias = ndash0492degCR2 = 0884

(c)

Estim

ated

Tai

r (degC

)

40353025201510

50

ndash5

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

400

350

300

250

200

150

100

50

0

RMSE = 2006degCBias = 0185degCR2 = 0890

(d)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1984degCBias = ndash0122degCR2 = 0898

(e)

Station Tair (degC)ndash5 0 5 10 15 20 25 30 35 40

4035302520

Estim

ated

Tai

r (degC

)

1510

50

ndash5

400

350

300

250

200

150

100

50

0

RMSE = 1946degCBias = ndash0087degCR2 = 0902

(f )

Figure 6 Two-dimensional histogram of predicted Tair data and meteorological observed Tair data based on six machine-learning models(a) MLR model (b) RF model (c) KNN model (d) DNN model (e) GBTD model (f ) XGB model

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(c)

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

05 10 15 20 25 30 35 4000Temperature (degC)

(f )

Figure 7 Spatial distribution of RMSE for six machine-learning models (a) MLR-RMSE (b) RF-RMSE (c) KNN-RMSE (d) DNN-RMSE(e) GBTD-RMSE (f ) XGB-RMSE

8 Advances in Meteorology

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 9: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

is the more the radiation reaching the sensor will be highlyaffected by the atmosphere which may cause differences inR2 of the estimated Tair value between the southwestern andcentral regions

For the MLR model the bias for all of China was largeFor the RF and KNN models relatively high negative biasexisted in southwestern China (eg Yunnan-Guizhou Pla-teau) as shown in Figure 9 is may be the relatively simplestructure of the three models mentioned above whichcannot well simulate the complex Tair changes in Chinaresulting in underfitting Besides Tair estimated by the DNNmodel was overestimated in northwestern China which wasthe reason that the RMSE in the DNNmodel was also high inthese regions In contrast the GBTD and XGB models hadrelatively low bias in northwestern China where the absolutebias ranges from 20degC to 30degC In conclusion the bias islower in the coastal areas and higher in northwestern areaswhich is mainly related to the characteristics of Summer Tairchange

Figure 10 shows the time series of RMSE for the sixmodels during the validation period e RMSE of theMLR model was significantly higher than other modelswith the RMSE ranging from 25degC to 43degC In contrastthe RMSE of the GBTD and XGB models showed arelatively lower RMSE (ie 18degCndash22degC) than that in theRF KNN and DNN models

Based on the above analysis it is expected that theXGB model can provide a more reliable and accurate Tairestimation than other models For purposes of evaluatingthe contribution of predictive factors in the XGB model

to Tair estimation BTs data (BT12 and BT13) and GFS data(GFS PWV and RH) were successively introduced (Ta-ble 3) As shown in Table 3 DEM longitude latitude andJulian day were used as input variables and the RMSE ofthe XGB model was 3003degC e accuracy of Tair esti-mation was obviously improved when BT12 and BT13 wereincluded in the model Moreover when GFS PWV andRH were added to the input variables the RMSE of theXGB model was decreased to 2164deg C indicating im-portant influences of GFS PWV and RH on the Tair es-timation ese results are understandable due to the factthat PWV and RH are the main parameters needed foratmospheric correction and LST retrieval e RMSE ofXGB model was improved by 0228deg C compared with justGFS data which were introduced when both AGRI BTsand GFS data were introduced to the input variables isindicates that both GFS data and satellite observationdata have an important role in improving the Tair esti-mation model e RMSE of Tair estimation model wasless than 20deg C when both satellite BTs and GFS data wereintroduced which was considered to be the precisionlevel of ldquoaccuraterdquo [67]

e relationship of XGB model errors with altitudeobserved Tair and VZA was analyzed Figure 11 dem-onstrates the scatter plot of the estimated Tair error withDEM Tair and VZA It can be seen that the Tair errormainly ranges from minus3degC to 3degC e results showedpositive deviation at high-altitude areas which produceda larger RMSE than low-altitude areas e model showeda positive deviation when Tair was low while exhibiting a

50degN

40degN

30degN

20degN

80degE 90degE 100degE 110degELatitude

120degE 130degE

Long

itude

01 02 03 04 05 06 07 08 09 1000

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

01 02 03 04 05 06 07 08 09 1000

(f )

Figure 8 Spatial distribution of R2 for six machine-learning models (a) MLR-R2 (b) RF-R2 (c) KNN-R2 (d) DNN-R2 (e) GBTD-R2(f ) XGB-R2

Advances in Meteorology 9

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 10: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(a)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(b)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(c)

80degE 90degE 100degE 110degELatitude

120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(d)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(e)

Latitude80degE 90degE 100degE 110degE 120degE 130degE

50degN

40degN

30degN

20degN

Long

itude

ndash25 ndash20 ndash15 ndash10 ndash05 00 05 10 15 20 25 30ndash30Temperature (degC)

(f )

Figure 9 Spatial distribution of bias for six machine-learning models (a) MLR-Bias (b) RF-Bias (c) KNN-Bias (d) DNN-Bias (e) GBTD-Bias (f ) XGB-Bias

2018

06

01

2018

06

10

2018

06

20

2018

06

30

2018

07

10

Date

RMSE between estimated Tair and observed Tair

2018

07

20

2018

07

30

2018

08

10

2018

08

20

2018

08

30

MLRRFKNN

DNNGBTDXGB

15

20

25

30

35

40

45

RMSE

(degC)

Figure 10 Time series (June to August 1st 10th 20th and 30th) of RMSE of estimated Tair for six machine-learning models

Table 3 e contribution of AGRI BTs and GFS data to the XGB Tair estimation model

Predictive factors XGB modelRMSE (degC)

DEM longitude latitude Julian day 3003degCDEM longitude latitude Julian day BT12 BT13 2376degCDEM longitude latitude Julian day GFS PWV GFS RH 2164degCDEM longitude latitude Julian day BT12 BT13 GFS PWV GFS RH 1946degC

10 Advances in Meteorology

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 11: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

negative bias for the high-air-temperature conditionerefore the model showed a larger RMSE in the lower-and higher-air-temperature conditions due to underes-timation and overestimation is is similar to the resultsof previous studies [38] Furthermore the uneven dis-tribution of stations makes the applicability of the modelin high-altitude areas poor It is worth mentioning thatthe effect of VZA on model performance is negligible asshown in Figure 11(c)

4 Conclusions

In this study six machine-learning approaches (MLR RFKNN DNN GBTD and XGB) for Tair estimation from FY-4A AGRI data in China were compared and analyzed interms of the spatial and temporal characteristics of theirperformance e validation results highlighted the highpotential of Tair estimation approaches using machinelearning and showed that the accuracy of the XGB modelwas better than that of theMLR RF KNN GBTD and DNNmodels at most sites for Tair estimation over China evalidation was performed using spatially and temporallyindependent data and hence the model performance wasconsidered to be quite reliable

is study improves on previous studies in the followingkey areas FirstTair estimationmodels were constructed basedon FY-4A AGRI data and other auxiliary data e resultsshowed that high-temporal- and high-spatial-resolution Tairvalues (RMSE lt20degC) can be obtained based on FY-4A dataAccording to the study of Vazquez [67] the level of precisiongenerally accepted as ldquoaccuraterdquo for remote-sensing-based Tairestimation is between 1degC and 2degC Second the accuracy andperformance of the six machine-learning models (MLR RFKNN XGB GBTD and DNN) were compared and analyzede results showed that the XGB model can provide morestable and high-precision Tair estimation which provides areference for Tair estimation based on machine-learningmodels Finally the accuracy of Tair estimation based onsatellite data can be effectively improved by adding a nu-merical model of Tair e experimental results showed thatonly satellite data were used for large-scale Tair estimation inChina and the RMSE of the XGB model was 2376degC but theRMSE using satellite data combined with numericallymodeled Tair data reached 1946degC

However aside from the novelties of this study thelimitation of the dataset used is the restriction to clear-skyconditions Similarly machine-learning algorithms cannotinfer beyond the range of observed Tair value If the Tair valueincreases beyond the range that cannot be observed withinthe current training period the model must be retrainedMoreover future research may explore whether addingother predictors such as distance-to-coast and vegetationinformation (normalized difference vegetation index etc)can improve the accuracy of the Tair estimation models

Data Availability

FY-4A AGRI data were downloaded from the China Na-tional Satellite Meteorological Center (NSMC) (httpsatellitensmcorgcnPortalSiteDataSatelliteaspx) eGFS data were obtained from the National Centers forEnvironmental Prediction (httpswwwnconcepnoaagovpmbproductsgfs) e meteorological station data wereaccessed at the China Meteorological Data Service Center(CMDC) (httpdatacmacn)

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (41527806) and the National KeyResearch and Development Program of China(2016YFA0600101) e authors would like to thank Na-tional Satellite Meteorological Center (NSMC) for providingFY-4A data China Meteorological Data Service Center(CMDC) for the meteorological data and National Centersfor Environmental Prediction (NCEP) for the GFS data

References

[1] M P Cresswell A P Morse M C omson andS J Connor ldquoEstimating surface air temperatures fromMeteosat land surface temperatures using an empirical solarzenith angle modelrdquo International Journal of Remote Sensingvol 20 no 6 pp 1125ndash1132 1999

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

0

50

100

150

200

250

300

DEM (m)0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

(a)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Observed Tair (degC)ndash10 ndash5 0 5 10 15 20 25 30 35 40 45

(b)

0

50

100

150

200

250

300

Erro

r (degC

)

ndash150ndash125ndash100

ndash75ndash50ndash25

00255075

100125150

Viewing zenith angle (deg)20 25 30 35 40 45 50 55 60 65

(c)

Figure 11 Scatter plot of Tair error with (a) DEM (b) observed Tair and (c) VZA for the XGB model

Advances in Meteorology 11

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 12: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

[2] L Prihodko and S N Goward ldquoEstimation of air temperaturefrom remotely sensed surface observationsrdquo Remote Sensingof Environment vol 60 no 3 pp 335ndash346 1997

[3] Z S Venter O Brousse I Esau and F Meier ldquoHyperlocalmapping of urban air temperature using remote sensing andcrowdsourced weather datardquo Remote Sensing of Environmentvol 242 no 1 Article ID 111791 2020

[4] H C Ho A Knudby Y Xu M Hodul and M AminipourildquoA comparison of urban heat islands mapped using skintemperature air temperature and apparent temperature(Humidex) for the greater Vancouver areardquo Science of theTotal Environment vol 544 pp 929ndash938 2016

[5] T R Lookingbill and D L Urban ldquoSpatial estimation of airtemperature differences for landscape-scale studies in mon-tane environmentsrdquo Agricultural amp Forest Meteorologyvol 114 no 3-4 pp 141ndash151 2003

[6] J W Hurrell and K E Trenberth ldquoSatellite versus surfaceestimates of air temperature since 1979rdquo Journal of Climatevol 9 no 9 pp 2222ndash2232 1996

[7] H Liu Q Zhou S Zhang and X Deng ldquoEstimation ofsummer air temperature over China using himawari-8 AHIand numerical weather prediction datardquo Advances in Mete-orology vol 2019 pp 1ndash10 Article ID 2385310 2019

[8] M Konda N Imasato and A Shibata ldquoA new method todetermine near-sea surface air temperature by using satellitedatardquo Journal of Geophysical Research Oceans vol 101 no C6pp 14349ndash14360 1996

[9] U Marcel ldquoComparison of satellite-derived land surfacetemperature and air temperature frommeteorological stationson the pan-arctic scalerdquo Remote Sensing vol 5 no 5pp 2348ndash2367 2013

[10] W Wagner V Naeimi K Scipal R de Jeu and J Martınez-Fernandez ldquoSoil moisture from operational meteorologicalsatellitesrdquo Hydrogeology Journal vol 15 no 1 pp 121ndash1312007

[11] C Oppenheimer ldquoReview article volcanological applicationsof meteorological satellitesrdquo International Journal of RemoteSensing vol 19 no 15 pp 2829ndash2864 1998

[12] H W Yates J D Tarpley S R Schneider D F McGinnisand R A Scofield ldquoe role of meteorological satellites inagricultural remote sensingrdquo Remote sensingof environmentvol 14 no 1ndash3 pp 219ndash233 1984

[13] C O Justice ldquoe moderate resolution imaging spectro-radiometer (MODIS) land remote sensing for global changeresearchrdquo IEEE Transactions on Geoscience amp Remote Sensingvol 36 no 4 pp 1228ndash1249 2002

[14] D A Chu ldquoRemote sensing of smoke from MODIS airbornesimulator during the SCAR-B experimentrdquo Journal of Geo-physical Research Atmospheres vol 103 no D24 Article ID31979 1998

[15] L Mohammadi N Molanian and A Heidari ldquoDetermina-tion of the best coverage area for receiver stations of LEOremote sensing satellitesrdquo in Proceedings of the 3rd Interna-tional Conference on Information and CommunicationTechnologies From lteory to Applications Damascus Syria2008

[16] K Yumimoto TM Nagao M Kikuchi et al ldquoAerosol dataassimilation using data from Himawari-8 a next-generationgeostationary meteorological satelliterdquo Geophysical ResearchLetters vol 43 no 11 pp 5886ndash5894 2016

[17] K Bessho K Date M Hayashi et al ldquoAn introduction tohimawari-89mdash Japanrsquos new-generation geostationary mete-orological satellitesrdquo Journal of the Meteorological Society ofJapan Ser II vol 94 no 2 pp 151ndash183 2016

[18] J Yang Z Zhang CWei F Lu and Q Guo ldquoIntroducing thenew generation of Chinese geostationary weather satellitesFengyun-4rdquo Bulletin of the American Meteorological Societyvol 98 no 8 pp 1637ndash1658 2017

[19] W Hui F Huang and R Liu ldquoCharacteristics of lightningsignals over the Tibetan Plateau and the capability of FY-4ALMI lightning detection in the Plateaurdquo International Journalof Remote Sensing vol 41 no 12 pp 4605ndash4625 2020

[20] S S Chu L Zhu H F Sun et al ldquoAutomated volcanic hot-spot detection based on FY-4AAGRI infrared datardquo Inter-national Journal of Remote Sensing vol 41 no 1 pp 2410ndash2438 2020

[21] X Zhang and W Jiao ldquoEstimation of land surface temper-ature using geostationary meteorological satellite datardquo Re-mote Sensing Technology amp Application vol 28 no 1 2013

[22] J Xu ldquoEstimation of near-surface air temperature fromHJ-1Bsatellite data in Northwest Chinardquo Nongye Gongcheng Xue-baotransactions of the Chinese Society of Agricultural Engi-neering vol 29 no 22 pp 145ndash153 2013

[23] X Zhu Q Zhang C-Y Xu P Sun and P Hu ldquoRecon-struction of high spatial resolution surface air temperaturedata across China A new geo-intelligent multisource data-based machine learning techniquerdquo Science of the Total En-vironment vol 665 pp 300ndash313 2019

[24] J Hrisko P Ramamurthy Y Yu P Yu and D Melecio-Vazquez ldquoUrban air temperature model using GOES-16 LSTand a diurnal regressive neural network algorithmrdquo RemoteSensing of Environment vol 237 Article ID 111495 2020

[25] Q Zhang Y Yu W Zhang T Luo and X Wang ldquoClouddetection from FY-4Arsquos geostationary interferometric infra-red sounder using machine learning approachesrdquo RemoteSensing vol 11 no 24 p 3035 2019

[26] Y Chen G Chen C Cui et al ldquoRetrieval of the verticalevolution of the cloud effective radius from the Chinese FY-4(Feng Yun 4) next-generation geostationary satellitesrdquo At-mospheric Chemistry and Physics vol 20 no 2 pp 1131ndash11452020

[27] I Kloog F Nordio B A Coull and J Schwartz ldquoPredictingspatiotemporal mean air temperature using MODIS satellitesurface temperature measurements across the NortheasternUSArdquo Remote Sensing of Environment vol 150 pp 132ndash1392014

[28] N Janatian M Sadeghi S Hossein Sanaeinejad et al ldquoAstatistical framework for estimating air temperature usingMODIS land surface temperature datardquo International Journalof Climatology vol 37 no 3 2016

[29] R Huang J-x Huang C Zhang et al ldquoSoil temperatureestimation at different depths using remotely-sensed datardquoJournal of Integrative Agriculture vol 19 no 1 pp 277ndash2902020

[30] J Y Fang and K Yoda ldquoClimate and vegetation in China (I)Changes in the altitudinal lapse rate of temperature anddistribution of sea level temperaturerdquo Ecological Researchvol 3 no 1 pp 37ndash51 1988

[31] C Du R Huazhong Q Qin S Zhao and J Meng ldquoResearchof split-window algorithm for retrieval of land surface tem-perature from Landsat 8 datardquo Journal of Geomatics vol 39pp 73ndash77 2014

[32] F Chen Y Liu Q Liu and F Qin ldquoA statistical method basedon remote sensing for the estimation of air temperature inChinardquo International Journal of Climatology vol 35 2014

[33] P S G de Mattos Neto G D C Cavalcanti P R A FirminoE G Silva and S R P Vila Nova Filho ldquoA temporal-window

12 Advances in Meteorology

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 13: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

framework for modelling and forecasting time seriesrdquoKnowledge-Based Systems vol 193 Article ID 105476 2020

[34] R Bycroft J X Leon and D Schoeman ldquoComparing randomforests and convoluted neural networks for mapping ghostcrab burrows using imagery from an unmanned aerial ve-hiclerdquo Estuarine Coastal and Shelf Science vol 224 pp 84ndash93 2019

[35] Y Lee D Han M-H Ahn J Im and S J Lee ldquoRetrieval oftotal precipitable water from himawari-8 AHI data A com-parison of random forest extreme gradient boosting anddeep neural networkrdquo Remote Sensing vol 11 no 15 p 17412019

[36] J Garcıa-Gutierrez F Martınez-Alvarez A Troncoso andJ C Riquelme ldquoA comparison of machine learning regressiontechniques for LiDAR-derived estimation of forest variablesrdquoNeurocomputing vol 167 no 1 pp 24ndash31 2015

[37] D UpretiW HuangW Kong et al ldquoA comparison of hybridmachine learning algorithms for the retrieval of wheat bio-physical variables from sentinel-2rdquo Remote Sensing vol 11no 5 p 481 2019

[38] R Li L Cui H Fu Y Meng J Li and J Guo ldquoEstimatinghigh-resolution PM1 concentration from Himawari-8 com-bining extreme gradient boosting-geographically and tem-porally weighted regression (XGBoost-GTWR)rdquo AtmosphericEnvironment vol 229 Article ID 117434 2020

[39] G Papacharalampous and H Tyralis ldquoHydrological timeseries forecasting using simple combinations Big data testingand investigations on one-year ahead river flow predictabil-ityrdquo Journal of Hydrology vol 590 Article ID 125205 2020

[40] R Perez-Chacon G Asencio-Cortes F Martınez-Alvarezand A Troncoso ldquoBig data time series forecasting based onpattern sequence similarity and its application to the elec-tricity demandrdquo Information Sciences vol 540 pp 160ndash1742020

[41] H Abdollahi ldquoA novel hybrid model for forecasting crude oilprice based on time series decompositionrdquo Applied Energyvol 267 Article ID 115035 2020

[42] R S dos Santos ldquoEstimating spatio-temporal air temperaturein London (UK) using machine learning and earth obser-vation satellite datardquo International Journal of Applied EarthObservation and Geoinformation vol 88 Article ID 1020662020

[43] H J Richardson D J Hill D R Denesiuk and L H FraserldquoA comparison of geographic datasets and fieldmeasurementsto model soil carbon using random forests and stepwise re-gressions (British Columbia Canada)rdquo GI Science amp RemoteSensing vol 54 no 4 pp 573ndash591 2017

[44] H Franco-Lopez A R Ek and M E Bauer ldquoEstimation andmapping of forest stand density volume and cover type usingthe k-nearest neighbors methodrdquo Remote Sensing of Envi-ronment vol 77 no 3 pp 251ndash274 2001

[45] R Haapanen A R Ek M E Bauer and A O FinleyldquoDelineation of forestnonforest land use classes using nearestneighbor methodsrdquo Remote Sensing of Environment vol 89no 3 pp 265ndash271 2004

[46] M Belgiu and L Dragut ldquoRandom forest in remote sensing areview of applications and future directionsrdquo ISPRS Journal ofPhotogrammetry and Remote Sensing vol 114 no 114pp 24ndash31 2016

[47] M J Cracknell and AM Reading ldquoe upside of uncertaintyIdentification of lithology contact zones from airborne geo-physics and satellite data using random forests and supportvector machinesrdquo Geophysics vol 78 no 3 2013

[48] X Ye X Yang X Xiong Y Shen M Hao and R Gu ldquoAquality control method based on an improved random forestalgorithm for surface air temperature observationsrdquo Advancesin Meteorology vol 2017 pp 1ndash15 Article ID 8601296 2017

[49] B Babar L T Luppino T Bostrom and S N AnfinsenldquoRandom forest regression for improved mapping of solarirradiance at high latitudesrdquo Solar Energy vol 198 pp 81ndash922020

[50] L V Utkin M S Kovalev and F P A Coolen ldquoImpreciseweighted extensions of random forests for classification andregressionrdquo Applied Soft Computing vol 92 Article ID106324 2020

[51] L Liu M Ji and M Buchroithner ldquoCombining partial leastsquares and the gradient-boosting method for soil propertyretrieval using visible near-infrared shortwave infraredspectrardquo Remote Sensing vol 9 no 12 p 1299 2017

[52] J Son I Jung K Park and B Han ldquoTracking-by-Segmen-tation with online gradient boosting decision treerdquo in Pro-ceedings of the International Conference on Computer VisionLas Condes Chile December 2015

[53] H Mo H Sun J Liu and S Wei ldquoDeveloping windowbehavior models for residential buildings using XGBoostalgorithmrdquo Energy and Buildings vol 205 no 15 Article ID109564 2019

[54] M H D M Ribeiro and L dos Santos Coelho ldquoEnsembleapproach based on bagging boosting and stacking for short-term prediction in agribusiness time seriesrdquo Applied SoftComputing vol 86 Article ID 105837 2020

[55] I B Mustapha and F Saeed ldquoBioactive molecule predictionusing extreme gradient boostingrdquo Molecules vol 21 no 8p 983 2016

[56] C Li ldquoPower load forecasting based on the combined modelof LSTM and XGBoostrdquo in Proceedings of the InternationalConference on Pattern Recognition Wenzhou China June2019

[57] Z-Y Chen T-H Zhang R Zhang et al ldquoExtreme gradientboosting model to estimate PM25 concentrations withmissing-filled satellite data in Chinardquo Atmospheric Environ-ment vol 202 pp 180ndash189 2019

[58] S Zhao D Zeng W Wang et al ldquoMutation grey wolf elitePSO balanced XGBoost for radar emitter individual identi-fication based on measured signalsrdquo Measurement vol 159Article ID 107777 2020

[59] S J Lee M-H Ahn and Y Lee ldquoApplication of an artificialneural network for a direct estimation of atmospheric in-stability from a next-generation imagerrdquo Advances in At-mospheric Sciences vol 33 no 2 pp 221ndash232 2016

[60] J Tang C Deng G-B Huang and B Zhao ldquoCompressed-domain ship detection on spaceborne optical image usingdeep neural network and extreme learning machinerdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 3pp 1174ndash1185 2015

[61] G T Ribeiro V C Mariani and L Coelho ldquoEnhancedensemble structures using wavelet neural networks applied toshort-term load forecastingrdquo Engineering Applications ofArtificial Intelligence vol 82 pp 272ndash281 2019

[62] W Wang X Sun R Zhang Z Li Z Zhu and H Su ldquoMulti-layer perceptron neural network based algorithm for esti-mating precipitable water vapour from MODIS NIR datardquoInternational Journal of Remote Sensing vol 27 no 3pp 617ndash621 2006

[63] K I Chronopoulos I X Tsiros I F Dimopoulos andN Alvertos ldquoAn application of artificial neural networkmodels to estimate air temperature data in areas with sparse

Advances in Meteorology 13

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology

Page 14: ComparisonofMachine-LearningAlgorithmsforNear-Surface ...downloads.hindawi.com/journals/amete/2020/8887364.pdfair) estimation from the new generation of Chinese geostationary meteorological

network of meteorological stationsrdquo Journal of EnvironmentalScience and Health Part A vol 43 no 14 pp 1752ndash17572008

[64] S Rodrigues Moreno R Gomes da Silva V Cocco Marianiand L dos Santos Coelho ldquoMulti-step wind speed forecastingbased on hybrid multi-stage decomposition model and longshort-term memory neural networkrdquo Energy Conversion andManagement vol 213 Article ID 112869 2020

[65] A Bayat and S Mashhadizadeh Maleki ldquoComparison ofprecipitable water vapor derived from AIRS and SPM mea-surements and its correlation with surface temperature of 29synoptic stations over Iranrdquo Journal of Atmospheric and Solar-Terrestrial Physics vol 178 pp 24ndash31 2018

[66] W W Gong ldquoEvaluation of surface meteorological elementsfrom several numerical models in Chinardquo Climatic and En-vironmental Research vol 20 no 1 pp 53ndash62 2015

[67] D Pozo Vazquez F J Olmo Reyes and L Alados ArboledasldquoA comparative study of algorithms for estimating landsurface temperature from AVHRR Datardquo Remote Sensing ofEnvironment vol 62 no 3 pp 215ndash222 1997

14 Advances in Meteorology


Recommended