What’s the Matter with Nightlights?kheilman/pdfs/landsat.pdf · 2016-12-31 · What’s the...

What’s the Matter with Nightlights?

Using Landsat Imagery to Improve Nightlight-Based Measures of

Local Economic Activity with an Application to India

Ran Goldblatt†, Gordon Hanson‡, Kilian Heilmann§, Amit Khandelwal¶

December 30, 2016

Very preliminary and incomplete draft

Abstract

We propose a new remote sensing based method to quantify the spatial extent ofurbanization. We show that augmenting the commonly used night time light data withhigh resolution Landsat data can provide a better prediction for population counts atdifferent levels of aggregation and for different dwelling types. We demonstrate thisadvantage using geo-coded population data from the Census of India.

JEL classification: E01, R1, O11

Keywords: Remote sensing; Night time lights; Urbanization

1 Introduction

The analysis of satellite imagery is now a key methodology in economics and other applied

scientific research. Coming straight from impartial satellites, remotely sensed data has the

advantage of not being filtered through national data agencies that are potentially inefficient

or biased. As Donaldson and Storeygard (2016) lay out its main benefits, remote sensing

allows researchers to access to information that would otherwise difficult to obtain, often

†UCSD, School of Global Policy and Strategy.‡UCSD, Department of Economics and School of Global Policy and Strategy.§UCSD, Department of Economics¶Columbia University, Department of Economics.

1

provides high spatial resolution, and a wide (if not global) geographic coverage. Since the

marginal cost of collecting more data is low, repeated consistent samples are often available

to researchers to learn about the world. Lastly, satellite imagery ignores administrative

boundaries and can therefore be flexibly combined with other data at any geographical unit.

In future, new commercial satellite projects with continually improving spatial and temporal

coverage will only reinforce the importance of remote sensing for academic studies.

Especially the use of night time lights as a proxy for economic activity pioneered by

Henderson et al. (2012) is important to scholars in economics and other social sciences. Night

light intensity has been used to approximate economic activity as light is believed to be a

normal good that is consumed more at higher incomes. For example, Bleakley and Lin (2012)

use night time light intensity to measure economic activity in their study on the economic

persistence of defunct portage sites. Night lights have also enabled development economists

to study geographic entities at that were previously inaccessible because of insufficient data

coverage. They haven been used to measure the economic development of regions that are

either too small to provide disaggregated data or that did not provide any data at all due

to low state capacity. For example, Storeygard (2016) studies the intercity transport costs

and their impact on income of sub-Saharan African cities while Harari (2016) employs night

lights to measure shapes of urban areas in India.

The most prominent night time light products, for example the DMSP OLS produced

by the National Oceanic and Atmosphere Administration (NOAA), however have several

drawbacks that complicate economic analysis, especially in urban areas. Firstly, night time

light data is a available at a very coarse geospatial resolution only. Secondly, night time

light data is saturated at a certain threshold of light intensity. This threshold is often easily

exceeded in very bright urban cores and does not allow the night time light data to capture

any growth in the city center. Thirdly, the night time lights have a tendency to extend into

neighboring regions, the so called blooming effect (Small et al., 2005; Abrahams et al., 2016).

All these factors make analysis of night time light data in the fringes of urban areas difficult.

This is problematic as most growth in urban areas is expected to be in the periphery rather

than the urban cores.

In this paper, we propose a new methodology based on Goldblatt et al. (2016) that

measures the geospatial extent of built up areas to approximate for urban areas. This

measure operates at a finer resolution and can therefore evade part of the problems of the

night time light data. It is based on Landsat imagery that is available at the 30x30m

resolution. Besides the finer resolution, a further advantage of our method is the longer

2

temporal extent. Unlike night time light data, our Landsat imagery extends to the early

1980s and thus allows researchers to study earlier time periods.

In correlational regressions below, we show that our measure correlates highly with the

night time light data, but captures different development patterns. Comparing our measure

to the traditionally used sum of light approach, it performs better at predicting urban

population differences between highly disaggregated spatial units India and has a higher

fit

2 Methodology

Our methodology is based on combining different sources of remotely sensed satellite

imagery. In this section, we describe the data sources and the algorithm to calculate our

measure of urbanization.

2.1 Remote Sensing Data

To implement our methodology, we use remote sensing imagery from the DMSP-OLS

night light dataset and the Landsat program maintained by the United States Geological

Survey (USGS). The Landsat program dates back to the early 1970s and consists of a set of

several satellites that capture global imagery at frequent intervals. In its current satellites

(Landsat 7 and Landsat 8), it provides a spatial resolution of 30x30 meters. For this paper,

we use Landsat 7 which launched in 1999 and is still operating. Landsat 7 records eight

different spectral bands and has a temporal resolution (the time until the satellite revisits a

certain position on earth) of 16 days.

In contrast, the DMSP-OLS night time light dataset has a spatial resolution of about

1km. The night time lights require significant ex-post processing and are typically released

every year. In this paper, we use the stable lights product that removes unstable light sources

such as moonlight, clouds, and fires (Baugh et al., 2010).

We use Google Earth Engine (GEE) to access and manage the Landsat data. GEE is a

cloud-based computational platform that allows to integrate data storage and data manipu-

lation within a single framework. Coming with a JavaScript-based library of geospatial tools

3

similar to the environment of ArcGIS but not being restricted to a single computer system,

it allows to easily scale the geospatial analysis across space and time.

2.2 Algorithm to Determine Built-up Areas

Our method is based on cleaning the night light data with non-urban areas derived from

high-resolution Landsat data to arrive at a measure of urban extents. The basic idea behind

this approach is that the pure night light data is too coarse to delineate boundaries of cities

that are often drivers of economic development. The “blooming effect” of night light tends

to exaggerate urban areas even if they consist of largely undeveloped land. By delineating

urban extents, we can easily generate another remotely sensed measure of economic activity

and compare this with ground truth economic data.

In a first step, we use the night time light data and extract highly lighted areas. The

assumption is that bright areas are signs of human activity and these areas are then classified

as urban. However, due to the blooming effect, some of these areas are actually undeveloped

and just take up light from nearby areas. We therefore then “clean” the coarse night time

light pixels by using Landsat-based indicators to remove non built-up areas. We define

these non built-up areas by employing indicators are commonly used transformations of the

Landsat bands to detect water (normalized difference water index, NDWI) and vegetation

(normalized difference vegetation index, NDVI). We choose thresholds of 0.3 for NDVI and

-0.01 for NDWI to denote areas that either consist of water (lakes, rivers, reservoirs) or

vegetation (public parks, agricultural fields, wasteland).

For computational purposes, we then split the data into smaller hexagons (see Figure

3) and perform machine learning on each hexagon separately. This reduces computational

overload and takes into account regional differences of India’s different agro-climatic zones.

In specific, we classify each pixel in the hexagon as “urban” if it falls above the 99th percentile

of the light distribution and satisfies the NDVI and NDWI cutoffs noted above.

We then take a 1% sample of all pixels within the hexagon for our training dataset and

use it to train a random forest classifier with 20 decision trees. As predictors, we only use the

eight bands of Landsat 7, the indices mentioned above above, and two more simple indices

that are commonly used to predict land cover (normalized difference built-up index, NDBI)

as well as simple urban index based on normalized difference of the short-wave infrared and

near infrared bands. This allows us to solely rely on the Landsat imagery and perform all

4

of our analysis at the finer 30x30m resolution. We then predict, using the trained classifier,

the probability of being urban for each pixel within the hexagon.

Figure 1 shows an example of our method and compares it with data from the DMSP-

OLS night light dataset. It shows the vicinity of the Indian city of Hyderabad, Telangana.

In the lower panel, we depict the night light data of 2011. The city appears as a white blob

with very little variation within its center where all night light values hit the maximum value

of DN = 63. This saturation however masks a strong heterogeneity in the city center as

captured by our measure in the top panel. Depicting the probability of being built up, the

upper picture clearly distinguishes between large non-built up areas within the city, such as

Musi River and several larger bodies of water. Using the night lights, the city appears much

larger than it actually is.

Figure 2 depicts another example of the differences between the built-up area methodol-

ogy and the night lights, this time for the city of Ahmedabad, Gujarat. The red areas depict

the urban areas with a probability cutoff of 0.1 and compares it directly to the night light

data visualized by the gray shades. The blooming effect of the night light data is clearly

visible as the satellite sensors record high light intensity in outside the urban area. The

figure also demonstrates how much finer the urban extent data is compared to the coarse

night time lights.

3 Validation

In this section, we bring our new methodology to the data and compare the performance

of different remote sensing measures of economic activity in the context of India.

3.1 Study Area

We test the performance of our methodology in the context of India. India is the ideal

setting of a large and geographically diverse country that is undergoing a a rapid urban-

ization. Driven by natural population growth and a pronounced rural-to-urban migration,

urban growth has outpaced rural growth and lead to sprawl like expansion of Indian cities.

At the same time, data collection cannot keep pace with the growth of cities. While the

Indian statistical authorities are thought of as being of high quality, the size of the country

5

Figure 1: Built-up areas versus Night Time Lights

Note: Comparison of built up area (top) and night time lights (bottom) in the city of Hyderabad.

6

Figure 2: Comparison of Urban Classification versus Night Light Data

Note: Comparison of the urban classification (in red) against the DMSP-OLS night light data (black-whitescale) in the city of Ahmedabad, Gujarat. Note the much coarser resolution of the night light data.

7

Figure 3: Study Area Division

Note: Hexagons used to train classifier separately to account for regional differences and to easecomputational burden.

8

allows for only infrequent data collection and population and industry censuses are collected

only every ten years.

We study several different administrative units at different geographical resolution. In

a first step, we test our methodology to predict population data at the state and disctrict

level. We then go to a micro-scale and analyze village-level population counts for the large

state of Andhra Pradesh in its boundaries of 2011, before parts of it were split into the

new state of Telangana in 2014. Andhra Pradesh is a state on the southeastern coast of

the Indian peninsula. It is the 8th largest state of the Indian federation by area with

160,000 square kilometers. As of data from 2016, 49.4 million inhabitants reside in the

state, making it the 10th largest state by population on par with countries like South Korea,

Colombia, and Spain. Telangana adds another 110,000 square kilometers and 35.2 million.

Combined, the two state covers three distinct agro-climatic zones. Andhra Pradesh is one of

the leading producer of agricultural goods in India but also a major center for manufacturing

and information technology.

The area of Andhra Pradesh and Telangana has several desirable properties that make

it a good candidate for validating our method: Firstly, it features large variation in the type

of dwelling. As of 2011, the area was about two-third rural and one-third urban allowing us

to validate our method for different dwelling types. Secondly, it experienced rapid popula-

tion growth and added more than ten million people between 2001 and 2011. The overall

population growth of almost 14% was driven to a large part by an increase by growth in

the urban population from 19.4 to 28.2 million (a growth rate of 45%). Secondly, Andhra

Pradesh and Telangana combined have a city size distribution that covers the whole support

as that of India. The largest city in the two states is Hyderabad, which is the fifth largest

city in India behind Delhi, Mumbai, Bangalorem, and Chennai. Yet the two states feature

many small agricultural villages that allow us to test our method on the micro scale.

3.2 Census Data

The main data source are the population counts collected by the Indian census bureau.

The Indian census is conducted every decade and we use the two newest rounds that took

place in 2001 and 2011. The census provides counts of populations by age, gender, caste, and

employment status for various administrative divisions. Most importantly, it distinguishes

between urban and rural dwelling types.

9

Figure 4: Graphical Representation of Study Area

LegendState and Union Territory BoundariesSubdistricts BoundariesAndhra Pradesh

Study Area

10

The largest subdivision in India are the 35 states and union territories. We use data for

33 of them (excluding the outlying island territories of the Andaman and Nicobar Islands

as well as Lakshadweep). Each state is divided into districts. There are a total of 640 in

India in 2011 of which we use 648 of them. The smallest geographical division in India are

towns and villages. The state of Andhra Pradesh in its 2011 boundaries consists of 28,406

municipalities of which 26,748 have consistent data.

The data is kindly provided by the World Bank’s Spatial Database for South Asia (Li

et al., 2015) and comes with consistent geographical boundaries for the years of 2001 and

2011. Figure 4 provides a graphical overview of the extent of the study area and its political

subdivisions.

3.3 Comparing Built-up Areas with Night Lights

To validate our methodology with socio-economic data, we first compare it to the tradi-

tionally used night light data products and run simple correlational regressions at different

administrative levels. At first, we calculate the correlation with the sum of lights (SOL) from

the DSMP-OLS product. The sum of lights approach is the most commonly used method

to approximate economic data and simply adds us the measure of light intensity per pixel

for the entire administrative boundary. Similarly, for our measure we simply add up the

probability of being urban per pixel.

Table 1 shows the results for correlational regressions between the SOL measure and our

urban built up area count for the state as well as the district level for the whole of India,

and for the villages of Andhra Pradesh. The results show that the two measures are highly

correlated and capture similar patterns of urbanization. This is not surprising since the

night light data was used as an input feature to create the training data for the machine

learning algorithm. The pattern of correlation however is not linear in the granularity of the

data. The two measures are highly correlated at the state level (R2 = 0.758), but less so at

the district level (R2 = 0.414). The correlation then increases again at the village level in

Andhra Pradesh to R2 = 0.531. Figure 1 displays the scatterplot between the two measures

for the weakest correlation type of the district level.

11

Table 1: Dependent Variable: Probability of Built up Area

(1) (2) (3)State District Village

Sum of Lights 1.385∗∗∗ 1.280∗∗∗ 2.435∗∗∗

(0.141) (0.0599) (0.0136)

Constant 102520473.5 8118413.5∗∗∗ -98696.5∗∗∗

(107103713.1) (2084348.1) (2042.9)Observations 33 647 28264R2 0.758 0.414 0.531

Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Figure 5: Correlation between Measures

02.

00e+

084.

00e+

086.

00e+

08P

roba

bilit

y of

Bui

lt-up

Are

a

0 5.00e+07 1.00e+08 1.50e+08 2.00e+08Sum of Lights

District Observation Fitted Line

12

3.4 Predicting Population Figures

3.4.1 State and District-Level Analysis

We begin by analyzing our measure by predicting population figures at large adminis-

trative boundaries. In this subsection, we regress the population counts of Indian states

and districts on aggregate per-pixel probabilities of begin a built-up area. Tables 2-5 show

the results for the correlational regressions between population and both measures. In all

regressions, the summed probabilities of the machine learning algorithm are a statistically

significant predictor of population counts.

At the state level, our built-up measure outperform the pure night light approach when

considering total population. The R2 increases from 0.45 to 0.6 showing that the urbanized

area is a better predictor for the number of people residing in the different entities. Distin-

guishing between urban and rural areas in columns (2) and (3) of each table highlights that

this advantage is purely driven by the rural population where the urbanized area measure

(R2 = 0.508) outperforms the sum of lights approach (R2 = 0.316) significantly. For urban

areas on the state level, the two measures perform similarly well with only minor differences.

At the district level, the pattern reverses and the simple sum of lights approach out-

performs our measure at every type of dwelling. The correlation with population becomes

weaker and drops from R2 = 0.641 to R2 = 0.240 for urban population. Most noteworthy is

the poor fit of our measure with rural population counts where the R2 is reduced to a mere

0.075.

Table 2: Dependent Variable: State Population

(1) (2) (3)Total Urban Rural

Probability of Built Up Area 0.0360∗∗∗ 0.0114∗∗∗ 0.0246∗∗∗

(0.00525) (0.00153) (0.00435)

Constant 9521287.0 2842947.5 6678242.0(6388925.4) (1860710.7) (5291175.2)

Observations 33 33 33R2 0.602 0.641 0.508


13

Table 3: Dependent Variable: State Population


Sum of Lights 0.0495∗∗∗ 0.0186∗∗∗ 0.0309∗∗∗

(0.00983) (0.00230) (0.00816)

Constant 13381417.8 2657271.2 10724061.5(7491400.9) (1753115.8) (6223551.5)

Observations 33 33 33R2 0.450 0.679 0.316


Table 4: Dependent Variable: District Population



(0.00103) (0.000638) (0.000746)

Constant 1327272.3∗∗∗ 240365.0∗∗∗ 1086908.6∗∗∗

(65688.3) (40887.3) (47755.0)Observations 647 647 647R2 0.236 0.240 0.075


Table 5: Dependent Variable: District Population


Sum of Lights 0.0357∗∗∗ 0.0189∗∗∗ 0.0168∗∗∗

(0.00186) (0.00125) (0.00139)

Constant 1050602.0∗∗∗ 148112.1∗∗∗ 902510.2∗∗∗

(64832.8) (43557.1) (48435.1)Observations 647 647 647R2 0.362 0.261 0.184


14

3.4.2 Village-Level Analysis

In a next step, we go on the micro level and regress population figures on the sum

of light measure and our methodology for the 26,748 villages and towns in the state of

Andhra Pradesh. At this very disaggregated level, the advantage of the fine built-up area

approach reappears and it outperforms the pure night light approach slightly (R2 = 0.504

versus R2 = 0.454) when looking at total population. Again, the advantage is solely realized

through predictive power for urban population counts where the R2 of 0.501 is much higher

than that the sum of light method’s (R2 = 0.392).

For predicting rural population counts at the village level, both methods perform poorly.

While the R2 in the regression on the sum of light measure drops to 0.098, the R2 in our new

methodology reaches a mere 0.001 and the predictive power at the rural disaggregated level

is very weak. This suggests that the built up area in rural villages is not a good predictor

of the number of people who live there, most likely to vastly different population density in

rural areas compared to dense cities.

Table 6: Dependent Variable: Village Population



(0.000152) (0.000152) (0.0000354)

Constant 1548.1∗∗∗ -530.8∗∗∗ 2078.9∗∗∗

(71.02) (70.87) (16.53)Observations 26748 26748 26748R2 0.504 0.501 0.001


4 Conclusion

In this paper, we present a simple method to “clean” the commonly used night light data

and to detect built-up urban environments through remotely sensed satellite imagery. By

subtracting areas of water and vegetation from at a finer resolution from the coarse night

light pixels, we are able to denote urban boundaries at the 30x30m resolution. Using a

15

Table 7: Dependent Variable: Village Population


Sum of Lights 0.0798∗∗∗ 0.0737∗∗∗ 0.00607∗∗∗

(0.000535) (0.000562) (0.000113)

Constant -2307.0∗∗∗ -3984.7∗∗∗ 1677.7∗∗∗

(82.39) (86.51) (17.37)Observations 26748 26748 26748R2 0.454 0.392 0.098


machine learning algorithm, we show that sampling a small number of pixels and predicting

the built-up area using Landsat data yields a reasonable fit.

In cross-sectional regressions using census data for different administrative units in India,

we show that this measure can improve the predictive power of night light satellite imagery

to proxy for population differences for certain administrative levels. However, the advantage

is not uniform and has severe limits. Our measure performs as a better predictor for overall

population at the highest (Indian states and union territories) and the smallest (villages in

the state of Andhra Pradesh) level of spatial aggregation, and performs marginally worse at

an intermediate level of the Indian districts.

When distinguishing between urban and rural population, the two measures reverse their

advantages when going from high to low aggregation. On an absolute level, both methods

perform worse at predicting rural population counts compared to urban ones. At the state

level, our measure has a better fit for rural population at the state level. This advantage

is lost when going to the district level and is completely lost when looking at villages and

towns. Conversely, our measure starts out underperforming the sum of light approach for

urban areas at higher aggregation, but then has a clear advantage at the town and village

level, indicating that it performs best in areas of higher density.

The analysis has shown that the traditionally used night time light approach can be

improved on with higher resolution satellite imagery of other sources, but that the level

of aggregation and the dwelling type matter. Heterogeneity of light usage and residential

density translate into heterogeneity in the advantage of different remote sensing techniques.

In future work, we plan to explore the predictive power of our measure for other socio-

16

economic data such as incomes and industrial production which might have very different

properties of being detectable from sky than residence patterns.

17

References

Abrahams, Alexei, Nancy Lozano-Gracia, and Christopher Oram, “Deblurring

DMSP Nighttime Lights,” Working Paper, 2016.

Baugh, Kimberly, Christopher D. Elvidge, Tilottama Ghosh, and Daniel Ziskin,

“Development of a 2009 Stable Lights Product using DMSPOLS data,” Proceedings of the

Asia-Pacific Advanced Network, 2010, 30, 114–130.

Bleakley, Hoyt and Jeffrey Lin, “Portage and Path Dependence,” The Quarterly Journal

of Economics, 2012, 127 (2), 587–644.

Donaldson, Dave and Adam Storeygard, “The View from Above: Applications of

Satellite Data in Economics,” Journal of Economic Perspectives, November 2016, 30 (4),

171–98.

Goldblatt, Ran, Wei You, Gordon Hanson, and Amit K. Khandelwal, “Detecting

the Boundaries of Urban Areas in India: A Dataset for Pixel-Based Image Classification

in Google Earth Engine,” Remote Sensing, 2016, 8.

Harari, Mariaflavia, “Cities in Bad Shape. Urban Geometry in India,” Working Paper,

2016.

Henderson, J Vernon, Adam Storeygard, and David N Weil, “Measuring economic

growth from outer space,” The American Economic Review, 2012, 102 (2), 994–1028.

Li, Yue, Martin Rama, Virgilio Galdo, and Maria Florencia Pinto, “A Spatial

Database for South Asia,” Working Paper, 2015.

Small, Christopher, Francesca Pozzi, and Christopher D Elvidge, “Spatial analysis

of global urban extent from DMSP-OLS night lights,” Remote Sensing of Environment,

2005, 96 (3), 277–291.

Storeygard, Adam, “Farther on down the Road: Transport Costs, Trade and Urban

Growth in Sub-Saharan Africa,” The Review of Economic Studies, 2016, 83 (3), 1263–

1295.

18

Date post:	24-Jun-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

What’s the Matter with Nightlights?kheilman/pdfs/landsat.pdf · 2016-12-31 · What’s the...

Documents