What’s the Matter with Nightlights?
Using Landsat Imagery to Improve Nightlight-Based Measures of
Local Economic Activity with an Application to India
Ran Goldblatt†, Gordon Hanson‡, Kilian Heilmann§, Amit Khandelwal¶
December 30, 2016
Very preliminary and incomplete draft
Abstract
We propose a new remote sensing based method to quantify the spatial extent ofurbanization. We show that augmenting the commonly used night time light data withhigh resolution Landsat data can provide a better prediction for population counts atdifferent levels of aggregation and for different dwelling types. We demonstrate thisadvantage using geo-coded population data from the Census of India.
JEL classification: E01, R1, O11
Keywords: Remote sensing; Night time lights; Urbanization
1 Introduction
The analysis of satellite imagery is now a key methodology in economics and other applied
scientific research. Coming straight from impartial satellites, remotely sensed data has the
advantage of not being filtered through national data agencies that are potentially inefficient
or biased. As Donaldson and Storeygard (2016) lay out its main benefits, remote sensing
allows researchers to access to information that would otherwise difficult to obtain, often
†UCSD, School of Global Policy and Strategy.‡UCSD, Department of Economics and School of Global Policy and Strategy.§UCSD, Department of Economics¶Columbia University, Department of Economics.
1
provides high spatial resolution, and a wide (if not global) geographic coverage. Since the
marginal cost of collecting more data is low, repeated consistent samples are often available
to researchers to learn about the world. Lastly, satellite imagery ignores administrative
boundaries and can therefore be flexibly combined with other data at any geographical unit.
In future, new commercial satellite projects with continually improving spatial and temporal
coverage will only reinforce the importance of remote sensing for academic studies.
Especially the use of night time lights as a proxy for economic activity pioneered by
Henderson et al. (2012) is important to scholars in economics and other social sciences. Night
light intensity has been used to approximate economic activity as light is believed to be a
normal good that is consumed more at higher incomes. For example, Bleakley and Lin (2012)
use night time light intensity to measure economic activity in their study on the economic
persistence of defunct portage sites. Night lights have also enabled development economists
to study geographic entities at that were previously inaccessible because of insufficient data
coverage. They haven been used to measure the economic development of regions that are
either too small to provide disaggregated data or that did not provide any data at all due
to low state capacity. For example, Storeygard (2016) studies the intercity transport costs
and their impact on income of sub-Saharan African cities while Harari (2016) employs night
lights to measure shapes of urban areas in India.
The most prominent night time light products, for example the DMSP OLS produced
by the National Oceanic and Atmosphere Administration (NOAA), however have several
drawbacks that complicate economic analysis, especially in urban areas. Firstly, night time
light data is a available at a very coarse geospatial resolution only. Secondly, night time
light data is saturated at a certain threshold of light intensity. This threshold is often easily
exceeded in very bright urban cores and does not allow the night time light data to capture
any growth in the city center. Thirdly, the night time lights have a tendency to extend into
neighboring regions, the so called blooming effect (Small et al., 2005; Abrahams et al., 2016).
All these factors make analysis of night time light data in the fringes of urban areas difficult.
This is problematic as most growth in urban areas is expected to be in the periphery rather
than the urban cores.
In this paper, we propose a new methodology based on Goldblatt et al. (2016) that
measures the geospatial extent of built up areas to approximate for urban areas. This
measure operates at a finer resolution and can therefore evade part of the problems of the
night time light data. It is based on Landsat imagery that is available at the 30x30m
resolution. Besides the finer resolution, a further advantage of our method is the longer
2
temporal extent. Unlike night time light data, our Landsat imagery extends to the early
1980s and thus allows researchers to study earlier time periods.
In correlational regressions below, we show that our measure correlates highly with the
night time light data, but captures different development patterns. Comparing our measure
to the traditionally used sum of light approach, it performs better at predicting urban
population differences between highly disaggregated spatial units India and has a higher
fit
2 Methodology
Our methodology is based on combining different sources of remotely sensed satellite
imagery. In this section, we describe the data sources and the algorithm to calculate our
measure of urbanization.
2.1 Remote Sensing Data
To implement our methodology, we use remote sensing imagery from the DMSP-OLS
night light dataset and the Landsat program maintained by the United States Geological
Survey (USGS). The Landsat program dates back to the early 1970s and consists of a set of
several satellites that capture global imagery at frequent intervals. In its current satellites
(Landsat 7 and Landsat 8), it provides a spatial resolution of 30x30 meters. For this paper,
we use Landsat 7 which launched in 1999 and is still operating. Landsat 7 records eight
different spectral bands and has a temporal resolution (the time until the satellite revisits a
certain position on earth) of 16 days.
In contrast, the DMSP-OLS night time light dataset has a spatial resolution of about
1km. The night time lights require significant ex-post processing and are typically released
every year. In this paper, we use the stable lights product that removes unstable light sources
such as moonlight, clouds, and fires (Baugh et al., 2010).
We use Google Earth Engine (GEE) to access and manage the Landsat data. GEE is a
cloud-based computational platform that allows to integrate data storage and data manipu-
lation within a single framework. Coming with a JavaScript-based library of geospatial tools
3
similar to the environment of ArcGIS but not being restricted to a single computer system,
it allows to easily scale the geospatial analysis across space and time.
2.2 Algorithm to Determine Built-up Areas
Our method is based on cleaning the night light data with non-urban areas derived from
high-resolution Landsat data to arrive at a measure of urban extents. The basic idea behind
this approach is that the pure night light data is too coarse to delineate boundaries of cities
that are often drivers of economic development. The “blooming effect” of night light tends
to exaggerate urban areas even if they consist of largely undeveloped land. By delineating
urban extents, we can easily generate another remotely sensed measure of economic activity
and compare this with ground truth economic data.
In a first step, we use the night time light data and extract highly lighted areas. The
assumption is that bright areas are signs of human activity and these areas are then classified
as urban. However, due to the blooming effect, some of these areas are actually undeveloped
and just take up light from nearby areas. We therefore then “clean” the coarse night time
light pixels by using Landsat-based indicators to remove non built-up areas. We define
these non built-up areas by employing indicators are commonly used transformations of the
Landsat bands to detect water (normalized difference water index, NDWI) and vegetation
(normalized difference vegetation index, NDVI). We choose thresholds of 0.3 for NDVI and
-0.01 for NDWI to denote areas that either consist of water (lakes, rivers, reservoirs) or
vegetation (public parks, agricultural fields, wasteland).
For computational purposes, we then split the data into smaller hexagons (see Figure
3) and perform machine learning on each hexagon separately. This reduces computational
overload and takes into account regional differences of India’s different agro-climatic zones.
In specific, we classify each pixel in the hexagon as “urban” if it falls above the 99th percentile
of the light distribution and satisfies the NDVI and NDWI cutoffs noted above.
We then take a 1% sample of all pixels within the hexagon for our training dataset and
use it to train a random forest classifier with 20 decision trees. As predictors, we only use the
eight bands of Landsat 7, the indices mentioned above above, and two more simple indices
that are commonly used to predict land cover (normalized difference built-up index, NDBI)
as well as simple urban index based on normalized difference of the short-wave infrared and
near infrared bands. This allows us to solely rely on the Landsat imagery and perform all
4
of our analysis at the finer 30x30m resolution. We then predict, using the trained classifier,
the probability of being urban for each pixel within the hexagon.
Figure 1 shows an example of our method and compares it with data from the DMSP-
OLS night light dataset. It shows the vicinity of the Indian city of Hyderabad, Telangana.
In the lower panel, we depict the night light data of 2011. The city appears as a white blob
with very little variation within its center where all night light values hit the maximum value
of DN = 63. This saturation however masks a strong heterogeneity in the city center as
captured by our measure in the top panel. Depicting the probability of being built up, the
upper picture clearly distinguishes between large non-built up areas within the city, such as
Musi River and several larger bodies of water. Using the night lights, the city appears much
larger than it actually is.
Figure 2 depicts another example of the differences between the built-up area methodol-
ogy and the night lights, this time for the city of Ahmedabad, Gujarat. The red areas depict
the urban areas with a probability cutoff of 0.1 and compares it directly to the night light
data visualized by the gray shades. The blooming effect of the night light data is clearly
visible as the satellite sensors record high light intensity in outside the urban area. The
figure also demonstrates how much finer the urban extent data is compared to the coarse
night time lights.
3 Validation
In this section, we bring our new methodology to the data and compare the performance
of different remote sensing measures of economic activity in the context of India.
3.1 Study Area
We test the performance of our methodology in the context of India. India is the ideal
setting of a large and geographically diverse country that is undergoing a a rapid urban-
ization. Driven by natural population growth and a pronounced rural-to-urban migration,
urban growth has outpaced rural growth and lead to sprawl like expansion of Indian cities.
At the same time, data collection cannot keep pace with the growth of cities. While the
Indian statistical authorities are thought of as being of high quality, the size of the country
5
Figure 1: Built-up areas versus Night Time Lights
Note: Comparison of built up area (top) and night time lights (bottom) in the city of Hyderabad.
6
Figure 2: Comparison of Urban Classification versus Night Light Data
Note: Comparison of the urban classification (in red) against the DMSP-OLS night light data (black-whitescale) in the city of Ahmedabad, Gujarat. Note the much coarser resolution of the night light data.
7
Figure 3: Study Area Division
Note: Hexagons used to train classifier separately to account for regional differences and to easecomputational burden.
8
allows for only infrequent data collection and population and industry censuses are collected
only every ten years.
We study several different administrative units at different geographical resolution. In
a first step, we test our methodology to predict population data at the state and disctrict
level. We then go to a micro-scale and analyze village-level population counts for the large
state of Andhra Pradesh in its boundaries of 2011, before parts of it were split into the
new state of Telangana in 2014. Andhra Pradesh is a state on the southeastern coast of
the Indian peninsula. It is the 8th largest state of the Indian federation by area with
160,000 square kilometers. As of data from 2016, 49.4 million inhabitants reside in the
state, making it the 10th largest state by population on par with countries like South Korea,
Colombia, and Spain. Telangana adds another 110,000 square kilometers and 35.2 million.
Combined, the two state covers three distinct agro-climatic zones. Andhra Pradesh is one of
the leading producer of agricultural goods in India but also a major center for manufacturing
and information technology.
The area of Andhra Pradesh and Telangana has several desirable properties that make
it a good candidate for validating our method: Firstly, it features large variation in the type
of dwelling. As of 2011, the area was about two-third rural and one-third urban allowing us
to validate our method for different dwelling types. Secondly, it experienced rapid popula-
tion growth and added more than ten million people between 2001 and 2011. The overall
population growth of almost 14% was driven to a large part by an increase by growth in
the urban population from 19.4 to 28.2 million (a growth rate of 45%). Secondly, Andhra
Pradesh and Telangana combined have a city size distribution that covers the whole support
as that of India. The largest city in the two states is Hyderabad, which is the fifth largest
city in India behind Delhi, Mumbai, Bangalorem, and Chennai. Yet the two states feature
many small agricultural villages that allow us to test our method on the micro scale.
3.2 Census Data
The main data source are the population counts collected by the Indian census bureau.
The Indian census is conducted every decade and we use the two newest rounds that took
place in 2001 and 2011. The census provides counts of populations by age, gender, caste, and
employment status for various administrative divisions. Most importantly, it distinguishes
between urban and rural dwelling types.
9
Figure 4: Graphical Representation of Study Area
LegendState and Union Territory BoundariesSubdistricts BoundariesAndhra Pradesh
Study Area
10
The largest subdivision in India are the 35 states and union territories. We use data for
33 of them (excluding the outlying island territories of the Andaman and Nicobar Islands
as well as Lakshadweep). Each state is divided into districts. There are a total of 640 in
India in 2011 of which we use 648 of them. The smallest geographical division in India are
towns and villages. The state of Andhra Pradesh in its 2011 boundaries consists of 28,406
municipalities of which 26,748 have consistent data.
The data is kindly provided by the World Bank’s Spatial Database for South Asia (Li
et al., 2015) and comes with consistent geographical boundaries for the years of 2001 and
2011. Figure 4 provides a graphical overview of the extent of the study area and its political
subdivisions.
3.3 Comparing Built-up Areas with Night Lights
To validate our methodology with socio-economic data, we first compare it to the tradi-
tionally used night light data products and run simple correlational regressions at different
administrative levels. At first, we calculate the correlation with the sum of lights (SOL) from
the DSMP-OLS product. The sum of lights approach is the most commonly used method
to approximate economic data and simply adds us the measure of light intensity per pixel
for the entire administrative boundary. Similarly, for our measure we simply add up the
probability of being urban per pixel.
Table 1 shows the results for correlational regressions between the SOL measure and our
urban built up area count for the state as well as the district level for the whole of India,
and for the villages of Andhra Pradesh. The results show that the two measures are highly
correlated and capture similar patterns of urbanization. This is not surprising since the
night light data was used as an input feature to create the training data for the machine
learning algorithm. The pattern of correlation however is not linear in the granularity of the
data. The two measures are highly correlated at the state level (R2 = 0.758), but less so at
the district level (R2 = 0.414). The correlation then increases again at the village level in
Andhra Pradesh to R2 = 0.531. Figure 1 displays the scatterplot between the two measures
for the weakest correlation type of the district level.
11
Table 1: Dependent Variable: Probability of Built up Area
(1) (2) (3)State District Village
Sum of Lights 1.385∗∗∗ 1.280∗∗∗ 2.435∗∗∗
(0.141) (0.0599) (0.0136)
Constant 102520473.5 8118413.5∗∗∗ -98696.5∗∗∗
(107103713.1) (2084348.1) (2042.9)Observations 33 647 28264R2 0.758 0.414 0.531
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Figure 5: Correlation between Measures
02.
00e+
084.
00e+
086.
00e+
08P
roba
bilit
y of
Bui
lt-up
Are
a
0 5.00e+07 1.00e+08 1.50e+08 2.00e+08Sum of Lights
District Observation Fitted Line
12
3.4 Predicting Population Figures
3.4.1 State and District-Level Analysis
We begin by analyzing our measure by predicting population figures at large adminis-
trative boundaries. In this subsection, we regress the population counts of Indian states
and districts on aggregate per-pixel probabilities of begin a built-up area. Tables 2-5 show
the results for the correlational regressions between population and both measures. In all
regressions, the summed probabilities of the machine learning algorithm are a statistically
significant predictor of population counts.
At the state level, our built-up measure outperform the pure night light approach when
considering total population. The R2 increases from 0.45 to 0.6 showing that the urbanized
area is a better predictor for the number of people residing in the different entities. Distin-
guishing between urban and rural areas in columns (2) and (3) of each table highlights that
this advantage is purely driven by the rural population where the urbanized area measure
(R2 = 0.508) outperforms the sum of lights approach (R2 = 0.316) significantly. For urban
areas on the state level, the two measures perform similarly well with only minor differences.
At the district level, the pattern reverses and the simple sum of lights approach out-
performs our measure at every type of dwelling. The correlation with population becomes
weaker and drops from R2 = 0.641 to R2 = 0.240 for urban population. Most noteworthy is
the poor fit of our measure with rural population counts where the R2 is reduced to a mere
0.075.
Table 2: Dependent Variable: State Population
(1) (2) (3)Total Urban Rural
Probability of Built Up Area 0.0360∗∗∗ 0.0114∗∗∗ 0.0246∗∗∗
(0.00525) (0.00153) (0.00435)
Constant 9521287.0 2842947.5 6678242.0(6388925.4) (1860710.7) (5291175.2)
Observations 33 33 33R2 0.602 0.641 0.508
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
13
Table 3: Dependent Variable: State Population
(1) (2) (3)Total Urban Rural
Sum of Lights 0.0495∗∗∗ 0.0186∗∗∗ 0.0309∗∗∗
(0.00983) (0.00230) (0.00816)
Constant 13381417.8 2657271.2 10724061.5(7491400.9) (1753115.8) (6223551.5)
Observations 33 33 33R2 0.450 0.679 0.316
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table 4: Dependent Variable: District Population
(1) (2) (3)Total Urban Rural
Probability of Built Up Area 0.0145∗∗∗ 0.00912∗∗∗ 0.00538∗∗∗
(0.00103) (0.000638) (0.000746)
Constant 1327272.3∗∗∗ 240365.0∗∗∗ 1086908.6∗∗∗
(65688.3) (40887.3) (47755.0)Observations 647 647 647R2 0.236 0.240 0.075
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table 5: Dependent Variable: District Population
(1) (2) (3)Total Urban Rural
Sum of Lights 0.0357∗∗∗ 0.0189∗∗∗ 0.0168∗∗∗
(0.00186) (0.00125) (0.00139)
Constant 1050602.0∗∗∗ 148112.1∗∗∗ 902510.2∗∗∗
(64832.8) (43557.1) (48435.1)Observations 647 647 647R2 0.362 0.261 0.184
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
14
3.4.2 Village-Level Analysis
In a next step, we go on the micro level and regress population figures on the sum
of light measure and our methodology for the 26,748 villages and towns in the state of
Andhra Pradesh. At this very disaggregated level, the advantage of the fine built-up area
approach reappears and it outperforms the pure night light approach slightly (R2 = 0.504
versus R2 = 0.454) when looking at total population. Again, the advantage is solely realized
through predictive power for urban population counts where the R2 of 0.501 is much higher
than that the sum of light method’s (R2 = 0.392).
For predicting rural population counts at the village level, both methods perform poorly.
While the R2 in the regression on the sum of light measure drops to 0.098, the R2 in our new
methodology reaches a mere 0.001 and the predictive power at the rural disaggregated level
is very weak. This suggests that the built up area in rural villages is not a good predictor
of the number of people who live there, most likely to vastly different population density in
rural areas compared to dense cities.
Table 6: Dependent Variable: Village Population
(1) (2) (3)Total Urban Rural
Probability of Built Up Area 0.0251∗∗∗ 0.0248∗∗∗ 0.000210∗∗∗
(0.000152) (0.000152) (0.0000354)
Constant 1548.1∗∗∗ -530.8∗∗∗ 2078.9∗∗∗
(71.02) (70.87) (16.53)Observations 26748 26748 26748R2 0.504 0.501 0.001
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
4 Conclusion
In this paper, we present a simple method to “clean” the commonly used night light data
and to detect built-up urban environments through remotely sensed satellite imagery. By
subtracting areas of water and vegetation from at a finer resolution from the coarse night
light pixels, we are able to denote urban boundaries at the 30x30m resolution. Using a
15
Table 7: Dependent Variable: Village Population
(1) (2) (3)Total Urban Rural
Sum of Lights 0.0798∗∗∗ 0.0737∗∗∗ 0.00607∗∗∗
(0.000535) (0.000562) (0.000113)
Constant -2307.0∗∗∗ -3984.7∗∗∗ 1677.7∗∗∗
(82.39) (86.51) (17.37)Observations 26748 26748 26748R2 0.454 0.392 0.098
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
machine learning algorithm, we show that sampling a small number of pixels and predicting
the built-up area using Landsat data yields a reasonable fit.
In cross-sectional regressions using census data for different administrative units in India,
we show that this measure can improve the predictive power of night light satellite imagery
to proxy for population differences for certain administrative levels. However, the advantage
is not uniform and has severe limits. Our measure performs as a better predictor for overall
population at the highest (Indian states and union territories) and the smallest (villages in
the state of Andhra Pradesh) level of spatial aggregation, and performs marginally worse at
an intermediate level of the Indian districts.
When distinguishing between urban and rural population, the two measures reverse their
advantages when going from high to low aggregation. On an absolute level, both methods
perform worse at predicting rural population counts compared to urban ones. At the state
level, our measure has a better fit for rural population at the state level. This advantage
is lost when going to the district level and is completely lost when looking at villages and
towns. Conversely, our measure starts out underperforming the sum of light approach for
urban areas at higher aggregation, but then has a clear advantage at the town and village
level, indicating that it performs best in areas of higher density.
The analysis has shown that the traditionally used night time light approach can be
improved on with higher resolution satellite imagery of other sources, but that the level
of aggregation and the dwelling type matter. Heterogeneity of light usage and residential
density translate into heterogeneity in the advantage of different remote sensing techniques.
In future work, we plan to explore the predictive power of our measure for other socio-
16
economic data such as incomes and industrial production which might have very different
properties of being detectable from sky than residence patterns.
17
References
Abrahams, Alexei, Nancy Lozano-Gracia, and Christopher Oram, “Deblurring
DMSP Nighttime Lights,” Working Paper, 2016.
Baugh, Kimberly, Christopher D. Elvidge, Tilottama Ghosh, and Daniel Ziskin,
“Development of a 2009 Stable Lights Product using DMSPOLS data,” Proceedings of the
Asia-Pacific Advanced Network, 2010, 30, 114–130.
Bleakley, Hoyt and Jeffrey Lin, “Portage and Path Dependence,” The Quarterly Journal
of Economics, 2012, 127 (2), 587–644.
Donaldson, Dave and Adam Storeygard, “The View from Above: Applications of
Satellite Data in Economics,” Journal of Economic Perspectives, November 2016, 30 (4),
171–98.
Goldblatt, Ran, Wei You, Gordon Hanson, and Amit K. Khandelwal, “Detecting
the Boundaries of Urban Areas in India: A Dataset for Pixel-Based Image Classification
in Google Earth Engine,” Remote Sensing, 2016, 8.
Harari, Mariaflavia, “Cities in Bad Shape. Urban Geometry in India,” Working Paper,
2016.
Henderson, J Vernon, Adam Storeygard, and David N Weil, “Measuring economic
growth from outer space,” The American Economic Review, 2012, 102 (2), 994–1028.
Li, Yue, Martin Rama, Virgilio Galdo, and Maria Florencia Pinto, “A Spatial
Database for South Asia,” Working Paper, 2015.
Small, Christopher, Francesca Pozzi, and Christopher D Elvidge, “Spatial analysis
of global urban extent from DMSP-OLS night lights,” Remote Sensing of Environment,
2005, 96 (3), 277–291.
Storeygard, Adam, “Farther on down the Road: Transport Costs, Trade and Urban
Growth in Sub-Saharan Africa,” The Review of Economic Studies, 2016, 83 (3), 1263–
1295.
18