Spatial regression modelling for large dataset: A precompression approach
Daisuke Murakami
The Institute of Statistical Mathematics
Joint work with Daniel A. Griffith1
Outline• Objective
‒ Development of a fast regression approach for large spatial data
• Outline‒ Introduction of a fast additive modeling (AM) for large samples,
which is implemented in an R package mgcv
‒ Development of another fast additive model for large spatial data
‒ Comparison through Monte Carlo simulations
‒ Application to a crime data
2
mgcv: developed byDr Wood (University of Bristol)
Spatial data are getting bigger and bigger
Participatory sensingPeople flow, Health, Tweet
Energy consumption ,...
Ground temperature
Air pollution
Remote sensingClimate, Temperature, Land cover,…
Mobile GPS
Google Earth Engine(https://earthengine.google.com/)
Estimated data- Population, Productivity,
Barth/death counts,…
例:WorldPop(http://www.worldpop.org.uk/)Global socioeconomic Statistics by 1 km grids
お
Increase ofopen data
Traffic counts
OpenStreetMap- Road, Buildings,…
Regression for large samples
Regression problems containing from tens ofthousands to millions of observations are nowcommonplace (Wood et al., 2015).
• Additive (mixed) model (AM) is useful✓Regression accounting for linear, non-linear, group,
and other effects. ✓Fast estimation methods have been developed
• Review✓AM in applied statistics✓AM-related models in geostatistics
4
Additive model (AM)
• Linear AM
𝐲 = 𝐗𝛃 +
𝑘=1
𝐾
𝐟(𝐳𝑘) + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)
𝐟 𝐳𝑘 : Unknown smooth function representing the effectfrom a covariate 𝐳𝑘
𝐄𝑘 : Known matrix consists of L (<<N) basis functions
𝛄𝑘 : Random coefficients with known covariance matrix 𝐒𝑘
✓ One variance parameter 𝜏𝑘2 for each 𝐟 𝐳𝑘
𝐟(𝐳𝑘) = 𝐄𝑘𝛄𝑘 𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝐒𝑘)
5
Additive model (AM) is useful
Time-varying effects- Electricity use prediction
Space-varying effects- Local impact of
racial diversityon crime risks
• To estimate a wide variety of effects behind data.
zk : view
Nonlinear effects- Effects of openness of
view on hosing price
𝐲 = 𝐗𝛃 +
𝑘=1
𝐾
𝐟 𝐳𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)
𝐟 𝐳𝑘 = 𝐄𝑘𝛄𝑘 𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝐒𝑘)
Time: zk
𝐳𝑘: Location
𝐟 𝐳𝑘
𝐟 𝐳𝑘
𝐟 𝐳𝑘
Estimation of the variance parameters {𝜏12, ⋯ , 𝜏𝐾
2 }
Log-restricted likelihood (REML)
7
𝐲 − ෩𝐗෩𝛃2+ ෩𝛃′ ෨𝐒𝛕
−1෩𝛃
2𝜎2+𝑁 −𝑀
2log 2π𝜎2 +
𝑙𝑜𝑔 ෩𝐗′෩𝐗 + ෨𝐒𝛕−1 − 𝑙𝑜𝑔 ෨𝐒𝛕
−1
2
Red: Matrices and vectors whose size depend on N
෩𝐗 = [𝐗, 𝐄1, ⋯ , 𝐄𝐾]𝐲 = ෩𝐗෨𝛃+ 𝛆 ෩𝛃 = [𝛃′, 𝛄′1, ⋯ , 𝛄′𝐾]′
𝐲 = 𝐗𝛃 +
𝑘=1
𝐾
𝐄𝑘𝛄𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝐒𝑘)
෨𝐒𝛕: A Block diagonal matrix whose k-th block equals 𝜏𝑘2𝐒𝑘
෩𝛃 = (෩𝐗′෩𝐗 + ෨𝐒𝛕−1)−1෩𝐗′𝐲
Fast REML (Wood et al., 2011; 2015; 2017)
• Applicable to Gigadata (Wood et al., 2015)‒ Fast : Linear-time estimation of {𝜏1
2, ⋯ , 𝜏𝐾2 }
‒ Small memory : There are memory efficient procedures
• Fast REML‒ Once ෩𝐗 is decomposed as ෩𝐗 = QR, the CP cost to estimate the
parameters {𝜏12, ⋯ , 𝜏𝐾
2 } is independent of N.‒ The large Q matrix can be discarded before the estimation
𝐟 − 𝐑෩𝛃𝟐+ 𝐫 𝟐 + ෩𝛃′ ෨𝐒𝛕
−1෩𝛃
2𝜎2+𝑁 − 𝐾𝐿
2log 2π𝜎2
+𝑙𝑜𝑔 𝐑′𝐑 + ෨𝐒𝛕
−1 − 𝑙𝑜𝑔 ෨𝐒𝛕−1
2
෩𝛃 = (𝐑′𝐑 + ෨𝐒𝛕−1)−1𝐑′𝐟
f = 𝐐′𝐲, r = 𝐲 𝟐 − 𝐫 𝟐8
An R package mgcv
• Implements (generalized) AM accounting for a wide variety of effects✓Linear, Non-linear, varying coefficients, group,…
• Computationally really efficient✓bam function: fast and memory efficient estimation for Big data
(fast REML is the default)
✓As far as I reviewed R packages for fast regression modeling (e.g., INLA, R2BayesX, RStan,…), mgcv was the fastest.
Later, I will compare my algorithm with the fast REML
9
Regression for large samples
Regression problems containing from tens ofthousands to millions of observations are nowcommonplace (Wood+2015, Appl. Stat.).
• Additive mixed model (AM) is useful✓Regression accounting for linear, non-linear, group,
and other effects. ✓Fast estimation methods have been developed
• Review✓AM in applied statistics✓AM-related models in geostatistics
10
Spatial correlation
The most basic property
of spatial data- The first law of geography
(Tobler 1970)
Nearby things arestrongly related each other
R. A. Fisher (1935)
11
Modeling spatial correlation
𝛃~𝑁 𝑏𝟏, 𝜏2𝐂𝑟 𝐂𝑟 =
𝑐𝑟(𝑑1,1) ⋯ 𝑐𝑟(𝑑1,𝑁)
⋮ ⋱ ⋮𝑐𝑟(𝑑𝑁,1) ⋯ 𝑐𝑟(𝑑𝑁,𝑁)
Distance di,jr
𝑐𝑟(𝑑𝑖,𝑗) is a
distance-decay function
Correlation
• Gaussian process (GP) is widely used.
𝑐𝑟(𝑑𝑖,𝑗) = exp(−𝑑𝑖,𝑗
𝑟)
Spatially correlated process β behind data
Processbehindairpollution
Spatially varying coefficient (SVC) model
– A particular type of AM in geostatistics estimating spatial correlated process behind regression coefficients
13
𝐲 =
𝑘=1
𝐾
𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)
𝛃𝑘~𝑁(𝑏𝑘𝟏, 𝜏𝑘2𝐂𝑟𝑘)
Processesbehind the coefficients
Spatial pattern of β1 Spatial pattern of β2
Small 𝑟1 Large 𝑟2
Geostatistical approach is too slow
• Geostatistical approaches accurately estimate
SVCs (i.e., 𝛃𝑘), but too slow… not suitable for large samples.
14
20,000 60,000 100,000
Sample size
CP time(Seconds) CP time of SVC models
The fast REML in mgcv is available to the SVC modeling
‒ Because the SVC model is a particular type of AM.
Geo-additive model (GeoAM; Kammann and Wand, 2003)
𝐲 =
𝑘=1
𝐾
𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)
𝛄𝑘~𝑁(0, 𝜏𝑘2𝚲𝑘)𝛃𝑘 = 𝑏𝑘𝟏 + 𝐄𝛄𝑘
𝐲 = 𝐗𝛃 + 𝛆𝐗 = [𝐱1, ⋯ , 𝐱𝐾, (𝐱1°𝐄),⋯ , (𝐱𝐾°𝐄)]
𝛃 = [𝑏1, ⋯ , 𝑏𝐾 , 𝛄′1, ⋯ , 𝛄′𝐾]′
𝛃𝑘~𝑁(𝑏𝑘𝟏, 𝜏𝑘2 𝐂) 𝐂 = ณ𝐄
𝑁×𝐿
ด𝚲𝑘
𝐿×𝐿
ณ𝐄 ′𝐿×𝑁
Rank reduced GP (scale r is given a priori)
𝐄: Matrix composed of L basis functions
15
Remining computational issue• SVC model has more parameters than typical AM
– AM : 𝚯 ∈ {𝜏12, ⋯ , 𝜏𝐾
2}, (mgcv and other AM studies)
– SVC model: 𝚯 ∈ 𝜏12, ⋯ , 𝜏𝐾
2 , 𝑟1, ⋯ , 𝑟𝐾 . (ours)
• So, red parts in the restricted likelihood might be slow
16
𝐟 − 𝐑𝛃2+ 𝐫 2 + 𝛃′𝐒𝚯𝛃
2𝜎2+𝑁 −𝑀
2log 2π𝜎2
+𝑙𝑜𝑔 𝐑′𝐑 + 𝐒𝚯 − 𝑙𝑜𝑔 𝐒𝚯
2
𝛃 = (𝐑′𝐑 + 𝐒𝚯)−1𝐑′𝐟
di,j
𝑟𝑘(decay speed)
Spatial covariance
𝜏𝑘2 𝑐 −𝑑𝑖,𝑗; 𝑟𝐾𝜏𝑘
2
Objective• Summary
‒ Geostatistical methods are too slow.
‒ AM (mgcv) in applied statistics is much faster, but might be slow for the SVC modeling because of the many variance parameters.
• I developed another fast REML for SVC modeling and other AM‒ Applicable to AM with SVCs even if
✓N (sample size) is large (e.g., millions)
✓K (number of SVCs and other effects) is large
‒ Note: my development is done independently with the Wood’s fast REML.✓Simply because I didn’t know his study …
17
Our SVC model
– Develops a fast approach to estimate 𝚯 ∈ {𝜏1
2, ⋯ , 𝜏𝐾2 , 𝑟1, ⋯ , 𝑟𝐾}.
18
Pattern Behind the coefficients
Spatial pattern of β1 Spatial pattern of β2
𝐲 =
𝑘=1
𝐾
𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)
𝛄𝑘~𝑁(0, 𝜏𝑘2𝚲𝑟𝑘)𝛃𝑘 = 𝑏𝑘𝟏 + 𝐄𝛄𝑘
Rank reduced GP (rank: L)
Small 𝑟1 Large 𝑟2
Type II restricted log-likelihood (see Bates, 2010)
𝑙𝑅(𝚯) = −1
2𝑙𝑛
𝐗′𝐗 𝐗′ ෨𝐄෩𝐕(𝚯)෩𝐕 𝚯 ෨𝐄′𝐗 ෩𝐕 𝚯 ෨𝐄′ ෨𝐄෩𝐕(𝚯) + 𝐈
−𝑁−𝐾
21 + 𝑙𝑛
2𝜋𝑑(𝚯)
𝑁−𝐾
መ𝐛𝐮
=𝐗′𝐗 𝐗′ ෨𝐄෩𝐕(𝚯)
෩𝐕 𝚯 ෨𝐄′𝐗 ෩𝐕 𝚯 ෨𝐄′ ෨𝐄෩𝐕(𝚯) + 𝐈
−1𝐗′𝐲
෩𝐕 𝚯 ෨𝐄′𝐲
𝑑 𝚯 = 𝐲 − 𝐗መ𝐛 − ෨𝐄෩𝐕(𝚯)𝐮2+ 𝐮
2
ො𝜎2 =𝐲 − 𝐗𝐛 − ෨𝐄෩𝐕 𝚯 𝐮
2
𝑁 − 𝐾Red: Matrix/vector whose size depend on N
Accuracy Variance
𝐲 = 𝐗𝛃 + ෨𝐄෩𝐕 𝚯 𝐮 + 𝛆 𝛆~𝑁(𝟎, 𝜎2𝐈)𝐮~𝑁(𝟎, 𝜎2𝐈)
• Our SVC model
• Our restricted log-likelihood
෨𝐄 = [𝐄, 𝐄,⋯ , 𝐄]: Matrix of all the basis functions (N×KL)෩𝐕 𝚯 : Diagonal matrix determining the variance structure
Eliminating N from 𝑙𝑅(𝚯) (Similar to Wood et al., 2015)
(1) Evaluate MXX=X'X, MXE=X'E, mXy=X'y, mEy=E'y, myy=y'y
(2) Rewrite 𝑙𝑅(𝚯) as below
→Large matrices/vectors are eliminated
→Complexity: O((K+KL)3) << O(N3)
𝑙𝑅(𝚯) = −1
2𝑙𝑛
𝐌𝐗𝐗 𝐌𝐗𝐄෩𝐕(𝚯)
෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄෩𝐕(𝚯) + 𝐈
−𝑁−𝐾
21 + 𝑙𝑛
2𝜋𝑑(𝚯)
𝑁−𝐾
𝑑 𝚯 = ො𝜺 2 + 𝐮2
ො𝜺 2 = 𝑚𝐲𝐲 − 2 𝐛′, 𝐮′𝐦𝐗𝐲
෩𝐕 𝚯 𝐦𝐄𝐲+ 𝐛′, 𝐮′
𝐌𝐗𝐗 𝐌𝐗𝐄෩𝐕(𝚯)
෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄෩𝐕(𝚯)
መ𝐛𝐮
መ𝐛𝐮
=𝐌𝐗𝐗 𝐌𝐗𝐄
෩𝐕(𝚯)෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄
෩𝐕(𝚯) + 𝐈
−1 𝐦𝐗𝐲
෩𝐕 𝚯 𝐦𝐄𝐲20
Fast maximization of 𝑙𝑅(𝚯)Still, the maximization of 𝑙𝑅(𝚯) is slow if K is large.
- Θ∈{𝜏12, ⋯ , 𝜏𝐾
2 , 𝑟1, ⋯ , 𝑟𝐾}
- It involves P(Θ)-1 and |P(Θ)| where P(Θ) (KL×KL)
𝑙𝑅(𝚯) = −1
2𝑙𝑛
𝐌𝐗𝐗 𝐌𝐗𝐄෩𝐕(𝚯)
෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄෩𝐕(𝚯) + 𝐈
−𝑁−𝐾
21 + 𝑙𝑛
2𝜋𝑑(𝚯)
𝑁−𝐾
𝐏 𝚯
It includes 𝐏 𝚯 −1
max 𝑙𝑅 θ1 |θ2,⋯,θK → Partial update of 𝐏 𝚯 −1 and 𝐏 𝚯
max 𝑙𝑅 θ2 |θ1,⋯,θK → Partial update of 𝐏 𝚯 −1 and 𝐏 𝚯
max 𝑙𝑅 θK |θ1,⋯,θK−1 → Partial update of 𝐏 𝚯 −1 and 𝐏 𝚯
・・・
We apply a sequential update (𝛉𝑘 ∈ {𝜏𝑘2, 𝑟𝑘})
K-th step of the sequential updates
• 𝑙𝑅 θK is maximized with respect to 𝛉𝐾 ∈ {𝜏𝐾2 , 𝑟𝑘}.
‒ We let 𝛉𝐾 outside of large matrices to enable partial update.
𝑙𝑅(θK) = −1
2𝑙𝑛 𝐏(θK) −
𝑁−𝐾
21 + 𝑙𝑛
2𝜋𝑑(θK)𝑁−𝐾
𝑑 θK = ො𝛆 θK2 +
𝑘=1
𝐾
ෝ𝐮𝑘2
ො𝛆 θK2 = 𝑚𝑦,𝑦 − 2 መ𝐛′, ෝ𝐮′1, ⋯ ෝ𝐮′𝐾
𝐦0
𝐕1𝐦1
⋮𝐕 𝛉𝐾 𝐦𝐾
+ መ𝐛′, ෝ𝐮′1, ⋯ ෝ𝐮′𝐾 𝐏0
መ𝐛ෝ𝐮1⋮ෝ𝐮𝐾
መ𝐛ෝ𝐮1⋮ෝ𝐮𝐾
= 𝐏(θK)−1
𝐦0
𝐕1𝐦1
⋮𝐕 𝛉𝐾 𝐦𝐾 22
K-th step of the sequential updates
23
• The terms including 𝐏(θK)−1 and 𝐏(𝚯) are expanded as
መ𝐛ෝ𝐮1⋮ෝ𝐮𝐾
=෩𝐕−𝐾−1 𝐎
𝐎 )𝐕(𝛉𝐾−1 𝐐−1 𝐦−𝐾
𝐦𝐾
−෩𝐕−𝐾−1𝐐−𝐾,𝐾
∗
)𝐕(𝛉𝐾−1𝐐𝐾,𝐾
∗)𝐕(𝛉𝐾2 +𝐐𝐾,𝐾
∗ −1𝐐𝐾,−𝐾∗ 𝐦−𝐾 + 𝐐𝐾,𝐾
∗ 𝐦𝐾
= ෩𝐕−𝐾2
)𝐕(𝛉𝐾2 ෩𝐌−𝐾,−𝐾 + ෩𝐕−𝐾
−2 ቚ
ቚ
)𝐕(𝛉𝐾−2 +𝐌𝐾,𝐾
− ෩𝐌𝐾,−𝐾෩𝐌−𝐾,−𝐾 + ෩𝐕−𝐾
−2 −1 ෩𝐌−𝐾,𝐾
𝐏(𝚯)
• Large matrices (shown in red) are processed only one time before the iteration to maximize 𝑙𝑅(θK).
Does the sequential estimation works appropriately?
• The lowest correlation coefficients between SVCs estimated by the sequential and simultaneous optimization in the 200 simulations✓The sequential estimation returns almost the same SVCs
with the simultaneous one.
24
Summary: fast estimation approach• Usual: N samples are processed iteratively in the
estimation‒ Not suitable for large N.
• Ours: N samples are compressed a priori.‒ The CP cost for the estimation is independent of N.
→ Iteration of O(N3) → O(N) + Iteration of O(K3L3)
‒ The estimation of Θ is split into K steps.→ Iteration of O(K3L3) → O(K3L3) + Iteration of O(L3)
Data Model
Iteration for estimation(e.g.,MCMC)
Iteration of O(N3)
in case of GP
N
(K+L)2 + K+L
Model
Split the estimation
into K steps: O(L3)
Rank reduction + Precompression
25
Summary of the proposed approach
• An estimation approach for the SVC model is proposed✓ Pre-compression: Iteration of O(N3) → O(N) + Iteration of O(K3L3)
✓ Sequential est. : Iteration of O(K3L3) → O(K3L3) + Iteration of O(L3)
• Although I assumed the SVC model below, this approach is applicable to other AM (and additive mixed model)
✓ Useful to estimate regression models with SVCs, group effects,
non-linear effects, time-varying effects,…
𝐲 =
𝑘=1
𝐾
𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(𝟎, 𝜎2𝐈)
𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝚲𝛼𝑘)𝛃𝑘 = 𝑏𝑘𝟏 + 𝐄𝛄𝑘
26
Simulation
• In each case, the true data is generated from
𝐲 =
𝑘=1
2
𝐱𝑘°𝛃𝑘 +
𝑘=3
4
𝐱𝑘°𝛃𝑘 + 𝛆𝑔 + 𝛆𝛆~𝑁(𝟎, 𝜎2𝐈)
Global
SVC𝛃𝑘 = 1 + 𝐄𝛄𝑘𝛄𝑘~𝑁(𝟎, 𝚲𝑘
3 )
𝛃𝑘 = 1 + 𝐄𝛄𝑘
𝛄𝑘~𝑁(𝟎, 𝚲𝑘0.5)
Local
SVC
•Models are estimated 200 times in each case, where✓N∈{500, 1,000, 3,000, 8,000, 20,000}
✓K∈{4, 6, 8}
✓σ𝑔2 =Var[σ𝑘=1
𝐾 𝐱𝑘°𝛃𝑘]
When K=4
𝛃1 𝛃2 𝛃5 𝛃6
𝛆𝑔~𝑁(𝟎, σ𝑔2𝐆)
Group
effect
Coefficients by
randomly
generated
groups of 20
samples
Specifications
28
𝐲 =
𝑘=1
𝐾
𝐱𝑘°𝛃𝑘 + 𝛆𝑔 + 𝛆
𝑓(𝐳𝑗) = 𝐄𝛄
𝛆~𝑁(0, 𝜎2𝐈)
Approach
SVCs: βk Group
effects
εg
R func-
tionModelScale
estimation
AM 2D cubic spline ×
bam
(mgcv)GeoAM Low rank GP
Spatial range is
fixed following
studies in
applied statistics
×
Propose1 Low rank GP ×resf_vc
(spmoran)Propose2 Low rank GP × ×
28
RMSE of the estimated parameters
29
β (local)
β (global)
β (group)
K = 4 K = 6 K = 8AM
GeoAM
Propose 1
Propose 2
Estimated SVCs in the 1st iteration(N=8,000; K=6)
30
β4 (local)
β2 (global)
AM GeoAM Proposed2True
Bivariate spline
is too simple
GP with pre-specified
scale parameter
is not acceptable
Estimated group effects in the 1st iteration (K=6)
31
AM GeoAM Proposed2
True True True
True True True
Estimate Estimate Estimate
Estimate Estimate Estimate
N=8,000(400 groups)
N=20,000(1000 groups)
Too much shrinkage? SVC estimation error might the group effect blur 31
Computation time (𝑁 ≤ 200,000)
32
Sample size
GeoAM(mgcv)
Propose 2
Number of variance parameters
CP time (seconds) CP time (seconds)
If N and the number of SVCs are the same, - Propose 2 is slower than GeoAM. But, when N >10,000, the increase of
CP time with respect to N is as slow as GeoAM
If N and the number of variance parameters are the same, - Propose 2 is faster than GeoAM (N is large).
(N=200,000)(K=8)
CP time comparison with other geostatistical approaches
•Conventional✓The CP time rapidly increases as N grow✓Did not work when N > 15,000
•Proposed✓CP time is very short even if N and K are large.
20,000 60,000 100,000 Sample size
CP time (seconds)
←Practical method
Proposed
Bayesian SVC model (our model approximates it)
Application to crime data
Joint work with Mami Kajita and Seiji Kajita
34
Crime analysis is a hot topic
PredPol (https://www.predpol.com/)
• Predict when and where crimes happens‒ Considering streetlight location, bar opening hours,…
• Optimization of patrol Crimes decrease 20% in Santa Cruz, CA, whereas 47 % in LA
Crime prediction Patrol control 35
Number of crimes in Tokyo
Brutal
Violent
2009 2012 2015 2017
2009 2012 2015 2017
Counts per area
36
Burglary
Non-burglary
2009 2012 2015 2017
2009 2012 2015 2017
37
Number of crimes in Tokyo
Counts per area
Determinants of crimes
Near-repeated crimes in Tokyo
Repeat-ness‒ Crimes tent to repeat in the
same area
Risk factors‒ Education, Income, Population,
Number of pedestrians,…
Regional properties‒ Slam, image of
neighborhood,…
38
Our applied additive mixed model with SVCs
𝑦𝑐 𝐬, 𝑡 =
𝑐′=1
4
𝑦𝑐′ 𝐬, 𝑡 − 1 𝑏𝑐′(𝐬) +
𝑘=1
𝐾
𝑥𝑘(𝐬, 𝑡)𝛽𝑘(𝐬) + 𝑢𝑠𝑝(𝐬) + 𝑢𝑛𝑠(𝐬) + ε(𝐬, 𝑡)
𝐮𝑛𝑠~𝑁(0, σ𝑛𝑠2 𝐈)
𝐛𝑐′~𝑁(𝟎, 𝐂(𝛉𝑐′)) 𝛃𝑘~𝑁(𝟎, 𝐂(𝛉𝑘)) 𝐮𝑠𝑝~𝑁(𝟎, 𝐂(𝛉𝑠𝑝))Spatially
varying
coefficients
(GP)Independent
Normal
(group effect)
Repeat-ness Risk factors
Local effect
Spatial District
Log(Crime density)- c: Crime : Brutal, Violence, Burglary, Non-burglary
- s: District : 3,128
- t: Year : 2009 – 2016
39
Application to crime analysis
Brutal
Violent
Burglary
Non-burglary
Explained variables- Number of crimes
per area (Log-scale)
Explanatory variables
40
Repeat-ness
Crime density in the previous year
Risk factors
Daytime pedestrian density
Nighttime pedestrian density
Unemployment rate
(5 other variables)
Local effects
District-level effect
Spatial effect
. . .
Estimated District and Spatial effects
Brutal Violent Burglary Non-burglaryRepresentative clouded district
(Kabuki-cho)
High risk in clouded district
High risks in the north area
High risk in the centerHigh risk in the centerand north west area
High risk in north area High risk in the centerand north area
District-level (group) effect
Spatial effects
Estimated Repeat-ness effects
Tend to repeat near the center
42
Brutal Violent Burglary Non-burglary
Tend to repeat near sub-centers(Shinjuku, Shibuya)
Estimated effects of Daytime pedestrian density
Increase crimes in suburban areas
Increase crimes in suburban areas
Estimated effects
Statistical significance
Increase in sub-centers
43
Brutal Violent Burglary Non-burglary
No impact
Estimated effects of Nighttime pedestrian density
Increase in clouded area
No impact Low density increase crimes in a bayside area
Statistical significance
High density increase crimes in local centersin the north
Estimated effects
Brutal Violent Burglary Non-burglary
Concluding remarks
• This study develops a fast SVC modeling approach for large samples✓It estimates SVCs accurately
✓It is confirmed that it is as fast as the fast REML in mgcv
• The SVC model is implemented in an R package spmoran
• This approach is available to other additive mixed (mixed) models✓Non-linear, time-varying, group, …
✓It might be possible to extend it to fast Bayesian sampling whose CP cost is independent of the sample size (after pre-compression)
45