Light field imaging: modelling,
parameterization and sparsification
Atanas Gotchev, Tampere University
11.6.2019 2
The most popular
city to live and study
in
Tampere UniversitiesTampere University of Technology,
University of Tampere and Tampere
University of Applied Sciences
• 35,000 students
• 5,000 employees
Tampere
Third largest
city in Finland,
220,000
inhabitants
One of the
fastest
growing urban
centres in
Finland
Methods for capture, representation and processing real world 3D visual data
Knowledge about perception of depth and visual cues
Optimal visualization on emerging 3D displays
This project has received funding from the European Union’s
Horizon 2020 research and innovation programme under the
Marie Sklodowska-Curie grant agreement No 764951.
The science
of more
exciting
tomorrow
Presentation Outline
• Introduction to plenoptic function, 4D light field and light field displays
• Epipolar plane image representation and densely sampled light fields (DSLF)
• DSLF reconstruction
• Angular super-resolution
• Spatial super-resolution
• DSLF compression
• DSLF applications
Plenoptic function, 4D light field and light field displays
Plenoptic function (PF)
• Introduced by Adelson and Bergen (1991)
• Plenus (complete) + Optic = Plenoptic
• 7-D continuous function that describes the light field P(q,j,l,t,Vx,Vy,Vz)
• (Vx, Vy, Vz) – location in 3D space
• (q, j) – angles determining the direction
• l – wavelength
• t – time
x
zy
j
q
(Vx,Vy,Vz)
Two-plane parameterization
• A 4-D approximation of PF, parameterized through two parallel planes L(u,v,s,t)
u
s
Ds
Du
v
u
t
s
Levoy and Hanrah (1996) – light field
Gortler et al. (1996) – Lumigraph
Light field displays
• Perceptual light field (PLF): how the human eyes sample
the light field
• Light field displays aimed at reconstructing:
• Stereo
• Focus (accommodation and retinal blur)
• Continuous parallax
A. Stern, Y. Yitzhaky, B. Javidi, “Perceivable light fields:
Matching the requirements between the human visual system
and autostereoscopic 3-D displays,” Proc. IEEE, Oct. 2014.
M. Banks, D. Hoffman, J. Kim, G. Wetzstein, “3D Displays“,
Annual Review of Vision Science 2016
A non-exhaustive LF display nomenclature
• Integral Imaging displays
• Super-multiview displays
• Tensor displays
H. Huang and H. Hua, “Systematic
characterization and optimization of 3d light
field displays,” Opt. Express, 2017
G. Wetzstein et. all., “Tensor displays: Compressive
light field synthesis using multilayer displays with
directional backlighting,” ACM Trans. Graph., July 2012
Y. Takaki, “Development of super multi-
view displays,” ITE Transactions on Media
Technology and Applications, 2014.
Projection Based Light Field Displays
• Ray generators
• Discrete to continuous conversion
• LF reconstruction instead of views
T. Balogh, “The HoloVizio system,” Proc. SPIE 6055, 2006
Epipolar plane images, their Fourier domain characteristics, and the densely-sampled light field
Forming epipolar plane image (EPI) from a 3D scene
s
t
u
v
(s,t) – camera plane
(u,v) – image plane
Two-plane parameterization
tAtBtCtDtE
tAtB
tC
tDtE
t
z
tAtBtCtDtE
Forming epipolar plane image (EPI) from a 3D scene
𝑣 =𝑣2 − 𝑣1𝑡2 − 𝑡1
𝑡 − 𝑡 + 𝑣 =𝑓
𝑧0𝑡 − 𝑡1 + 𝑣1
Δ𝑡 = 𝑡2 − 𝑡1
Δ𝑣 = 𝑣2 − 𝑣1Δ𝑣 =
𝑓
𝑧0Δ𝑡
Chai 00 siggraph
t
v
t1 t2
v1
v2
EPIs100
160
220
280
340
100
A
A
C
E
B
D
F
B C
D E F
v
u
t
v
Full parallax
• 4D EPI hyper-cube
t
v
s
u
s
t
u
v
EPI in continuous Fourier domain
zminzmax
fv
t
Chai 00 siggraph
EPIv
t
Wv
Wt
~zmax
~zmin
~inf
~ 0~ f
Discretization in spatial and angular domains
image
resolu
tion
camera density
Wv
Wt
Wt
Wv
2𝜋
Δ𝑡
2𝜋
Δ𝑣
Wt
Wv 2𝜋
Δ𝑡2𝜋
Δ𝑣
Wt
2𝜋
Δ𝑡
Wv
2𝜋
Δ𝑣
Alias free sampling
• Scene with Lambertian properties and without occlusions
• Practical estimation for scene sensing / rendering
#images
#layers1
resolution
1
J.-X. Chai, X. Tong, S.-C. Chan, H.-Y. Shum, “Plenoptic
sampling,” SIGGRAPH (Computer Graphics), July 2000.
Δ𝑡 =1
𝐾𝑓𝑣𝑓ℎ𝑑𝑁𝑑 , 𝑁𝑑 ≥ 1
ℎ𝑑 =1
𝑧𝑚𝑖𝑛−
1
𝑧𝑚𝑎𝑥
𝐾𝑓𝑣𝑓ℎ𝑑 = min 𝐵𝑣𝑠 ,
1
2Δ𝑣,1
2𝛿𝑣
𝐵𝑣𝑠 – highest (texture)frequency
Δ𝑣 – sampling camera resolution
𝛿𝑣 – rendering camera resolution
𝑁𝑑 – Number of layers
Δ𝑡 – Sampling interval
Densely sampled light field (DSLF)
• Sampling that allows to treat the disparity space as a
continuous space
• Less than 1px disparity between adjacent views
• Lines in EPI become unambiguous
• Influenced by
• Sampling density on the t and v plane
• (Minimal) depth and (smallest) details in the scene
• Bilinear interpolation can be used for finding finer
details
• Without introducing any major aliasing errors
t
Dt
Densely sampled light field (DSLF)
20 40 60 80 100 120 140
20
40
60
80
100
120
14020 40 60 80 100 120 140
2
4
6
8
10
12
14
20 40 60 80 100 120 140
20
40
60
80
100
120
140
dmax = 1dmax = 10
Continuous plenoptic function
Linear interpolationAdvance interpolation
Densely sampled light field recontstruction (aka angular super-resolution) by sparsification in shearlet transform domain
Reconstruction by processing EPIs
Set of captured viewsCoarsely
sampled
Densely
sampled
t
v
t
v
≤1px
disp.
lines in EPI domain
cones in spectral
domain
Structured data
Reconstruction by processing EPIs
• Impainting: fill in holes (missing pixels) with visually-
acceptable values
• argmin𝛼
1
2HD𝛼 − Hy 2
2 + 𝜆 𝛼 1 where H is the
operator selecting the given and missing values
• D is a proper dictionary / transform domain where the light field gets sparsified
Hy
y
Shearlet elements in Fourier and spatial domain
• Dictionary is formed by shearlet atoms and the coefficients are found by Shearlet transform
y = D𝛼; 𝛼 = 𝑆 y 𝑦 = 𝑆∗(𝛼)
Vagharshakuan, Bregovic, Gotchev, Light
Field Reconstruction using Shearlet
Transform, IEEE Trans. PAMI, 2017
The algorithm
• Reconstruction formula ො𝑦 = argmin𝑦
𝑆(𝑦) 1, subject to 𝑥 = 𝐻𝑦
t
v
t
v
y𝑥 = 𝐻𝑦
The algorithm
• How to solve this?
ො𝑦 = argmin𝑦
𝑆(𝑦) 1, subject to 𝑥 = 𝐻𝑦
• One needs a regularizer, which will minimize the 𝑙1 norm
• Regularizer applied in the form of hard thresholding in the shearlet domain in
the fashion of denoising…
(𝑇𝜆𝑠)(𝑘) = ቊ𝑠(𝑘), |𝑠(𝑘)| ≥ 𝜆0, |𝑠(𝑛)| < 𝜆
The algorithm
• Iterative procedure
𝑦𝑛+1 = 𝑆∗ 𝑇𝜆𝑛(𝑆(𝑦𝑛 + 𝛼𝑛(𝑥 − 𝐻𝑦𝑛))) ,
where
(𝑇𝜆𝑠)(𝑘) = ቊ𝑠(𝑘), |𝑠(𝑘)| ≥ 𝜆0, |𝑠(𝑘)| < 𝜆
is a hard thresholding operator and 𝛼𝑛 is an
acceleration parameter controlling the convergence
Vagharshakuan, Bregovic, Gotchev, Light
Field Reconstruction using Shearlet
Transform, IEEE Trans. PAMI, 2017
Epipolar-plane image reconstruction (x32)
Input (16
views)Ground-truth Reconstructed
Epipolar-plane image reconstruction (x32)
How to handle full parallax
• Hierarchical reconstruction allows to use lower number of layers
Full parallax
Semi-transparent scenes
Non-Lambertian Scene
Ground Truth Shearlet Reconstruction SFFT[2]
[2] L. Shi, H. Hassanieh, A. Davis, D. Katabi, and F.
Durand, “Light field reconstruction using sparsity in
the continuous fourier domain,” ACM Trans. on
Graphics (TOG), vol. 34, no. 1, p. 12, 2014
Sergio Moreschini, Robert Bregovic, Atanas
Gotchev, Shearlet-Based Light Field
Reconstruction of Scenes with non-Lambertian
properties, 3DTV-CON 2018
Non-Lambertian Scene
Ground Truth Shearlet Reconstruction SFFT[2]
36
Non-Lambertian Scene
Shearlet Reconstruction SFFT[2]
Joint spatial-angular super-resolution
In angular direction….
Horizontal parallax light field Densely sampled light field
In spatial direction….
High resolution and densely sampled light field
Low spatial resolutionmulti-perspective images
Required high resolution images
𝐱𝐝𝐬𝐲 𝐱𝐬𝐫
𝐲 = 𝐇𝐬𝐩𝐭 𝐱𝐬𝐫 𝐱𝐬𝐫 = 𝐇𝐚𝐧𝐱𝐝𝐬
𝐇𝐬𝐩𝐭 - given decimation
matrix in spatial domain
𝐇𝐬𝐩𝐭 - decimation matrix
in angular dimension
argminxsr,xds
𝐲 − 𝐇𝐬𝐩𝐭 𝐱𝐬𝐫 𝟐
𝟐+ 𝛾 𝐱𝐬𝐫 −𝐇𝐚𝐧𝐱𝐝𝐬 𝟐
𝟐 + 𝜆 𝐒𝐱𝐝𝐬 𝟎
Spatial and angular super-resolution formulated as a variational optimization problem
Formulating the problem….
𝐲 − 𝐇𝐬𝐩𝐭 𝐱𝐬𝐫 𝟐
𝟐+ 𝛾 𝐱𝐬𝐫 −𝐇𝐚𝐧𝐱𝐝𝐬 𝟐
𝟐 + 𝜆 𝐒𝐱𝐝𝐬 𝟎
xsrk = xsr
k−1 + 𝜏 A y − Hsptxsrk−1 + 𝛾zk
A~Hspt−1
almost inverse, interpolation filter + guided filtering
𝐲 − 𝐇𝐬𝐩𝐭 𝐱𝐬𝐫 𝟐
𝟐+ 𝛾 𝐱𝐬𝐫 − zk
𝟐
𝟐, for fixed z𝐤 = 𝐇𝐚𝐧𝐱𝐝𝐬
Gradient descent
Spatial super-resolution
𝐲 − 𝐇𝐬𝐩𝐭 𝐱𝐬𝐫 𝟐
𝟐+ 𝛾 𝐱𝐬𝐫 −𝐇𝐚𝐧𝐱𝐝𝐬 𝟐
𝟐 + 𝜆 𝐒𝐱𝐝𝐬 𝟎
S. Vagharshakyan, R. Bregovic and A. Gotchev, "Accelerated Shearlet-Domain Light Field Reconstruction,"
in IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 7, pp. 1082-1091, Oct. 2017
𝛾 𝐱𝐬𝐫𝐤 −𝐇𝐚𝐧𝐱𝐝𝐬 𝟐
𝟐+ 𝜆 𝐒𝐱𝐝𝐬 𝟎, for fixed 𝐱𝐬𝐫
𝐤
Iterative thresholding in Shearlet transform domain
Angular super-resolution
Results: block-average x2
Mattia Rossi and Pascal Frossard, Geometry-Consistent Light Field
Super-Resolution Via Graph-Based Regularization, IEEE Tran. on
Image Processing, vol. 27, no. 9, pp. 4207-4218, Sep. 2018
Results: block-average x3
Results: block-average x4
Results: Gaussian average x2
Martin Alain, Aljosha Smolic, "Light Field Super-Resolution
via LFBM5D Sparse Coding", IEEE International
Conference on Image Processing (ICIP 2018), 2018
Results: Gaussian average x3
Results: Gaussian average x4
Compression
POC 0
POC 4
POC 8
POC 12
POC 16
• POC 0,4,8,12,16 are encoded with MV-HEVC.
• Predict & encode intermediate views with MV-HEVC.
• Predict & encode intermediate views with Shearlet transform.
• Anchor, encoded POC 0 to POC 16 with HEVC.
Research Methodology (Single Layer Example)
Compression
Sub-Sampling of
Views
Input 17x17 Views
Decoded 5x5 Views
MV-HEVC
Encoder
MV-HEVC
Decoder
5x5 Views
Shearlet
Transform
Prediction
Predicted [17x17 - 5x5] Views
Residual
Estimation
Reference [17x17 - 5x5] Views
MV-HEVC
Encoder
Encoded Stream
Proposed Compression Scheme
Encoder
Preprocessing
Encoded Stream
Decoded 5x5 Views
MV-HEVC
Decoder
Encoded 5x5 Views
Shearlet
Transform
Prediction
Residual
Compensation
17x17 Output Views
Proposed Compression Scheme
Decoder
MV-HEVC
Decoder
Predicted [17x17 - 5x5] Views
Residual [17x17 - 5x5] Views
Encoded Residual [17x17 - 5x5] Views
Compression
Delta SNR=-2.21
Truck Image
Rate Distortion curves between 17x17 Grid Encoded by HEVC & X265 and
17x17 Grid reconstructed by Shearlet (using 5x5 HEVC decoded grid)
Delta SNR=-3.88
X265 AnchorHEVC Anchor
HEVC, 35 dB
Shearlet, 39 dB
Compression
Delta SNR=-0.17
Bunny Image
Rate curves between 17x17 Grid Encoded by HEVC and 17x17
Grid reconstructed by Shearlet (using 5x5 HEVC decoded grid)
Delta SNR= -1.85
X265 AnchorHEVC Anchor
Ground Truth
HEVC with PSNR 37 dB
Shearlet with PSNR 41 dB
Applications
Continuous refocusing for
integral microscopy with
Fourier plane recording
• Conversion of light fields to several different types of holographic representations (e.g. holographic stereogram,
Fresnel holograms) are studied.
• For example, hogels of holographic stereograms consist of several windowed planes waves (propagating to different
directions) whose intensities are defined by the captured light field:
𝑂𝐻𝑆 𝑥 =
𝑚
rect𝑥 − 𝑚∆𝑥
∆𝑥×
𝑖
𝐿 𝑚, 𝑖 exp 𝑗2𝜋𝑓𝑥𝑚𝑖𝑥
Hologram generation from light fields
• Holograms usually impose dense capture of light fields, which requires tedious work e.g. by camera rigs.
• We demonstrated that the capture constraints can be significantly relieved by utilizing the shearlet decomposition
based light field reconstruction. Thus, it becomes possible to use camera arrays.
Perceived images by human eye at different
viewpoints corresponding to LFs with
(a) 1mm, (b) 8mm baselines.
(hologram reconstructions are simulated via
wave field propagation)
(a) (b)
Sahin E., Vagharshakyan S., Mäkinen J., Bregovic R., Gotchev A. “Shearlet-domain light field reconstruction for
holographic stereogram generation”, 2016 IEEE Int. Conf. Image Processing (ICIP). IEEE, 2016. p. 1479-1483.
Hologram generation from light fields
Light field displays
Fourier analysis of Light Field displays
• Projection-based LF displays
• Optical modules
• Ray generators
• Holographic screen
• Discrete to continuous conversion
• LF reconstruction instead of views
T. Balogh, “The HoloVizio system,” Proc. SPIE 6055, 2006
Ray Propagation in LF displays
x
z
Screenplane(s)
(x,z) = (0,0)
zp1
Ray generator(RG) plane
dp
ds
1 Np
FOVp
Ray r
zp2
zp3
zp4
Samplingpatterns
LF sampling topologies in ray space
-2 0 2-2
-1
0
1
2
x
-2 0 2-2
-1
0
1
2
x
-2 0 2-2
-1
0
1
2
x
-4 -2 0 2 4-4
-2
0
2
4
x
-4 -2 0 2 4-4
-2
0
2
4
x
zp1 zp3
-50 0 50-4
-2
0
2
4
x
-50 0 50-4
-2
0
2
4
x
-20 0 20-4
-2
0
2
4
x
zp2
z=0
z=0
zp2 zp4
zp4
Display bandwidth
• Angular-spatial bandwidth at the screen level
determined by the size of the Voronoi cells
calculated for the sampling grid at the screen
plane
• Determines the display passband (throughput)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
R. Bregović, P. T. Kovács, A. Gotchev, “Optimization of light field display-camera
configuration based on display properties in spectral domain,” Opt. Express, Feb. 2016.
Display-camera setup
xc
zp
Ray
generators
plane
Screen
plane
Viewing /
camera
plane
(0,0)
z
xxp
s
FOVcam
FOVproj
zc
Nc
Np1
1
Finite number of rays
generated by the display
Limited display
bandwidth
Enough ‘correct’ rays
captured by cameras
Display bandwidth at camera plane
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Ω Ω
Ω x
Camera setup for optimal capture
-0.5 0 0.5-20
-10
0
10
20
x
Blue: 𝑆 ҧ𝑧𝑐 𝑃∗ 𝑉( ҧ𝑥𝑝, ത𝛼𝑝, ҧ𝑧𝑝)
Green: 𝑃∗ 𝑉( ҧ𝑥𝑐 , ത𝛼𝑐)Red: 𝑃∗ 𝑉(𝑥𝑐
𝐵𝐼𝐺 , 𝛼𝑐𝐵𝐼𝐺)
• Optimal with respect to a given display
• Desired visualization quality
• Determine the optimal display setup
• Given a display setup
• Determine the optimal capture (data) setup
• Will never have matching data
• LF interpolation / reconstruction needed
Conclusions
• Light Field technologies capable of recreating 3D visual cues beyond binocularity
• Densely Sampled Light Field as an LF representation capable to deliver the desired density
of rays for recreating focus and continuous parallax visual cues
• Computational imaging tools for DSLF reconstruction from sparse cameras
• Research challenges related with the computational complexity of LF reconstruction
techniques
• Research challenges related with the LF display technologies
Thank you for your attention!