arX
iv:2
003.
0238
6v3
[cs.
LG]
28 M
ar 2
020
PREPRINT VERSION. 1
mmFall: Fall Detection using 4D MmWave Radar
and Variational Recurrent AutoencoderFeng Jin, Student Member, IEEE, Arindam Sengupta, Student Member, IEEE, and Siyang Cao, Member, IEEE
Abstract—Elderly fall prevention and detection becomesextremely crucial with the fast aging population globally.In this paper we propose mmFall - a novel fall detectionsystem, which comprises of (i) the emerging millimeter-wave(mmWave) radar sensor to collect the human body’s pointcloud along with the body centroid, and (ii) a variationalrecurrent autoencoder (VRAE) to compute the anomaly levelof the body motion based on the acquired point cloud. A fall isclaimed to have occurred when the spike in anomaly level andthe drop in centroid height occur simultaneously. The mmWaveradar sensor provides several advantages, such as privacy-compliance and high-sensitivity to motion, over the traditionalsensing modalities. However, (i) randomness in radar pointcloud data and (ii) difficulties in fall collection/labeling in thetraditional supervised fall detection approaches are the twomain challenges. To overcome the randomness in radar data,the proposed VRAE uses variational inference, a probabilisticapproach rather than the traditional deterministic approach, toinfer the posterior probability of the body’s latent motion stateat each frame, followed by a recurrent neural network (RNN)to learn the temporal features of the motion over multipleframes. Moreover, to circumvent the difficulties in fall datacollection/labeling, the VRAE is built upon an autoencoderarchitecture in a semi-supervised approach, and trained ononly normal activities of daily living (ADL) such that in theinference stage the VRAE will generate a spike in the anomalylevel once an abnormal motion, such as fall, occurs. During theexperiment1, we implemented the VRAE along with two otherbaselines, and tested on the dataset collected in an apartment.The receiver operating characteristic (ROC) curve indicatesthat our proposed model outperforms the other two baselines,and achieves 98% detection out of 50 falls at the expense ofjust 2 false alarms.
Note to Practitioners—In the traditional non-wearable falldetection approaches, researchers typically use a vision-basedsensor, such as camera, to monitor the elderly and build aclassifier to identify the fall based on the features extractedfrom collected fall data. However, several problems probablymake those conventional methods not practical. Firstly, cameramay violate the elderlys life privacy. Secondly, fall datacollection is difficult and costly, not to mention the impossibleask of the elderly repeating falls for data collection. In thispaper, we propose a new fall detection approach to overcomethese problems. Firstly, we use the mmWave radar sensorto monitor the elderly. The mmWave radar sensor is highlysensitive to motion, which makes it very suitable for falldetection while protecting elderly privacy. Secondly, we use asemi-supervised anomaly detection approach to circumvent thefall data collection. More specifically, we only collect the non-fall data using mmWave radar sensor, which is very convenient
Copyright (c) 2015 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].
Feng Jin, Arindam Sengupta and Siyang Cao are with Department ofElectrical and Computer Engineering, the University of Arizona, Tucson,AZ, 85719 USA. (e-mail: {fengjin, sengupta, caos}@email.arizona.edu)
1All the codes and datasets are shared on GitHub (https://github.com/radar-lab/mmfall).
and practical to obtain. And then, we studied method toonly remember these non-fall data so that the system will besurprised during fall occurrence. Therefore, our proposed falldetection solution is privacy-compliant and does not rely onfall data which is difficult to collect. Moreover, the proposedneural network is very light so that it can operate in real-time.Thus, our proposed solution can be practically deployed inthe living room where the elderly live. In the future, we couldmake the necessary hardware engineering to get a better radarresolution so that the proposed fall detection solution could bemore reliable.
Index Terms—Fall detection, millimeter wave radar, vari-ational autoencoder, recurrent autoencoder, semi-supervisedlearning, anomaly detection.
I. INTRODUCTION
GLOBALLY, the elderly aged 65 or over make up the
fastest-growing age group. In fact, the proportion of
the elderly in the world is projected to reach nearly 12%
by 2030 from 9% in 2019 [1]. Approximately 28-35%
of the elderly fall every year [2], and suffer the greatest
number of fatal falls, making it the second leading cause
of unintentional injury death after road traffic injuries [3].
Moreover, elderly falls are costly. Only in the United States
in 2015, the total direct cost of fall among the elderly,
adjusted for inflation, was 31.9 billion USD [4]. As a
result, measures to protect the elderly from fall injuries
become increasingly urgent from both a social and economic
perspective.
Physical training program, such as Simplified Tai Chi,
can assist in reducing fall risks [5], while it is beyond the
research scope of information technology (IT). In the IT
community, researchers seek to (i) either predict the fall right
before it occurs so that a mechanism, such as a wearable
airbag [6] or a walking-aid cane robot [7], can be deployed
to reduce the fall injuries; or (ii) detect the fall right after it
occurs along with an alert sent to the caregiver immediately
so a timely treatment can be implemented [8]. Based on the
type of sensor to capture data for body motion analysis, the
fall-related research can be divided into three categories, i.e.,
wearable, non-wearable and fusion solutions [8].
Wearable solutions usually require the elderly to carry
single or several sensors on their body. These sensors
can measure the motion parameters, such as acceleration
(accelerometer), angular velocity (gyroscope), orientation
(magnetometer), and tilt angle (inclinometer), among others.
Thus, a fall can be predicted or detected by analyzing the
sudden change of body motion or posture [9]–[11]. On
the other hand, non-wearable solutions place sensors in
the environment to monitor the elderly motion. One of the
primary non-wearable sensors is camera, that provides rich
motion details visually such that the silhouette change can
be perceived over time in order to predict or detect the
fall [12]. Some ambient sensors, which provide acoustic
[13], vibration [14], thermal [15] , radio frequency (RF)
[16] changes in the environment when a fall occurs, have
also be used in the past. Furthermore, depth sensors, which
generate the depth information per pixel, have also been
used to track the joint motion to detect fall [17]. Radar
sensors have also been used to capture the distinguish micro-
Doppler pattern for fall detection [18]. Beside wearable and
non-wearable solutions, the fusion solution involves multiple
heterogeneous sensors, which can potentially improve the
reliability and specificity of fall-related systems [19].
In this paper, we are focusing on non-wearable fall
detection using the emerging millimeter-wave (mmWave)
radar sensor [20]. In short, mmWave radar sensor measures
the point cloud coming from moving objects in a scene.
Each point in the radar point cloud contains a 3D position
and a 1-D Doppler (radial velocity component) information,
thereby resulting in a 4D mmWave radar as referred to in
the paper title. In the fall detection application, mmWave
radar sensor can offer several advantages over the other
traditional sensing technologies, viz. (i) non-intrusive and
convenient over the wearable solutions that require the
elderly cooperation and compliance to be worn, and also
need frequent battery recharging; (ii) privacy-compliant over
camera, as video monitoring violates the privacy of the
elderly daily life; (iii) sensitive to motion and operationally
robust to occlusions compared to depth sensor, especially in
a complex living environment; (iv) more informative over
typical ambient sensors which suffer interference from the
external environment [21]; and (v) low-cost, compact and
high resolution over the traditional radars.
Now, we restrict our attention to radar-related fall detec-
tion research for detection methodology review. Tradition-
ally, radar-related fall detection system collects fall data,
applies time-frequency analysis to extract the micro-Doppler
features [22], and then detects the fall with a classifier, as
fall has a unique micro-Doppler signature than other mo-
tions. In [23], the authors pre-screened simulated falls from
stunt actors collected by a Doppler radar, proposed to use
Wavelet transform (WT) to extract the features and estimated
the wavelet coefficients from the simulated falls, and then
detected fall using the nearest neighbor (NN) approach based
on the feature distance between real-life motion sample
and the simulated fall samples. In [24], the authors used
a range-Doppler radar to capture the spectrogram of human
motion as a micro-Doppler signature, and gathered the range
map at the same time in order to distinguish fall from
sitting/bending, by training a logistic regression model with
labeled data on simulated falls and non-falls.
However, these traditional approaches have several ma-
jor drawbacks. First, the problem in these supervised ap-
proaches, that need manually preprocessed fall data samples
for feature extraction and classifier training, is that the fall
is very difficult to collect as the fall event is rare and non-
continuous, not to mention the impossible ask of the elderly
repeating falls for data collection. Moreover, the simulated
fall from stunt actor may differ from the real-life elderly
fall. Meanwhile, the fall data labeling procedure is very ex-
pensive as it requires manual extraction of short portions of
fall event from a long duration recording. Secondly, no target
separation is considered in these traditional approaches. This
may cause problems when there are interfering sources in
motion, for example ceiling fans, as in this case the most
commonly used time-frequency extractor would take in a
mix of motion from target human body and interference.
To overcome these drawbacks, we use mmWave radar
sensor which clusters and tracks multiple moving targets in
a scene so that interference can be excluded from the target
human’s body, and leverage the semi-supervised anomaly
detection approach to circumvent difficulties in the real-
life elderly fall collection. Anomaly detection refers to the
problem of finding patterns in data that do not conform to
expected behavior [25]. In our case, the expected behaviors
would be the normal activities of daily living (ADL), such as
walking/sitting/crouching, etc., while a fall does not conform
to these normal behaviors. The semi-supervised anomaly
detection method, such as autoencoder, trains a model with
only normal ADL without data labeling. Thus, the model
will ‘recognize’ the normal ADL, while a fall will ‘surprise’
the model.
Particularly, we propose a variational recurrent autoen-
coder (VRAE) to learn the radar point cloud in a proba-
bilistic way rather than deterministic way on account of the
radar data randomness. In the encoding part, VRAE learns
the posterior probability of the target human body’s latent
motion state at each frame through variational inference,
following a recurrent neural network for temporal feature
compression over multiple frames. In the decoding part, the
VRAE tries to reconstruct the likelihood of the input radar
data, given a latent motion state sequence recovered from the
encoding part. This model will be trained only with normal
ADL data, and output low loss, also denoted as anomaly
level, during normal ADL occurrence. Then we claim a fall
detection when there is a spike in the anomaly level and
a sudden drop of the body’s centroid height, which is also
monitored by the radar sensor in parallel at the same time.
The rest of this paper is organized as follows. Section
II introduces all the components that constitute our pro-
posed mmFall system, including the principles of mmWave
radar sensor, variational inference, variational autoencoder,
and recurrent autoencoder. Section III presents the details
of proposed mmFall system, including the overall system
architecture, a novel data oversampling method and a custom
loss function for model training. Section IV shows the ex-
perimental results conducted in an apartment, and compares
the performance of our model with two baselines. Finally,
section V concludes the paper.
II. PRELIMINARIES
In this section, we introduce the background of all the
components that constitute the proposed mmFall system
detailed in the next section.
A. 4D mmWave FMCW Radar Sensor
The carrier frequency of mmWave frequency-modulated
continuous-wave (FMCW) radar sensor, or mmWave radar
sensor for short, ranges from 57 GHz to 85 GHz according
to various applications. For example, 76-81 GHz is primarily
used for automotive applications such as objects’ dynamics
measurement [26], and 57-64 GHz can be used for short-
range interactive motion sensing such as in Google’s Soli
project [27]. Coming along with the high carrier frequency,
a high bandwidth up to 4 GHz is available, and the physical
size of hardware components, including antennas, shrinks.
This eventually makes the mmWave radar sensor more
compact and higher resolution than the traditional low-
frequency band radars.
There are no significant differences in signal modula-
tion and processing of mmWave radar sensor than that of
conventional FMCW radars described in [28]. Generally,
the mmWave radar sensor transmits multiple linear FMCW
signals over multiple antenna channels in both azimuth and
elevation. After the stretch processing and digitalization, a
raw multidimensional radar data cube is obtained. Followed
by a series of fast Fourier transform (FFT), the parameters of
each reflection point in a scene, i.e., range r, azimuth angle
θAZ , elevation angle θEL, and Doppler D, are estimated.
In addition, during this process the constant false alarm
rate (CFAR) is incorporated to detect the points above a
given signal-to-noise ratio (SNR) against the surrounding
noise, and the moving target indication (MTI) is applied to
distinguish the moving points from the static background.
Eventually, a set of moving points, also called radar point
cloud, is obtained.
Fig. 1: MmWave radar sensor and radar point cloud. (a) ThemmWave radar sensor is set up in an apartment, the cameraprovides a view for reference, and the laptop is used for dataacquisition. The same setup is also used in the experiment inSection IV. (b) Radar point cloud in a two-person scenario. Oneis lying down on the floor, and the other one is walking. For thepoints, different color indicates different person while the yellowpoint indicates the centroid. For the coordinates, red is the cross-radar direction, green is the forward direction, and blue is theheight direction. The original radar measurement of each pointis a vector of (r, θAZ, θEL, D), along with the estimated centroid of(xc, yc, zc).
If multiple moving targets are present in a scene, the
obtained point cloud is a collection of such points from all
targets. Thus, a clustering method, such as the density-based
spatial clustering of applications with noise (DBSCAN), has
to be applied to group a subset of points into a single target
such that multiple targets can be separated. Meanwhile,
the centroid of target can be estimated from the point
subset associated with it. Followed by a tracking algorithm,
such as Kalman filtering, the trajectory of each target will
be recorded with an association of a unique target ID.
Particularly, a joint clustering/tracking algorithm call Group
Tracking [29] can be used as well. Fig. 1 shows the mmWave
radar sensor and the radar point cloud we can get from it.
With the target ID, the motion history of each target can be
gathered separately in a multiple-target scenario, such that
we are able to analyze each target’s motion individually.
For simplicity but without loss of generality, we will only
discuss the single-person scenario thereafter.
B. Radar Point Cloud Distribution for Human Body Motion
From Fig. 1 (b), a straightforward fall detection approach
is to analyze the height of the body centroid. For instance, if
the body centroid suddenly drops to the ground level, then
a fall event is detected. However, this approach may easily
cause a false alarm when crouching or sitting.
Considering the randomness of radar measurement, we
now start to view the radar point cloud of the human body
as a probabilistic distribution. From the observation of Fig.
1 (b), we make an assumption as
Assumption 1. At each frame, the radar point cloud of the
human body, denoted as X, follows a specific multivariate
Gaussian distribution whose mean is relevant to the body
centroid, and covariance is relevant to the body shape. The
body motion state, such as walking/fall, is a latent variable
denoted as z. Given a z, the change of distribution X over
multiple frames has a unique pattern, which is named as
motion pattern.
For example, the radar point cloud of a lying-down (on
the floor) person may have a large variance in x/y-axis but
small variance in z-axis. In contrast, the radar point cloud
of a walking person may have small variance in x/y-axis but
large variance in z-axis. A depiction of such motion pattern
is shown in Fig. 2.
(a) (b) (c)
Z−axis
Y−axis
X−axis
Y−axis
X−axis
Z−axis
Y−axis
X−axis
Z−axis
Fig. 2: A depiction of motion pattern, i.e., the radar point clouddistribution change over multiple frames, for different motions. Theellipse represents the distribution, and the yellow point indicatesthe centroid. Different color indicates the frame in time order: red,green, and then blue. (a) Walking; (b) Crouching; (c) Fall.
Although Assumption 1 might not be true as we never
know the true physical generation process of radar data from
a human body, we at least believe that this assumption is
enough for our purpose, i.e., distinguish different motion of
the human body. Therefore, intuitively we propose to detect
fall through ‘learning’ the uniqueness of such a motion
pattern.
The overview of following subsections is, we propose to
(i) learn the distribution at each frame through variational
inference, (ii) learn the distribution change over multiple
frames through recurrent neural network, (iii) and overall
they are discussed in the framework of autoencoder for semi-
supervised learning approach.
C. Variational Inference
More formally, at each frame we obtain a N -point
radar point cloud X={xn}Nn=1. The original radar mea-
surement of each point xn is a four-dimensional vector
of (r, θAZ, θEL, D). After coordinates transformation to the
Cartesian coordinate, xn goes to be (x, y, z,D). We view
the points in X are independently drawn from the likelihood
p(X|z), given a latent motion state z which is a D-
dimensional continuous vector. According to Assumption
1, p(X|z) follows multivariate Gaussian distribution. The
Bayes’ theorem shows
p(z|X)︸ ︷︷ ︸
posterior
=
likelihood︷ ︸︸ ︷
p(X|z)prior︷︸︸︷
p(z)
p(X)︸ ︷︷ ︸
evidence
=p(X|z)p(z)
∫p(X|z)p(z)dz . (1)
We expect to infer the motion state z based on the
observation X. This is equivalent to infer the posterior
p(z|X) of z. Due to the difficulties in solving p(z|X)analytically as the evidence p(X) is usually intractable, two
major approximation approaches, i.e Markov chain Monte
Carlo (MCMC) and variational inference (VI), are mostly
used.
Generally, the MCMC approach [30] uses a sampling
method to draw enough samples from a tractable proposal
distribution which is eventually approximate to the target
distribution p(z|X). The most commonly used MCMC
algorithm iteratively samples a data zt from an arbitrary
tractable proposal distribution q(zt|z(t−1)) at step t, and
then accept it with a probability of
min{1, p(zt|X) ∗ q(zt|z(t−1))
p(z(t−1)|X) ∗ q(z(t−1)|zt)}
= min{1, p(X|zt)p(zt) ∗ q(zt|z(t−1))
p(X|z(t−1))p(z(t−1)) ∗ q(z(t−1)|zt)}, (2)
where the difficult calculation of p(X) has been circum-
vented. And it has been proven that this approach constructs
a Markov chain whose equilibrium distribution equals to
p(z|X) and is independent to the initial choice of q(z0).One of the disadvantages in the MCMC approach is that
the chain needs a long and indeterminable burn-in period
to approximately reach the equilibrium distribution. This
causes the MCMC not suitable for learning on large-scale
dataset.
On the other hand, the VI approach [31] uses a family
of tractable probability distribution Q{q(z)} to approximate
the true p(z|X) instead of solving it analytically. The VI
approach changes the inference problem to an optimization
problem as
q∗(z) = argminq(z)∈Q
KLD{q(z)||p(z|X)}, (3)
where KLD is the KullbackLeibler divergence that measures
the distance between two probability distributions. And by
definition we have
KLD{q(z)||p(z|X)}
:=
∫
q(z) logq(z)
p(z|X)dz
=
∫
q(z) log q(z)dz −∫
q(z) log p(z|X)dz
= Eq[log q(z)]− Eq[log p(z|X)]
= Eq[log q(z)]− Eq[log p(X|z)p(z)] + Eq[log p(X)]
= Eq[log q(z)]− Eq[log p(X|z)p(z)]︸ ︷︷ ︸
L(q)
+ log p(X), (4)
where Eq[∗] is the statistical expectation operator of function
∗ whose variable follows q(z), and L is called the evidence
low bound (ELBO). As the term log p(X) is constant with
respect to z, the optimization in Equ. (3) is simplified to be
q∗(z) = argminq(z)∈Q
L(q). (5)
Here, the difficult computation of p(X) is also circum-
vented. This optimization approach leads to one of the
advantages of VI, that it can be integrated into a neu-
ral network framework and optimized through the back-
propagation algorithm.
It is critical to choose the variational distribution Q{q(z)}such that it is not only flexible enough to closely approxi-
mate the p(z|X), but also simple enough for efficient opti-
mization. The most commonly used option is the factorized
Gaussian family
q(z) =
D∏
d=1
q(z[d]) =
D∏
d=1
N (z[d]|µq[d],σq[d]), (6)
where (µq,σq) are mean and covariance of the distribution
of latent variable z with a predetermined length of D, and
the components in z are mutually independent.
D. Variational Autoencoder
As we briefly state previously, we adopt the semi-
supervised anomaly detection approach to train model only
on normal ADL such that the model will be surprised by
the ‘unseen’ fall data. The common approach is autoencoder,
whose basic architecture is shown in Fig. 3 (a). The autoen-
coder consists of two parts, i.e., encoder and decoder. In
most cases, the decoder is simply a mirror of the encoder.
The encoder compresses the input data X to a latent feature
vector z with less dimensions, and reversely the decoder
reconstructs X′ to be as close to X as possible, based on the
latent feature vector z. Generally, the multilayer perceptrons
(MLP) are used to model the non-linear mapping function
between X and z, as the MLP is a powerful universal
function approximator [32]. Besides a predetermined non-
linear activation function, such as sigmoid/tanh, the MLP
is characterized by its weights and biases. The training
objective is to minimize the loss function between X and
X′ with respect to the weights and biases of encoder MLP
and decoder MLP. The loss function could be cross-entropy
for a categorical classification problem or mean square error
(MSE) for a regression problem.
Reparameterizationthrough
Trick
Sampling
LayerOutputInput
Layer LayerOutput
LayerHidden
Inpu
t Dat
a X
Sampling
Latent
(a) (b)
InputLayer
DecoderMLPEncoderMLP SamplingDecoderMLPEncoderMLP
Variable
DecompressionCompression Compression Decompression
Reco
nstru
ctedD
ataX
′
Inp
ut
Dat
aX
Reco
nstru
ctedlik
eliho
od
’s(µ
p,σ
p )
Z
Σ
×
µq
z
σq
N (0, I)
Fig. 3: Autoencoder architecture. (a) Vanilla autoencoder archi-tecture. (b) Variational autoencoder architecture with factorizedGaussian parametrized by (µq ,σq).
In this way, the autoencoder squeezes the dimensionality
to reduce the redundancy of input data. So it learns a com-
pressed yet informative latent feature vector in X. Therefore,
the autoencoder will result in a close reconstruction X′ from
the input data similar to X, with a low reconstruction loss.
However, whenever an ‘unseen’ data passes through, the
autoencoder will erroneously squeeze it and be unable to
reconstruct it well. This will lead to a loss spike from which
an anomaly can be detected.
Similarly, in the variational autoencoder (VAE) [33], [34]
in Fig. 3 (b), the encoder learns q(z) that aims to approxi-
mate p(z|X) from the input data X using VI approach, and
the decoder reconstructs the p(X|z) based on z sampled
from the learned q(z). The training objective is as in Equ.
5. We can evaluate the loss function in VAE case further,
LVAE = L(q) = Eq[log q(z)]− Eq[log p(z)]− Eq[log p(X|z)= KLD{q(z)||p(z)} − Eq[log p(X|z)]. (7)
For the variational distribution q(z), the factorized Gaus-
sian in Equ. 6 is used, and for the prior p(z), a common
choice of Gaussian N (z|0, I) is used as we do not have a
strong assumption on it. Therefore, the first term in LVAE in
Equ. (7) is reduced to
KLD{q(z)||p(z)}
=− 1
2
D∑
d=1
{1 + logσq[d]2 − µq[d]
2 − σq[d]2}, (8)
where (µq,σq) is the mean and variance of the factorized
Gaussian q(z) with D-dimensional latent vector z. See
Appendix A for detailed derivation.
For the second term in LVAE in Equ. (7), it is reduced to
Eq[log p(X|z)]
=
∫
q(z) log p(X|z)dz
≈ log p(X|z)= logN (X|µp,σp)
= log
N∏
n=1
N (xn|µp,σp)
=
N∑
n=1
K∑
k=1
logN (xn[k]|µp[k],σp[k])
≈− 1
2
N∑
n=1
K∑
k=1
{ (xn[k]− µp[k])2
σp[k]2+ logσp[k]
2}, (9)
where the third line is single-data Monte Carlo estimation,
the single-data z is sampled from q(z); the fourth line
comes from Assumption 1 on the likelihood parametrized
by (µp,σp); X={xn}Nn=1 is the input point cloud, each
point xn is a K-dimensional vector; log√2π in the last
line is ignored as it is a constant.
From the first line to the second line in Equ. (9), a single
sample of z is needed. Instead of drawing from q(z) directly,
the reparameterization trick [33], [35] as the following is
used.
z = µp + σp ⊙ ǫ, (10)
where ǫ ∼ N (0, I), ⊙ means element-wise production. And
the trick is to draw a K-dimensional sample ǫ from N (0, I),and then get z through Equ. (10).
By viewing Equ. (8), the VAE encoder becomes clear as
(µq, logσ2q) = EncoderMLPφ{X}, (11)
where the weights and biases of encoder MLP are denoted
as φ. In other words, the EncoderMLPφ estimates the
parameters of q(z) from the input X.
Similarly, by viewing Equ. (9), the VAE decoder becomes
clear as
(µp, logσ2p) = DecoderMLPθ{z}, (12)
where the weights and biases of decoder MLP are denoted
as θ. In other words, the DecoderMLPθ estimates the
parameters of p(X|z) from the z sample obtained from Equ.
(10).
Then the VAE architecture shown in Fig. 3 (b) becomes
clear by combining EncoderMLPφ and DecoderMLPθ
together, where these two parts are bridged through the sam-
pling of z. And the VAE training objective is to minimize
the loss function, which is the subtraction of Equ. (8) from
Equ. (9) according to Equ. (7), with respect to (φ, θ).
E. Recurrent Autoencoder
While we use the VI approach to learn the radar point
cloud distribution at each frame, we also need a sequence-
to-sequence modeling approach to learn distribution changes
over multiple frames, as we stated in Section II-B previously.
The recurrent neural network (RNN) is such a basic
sequence-to-sequence model for temporal applications. At
every frame l, an RNN accepts two inputs, input from the
sequence at the l-th frame xl and its previous hidden state
hl−1, to output a new hidden state hl, calculated as:
hl = tanh(W ∗ hl−1 + U ∗ xl) ∀l = 1, 2, . . . , L (13)
where W and U are learnable weights (including the bias
term, omitted for brevity), and L is the length of the
sequence. Note that at l=1, h0 is defined as the initial
RNN state that is either initialized as zeros, or randomly
initialized. Also, note that the hidden state hl acts as an
accumulated memory state as it continuously computed and
updated with new information in the sequence. However, in
[36], the primary shortcoming in RNNs in modeling long
term dependencies, due to vanishing/exploding gradients,
was thoroughly explored and identified. Long-Short-Term-
Memory (LSTM) and Gated-Recurrent-Units (GRUs) were
developed as a result, to overcome this challenge [37], [38].
In our case, as a fall motion may last for about one
second, that is ten frames for the radar data rate of ten
frames per second, the long term dependency is not an issue
here. Only the basic RNN is used for light computation load
consideration.
EncoderMLP DecoderMLP
RNN Encoder
RNN Decoder
RNN Seq2Seq ModelCompression Decompression
Input
Seq
uen
ce{X
l}L l=
1 Reco
nstru
ctedS
eq{X
′l }Ll=
1
{xl}L l=
1
{x′ l}L l=
1
he
xL
x′1
x2x1
x′Lx′
L−1
Fig. 4: A depiction of a Recurrent Autoencoder (RAE). The inputsequence {Xl}
Ll=1 is first compressed to a embedded feature
sequence {xl}Ll=1 on per frame basis through the EncoderMLP. An
RNN Encoder iteratively processes the data over L frames and thefinal hidden state he is passed on to the RNN Decoder that outputsthe reconstructed embedded feature sequence {x′
l}Ll=1 in reverse.
Finally, {x′l}
Ll=1 are decompressed to reconstruct the sequence
{X′l}
Ll=1 through the DecoderMLP. The output sequence {X′
l}Ll=1
is compared with the input sequence {Xl}Ll=1 to compute the
reconstruction loss, which is desired to be low for an autoencoder.
The RNN-based autoencoder [39] [40], or RAE shown
in Fig. 4, is built upon the vanilla autoencoder architecture
in Fig. 3 (a). As the input of RAE is a time sequence of
feature vector, it has two dimensions, i.e., feature dimension
and time dimension. Therefore, the RAE consists of two
autoencoder substructures, responsible for each dimension,
respectively. In Fig. 4, the EncoderMLP/DecoderMLP is
for compressing and reconstructing the feature vector on
per frame basis, and the RNN-Encoder/Decoder is for com-
pressing and reconstructing the time sequence over multiple
frames. Overall, the RAE reduces redundancy in both feature
and time dimension.
III. PROPOSED SYSTEM
To effectively learn the motion pattern of human body,
which is formed by a sequence of radar point cloud data, for
fall detection in a semi-supervised approach, we propose the
Variational Recurrent Autoencoder (VRAE) which has two
autoencoder substructures, i.e., VAE for learning radar point
cloud distribution on per frame basis and RAE for learning
the sequence over multiple frames. The VRAE is trained
only on normal ADL, such that an ‘unseen’ fall will cause
a spike in the loss, also called anomaly level. If the height of
body centroid, which is estimated by mmWave radar sensor
in parallel, drops suddenly at the same time, we claim a fall
detection. The proposed system, called mmFall, including
both hardware and software, is presented in Fig. 5.
A. Data Preprocessing
With a proper mmWave radar sensor, we are able to
collect the radar point cloud, as shown in Fig. 1 (b). In
Fig. 5, the radar sensor could be mounted on the wall in a
room with a height of h over the head of people, and could
also be rotated with an angle θtilt so that it has a better
coverage of the room. The radar sensor can detect multiple
moving persons simultaneously, each person has a unique
target ID as a result of the clustering/tracking algorithms.
With the multiple frame data with the same target ID, we
can analyze the motion of the person associated this target
ID. In other words, each person’s motion analysis can be
processed separately based on the target ID. Afterwards, we
will only discuss the single-person scenario for brevity.
We then propose a data preprocessing flow denoted in
Fig. 5 for the following reasons.
The original measurement for each point in the radar
point cloud is in the radar polar coordinates. We need to
transfer it to the radar Cartesian coordinates, and then to the
ground Cartesian coordinates on the basis of the tilt angle
and height. Therefore, we have a transformation matrix as[xyz
]
=[ 1 0 00 cos θtilt − sin θtilt
0 sin θtilt cos θtilt
][ r cos θEL sin θAZ
r cos θEL cos θAZ
r sin θEL
]
+[00h
]
, (14)
where (r, θAZ, θEL) is range, azimuth angle and elevation
angle in the radar polar coordinates, θtilt is radar tilt angle,
h is the radar platform height, and [x, y, z]T is the result in
the ground Cartesian coordinates.
After coordinate transformation, at each frame we obtain
a radar point cloud, in which each point is a vector of
(x, y, z,D) where D is the Doppler from the original radar
measurement. And we also have the centroid (xc, yc, zc) as
a result of the clustering/tracking algorithms in the radar.
We accumulate the current frame’s previous L frames
including itself as a motion pattern. The value of L equals
to the radar frame rate in frames per second (fps) multiplied
mmWaveRadarSensor
CartesianCoordinates
Ground
RadarCartesianCoordinates
TransformationCoordinate
Origin
AccumulationL Frames
Shift toN Points per FrameOversampling to
Beam
MeanDirection
Reference
...
ReparameterizationSampling through
...VRAE Loss Function
... ...
Fall Detection Logic
Anomaly
Level
Fall Detected!
Hidden State
VAE Encoder
VAEDecoder
RNN Seq2Seq Model
...
CoordinatesPerson#1
Person#2
Y
Z
X
Variational Recurrent Autoencoder (VRAE)
Radar Point Cloud Preprocessing
h
θtilt
µ1q
N(0, I)
µ2qσ
1q
z1
z2
X1
σ2q µ
Lq
µLpσ
Lq
µL−1p σ
L−1p
z1rz
Lr z
L−1r
and
σ1pµ
1pσ
Lp
If anomaly level > anomaly threshold
zL
If (z1c − zLc ) > drop threshold
in Equ. (11)
in
in Fig. 4
Trick in Equ. (10)
{µlq,σ
lq}
Ll=1
X1
X2
XL
X2
XL
N(0, I) N(0, I)
Equ. (12)
2nd frame X2
Lth frame XL
Last frame centroid (xLc , yL
c , zLc )
(x1c, y
1c , z
1c)
1st frame centroid
1st frame X1
The n-th point in l-th frame is xln=(xl
n, yln, zln, Dl
n)
Motion Pattern X={Xl}Ll=1={{xln}Nn=1}
Ll=1
Fig. 5: An overview of the proposed mmFall System. At each frame, we obtain the point cloud of a human body along with its centroidfrom mmWave radar sensor. After the preprocessing stage, we get a motion pattern X={Xl}Ll=1 in the reference coordinates. For each l-
th frame, we use the VAE Encoder to model the mean µlq and variance σ
lq of the factorized Gaussian family q(zl)=
∏D
j=1 N (zlj |µlj , σ
lj)
that aims to approximate the true posterior p(zl|Xl) of the latent motion state zl, where D is the predetermined length of z. Then we
use the reparameterization trick to sample zl from q(zl). After we have a sequence of latent motion states Z = {z1, ..., zL}, we use
the RAE to compress and then reconstruct it as Zr = {z1r, ..., zLr }. Based on Zr , we use the VAE Decoder to model the mean µ
lp and
variance σlp of the likelihood p(Xl|zl). With (µq,σq), (µp,σp) and X , we are able to compute the VRAE loss defined in Equ. 16
as an indication of anomaly level. In the fall detection logic, if a sudden drop of centroid height is detected at the same time when theVRAE outputs an anomaly spike, we claim a fall detection.
by the predetermined detection window in seconds. For each
motion pattern with L frames, we subtract the x and y value
of each point in each frame from the xc and yc value of
centroid in the first frame, respectively. In this way, we shift
the motion pattern to the origin of a reference coordinates.
Algorithm 1: Data Oversampling Method
Input: Input dataset X = {xi}Mi=1 with a length of M , M is a random
number, each data sample xi is a vector. N , target length after
oversampling. N is always > M .
Output: X′ = {x′
i}Ni=1 with a length of N .
1 µ = 1M
∑Mi=1 xi // Get the estimated mean
2 for i = 1 to N do
3 if i 6 M then // Rescale and shift
4 x′i =
√
NM
xi + µ −√
NM
µ;
5 else // Pad with µ
6 x′i = µ;
7 end
8 end
At each frame, the number of points is a random number
due to the nature of radar measurement. To meet the
fixed number of input nodes of VRAE model, we need
a data oversampling method. The traditional oversampling
method in deep learning approach is such as zero-padding
or random-padding. However, these methods disrupt the
distribution of original input and are not suitable in this case
as our purpose is to learn the distribution of radar point
cloud. Therefore, we propose a novel data oversampling
Algorithm 1 that extends the original point cloud to a fixed
number while keeping its mean and covariance the same.
The proof of this algorithm is in Appendix B.
Finally, we obtain a motion pattern X in the reference
coordinates,
X={Xl}Ll=1={{xln}Nn=1}Ll=1
={{(xln, y
ln, z
ln, D
ln)}Nn=1}Ll=1, (15)
where L is the number of frames in the motion pattern; N
is the number of points at each frame; Xl is l-th frame; xln
is the n-th point in l-th frame, that is also a 4D vector of
(xln, y
ln, z
ln, D
ln). We also have the centroid {(xl
c, ylc, z
lc)}Ll=1
over L frames. Afterwards, we use the superscript l to denote
the frame index.
B. VRAE Model
We propose the variational recurrent autoencoder (VRAE)
architecture, as shown in Fig. 5 and detailed in the caption.
The VRAE model is a combination of VAE and RAE,
discussed in the previous section. The VRAE loss LVRAE
is the VAE loss LVAE in Equ. (7) over all the L frames.
With the substitution of Equ. (8) and (9), we have
LVRAE =
L∑
l=1
{
KLD{q(zl)||p(zl)} − Eq[log p(Xl|zl)]
}
=
L∑
l=1
{1
2
N∑
n=1
K∑
k=1
{(xl
n[k]− µlp[k])
2
σlp[k]
2+ logσl
p[k]2}
− 1
2
D∑
d=1
{1 + logσlq[d]
2 − µlq[d]
2 − σlq[d]
2}}
(16)
where L, N and xln are from the motion pattern in Equ
(15); K is the length of point vector, in our case K=4 as
each point is a 4D vector; D is the length of latent motion
state z; (µq,σq) and (µp,σp) are parameters of factorized
Gaussian q(z) and likelihood p(X|z), respectively, both are
modeled through the architecture in Fig. 5.
For VRAE training, the objective is to minimize LVRAE
with respect to the network parameters. The standard
stochastic gradient descent algorithm Adam [34] is used.
It is noted that, for the implementation of VAE En-
coder/Decoder in VRAE, only a dense layer or fully-
connected layer is used, as the model should be invariant
to the order of point cloud at each frame.
C. Fall Detection Logic
In a semi-supervised learning approach, we train this
VRAE model only on normal ADL, which are easy to collect
compared to falls. For normal ADL, the VRAE will output a
low LVRAE as this is the training objective. In the inference
stage, the model will generate a high loss LVRAE when an
‘unseen’ motion happens, such as fall occurs. Therefore, we
denote the VRAE loss LVRAE as an anomaly level measure
of human body motion.
Along with the body centroid height {zlc}Ll=1 over L
frames, we can calculate the drop of centroid height as
z1c−zLc during this motion. Then we propose a fall detection
logic as in Fig. 5, that is if the centroid height drop is greater
than a threshold at the same time when the anomaly level
is greater than a threshold, we claim a fall detection.
According to the World Health Organization (WHO) [2],
fall is defined as “inadvertently coming to rest on the
ground, floor or other lower level, excluding intentional
change in position to rest in furniture, wall or other objects.”
In the proposed mmFall system, the VRAE measures the
inadvertence or anomaly level of the motion, while the
centroid height drop indicates the motion of coming to rest
on a lower level.
IV. EXPERIMENTAL RESULTS AND DISCUSSION
To verify the effectiveness of the proposed system, we
used a mmWave radar sensor to collect experimental data
and implemented the proposed mmFall system along with
two other baselines for performance evaluation and compar-
ison.
A. Hardware Configuration and Experiment Setup
We adopt the Texas Instrument (TI) AWR1843BOOST
mmWave FMCW radar evaluation board [41] for radar point
cloud acquisition. This radar sensor has three transmitting
antenna channels and four receiving antenna channels, as
shown in Fig. 1 (a). The middle transmitting channel is
displaced above the other two by a distance of half a wave-
length. Through the direction-of-angle (DOA) algorithm
using multiple-input and multiple-output (MIMO), it can
achieve 2x4 MIMO in azimuth and 2x1 MIMO in elevation.
Thus, we have 3D positional measurement of each point.
Plus the 1D Doppler, we finally have a 4D radar point cloud.
Based on a demo project from TI, we configure the radar
sensor with the parameters listed in Table I.
TABLE I: mmWave FMCW radar parameter configuration. Referto [42] for waveform details. fs, FMCW starting frequency. BW,FMCW bandwidth. rChirp, FMCW chirp rate. fADC, ADC samplingrate. NFast, ADC samples per chirp. CPI, coherent processinginterval. NSlow, chirps per CPI per transmitting channel. TFrame,duration of one frame. ∆R, range resolution. Rmax, maximumunambiguous range. ∆D, Doppler resolution. Dmax, maximumunambiguous Doppler. ∆θAZ, azimuth angle resolution. ∆θEL,elevation angle resolution. rFrame, frame rate in frames per second.
Parameter Value Unit Parameter Value Unit
fs 77 GHz ∆R 0.078 m
BW 1.92 GHz Rmax 9.99 m
rChirp 30 MHz/us ∆D 0.079 m/s
fADC 2 MHz Dmax ±2.542 m/s
NFast 128 MIMO 2x4/2x1 AZ/EL
CPI 24.2 ms ∆θAZ 15 deg
NSlow 64 ∆θEL 57 deg
TFrame 100 ms rFrame 10 fps
Based on the Robotic Operating System (ROS) on an
Ubuntu laptop, we developed a interface program to connect
the TI AWR1843BOOST and collect the radar point cloud
data over the USB port. Then we set up the equipment in the
living room (2.7m*8.2m*2.7m) in an apartment, as shown
in Fig. 1 (a). The radar sensor was put on top of a tripod
with a height of 2 meters, and rotated with a tilt angle of
10 degrees for better area coverage.
B. Data Collection
During the experiment, we collected three datasets as in
Table II. Firstly, we collected the D0 dataset which contains
about two hours of normal ADL without any labels, and it
is for training purposes. Secondly, in the D1 dataset, we
collected randomly walking along with one sample of every
other motion, including fall, etc. We showed the motion
pattern for every motion in Fig. 6 for visualization purposes.
Lastly, we collected a comprehensive inference dataset D2
and manually labeled the frame index when a fall happens
as the ground truth, and it is used for overall performance
evaluation. It is noted that in D1 and D2, both the fall and
jump are anomalies that can not be found in D0. We expect
that VRAE will output an anomaly level spike for both fall
and jump, but the fall detection logic also involving the
centroid height drop will guarantee the correct fall detection.
C. Model Implementation and Two Baselines
Based on Tensorflow and Keras, we first implemented the
proposed mmFall system in Fig. 5 with loss function in Equ.
Fig. 6: Motion patterns in dataset D1 along with the associated camera view. Only the ellipse was manually added for depicting thedistribution of point cloud. For the points, different color indicates the frame in time order: red, green, and then blue, while the yellowpoint indicates the centroid estimated by the mmWave radar sensor. For simplicity, we showed the frames with the increment of fiveframes. Each frame is 0.1 seconds. Please compare this figure with Fig. 2. For the coordinates, red is the cross-radar direction, greenis the forward direction, and blue is the height direction. (a) Randomly walking; (b) Forward fall; (c) Backward fall; (d) Left fall; (e)Right Fall; (f) Sitting down on the floor; (g) Crouching; (h) Bending; (i) jump.
TABLE II: Collected Dataset.
Name Description
D0Two hours of normal ADL, including randonly walking, sitting
on the floor, crouching, bending, etc. No labeling.
D1
Randomly walking with one forward fall, one backward fall,
one left fall, one right fall, one sitting on the floor, one
crouching, one bending, and one jump.
D2
Randomly walking with 15 forward falls, 15 backward falls, 10
left falls, 10 right falls, 50 sitting on the floor, 50 crouching,
50 bending, and 50 jump. Labeling fall as ground truth.
(16). In this implementation, we set the number of frames,
L, equal to 10 for one-second detection window with 10
fps radar data rate; the number of point each frame N equal
to 64 for data oversampling. Thus, the motion pattern X ,
i.e., the model input, is 10*64*4. We set the length of latent
motion state z, D, equal to 16. For performance comparison
purposes, we also implemented two other baselines. All the
three models are listed in Table III.
The baseline VRAE SL is the same as the proposed
mmFall system expect for using a simplified loss function in
Equ. (17). The simplified loss function Equ. (17) is based on
a weak assumption on likelihood, that is p(X|z) follows a
Gaussian with identity covariance, i.e., N (µp, I). This leads
to that the σp term in Equ. (16) is ignored, or
LVRAE SL=
L∑
l=1
{1
2
N∑
n=1
K∑
k=1
{(xln[k]− µl
p[k])2} − 1
2∗
D∑
d=1
{1 + logσlq[d]
2 − µlq[d]
2 − σlq[d]
2}}
. (17)
To compare our proposed system with the baseline
VRAE SL, we will verify our assumption that the variance
change of the radar point cloud distribution of human body
carries a better representation of motion, as discussed in
Section II-B.
Another baseline is RAE with MSE loss in Fig. 4, which
is similar to the proposed system expect for using the vanilla
MLP in the feature dimension instead of VI approach at each
frame. To compare our proposed system with this baseline,
we will show that the VI approach for motion state inference
based on the distribution of radar point cloud makes more
sense than the vanilla feature compression in RAE.
TABLE III: Implemented Models.
Name Description
VRAEThe proposed variational recurrent autoencoder and fall
detection logic in Fig. 5 with loss function in Equ. (16).
VRAE SLThe proposed variational recurrent autoencoder and fall
detection logic in Fig. 5 with simplified loss in Equ. (17).
RAEThe vanilla recurrent autoencoder in Fig. 4 with
MSE loss function, and fall detection logic in Fig. 5.
D. Training and Inference
First, we trained these three models on the normal dataset
D0, and then tested on dataset D1 in which there are
some normal motions as in D0 and two different ‘unseen’
motions, i.e., fall and jump, that do not appear in D0. The
anomaly level outputted by these three models on D1 is
shown in Fig. 7. The proposed VRAE model can generate
significant anomaly level for fall and jump while keeping
low for normal motions. Along with the fall detection logic
involving body centroid drop, the jump can be ignored, and
only fall will be detected. As comparison, the VRAE SL
model suffers great noise during normal motions that easily
leads to false alarm, and the vanilla RAE model can not
learn the anomaly level effectively.
Forward
Fall
Backward
Fall
Left
Fall
Right
Fall
Sitting
Down Crouching Bending Jump
(a) (b) (c)
Forward
Fall
Backward
Fall
Left
Fall
Right
Fall
Forward
Fall
Backward
Fall
Left
Fall
Right
Fall
Sitting
Down Crouching Bending Jump
Sitting
Down Crouching Bending Jump
Fig. 7: Inference results of the models listed in Tab III on the dataset D1 described in Tab II. In each figure, the blue line represents thebody’s centroid height, and the orange line represents the model’s loss output, or anomaly level. Only the black text and arrows weremanually added as the ground truth when a motion happens. Except for the motion indicated by the black text, the rest of time arealways randomly walking. (a) VRAE inference results: The VRAE model can clearly generate a spike in anomaly level when fall/jumphappens while keeping low anomaly level for normal motions. jump is another abnormal motion that does not appear in the trainingdataset D0, but the fall detection logic involving the body centroid drop at the same time will ignore jump. On the other hand, withoutthe help of anomaly level it is difficult to distinguish fall from other motions if only the change of centroid height is considered; (b)VRAE SL inference results: The VRAE SL can also have anomaly level spike generation for fall/jump but suffer significant noiseduring normal motion occurrence. For example, the ‘Sitting Down’ and the ‘Right Fall’ have almost the same anomaly level output.As a result, either the ‘Sitting Down’ causes a false alarm, or ‘Right Fall’ causes a missed detection, depending on the threshold; (c)Vanilla RAE inference results: The vanilla RAE model can not effectively learn the anomaly level for ‘unseen’ motions.
Finally, we tested these three well-trained models on the
dataset D2. In D2, there are 50 falls with manually labeled
‘ground truth fall frame index’ when a fall happens, along
with many other different motions without labeling. The
fall detection logic will detect the frame index when a fall
happens. We allow a flexible detection, i.e., if the ‘detected
fall frame index’ falls into the 1-second detection window
centered at one ‘ground truth fall frame index’, we treat it
as true positive. In this experiment, for the fall detection
logic we fixed the threshold of centroid height drop as 0.6
meters. By varying the anomaly level threshold, we got the
receiver operating characteristic (ROC) curves as shown in
Fig. 8. From this result, we clearly see that our proposed
approach outperformed the other two baselines. Specifically,
at the expense of two alarms, our VRAE model can achieve
98% fall detection rate out of 50 falls, while the VRAE SL
can only achieve around 60% and the vanilla RAE can only
achieve around 38%.
Fig. 8: ROC curves for all the three models.
V. CONCLUSION
In this study, we used a mmWave radar sensor for fall
detection on the basis of its advantages such as privacy-
compliant, non-wearable, sensitive to motions, etc. We made
an assumption that the radar point cloud for the human body
can be viewed as a multivariate Gaussian distribution, and
the distribution change over multiple frames has a unique
pattern for different motions. And then, we proposed a
variational recurrent autoencoder to effectively learn the
anomaly level of ‘unseen’ motion, such as fall, that does not
appear in the normal training dataset. We also involved a fall
detection logic that checks the body centroid drop to further
confirm the anomaly motion is fall. In this way, we detected
the fall in a semi-supervised learning approach that does not
require the difficult fall data collection and labeling. The
experiment results showed our proposed system can achieve
98% detection rate out of 50 falls at the expense of just two
false alarms, and outperformed the other two baselines.
APPENDIX A
PROOF OF EQU. 8
KLD{q(z)||p(z)}
=KLD{D∏
d=1
N (z[d]|µq[d],σq[d])||N (z|0, I)}
:=
∫
...
∫ D∏
d=1
N (z[d]|µq[d],σq[d])∗
log
∏D
d=1 N (z[d]|µq[d],σq[d])∏D
d=1N (z[d]|0, 1)dz[1]...dz[D]
=
∫
...
∫ D∏
d=1
N (z[d]|µq[d],σq[d])∗
D∑
d=1
logN (z[d]|µq[d],σq[d])
N (z[d]|0, 1) dz[1]...dz[D]
=
D∑
d=1
∫
N (z[d]|µq[d],σq[d]) logN (z[d]|µq[d],σq[d])
N (z[d]|0, 1) dz[d]
=
D∑
d=1
∫
N (z[d]|µq[d],σq[d])∗
{log 1
σq[d]2− (z[d]− µq[d])
2
2σq[d]2+
z[d]2
2}dz[d]
=− 1
2
D∑
d=1
{1 + logσq[d]2 − µq[d]
2 − σq[d]2}, (18)
where (µq,σq) is the mean and variance of of the factorized
Gaussian q(z) with D-dimensional latent vector z.
APPENDIX B
PROOF OF PROPOSED ALGORITHM
Given a set of statistically independent and identically
distributed (i.i.d.) data X = {xi}Mi=1 drawn from a multivari-
ate Gaussian random variable N (µ,σ), thus the maximum
likelihood (ML) estimator of its mean µ and covariance σ
is,
µ =1
M
M∑
i=1
xi, σ =1
M
M∑
i=1
{xi − µ}2. (19)
For the output dataset X′ = {x′i}Ni=1, its first M elements
are modified from the input dataset according to Step 4 in
Algorithm (1), and its last (N−M) elements are simply the
mean of the input dataset according to Step 6 in Algorithm
(1). Thus, its ML estimator of its mean µ′ and covariance
σ′ is
µ′ =
1
N
N∑
i=1
x′i =
1
N{
M∑
i=1
(
√
N
Mxi + µ−
√
N
Mµ) +
N∑
i=M+1
µ}
=1
N{√
N
M
M∑
i=1
xi +M µ−M
√
N
Mµ+ (N −M)µ}
=1
N{√
N
MM µ+M µ−M
√
N
Mµ+ (N −M)µ}
=1
N{N µ} = µ, (20)
and
σ′ =
1
N
N∑
i=1
{x′i − µ
′}2 =1
N
N∑
i=1
{x′i − µ}2
=1
N
M∑
i=1
{√
N
Mxi + µ−
√
N
Mµ− µ}2 + 1
N
N∑
i=(M+1)
{µ− µ}2
=1
N
M∑
i=1
{√
N
Mxi −
√
N
Mµ}2
=1
M
M∑
i=1
{xi − µ}2 = σ. (21)
Therefore, the proposed algorithm oversamples the orig-
inal input dataset to a fixed number while keeping the ML
estimation of mean and variance the same.
REFERENCES
[1] World Population Prospects 2019: Highlights (ST/ESA/SER.A/423),Department of Economic and Social Affairs, Population Division,United Nations, 2019. [Online]. Available: https://population.un.org/wpp/Publications/Files/WPP2019 Highlights.pdf
[2] WHO Global Report on Falls Prevention in Older
Age, World Health Organization, 2008. [Online]. Avail-able: https://extranet.who.int/agefriendlyworld/wp-content/uploads/2014/06/WHo-Global-report-on-falls-prevention-in-older-age.pdf
[3] (2018, Jan.) Falls. World Health Orgnazation. [Online]. Available:https://www.who.int/news-room/fact-sheets/detail/falls
[4] E. R. Burns, J. A. Stevens, and R. Lee, “The direct costs of fataland non-fatal falls among older adultsUnited States,” J. Safety Res.,vol. 58, pp. 99–103, 2016.
[5] F. Li et al., “Exercise and fall prevention: narrowing the research-to-practice gap and enhancing integration of clinical and communitypractice,” J. Am. Geriatr. Soc., vol. 64, no. 2, pp. 425–431, 2016.
[6] T. Tamura et al., “A wearable airbag to prevent fall injuries,” IEEETrans. Inf. Technol. Biomed., vol. 13, no. 6, pp. 910–914, Nov 2009.
[7] P. Di et al., “Fall detection and prevention control using walking-aid cane robot,” IEEE/ASME Trans. Mechatronics, vol. 21, no. 2, pp.625–637, April 2016.
[8] K. Chaccour et al., “From fall detection to fall prevention: A genericclassification of fall-related systems,” IEEE Sens. J., vol. 17, no. 3,pp. 812–822, Feb 2017.
[9] J. K. Lee, S. N. Robinovitch, and E. J. Park, “Inertial sensing-basedpre-impact detection of falls involving near-fall scenarios,” IEEE
Trans. Neural Syst. Rehabil. Eng., vol. 23, no. 2, pp. 258–266, March2015.
[10] J. Liu and T. E. Lockhart, “Development and evaluation of a prior-to-impact fall event detection algorithm,” IEEE Trans. Biomed. Eng.,vol. 61, no. 7, pp. 2135–2140, July 2014.
[11] J. Sun et al., “A plantar inclinometer based approach to fall detectionin open environments,” in Emerging Trends and Advanced Technolo-gies for Computational Intelligence. Springer, 2016, pp. 1–13.
[12] B. Mirmahboub et al., “Automatic monocular system for human falldetection based on variations in silhouette area,” IEEE Trans. Biomed.Eng., vol. 60, no. 2, pp. 427–436, Feb 2013.
[13] Y. Li, K. Ho, and M. Popescu, “A microphone array system forautomatic fall detection,” IEEE Trans. Biomed. Eng., vol. 59, no. 5,pp. 1291–1301, 2012.
[14] K. Chaccour et al., “Smart carpet using differential piezoresistivepressure sensors for elderly fall detection,” in Proc. IEEE 11th Int.
Conf. Wireless and Mobile Computing, Networking and Communica-tions (WiMob), 2015, pp. 225–229.
[15] X. Fan et al., “Robust unobtrusive fall detection using infrared arraysensors,” in Proc. IEEE Int. Conf. Multisensor Fusion and Integration
for Intelligent Systems (MFI), 2017, pp. 194–199.[16] Y. Wang, K. Wu, and L. M. Ni, “Wifall: Device-free fall detection
by wireless networks,” IEEE Trans. Mobile Comput., vol. 16, no. 2,pp. 581–594, Feb 2017.
[17] Z.-P. Bian et al., “Fall detection based on body part tracking using adepth camera,” IEEE J. Biomed. Health Informat., vol. 19, no. 2, pp.430–439, 2014.
[18] M. G. Amin et al., “Radar signal processing for elderly fall detection:The future for in-home monitoring,” IEEE Signal Process. Mag.,vol. 33, no. 2, pp. 71–80, March 2016.
[19] G. Koshmak, A. Loutfi, and M. Linden, “Challenges and issues inmultisensor fusion approach for fall detection: Review paper,” J.Sens., vol. 2016, no. 6931789, 2016.
[20] mmWave radar sensors in robotics applications, Texas Instruments,2017. [Online]. Available: http://www.ti.com/lit/wp/spry311/spry311.pdf
[21] L. Ren and Y. Peng, “Research of fall detection and fall preventiontechnologies: A systematic review,” IEEE Access, vol. 7, pp. 77 702–77 722, 2019.
[22] S. Z. Gurbuz and M. G. Amin, “Radar-based human-motion recog-nition with deep learning: Promising applications for indoor moni-toring,” IEEE Signal Process. Mag., vol. 36, no. 4, pp. 16–28, July2019.
[23] B. Y. Su et al., “Doppler radar fall activity detection using the wavelettransform,” IEEE Trans. Biomed. Eng., vol. 62, no. 3, pp. 865–875,2014.
[24] B. Jokanovic and M. Amin, “Fall detection using deep learning inrange-doppler radars,” IEEE Trans. Aerosp. Electron. Syst., vol. 54,no. 1, pp. 180–189, 2017.
[25] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: Asurvey,” ACM Comput. Surv., vol. 41, no. 3, Jul. 2009.
[26] Operation of Radar Services in the 76-81 GHz Band, FederalCommunications Commission, Washington, D.C. [Online]. Available:https://docs.fcc.gov/public/attachments/FCC-15-16A1.pdf
[27] Google LLC Request for Waiver of Part 15 for Project Soli, FederalCommunications Commission, Washington, D.C. [Online]. Available:https://docs.fcc.gov/public/attachments/DA-18-1308A1.pdf
[28] G. Hakobyan and B. Yang, “High-performance automotive radar: Areview of signal processing algorithms and modulation schemes,”IEEE Signal Process. Mag., vol. 36, no. 5, pp. 32–44, 2019.
[29] S. Blackman, Multiple-target Tracking with Radar Applications, ser.Radar Library. Dedham, MA: Artech House, 1986, ch. 11.
[30] C. M. Bishop, Pattern recognition and machine learning, ser. Infor-mation science and statistics. New York, NY: Springer, 2006, ch. 11.
[31] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational infer-ence: A review for statisticians,” J. Am. Stat. Assoc., vol. 112, no.518, pp. 859–877, 2017.
[32] S. Haykin, Neural Networks and Learning Machines, 3rd ed. UpperSaddle River, NJ: Pearson Higher Ed, 2011, ch. 4.
[33] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” inProc. Int. Conf. Learning Representations (ICLR), 2014.
[34] D. P. Kingma and M. Welling, “An introduction to variationalautoencoders,” Foundations and Trends in Machine Learning, vol. 12,no. 4, pp. 307–392, 2019.
[35] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropa-gation and approximate inference in deep generative models,” in Proc.
31st Int. Conf. Machine Learning (ICML), 2014, pp. II–1278–II–1286.
[36] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term depen-dencies with gradient descent is difficult,” IEEE Trans. Neural Netw.,vol. 5, no. 2, pp. 157–166, 1994.
[37] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.[38] K. Cho et al., “Learning phrase representations using RNN encoder-
decoder for statistical machine translation,” in Proc. Conf. Empirical
Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.
[39] A. M. Dai and Q. V. Le, “Semi-supervised sequence learning,” inAdvances in Neural Information Processing Systems (NIPS), 2015,pp. 3079–3087.
[40] T. Kieu et al., “Outlier detection for time series with recurrentautoencoder ensembles,” in Proc. 28th Int. Joint Conf. Artificial
Intelligence (IJCAI), 2019, pp. 2725–2732.[41] xWR1843 Evaluation Module (xWR1843BOOST) Single-Chip
mmWave Sensing Solution, Texas Instruments, 2019. [Online].Available: http://www.ti.com/lit/ug/spruim4a/spruim4a.pdf
[42] Programming Chirp Parameters in TI Radar Devices, TexasInstruments, 2020. [Online]. Available: http://www.ti.com/lit/an/swra553a/swra553a.pdf