Radio Fingerprinting Using Convolutional Neural Networks
A Thesis Presented
by
Shamnaz Mohammed Riyaz
to
The Department of Electrical and Computer Engineering
in partial fulfillment of the requirements
for the degree of
Master of Science
in
Electrical and Computer Engineering
Northeastern University
Boston, Massachusetts
July 2018
To my family
ii
Contents
List of Figures v
List of Tables vi
Acknowledgments vii
Abstract of the Thesis viii
1 Introduction 1
2 Related work 32.0.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.0.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Causes of hardware impairments 93.1 Hardware impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 I/Q imbalance: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.2 Phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.3 Carrier frequency and phase offset . . . . . . . . . . . . . . . . . . . . . . 123.1.4 Harmonic distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.5 Power amplifier distortions . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 Protocols of operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.2 Storage and processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Deep learning for RF fingerprinting 184.1 Initial studies on ML techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 Support vector machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.1.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Convolutional neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.1 CNN architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
iii
5 Results and performance evaluation 295.1 Network setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.1 CNN vs. conventional algorithms . . . . . . . . . . . . . . . . . . . . . . 325.2.2 Receiver operating characteristics for radio fingerprinting . . . . . . . . . . 335.2.3 Impact of distance on radio fingerprinting . . . . . . . . . . . . . . . . . . 35
6 Conclusion 386.1 Research challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Bibliography 40
iv
List of Figures
2.1 RF fingerprinting classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Typical transceiver chain with various sources of RF impairments. . . . . . . . . . 103.2 Amplitude imbalance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Phase imbalance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Phase noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Phase offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.6 AM/AM distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.7 AM/PM distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.8 Data collection using SDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.9 Experimental setup demonstrating data capture. . . . . . . . . . . . . . . . . . . . 153.10 Discovery cluster partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Device classification using Logistic Regression and Linear SVM for WiFi and LTE. 204.2 CNN architecture for RF fingerprinting. . . . . . . . . . . . . . . . . . . . . . . . 224.3 Convolution operation: filters strided over input sequences. . . . . . . . . . . . . . 234.4 Rectified Linear Unit (ReLU) operation performed on feature maps. . . . . . . . . 244.5 An illustration of max pooling operation. . . . . . . . . . . . . . . . . . . . . . . . 254.6 An illustration of sliding operation using a window of length 128. . . . . . . . . . 28
5.1 Software stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2 The accuracy comparison of SVM, logistic regression and CNN for 2− 5 devices. 325.3 ROC curve fold1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.4 ROC curve fold 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.5 ROC curve fold 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.6 ROC curve fold 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.7 ROC curve fold 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.8 Computational load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.9 The plot of accuracy obtained using CNN for 4 devices over different distances
between transmitter and receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
v
List of Tables
4.1 CNN architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
vi
Acknowledgments
Foremost, I would like to thank my advisor Prof. Kaushik Chowdhury for his constantguidance and encouragement in all my endeavors. His vision and ideas have always been a source ofinspiration for me. I thoroughly enjoyed my learning experience in his course ’Mobile and WirelessNetworking’ and also in the research associated with his Genesys lab. He has been extremelysupportive and patient throughout this research. I would also like to thank Prof. Stratis Ioannidis andProf. Jennifer Dy for their positive feedback and continuous association since the inception of theproject.I thank my husband Rameez for his support, inspiration and confidence in me. I am grateful for myparents, Mohammed and Sajida and in laws Rasheed and Rabia for being supportive and alwaysmotivating me to excel in everything I do. I would also like to thank my labmates in the Genesys labspecifically Kunal and Mauro for helping me with various experiments. Their company provided apositive energy in the work place.
vii
Abstract of the Thesis
Radio Fingerprinting Using Convolutional Neural Networks
by
Shamnaz Mohammed Riyaz
Master of Science in Electrical and Computer Engineering
Northeastern University, July 2018
Dr. Kaushik Chowdhury, Advisor
In this thesis, we describe a method for uniquely identifying a specific radio amongnominally similar devices using a combination of software defined radio (SDR) sensing capabilityand machine learning (ML) techniques. Our approach of radio fingerprinting applies ML over rawI/Q samples without specifically selecting features of interest. It distinguishes devices using only thetransmitter hardware-induced signal modifications that serve as a unique signature for a particulardevice. No higher level decoding, feature engineering, or protocol knowledge is needed, furthermitigating challenges of ID spoofing and coexistence of multiple protocols in a shared spectrum.Advances in SDR technology allows unprecedented control on the entire processing chain, allowingmodification of each functional block as well as sampling the changes in the input waveform.We first demonstrate RF impairments by modifying the operational blocks in a typical wirelesscommunications processing chain in a simulation study. We then generate over-the-air datasetcompiled from an experimental testbed of SDRs such as B210 and X310 and train the data usingan optimized deep convolutional neural network (CNN) architecture that gives good classificationaccuracy. We describe the parallel processing needs and choice of several hyper parameters to enableefficient training of the CNN model. We then compare the performance quantitatively with alternatetechniques such as support vector machines and logistic regression. Overall our results show that wecan achieve up to 90-99% experimental accuracy at transmitter-receiver distances varying between2-50 feet over a noisy, multi-path wireless channel.
viii
Chapter 1
Introduction
Emerging applications in the context of smart cities, autonomous vehicles, Internet of
Things (IoT), and complex military missions, among others, require reconfigurability both at the
systems and the protocol level within its communications architecture. These advances rely on a
critical enabling component, namely, software defined radio (SDR): this allows cross-layer pro-
grammability of the transceiver hardware using high level directives [1]. The promise of intelligent
or so called cognitive radios builds on the SDR concept, where the radio is capable of gathering
contextual information and adapting its own operation by changing the settings on the SDR based on
what it perceives in its surroundings.
In the last few decades, there has been an incredible growth in the application of internet
and connected devices. However, the privacy and security of such billions of devices is a paramount
concern in the IoT network. Any device that has network connectivity is vulnerable. Data gathered
by IoT devices are susceptible to attacks such as ID spoofing by an intruder. Most of the IoT
devices have limited computing power and memory capacity, which makes it difficult to use complex
cryptographic algorithms that require more resources than the devices can provide. Therefore,
there is insufficient authentication/authorization. Additionally, in many mission critical scenarios,
problems in authenticating devices, ID spoofing and unauthorized transmissions are major concerns.
Moreover, high bandwidth applications are causing a spectrum crunch, leading network providers
to explore innovative spectrum sharing regimes in the TV whitespace and the sub-6GHz bands. In
all of the above, identifying (i) the type of the protocol in use, and (ii) the specific radio transmitter
(among many other nominally similar radios) become important. Our work on SDR-enabled radio
fingerprinting tackles these two scenarios by learning characteristic features of the transmitters in
a pre-deployment training phase, which is then exploited during actual network operation. We
1
CHAPTER 1. INTRODUCTION
recognize that SDRs come in diverse form factors with varying on-board computational resources.
Thus, for general purpose use, any device fingerprinting approach must be computationally simple
once deployed in the field. For this reason, we propose machine learning (ML) techniques, specifically,
Deep Convolutional Neural Networks (CNNs), and experimentally demonstrate near-perfect radio
identification performance in many practical scenarios.
ML techniques have been remarkably successful in image and speech recognition, however,
their utility for device level fingerprinting by feature learning has yet to be conclusively demonstrated.
True autonomous behavior of SDRs, not only in terms of detecting spectrum usage, but also in terms
of self-tuning a multitude of parameters and reacting to environmental stimulus is now a distinct
possibility. We collect over 20 · 106 RF I/Q samples over multiple transmission rounds for eachtransmitter-receiver pair composed of off-the-shelf Universal Software Radio Peripheral (USRP)
SDRs. The approach of providing raw time series radio signal by treating the complex data as
dimension of 2 real valued I/Q inputs to the CNN, is motivated from modulation classification [2].
It has been found to be a promising technique for feature learning on large time series data. Our
technique of RF fingerprinting using the I/Q samples that carry embedded signatures characteristic
of different active transmitter hardware is a first in this field to the best of our knowledge. My
contributions in this project are:
• Generation of large real-time series data composed of 802.11ac signals using SDRs• Simulation study on the causes of hardware impairments of the transmitters• Developed a CNN architecture composed of multiple convolutional and max-pooling layers
optimized for the task of radio fingerprinting
• Partitioned the collected samples into separate instances for data pre-processing• Implemented CNN training in Keras running on top of TensorFlow on the Northeastern
discovery cluster environment
• Evaluated performance of CNN along with support vector machines and logistic regressionThe thesis is organized as follows. We briefly survey and classify existing approaches in Chapter 2.
In Chapter 3, we design a simulation model of a typical wireless communications processing chain
in MATLAB, and then modify the ideal operational blocks to demonstrate the RF impairments that
we wish to learn. This is followed with generation of real data and preprocessing data for training the
classifier. In Chapter 4, we architect and experimentally validate an optimized deep convolutional
neural network for radio fingerprinting. Experimental results and quantitative comparison of our
approach with support vector machines and logistic regression is provided in Chapter 5. Finally,
research challenges associated with our approach and conclusions are summarized in Chapter 6.
2
Chapter 2
Related work
There has been a significant amount of research going on in the application of deep
neural networks to cognitive radio tasks in the wireless communications field. While, the focus
is mainly on modulation classification which has shown impressive results[3]. Our interest is in
radio fingerprinting using deep learning architectures. The key idea behind radio fingerprinting is
to extract unique patterns (or features) and use them as signatures to identify devices. A variety of
features at the physical (PHY) layer, medium access control (MAC) layer, and upper layers have been
utilized for radio fingerprinting [4] in the literature. Simple unique identifiers such as IP addresses,
MAC addresses, mobile identification number (MIN), international mobile station equipment identity
(IMEI) numbers can easily be spoofed. Location-based features such as radio signal strength (RSS)
and channel state information (CSI) are susceptible to mobility and environmental changes. We
are interested in studying those features that are inherent to a device’s hardware, which are also
unchanging and not easily replicated by malicious agents. We classify existing approaches in Fig. 2.1.
2.0.1 Supervised learning
This type of learning requires a large collection of labeled samples prior to network
deployment for training the ML algorithm. It takes thousands of input samples from the devices with
labels correponding to each of the devices. The algorithm will then learn the relationship between
the samples and their associated numbers, and apply that learned relationship to classify completely
new samples (without labels) that the machine hasnt seen before. We study three types of learning
namely similarity based, classification based and deep learning based mechanisms.
3
CHAPTER 2. RELATED WORK
Figure 2.1: RF fingerprinting classification.
4
CHAPTER 2. RELATED WORK
2.0.1.1 Similarity-based
Similarity measurements involve comparing the observed signature of the given device
with the references present in a master database. In [5], a passive fingerprinting technique is proposed
that identifies the wireless device driver running on an IEEE 802.11 compliant node by collecting
traces of probe request frames from the devices. They used binning approach on the time difference
between probes as features. These bins are iterated to compute similarity by summing the difference
of the percentages and mean differences scaled by percentage. They obtained an identification
accuracy varying from 77% to 97% depending on the bin size. [6] describes a passive blackbox-based
technique, that uses transmission control protocol (TCP) or user datagram protocol (UDP) packet
inter-arrival time (ITA) from access points (APs) as signatures to identify AP types. APs exhibit
different characteristics due to the manufacturing effects, because of which each AP will act upon
the packet ITA differently. In this case, an AP is considered as blackbox, since there is no apriori
information about the architecture of the AP. They collected multiple packet traces for each AP to
compute the ITAs. An unique pattern is then extracted using wavelet analysis on these ITAs. These
time intervals are sampled using bin sizes between 1-10µs. Optimal bin size is determined based on
the difference in the ITAs among different APs, that lead to a maximum value. Cross-correlation is
used to compute the similarity between the unknown signals and the signatures extracted from the
wavelet analysis for pattern matching.
2.0.1.2 Classification-based
There are several studies on supervised learning that exploit RF features such as I/Q
imbalance, phase imbalance, frequency error, and received signal strength, to name a few. These
imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals.
There are two types of algorithms
• ConventionalThis form of classification examines a match with pre-selected features using domain knowl-
edge of the system, i.e., the dominant feature(s) must be known a priori. This requires an
expertise in the RF domain for feature engineering. [7] proposes classification by extracting
the known preamble within a packet. The preamble signals are subjected to spectral analysis
by using fast fourier transform (FFT) to obtain the spectral components from the time domain
steady part of the signal. These log spectral energy features are fed as input to the k-nearest
neighbors (k-NN) discriminatory classifier, which uses Euclidean distance to compute the
5
CHAPTER 2. RELATED WORK
distance. The training preambles are mapped into multidimensional feature space which is
divided into sections depending on the class labels. A given preamble is categorized based on
the highest frequency of occurrence of the label among all other k nearest training preambles.
This approach provides promising results with 97% accuracy to distinguish between eight
identical transmitters at 30dB signal-to-noise ratio (SNR). PARADIS [8] fingerprints 802.11
devices based on modulation-specific errors in the network interface card (NIC) of a wireless
frame. PARADIS demonstrated its effectiveness with an accuracy of 99% in distinguishing
between more than 130 similar 802.11 network interface cards (NICs). It is also shown to be
robust against alterations and noise in the wireless channel. In [9], a technique for physical
device and device-type classification called GTID is proposed. This method exploits variations
in clock skews as well as hardware compositions (such as processor, DMA controller, memory)
of the devices and applies artificial neural networks (ANNs) for classification. Unique device
specific signatures are created from the time-variant behavior of the traffic using statistical
techniques. GTID performed classification across various device classes such as iPhones,
Google phones that support variety of traffic types such as internet control message protocol
(ICMP), Skype etc and achieves high accuracy and recall on identification. In general, as
multiple different features are used, selecting the right set of features is a major challenge.
Additionally, RF domain knowledge plays significant role in extracting features, which by
itself is a time consuming task. This also causes scalability problems when large number of
devices are present, leading to increased computational complexity in training.
• Deep learningDeep learning offers a powerful framework for supervised learning approach. It can learn
functions of increasing complexity, leverages large datasets, and greatly increases the the
number of layers, in addition to neurons within a layer. [2] and [10] apply deep learning at
the physical layer, specifically focusing on modulation recognition using convolutional neural
networks. It involves identifying and differentiating broadcast radio, local and wide area
data and voice radios, radar users, and other sources of radio interference in the surroundings
which each have different behaviors and requirements. Modulation recognition is the task
of classifying the modulation type of a received radio signal with an aim to determine the
communication scheme. They classify 8 digital and 3 analog, totally 11 different modulation
schemes that are used in wireless systems. These consist of BPSK, QPSK, 8PSK, 16QAM,
64QAM, BFSK, CPFSK, and PAM4 for digital modulations, and WB-FM, AM-SSB, and
6
CHAPTER 2. RELATED WORK
AM-DSB for analog modulations. Overall, 87.4% classification accuracy is obtained on the
test dataset. However, this approach does not identify a device, as we do here, but only the
modulation type used by the transmitter.
2.0.2 Unsupervised learning
Unsupervised learning is effective when there is no prior label information about devices.
In [11], an infinite Hidden Markov Random field (iHMRF)-based online classification algorithm is
proposed for wireless fingerprinting using unsupervised clustering techniques and batch updates.
This approach can model both time-dependent features such as received signal strength (RSS), Time-
of-Arrival (TOA) and angle-of-arrival (AOA) using Markov property and time-independent features
such as I/Q offset, carrier frequency offset (CFO), phase shift difference (PSD) using embedded
gaussian mixture model (GMM). A combination of these features is used to identify the number
of devices in a simulation testbed. However, this approach is yet to be demonstrated on real set of
devices. Transmitter characteristics are used in [12] where a non-parametric Bayesian approach
(namely, an infinite gaussian mixture model) classifies multiple devices in an unsupervised, passive
manner. A multi-variable Gaussian distribution with unknown parameters is used to model the feature
space of every single device. Similarly, an infinite gaussian mixture model used for multiple devices.
The features chosen by this approach are invariant to the channel, resistant to mobility and are not
affected by transmitter/receiver antenna gain and are independent of distance. It does not need to
create a database of legitimate devices unlike supervised approaches. This approach specifically
aims to detect identity spoofing by comparing the cluster labels with the device IDs. It identifies
masquerading attacks, when it encounters multiple devices which share the same device ID.
Our choice of algorithms is deep learning based which is made up of deep neural networks,
where several hidden layers are present between the input and output nodes. These hidden layers
extract the features from the input data and perform much more complicated classification tasks over
the learned features. This approach does not require feature engineering unlike other conventional
algorithms, thus reducing human intervention in identifying features. In the recent years, deep
learning has been found to be successful in object recognition, image classification and powering
vision in robots. Also most of the voice-activated personal assistants from Alexa, Cortana, Google
assistant, Siri and other high bandwidth applications such as Youtube, Netflix are all powered with
artificial-intelligence (AI) search engines, that provide us with information/recommendation as per
user’s interests. However, building such intelligent systems is not an easy task. Training these
7
CHAPTER 2. RELATED WORK
deep learning algorithms require copious amount of data, nearly terabytes of data so that they can
perform better. This means, it involves careful selection of hyper parameters and efficient tuning of
these parameters required to solve complex functions. The number of such parameters can go up to
millions and hence careful consideration of training platform and resources is necessary. Generally
multi-core high performing GPUs are preferred to ensure efficient data processing. The complex
functions take weeks to train the large amount of data even with hundreds of GPU machines. It is
necessary to balance the trade off between training time and classification accuracy. Transmitter iden-
tification using deep learning architectures is still in a nascent stage. Our work focuses on generation
and processing of large number of RF I/Q samples to train the classifiers and eventually identify
the devices uniquely. The data collection procedure, data pre-processing, choice of parameters,
implementation details are explained in successive chapters.
8
Chapter 3
Causes of hardware impairments
Radio fingerprinting is a mechanism through which wireless devices can be identified based
on the unique characteristics in their analog components. Even though there has been an immense
growth in electronic design, RF transmitters are naturally imperfect devices due to the tolerances in
manufacturing of the analog electronics. These imperfections lead to differences in device specific
parameters such as channel doping and oxide thickness. One important thing to consider is that, these
imperfections are too small to compromise the specifications of communication standards [13]. Such
imperfections are specifically found in the transmitter front end such as frequency mixers, digital to
analog converters, band-pass filters and power amplifiers. RF fingerprint of a transmitter can not be
easily cloned and hence it provides an extra layer of security over other cryptographic mechanisms.
These fingerprints are unique to each device and cannot be replicated by any other device, since each
device adds its own impairments on the transmitted signal.
3.1 Hardware impairments
MATLAB Communications System Toolbox provides applications for the design and
analysis of communication systems. Using this, we design a simulation model of a typical wireless
communications processing chain, and then modify the ideal operational blocks to introduce RF
impairments, typically seen in actual hardware implementations. This allows us to individually study
the I/Q imbalance, phase noise, carrier frequency and phase offset, nonlinearity of power amplifier
and harmonic distortions in isolation of each other.
A block diagram of transceiver pair is shown in Fig. 3.1, with various sources of RF
impairments highlighted. We first study the effect of the hardware-induced causes of I/Q deviation
9
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
Digital Baseband
(DSP) PA
I/Q Imbalance
Nonlinear Distor;on
LO
DAC
DAC
Phase Noise
An;-aliasing Filter
Digital Baseband
(DSP)
Carrier Frequency Offset
ADC
ADC
An;-aliasing Filter
LO
Sampling Frequency Offset
LNA
Harmonics Distor;on
(a) TransmiIer
(b) Receiver
π/2
π/2
Figure 3.1: Typical transceiver chain with various sources of RF impairments.
10
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
-1 -0.5 0 0.5 1
In-phase Amplitude
-1
-0.5
0
0.5
1
Quadra
ture
Am
plit
ude
Input Symbols
Reference Points
Figure 3.2: Amplitude imbalance.
-1 -0.5 0 0.5 1
In-phase Amplitude
-1
-0.5
0
0.5
1
Quadra
ture
Am
plit
ude
Input Symbols
Reference Points
Figure 3.3: Phase imbalance.
from the ideal values.
3.1.1 I/Q imbalance:
Quadrature mixers that convert baseband to RF and vice versa are often impaired by gain
and phase mismatches between the parallel sections of the RF chain dealing with the in-phase (I) and
quadrature (Q) signal paths. The analog gain is never the same for each signal path and the difference
between their amplitude causes amplitude imbalance, i.e., it occurs when one modulator produces a
larger signal than the other. In addition, the delay is never exactly 90◦, which causes phase imbalance,
which means that the cosine and the sine local oscillator (LO) signals are not perfectly orthogonal.
Figs. 3.2 and 3.3 illustrate the effect of amplitude imbalance and phase imbalance on a 16-QAM
constellation. In practice, I/Q amplitude imbalance is expressed in the range [-5, 5] dB, whereas
phase imbalance in the range [-30, 30] degrees.
3.1.2 Phase noise
The up-conversion of a baseband signal to a carrier frequency fc is performed at the
transmitter by mixing the baseband signal with the carrier signal. Instead of generating a pure tone at
frequency fc, i.e., ej2πfct, the generated tone is actually ej2πfct+φ(t), where φ(t) is a random phase
noise. The phase noise introduces a rotational jitter as shown in Fig. 3.4. Phase noise is expressed
in units of dBc/Hz, which represents the noise power relative to the carrier contained in a 1 Hz
11
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
-1 -0.5 0 0.5 1
In-phase Amplitude
-1
-0.5
0
0.5
1
Quadra
ture
Am
plit
ude
Input Symbols
Reference Points
Figure 3.4: Phase noise.
-1 -0.5 0 0.5 1
In-phase Amplitude
-1
-0.5
0
0.5
1
Quadra
ture
Am
plit
ude
Input Symbols
Reference Points
Figure 3.5: Phase offset.
bandwidth centered at a certain offset from the carrier. Typical values of phase noise level is in the
range [−100, −48] dBc/Hz, with frequency offset in the range [20, 200] Hz.
3.1.3 Carrier frequency and phase offset
The performance of crystal oscillators used to generate the carrier frequency is specified
with a certain accuracy in parts per million (ppm). The difference in transmitter and receiver carrier
frequencies is referred to as carrier frequency offset (CFO). Due to CFO, the received signal spectrum
shifts by a frequency offset:
y(t) = x(t)ej2π(fTx−fRx)t = x(t)ej2π∆CFOt (3.1)
where ∆CFO is the shift introduced by CFO between transceiver.
Phase shift difference is defined as the phase shift from one constellation to a neighboring
one. The uniqueness of CFO and phase offset in each transceiver pair make them excellent features
for classification of devices. Although orthogonal frequency divison multiplexing (OFDM) uses
different modulation techniques and each technique produces a specific constellation, most of the
constellations share some commonalities. For example, the phase shifts from one symbol to the next
one are created in the similar way in hardware and are transmitter dependent. Thus, for the sake
of simplicity, we use quadrature phase shift keying (QPSK) as an example and consider features
extracted from the constellation of QPSK as shown in 3.5. In QPSK, four symbols with different
12
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
-1 -0.5 0 0.5 1
In-phase Amplitude
-1
-0.5
0
0.5
1
Quadra
ture
Am
plit
ude
Input Symbols
Reference Points
Figure 3.6: AM/AM distortion.
-1 -0.5 0 0.5 1
In-phase Amplitude
-1
-0.5
0
0.5
1
Quadra
ture
Am
plit
ude
Input Symbols
Reference Points
Figure 3.7: AM/PM distortion.
phases are transmitted and each symbol is encoded with two bits. The phase difference between
two consecutive symbols is ideally 90. However, the transmitter amplifiers for I-phase and Q-phase
might be different. Consequently, the degree shift can have some variances. The constellation may
deviate from its original position due to hardware variability, and different devices may have different
constellations. Therefore, phase shift can be considered as a main feature.
3.1.4 Harmonic distortions
The harmonics in a transmitted signal are caused by nonlinearities in the transmitter-side
amplifiers. These harmonics are unique to the transmitting device. Harmonic distortion is measured
in terms of total harmonic distortion, which is a ratio of the sum of the powers of all harmonic
components to the power of the fundamental frequency of the signal. This distortion is usually
expressed in either percent or in dB relative to the fundamental component of the signal.
3.1.5 Power amplifier distortions
Power amplifier (PA) non-linearities mainly appear when the amplifier is operated in its
non-linear region, i.e., close to its maximum output power, where significant compression of the
output signal occurs. The distortions of the power amplifiers (PA) are generally modeled using
AM/AM (amplitude to amplitude) and AM/PM (amplitude to phase) curves. If we consider a complex
13
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
baseband signal x(t) = a(t)ejφ(t) , the output of the PA can be written
yPA(t) = AM(a(t))ej[φ(t)+PM(a(t))] (3.2)
where AM(a(t)) is the AM/AM function describing the PA output amplitude as a function of the
input signal amplitude, and PM(a(t)) is the AM/PM function describing the PA output phase as a
function of the input signal amplitude.
The AM/AM causes amplitude distortion whereas AM/PM introduces phase shift. As
shown in Fig. 3.6, the corner points of the constellation have moved toward the origin due to amplifier
gain compression. The constellation has rotated due to the AM/PM conversion in Fig. 3.7. The
nonlinearity of amplifier is modeled using Cubic Polynomial and Hyperbolic Tangent methods using
Third-order input intercept point (IIP3) parameter. IIP3 expressed in (dBm) represents a scalar
specifying the third order intercept.
3.2 Data Collection
Figure 3.8: Data collection using SDR.
Data collection is the first and foremost step in machine learning. The performance of
our predictive model is based on the quality and quantity of data that is gathered, and hence data
collection is the critical step. Our approach of deep learning applied for RF fingerprinting requires
ample data in order for the training to be effective. In our case, we generate raw I/Q samples and
transmit these samples over air and finally collect them at the receiver. We collect millions of samples
14
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
from each of the devices and generate corresponding class labels. For the purpose of data collection
at the receiver end, we use a fixed USRP B210. For the transmitter we use 4 different devices i.e.,
USRP B210s and X310s. Fig. 3.8 shows I/Q raw data collection using the SDR.
3.2.1 Protocols of operation
We transmit different physical layer frames defined by the IEEE 802.11ac and LTE
standards (as parameters defined in technical specification 36.141) on each transmitter SDR. These
frames are generated using the MATLAB WLAN Systems toolbox and LTE Systems toolboxes,
which provides standard-compliant functions for the generation of the waveforms. The data frames
generated are random since we intend to transmit any data streams. Once the waveforms are
generated, these protocol frames are streamed to the selected SDR for transmission, considering
separately the cases of over-the-air wireless propagation and through RF cable. The latter approach
eliminates wireless channel effects and captures the signals as they are modified by the transmitter.
The receiving SDR samples the incoming signals at 1.92 MS/s sampling rate at center frequency
of 2.45 GHz for WiFi and 900 MHz for LTE. Ultimately, we study the performance of different
learning algorithms, including linear support vector machine (SVM), logistic regression, and CNNs,
using I/Q samples collected from an experimental setup of USRP SDRs.
Figure 3.9: Experimental setup demonstrating data capture.
15
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
As shown in Fig.3.9, the host computer enabled with MATLAB WLAN and LTE toolboxes
generates waveforms and transmits these waveforms through an X310 USRP. These waveforms are
received through another USRP, a B210 which is connected to another computer through a high
speed link and that has all the required MATLAB packages to receive and store the raw IQ samples.
The workstations are equipped with typical configurations of Core-i7 processor, 8GB RAM, and
flash-based 512GB storage. Data is collected using different B210/X310 USRPs at the transmitter
end, but the receiver is kept fixed. Such experiments are repeated over distances starting from 2ft to
50ft with an interval of 4ft. Overall, we collect approximately 20 million samples for each of the five
SDRs at each distance.
3.2.2 Storage and processing
The samples are further analyzed offline over Northeastern’s Discovery cluster located
at Massachusetts Green High Performance Computing Center (MGHPCC). It provides high-end
research computing resources such as centralized high performance computing (HPC) clusters and
storage, visualization and software. There are 30352 compute cores shared across all users. The
method to connect to the Discovery cluster is via ssh (secure shell). Partitioning of discovery cluster
into dedicated CPU and GPU nodes is shown in Fig.3.10 The configuration details of the nodes
Figure 3.10: Discovery cluster partitioning.
which we use is given below. Each of these CPU nodes have : 2x Intel(R) Xeon(R) CPU E5-2680
v4 @ 2.40GHz with 28 physical / 56 logical cores and are equipped with 500GB RAM, whereas on
each node GPU has 4x NVIDIA Tesla K80 boards equipped with 4992 @ 560MHz CUDA cores per
board and 24 GB GDDR5 memory per board. These GPU servers are on a 10Gb/s TCP/IP backplane.
The collected complex I/Q samples are partitioned into subsequences in the cluster environment
16
CHAPTER 3. CAUSES OF HARDWARE IMPAIRMENTS
before passing onto the classifiers. For our experimental study, we set a fixed subsequence length of
128, additional details of data preprocessing are provided in Chapter 4.
3.2.2.1 Signal metadata format
Signal Metadata Format (SigMF) is a standard way to store the signal data [14]. Deep
learning works best when large amounts of data are available. Since, deep learning in RF domain is
in nascent stage, sharing these datasets is important in order to reproduce the experimental results and
provide direct access to those users who do not have direct access to the tools/equipments required to
generate the datasets. SigMF is a method of sharing metadata descriptions of the captured signal
data written in JSON. It stores signal data using two files:
• JSON format text file which is made up of:• Core data namespaces: gives general file information• global : It includes information applicable to the whole recording such as description
of the SigMF recording, hardware used to make recording, sample rate and the data
file format
• capture : It provides parameters of the signal capture such as the center frequency ofthe signal and the sample index at which the segment takes effect
• annotations : It includes the signal data which is not part of the captures and globalsuch as the number of samples that each segment applies to and the frequency of the
lower/upper edge of the feature
• Extension namespace: used to define fields that are not in the core namespace i.e., capturedetails such as:
• signal reference number: the sequential label for signals in a data file• the type of RF transmitter• manufacturer of the transmitter• the source of the RF signal.
• A binary file, where IQ samples are stored as defined in in the ’datatype’ field in the metadatafile, for example ci16: complex data type of integer 16-bit data
It is encouraged to store all signal datasets in widely accepted formats such as SigMF as a standard
practice.
17
Chapter 4
Deep learning for RF fingerprinting
Assume a set of multiple wireless devices placed in a room and the task for one of the
devices is to identify rest of the devices uniquely. The identification is purely based on the inherent
hardware characteristics of the devices, which can be used as their unique signatures. To enable
the task of RF fingerprinting, we collected raw data from all the devices and build a model that can
effectively perform the classification. Different learning algorithms such as SVM, logistic regression
and CNNs are used to fit the data. Based on the preliminary results, we choose CNN to be the
best working model compared to other conventional ML algorithms. CNN really shines when it
comes to complex problems such as image classification, natural language processing, and speech
recognition and from our analysis we can say that it indeed is the most suitable choice for RF
fingerprinting as well. RF fingerprinting using CNN solves one of the major hurdles associated with
feature engineering. Deep learning offers algorithms that learns features and hence we do not have to
worry about specifically selecting features of interest. The major challenge that we faced with this
approach is finding the right model that nearly perfectly fits our data. Different components of the
CNN architecture and parameter selection, followed by hyper parameter tuning are presented in this
chapter.
4.1 Initial studies on ML techniques
As part of our preliminary experiments, we started with shallow (single layer) supervised
learning classifiers such as linear support vector machine (SVM) and logistic regression [15]. Several
features such as amplitude, phase and FFT values along with mean, standard deviation, normalized
phase, absolute normalized frequency components are extracted from the I/Q samples to build a rich
18
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
set of features to train the classifiers. The frequency components of the samples are computed using
the FFT function in MATLAB.
4.1.1 Support vector machines
The SVM classifier is a supervised ML approach used for classification problem. It
is based on finding the hyperplane that segregates any two classes. The right hyperplane is the
one with the largest margin between the closest data point and the hyperplane. Selecting the best
hyperplane is necessary to ensure robustness in the classification. For a dataset of points xj ∈ Rd
and corresponding labels yj ∈ {−1, 1}, j = 1, . . . , N , the hyperplane is given by
f(x) = x′β + b = 0 (4.1)
The optimal hyperplane is obtained by finding β ∈ Rd and b ∈ R that minimizes
‖β‖22 + CN∑j=1
ζj (4.2)
subject to the constraints that
yj(f(xj) + ζj) ≥ 1 (4.3)
for all the data points (xj , yj), where ζj are the slack variables and in our experiments C is set to 1.
We use libraries/packages to implement support vector machines. In order to best fit the data with
hyperplane, we use Linear SVC (Support Vector Classifier). Python offers huge set of libraries and
for our training we use sklearn library which provides functions like LinearSVC (linear kernel) to
perform classification. The features which are already computed are fed into this classifier along
with label information to predict the performance of SVM. We chose LinearSVC since it offers
flexibility in choosing parameters. We also used squared hinge loss function and `2 regularizer to
prevent overfitting. These parameters helps in better scaling to larger dataset and faster convergence.
4.1.2 Logistic regression
Logistic regression is another supervised learning algorithm which transforms its output
using the logistic sigmoid function. This is the core of logistic regression which squashes any real
values into values between 0 and 1. Each of the returned probability value can then be mapped to
two or more discrete classes. Logistic regression can be thought of as a single-neuron dense neural
19
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
network. In logistic regression, again yj is binary in {−1,+1} and
P(yi = +1) = σ(β′xj + b) =1
1 + e−(β′xj+b)
(4.4)
We use scikit library to train the model in Python. This algorithm learns the regression variables β
and b that minimizes the squared error between each yj and ŷj . Cross entropy is the loss function
used for this learning. Overfitting is handled using `2 regularizer. New data points x are classified
based on σ(β′x + b). Classification is performed on three different datasets. First task is to classify
devices that operate on WiFi and the second task is to identify devices that use LTE. Last one being
the data combined from devices that use both WiFi and LTE. For each of these cases, data is divided
into three parts namely, training, validation and testing sets. Ultimately, the performance of the
aforementioned classifier is measured on the testing data which is not seen by the trained model.
Fig. 4.1 provides the accuracy obtained through cross validation for the validation data using both
SVM and logistic regression. Results were obtained for various combination of devices over air
for both WiFi and LTE respectively. In Fig. 4.1, we also report the accuracy in performance for
WiFi LTE WiFi-LTE40
50
60
70
80
90
100
% A
ccur
acy
Logistic Regression B210-B210Linear SVM B210-B210Logistic Regression B210-X310Linear SVM B210-X310
Figure 4.1: Device classification using Logistic Regression and Linear SVM for WiFi and LTE.
identifying different protocols. Being able to detect a protocol considerably reduces the number of
feasible constellations supported by the protocol, which in turn influences the constellation type and
20
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
structure. One important thing to note is that SVM and logistic regression are both able to achieve
high accuracies (≈ 90%) for the simpler task of protocol detection compared to device recognitionaccuracy of less than ≈ 60%.
4.2 Convolutional neural networks
Convolutional neural networks (ConvNets or CNNs) are a category of neural networks
that have found to be very effective in areas such as image recognition and classification [16] The
success of CNNs in recognizing faces, things, speech domain as well as empowering vision in robots
motivates our investigation in using these networks for radio fingerprinting. Our first challenge was
to understand, what these neural networks are made up of and how they can be used to achieve our
task. An Artificial Neural Network (ANN) is a model inspired by the neurons in the human brain.
The computation of ANN is similar to that of brain functioning, with neuron being the basic unit
of computation. Each neuron receives input from either an external source or from other neurons.
The input passed into the neuron is associated with weight, which is assigned based on the input’s
relative importance with other inputs. The neurons apply a non linear function namely activation
function to the entire weighted sum of inputs and ultimately computes an output. It is important to
use an activation function since most of the data is non linear and activation function introduces non
linearity to the neuron’s output [17]. The three most important activation functions are:
• Sigmoid : It takes the input and maps into a value between 0 and 1• Relu : Rectified Linear Unit takes input and replaces negative values with zero. This is done
by finding maximum value between the input and zero
• tanh : It takes the input and maps into a value in the range of [-1,1]A neural network is made up of input layer and multiple interconnected neurons in the middle layers
called as hidden layers followed by output layer. The CNNs are similar to ordinary neural networks
but are made up of multiple hidden layers and fully connected layers. Additionally, CNNs slides
a filter across the input dimensions, with the filter’s weights being shared across all the slides in
that particular layer. This results in lesser parameters as compared to far more parameters in regular
neural networks.
21
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
4.2.1 CNN architecture
The proposed method consists of two stages, i.e., a training stage and an identification
stage. In the former, the CNN is trained using raw IQ samples collected from each SDR transmitter
to solve a multi-class classification problem. In the identification stage, raw I/Q samples of the
unknown transmitter are fed to the trained neural network and the transmitter is identified based
on observed value at the output layer. In this section, we first describe the CNN architecture and
then present preprocessing of input data necessary to improve the performance. There exists several
CNN architectures namely LeNet, ResNets, AlexNet, GoogleNet, VGGNet, ZFNet, DenseNet. Our
CNN architecture is inspired in part by AlexNet [18], which shows remarkable performance in
image recognition. As shown in the Fig. 4.2, our network has four layers, which consists of two
convolutional layers and two fully connected or dense layers. Our goal is to first understand how the
Figure 4.2: CNN architecture for RF fingerprinting.
layers are stacked and the functional operation of the layer components. The most difficult challenge
in building the CNN network is to find how many layers to use, how many filters/kernels in each
layer, what the filter sizes, values for padding and stride should be. None of these are standard and
22
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
the complexity of the network depends on the type of data and its processing. A lot of effort was
spent on experimenting with different parameters and ultimately finding the right combination of
these hyperparameters that generalizes our data well.
We describe various CNN components and hyperparameters in detail in this chapter. The
input to the CNN is a windowed sequence of raw I/Q samples with length 128. Each complex value is
represented as two-dimensional real values, which results in the dimension of our input data growing
to 2× 128. This is then fed to the first convolution layer.
4.2.1.1 Convolution layer
The convolution layer is the core building block of the CNN, whose primary purpose is
to extract features from the input data. It consists of a set of spatial filters (also called kernels, or
simply filters) that perform a convolution operation over input data. The operation of the convolution
filter is shown with an example in Fig. 4.3 for intuitive understanding. A filter of size 2 × 2 is
Figure 4.3: Convolution operation: filters strided over input sequences.
convolved with input data of size 4 × 4 by sliding across its dimension. This convolution meanscomputing the element wise multiplication between input matrix and the filter matrix and then sum
all the multiplication outputs that produces a single value in the output matrix. Such convolution is
performed over the entire input to produce a two-dimensional feature map/activation map. The next
hyperparameter is called stride, which controls how the filter convolves around the input data. In the
Fig. 4.3, we set stride to 1, i.e., the filter convolves around the entire input matrix by shifting one
value at a time. In generic, stride is the sliding interval of the filter and determines the dimension
of the feature map. Our example produces a feature map of dimension 3 × 3 at the end of the
23
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
convolution. In our architecture, each convolution layer consists of a set of such filters, which in turn
operates independently to produce a set of two-dimensional feature maps.
4.2.1.2 ReLU activation
Convolution is a linear operation which involves basic element wise multiplication and
additions. Therefore to introduce non linearity to the system ReLU (Rectified Linear Units) layers
are used after each of the convolution layers. Their main function is to perform a pre-determined
non-linear transformation on each element of the feature map. There are many possible activation
functions, such as sigmoid and tanh; we use the ReLU function, as CNNs with ReLU train faster
compared to alternatives with greater computational efficiency. It also reduces the vanishing gradient
problem, where network training becomes slower because the gradients reduces exponentially to
minimal values close to zero. Mathematically it is expressed as:
f(x) = max(0, x) (4.5)
Figure 4.4: Rectified Linear Unit (ReLU) operation performed on feature maps.
As shown in Fig. 4.4, ReLU outputs max(x, 0) for an input x, replacing all negative
activations in the feature map by zero.
4.2.1.3 Pooling layers
The convolution layer is generally followed by a pooling layer. Its functionality is to (a)
introduce shift invariance (as well as (b) reduce the dimensionality of the rectified feature maps of the
preceding convolution layer, while retaining the most important information. We choose a pooling
24
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
layer with filters of size 2× 2 and stride 2, which downsamples the feature maps by 2 along boththe dimensions. Among different filter operations (such as average, sum), max pooling gives better
performance. As shown in Fig. 4.5, max pooling of size 2× 2 with stride 2 selects the maximumelement in the non-overlapping regions (shown with different colors). We apply pooling operation
separately onto each of the feature maps. Thus, it reduces the dimensionality of the feature maps,
Figure 4.5: An illustration of max pooling operation.
which in turn reduces the number parameters and computations in the network and control overfitting.
Additionally it makes network invariant to any sort of transformations in the input data.
4.2.1.4 Fully connected layers
A fully connected or dense layer is a traditional Multi Layer Perceptron (MLP), where
the neurons have full connections to all activation steps in the previous layer, similar to regular
neural networks. The output of the second pooling layer is provided as input to the fully connected
layer. Its primary purpose is to perform the classification task on high-level features extracted from
the preceding convolution layers. At the output layer, a softmax activation function is used. The
classifer with softmax activation function gives probabilities (e.g. [0.9, 0.09, 0.01] for three class
labels), i.e., it ensures the sum of the probabilities from the fully connected layer is 1. To sum it
up, the convolution, pooling layers function as feature extractors from the input data while the fully
connected layers (dense layers) perform the classification based on these features. The network
architecture for our RF fingerprinting is shown in Table 4.1.
Next, we discuss the selection of hyperparameters of CNN to optimize the performance,
followed by preprocessing of input data necessary for proper operation of CNN and finally shift-
25
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
Table 4.1: CNN architecture.
Layer Output dimensions
Input 2*128Conv1 50*128Conv2 50*128
FC/ReLU 256FC/ReLU 80
FC/Softmax 4
invariance property of our classifier.
4.2.1.5 Model selection
We start with a baseline architecture consisting of two convolution layers and two dense
layers, then progressively vary the hyperparameters to analyze their effect on the performance. The
first parameter is the number of filters in the convolutional layers. We observed that the number of
filters within a range of (30 − 256) provide reasonably similar performance. However, since thenumber of computations increases with an increase in the number of filters, we set 50 filters in both
convolution layers for balancing the performance and computational cost. Similarly, we set 1× 7 and2×7 as the filter size in the first and second convolution layer respectively, since larger filter size doesnot offer significant performance improvement. Furthermore, increasing the number of convolution
layers from 2 to 4 shows no improvement in the performance, which justifies continuation with two
convolution layers. We then try to analyze the effect of the number of neurons in the first dense
layer by varying it between 64 to 1024. Interestingly, we find that increasing the number of neurons
beyond 256 does not improve the performance. Therefore, we set 256 neurons in the first dense
layer. In all of these parameters selection, we observe that having a single fully connected layer or
increasing the number of neurons to as large as 1024, increases the model complexity and makes the
training slower. Overfitting is one of the major problems during network training, during which the
network weights gets tuned so well to the training examples while the network fails to perform well
when given the unseen data. Thus we need to take measures to alleviate the problem of overfitting.
We use dropout layer whose main function it to drop out a set of activations in that specific layer by
setting them to zero. By doing this, we can make our network robust and ensure that it does not get
too fitted to the training data. After finalizing the architecture and parameters of CNN, we carefully
26
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
select the regularization parameters as follows: We use a dropout rate of 50% at dense layers. In
addition, we use an `2 regularization parameter λ = 0.0001 to avoid overfitting.
4.2.1.6 Preprocessing data
Our experimental studies conducted on different representative classes of ML algorithms
demonstrate significant performance improvement by choosing deep CNN. However, to ensure
scalable performance over large number of devices, our CNN architecture needs to be modified. In
addition, our input I/Q sequences, which represent a time-trace of collected samples, need to be
suitably partitioned and augmented beyond a stream of raw I/Q samples. Our classifiers operate
on sequences of I/Q samples of a fixed length. In general, given sequences of length L, we can
create N = L/` subsequences of length ` by partitioning the input stream. We thus create L − `subsequences by sliding a window of length ` over the larger sequence (or stream) of I/Q samples.
Training classifiers over small subsequences leads to more training data points, which in turn yields a
low variance but potentially high bias in the classification result. Conversely, large sequences may
lead to high variance and low bias. We set 128 as sequence length. From a wireless communications
viewpoint, the channel remains invariant in smaller durations of time. Hence, the ability to operate
on smaller subsequences carved out of in-order received samples allows us to estimate the complex
coefficients representing the wireless channel. Thus we train our classifiers over the input I/Q
sequences by treating each real and imaginary part of a sample as two inputs, leading to a training
vector of 2× ` samples for a sequence of length `.
4.2.1.7 Shift invariance
Another prominent characteristic of our CNN classifier both with respect to our final goal
of identifying the transmitting device, but also in terms of feature extraction, is shift invariance. In
short, all events like I/Q imbalance, phase noise, carrier frequency and phase offset, and nonlinearity
of power amplifier and harmonic distortions can occur at an arbitrary position in a given I/Q sequence.
A classifier should be able to detect a device-specific impairment irrespectively of whether it occurs
at e.g., the 1-st or 15-th position of an I/Q sequence. Convolved weights in each layer detect signals
in arbitrary positions in the sequence, and a max-pool layer passes the presence of a signal to a higher
layer irrespectively of where it occurs.
To enhance the shift-invariance property of our classifier during training, we train it over sliding
windows of length ` as shown in Fig. 4.6, rather than partitioned windows: this further biases the
27
CHAPTER 4. DEEP LEARNING FOR RF FINGERPRINTING
trained classifiers to shift-invariant configurations. In our initial experiments, we verified the efficacy
1 2 ⋯ 128 129 ⋯ ⋯ N
1 ⋯ 128 129 ⋯ 256 ⋯ ⋯ ⋯ M
Slidingoperation
IQsamples aftersliding
Figure 4.6: An illustration of sliding operation using a window of length 128.
of using sliding window by comparing the performance of our CNN with data preprocessed using
partitioned windows. We observed an improved performance with the usage of sliding window.
Finally, since deep learning performs well with large data, it was evident from our analysis that
sliding window is an efficient way for data augmentation.
28
Chapter 5
Results and performance evaluation
5.1 Network setup
The performance of the CNN architecture for RF fingerprinting is analyzed on the raw
IQ samples collected from USRPs. We use MATLAB as the host based software to interact with
the USRP radios. Once the data is collected at the receiver end, the samples are first partitioned
into subsequences in the Northeastern’s Discovery cluster. The details of the software packages and
structuring is shown in the figure 5.1. The core software implementation is in Python language. It
is easy to read and write than other programming languages. It offers a wide varieties of standard
libraries and built-in functions. In addition to it, there are many third-party open source libraries
offering high end modules for a wide range of applications. The compute nodes in the discovery
cluster are equipped with CUDA, a parallel computing platform and programming model by NVIDIA
for the computing purpose on the graphical processing units (GPUs). We implement our CNN
training and classifier in Keras which is a model level library that provides building blocks for
deep learning [19]. In the backend, we use TensorFlow library, a specialized well optimized tensor
manipulation library for high dimensional matrix operations. We install these packages in the open
source distribution of Python, namely Anaconda which is specifically used for machine learning
related applications. It eases package, environment management and deployment. All of these
packages are installed on a NVIDIA Cuda enabled Tesla K80m GPU, which is our platform for
training and evaluation.
29
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
Figure 5.1: Software stack.
5.2 Evaluation
Our CNN implementation has a network depth of 5 layers, with 50 filters in layers 1
and 2, 256 neurons in layer 3, 80 neurons in layer 4 and the final classifier with 4 neurons. Each
convolution layer is followed by a max pooling layer with pool size 2. We calculate total error at
the output neurons and propagate these error back through the network using back propagation to
calculate gradients.Thus the important function during training is finding right set of weights that
fit our data well and classifies devices correctly by reducing the error at the output layer. This is
done using optimizers, whose basic purpose is to update weights using gradients. In our network
we use Adam, an optimization method which is well suited for problems that are large in terms of
data and parameters. Here, we should also consider another parameter called learning rate, which
decides by how much we update the network weights by their gradients. It is important to note that,
the learning rate should be chosen carefully. If it is too high, the network learns faster at the cost
of divergence and never reach a global minimum. On the other hand, if the learning rate is too low,
30
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
then the network learns too slow and may take days together to converge. Optimizer goes hand in
hand with learning rate and decides how it should use current weight gradients along with previous
weight gradients to decide the learning rate. Adam optimizer uses the gradients to find an adaptable
learning rate for each individual weight (parameter) unlike stochastic gradient descent where a single
learning rate is set for all weight updates i.e., the learning rate doesn’t change during training. The
next parameter is called Batch size, which defines the total number of examples taken from a dataset
at once to perform optimization. The choice of these parameters is done by progressively varying
their values to analyze the effect on the performance. The following steps summarizes the network
training process:
• The first step is the initialization of filters and weights. it is done using the Glorot uniforminitializer, also called Xavier uniform initializer. It draws samples from a uniform distribution
within [-limit, limit] where
limit =√
6/(fan in+ fan out) (5.1)
where fan in is the number of input units in the weight tensor and fan out is the number of
output units in the weight tensor.
• Training data is passed as input to the network and it goes through forward propagation step(convolutional, Relu and pooling operations along with fully connected layers) to find the
output probabilities for each of the classes.
• Total error is calculated at the output layer using categorical cross entropy loss function whichinternally uses softmax function.
• Gradients of the error is calculated w.r.t all the weights in the network and Adam optimizerupdates all filter weights and parameters to minimize the output error.
Finally, we evaluate the performance of our CNN algorithm using k-fold Cross Validation
technique. The value of k is set to 5 in our case. It is done by splitting the training dataset into 5
folds and takes turns training models on all the folds except one which is held out. This is followed
by evaluating model performance on the hold out dataset. The same process is repeated until each
of the subfolds becomes a part of the hold out dataset. Thus we can measure the trained model’s
performance on the unseen data and avoid overfitting. This leads us to obtain less biased estimate
of the performance of our model. We have used StratifiedKFold class from the scikit-learn Python
31
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
machine learning library to split up the training dataset into 5 folds. Our training set consists of
≈ 720K training examples and ≈ 80K examples for validation. We use another 200K examples fortesting the performance of our trained model. We also represent the class labels associated with the
devices as binary vectors since classification works better when the categorical variables are mapped
into binary values. This ensures equal importance is given to all the devices. It took ≈ 23min totrain our model. Performance evaluation on hold out dataset of 200K examples took only ≈ 2min.There exists several metrics to evaluate the model performance, of which accuracy which gives the
proportion of correct classifications among all the classifications is not a good measure. This is
because if the data is imbalanced the output predictions may show that every instance belongs to the
majority class (99%). Hence we do not solely rely on accuracy but use better metrics such as Area
Under the Curve (AUC), which is evaluated on the Receiver Operating Characteristic (ROC) curve
comprising true positive rate on the Y-axis and false positive rate on the X-axis.
5.2.1 CNN vs. conventional algorithms
We first measure the performance of our WiFi dataset using SVM and logistic regression
for the classification of nominally similar devices. We extract several features such as amplitude,
2 3 4 5
Number of transmitters
0
10
20
30
40
50
60
70
80
90
100
Accu
racy (
%)
SVM Logistic Regression CNN
Figure 5.2: The accuracy comparison of SVM, logistic regression and CNN for 2− 5 devices.
phase and FFT values along with mean, standard deviation, normalized phase, absolute normalized
frequency components from the raw I/Q samples and built a rich set of features to train the classifiers.
We obtain the classification accuracy for identification among 2, 3, 4 and 5 devices. As seen in
32
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
Fig. 5.2, accuracy measure with SVM and logistic regression algorithms for 2 devices is ≈ 55% andit decreases further as the number of devices increases. The performance deterioration can be clearly
seen in the Fig. 5.2.
We then train our CNN classifier using raw data to classify the same set of devices. With
our deep CNN network, we are able to achieve accuracy 98% for five devices, as opposed to less
than ≈ 33% for the shallow learning SVM and logistic regression algorithms.
5.2.2 Receiver operating characteristics for radio fingerprinting
We obtained false positive rate and true positive rate to measure AUC. Figs. 5.3, 5.4, 5.5, 5.6
and 5.7 shows the ROC curve for the classification of four similar WiFi devices, for each folds of
cross validation. We can see that the CNN model works extremely well, as AUC ranges between
0.93 and 1. The AUC attained for each device is 0.964, 0.936, 1, and 0.994, respectively as shown in
Fig. 5.3. This demonstrates that CNN is the effective model for radio fingerprinting. Additionally,
training our CNN network over a large dataset with Keras takes significantly lower time compared to
any other aforementioned algorithms. To demonstrate this, Fig. 5.8 shows computational load for
training, scaled as a function of the number of training examples and estimated time for every epoch
on average. Clearly performance with GPU is faster than the CPU.
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rate
Receiver operating characteristic example
B210 #1 (area = 0.96402)B210 #2 (area = 0.93601)B210 #3 (area = 1.00000)X310 #1 (area = 0.99461)
Figure 5.3: ROC curve fold1.
33
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rat
e
Receiver operating characteristic example
B210 #1 (area = 0.96194)B210 #2 (area = 0.93165)B210 #3 (area = 1.00000)X310 #1 (area = 0.99391)
Figure 5.4: ROC curve fold 2.
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rat
e
Receiver operating characteristic example
B210 #1 (area = 0.96378)B210 #2 (area = 0.93489)B210 #3 (area = 1.00000)X310 #1 (area = 0.99431)
Figure 5.5: ROC curve fold 3.
34
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rat
e
Receiver operating characteristic example
B210 #1 (area = 0.96502)B210 #2 (area = 0.93441)B210 #3 (area = 1.00000)X310 #1 (area = 0.99504)
Figure 5.6: ROC curve fold 4.
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True Positive Rate
Receiver operating characteristic example
B210 #1 (area = 0.96418)B210 #2 (area = 0.93030)B210 #3 (area = 1.00000)X310 #1 (area = 0.99475)
Figure 5.7: ROC curve fold 5.
5.2.3 Impact of distance on radio fingerprinting
We run experiments to collect data over a distance ranging between 2-50 ft over steps
of 4 ft, to evaluate the impact of distance (and possible multipath effect owing to reflections) on
35
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
Figure 5.8: Computational load.
0 2 6 10 14 18 22 26 30 34 38 42 46 50
Distance (ft)
60
65
70
75
80
85
90
95
100
Accu
racy (
%)
5
10
15
20
25
30
35
40
45
50
SN
R (
dB
)
Accuracy (%)
Observed SNR in dB
Analytical SNR in dB
Figure 5.9: The plot of accuracy obtained using CNN for 4 devices over different distances between
transmitter and receiver.
classification accuracy. Fig. 5.9 demonstrates the accuracy measure for the classification of 4 devices
using CNN. It achieves classification accuracy greater than 95% up to the distance of 34ft. In addition,
the observed SNR and analytical SNR (calculated using free-space path model) are shown in the
same plot to elucidate the effect of received SNR on the classification accuracy. It is evident that the
classification is robust against the fluctuations in SNR occurred due to path loss and multipath fading
36
CHAPTER 5. RESULTS AND PERFORMANCE EVALUATION
up to the distance of 34ft.
37
Chapter 6
Conclusion
With the increase in the demand for high data rate applications and the advancement in the
IoT space enabling millions of devices to stay interconnected, wireless security has become one of
the crucial functionality. In addition to this, the available spectrum is limited to support enormous
amount of mobile devices. Therefore, the need for novel techniques for identifying devices and
hence detect malicious activity and gain spectrum awareness is of great importance. Existing device
fingerprinting approaches require feature engineering and are not efficient enough to train large
datasets. We propose a radio fingerprinting approach based on deep learning CNN architecture to
train using I/Q sequence examples. Our design enables learning features embedded in the signal
transformations of wireless transmitters, and identifies specific devices. Furthermore, we have shown
that our approach of device identification with CNN outperforms alternate ML techniques such
as SVM, logistic regression for the identification of four nominally similar devices. Finally, we
experimentally validate the performance of our design on a dataset collected over range of distances,
2 ft to 50 ft. We observe that detection accuracy decreases as the distance between transmitter and
receiver increases. We also show how computational resources such as Keras running with GPU
support speed up the training time. Our future work involves increasing the robustness of the CNN
architecture to allow scaling up to correct identification of 1000s of similar radios.
6.1 Research challenges
We now summarize the challenges associated with the implementation of CNNs for radio
fingerprinting. In our experiments, we set the partition length as 128 through a rectangular windowing
process. However, identifying the optimal length is a critical research objective and should be depen-
38
CHAPTER 6. CONCLUSION
dent on the channel coherence time. Varied CNN architectures may lead to significantly different
results. Finding an optimal architecture which enhances device classification is an open research
issue. A related challenge is obtaining the right balance between training time and the classification
accuracy. Increasing the depth of the CNN beyond a point may not help the classification; in fact
there are risks of overfitting the training set, as we found in some of our early experiments. Our
work focuses on training the model with actual experimental data while a large body of earlier
works attempt to solve a similar problem using synthetic data. There exists no standard dataset to
benchmark the performance of our classifier, and releasing all datasets in widely accepted formats
such as SigMF is essential for correct replication of experiments. Our classifier performs very well
on limited set of devices, however to identify large number of devices (1000s) and also at wide
distances of 100-200 ft, it may require us to effect major changes in the architecture and find new
optimum parameters. Additionally, the effects of wireless channel conditions on the classification
accuracy is yet to be studied. It is important to note that our technique relies on the fact that devices
can be identified uniquely based on their hardware imperfections. This leads to a wide scope to
determine the kind of features that can be learnt in the wireless domain.
39
Bibliography
[1] J. Mitola, “Software radio architecture: a mathematical perspective,” IEEE Journal on Selected
Areas in Communications, vol. 17, no. 4, pp. 514–538, Apr 1999.
[2] T. J. O’Shea and J. Corgan, “Convolutional radio modulation recognition networks,” CoRR, vol.
abs/1602.04105, 2016. [Online]. Available: http://arxiv.org/abs/1602.04105
[3] N. E. West and T. O’Shea, “Deep architectures for modulation recognition,” in 2017 IEEE
International Symposium on Dynamic Spectrum Access Networks (DySPAN), March 2017, pp.
1–6.
[4] Q. Xu, R. Zheng, W. Saad, and Z. Han, “Device fingerprinting in wireless networks: Challenges
and opportunities,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp. 94–104,
Firstquarter 2016.
[5] J. Franklin, D. McCoy, P. Tabriz, V. Neagoe, J. Van Randwyk, and D. Sicker, “Passive data link
layer 802.11 wireless device driver fingerprinting,” in Proceedings of the 15th Conference on
USENIX Security Symposium - Volume 15, ser. USENIX-SS’06. Berkeley, CA, USA: USENIX
Association, 2006. [Online]. Available: http://dl.acm.org/citation.cfm?id=1267336.1267348
[6] K. Gao, C. Corbett, and R. Beyah, “A passive approach to wireless device fingerprinting,” in
2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN), June 2010,
pp. 383–392.
[7] I. O. Kennedy, P. Scanlon, F. J. Mullany, M. M. Buddhikot, K. E. Nolan, and T. W. Rondeau,
“Radio transmitter fingerprinting: A steady state frequency domain approach,” in 2008 IEEE
68th Vehicular Technology Conference, Sept 2008, pp. 1–5.
[8] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device identification with radiometric
signatures,” in Proceedings of the 14th ACM International Conference on Mobile Computing
40
http://arxiv.org/abs/1602.04105http://dl.acm.org/citation.cfm?id=1267336.1267348
BIBLIOGRAPHY
and Networking, ser. MobiCom ’08. New York, NY, USA: ACM, 2008, pp. 116–127.
[Online]. Available: http://doi.acm.org/10.1145/1409944.1409959
[9] S. V. Radhakrishnan, A. S. Uluagac, and R. Beyah, “Gtid: A technique for physical device and
device type fingerprinting,” IEEE Transactions on Dependable and Secure Computing, vol. 12,
no. 5, pp. 519–532, Sept 2015.
[10] T. J. O’Shea and J. Hoydis, “An introduction to machine learning communications systems,”
CoRR, vol. abs/1702.00832, 2017. [Online]. Available: http://arxiv.org/abs/1702.00832
[11] F. Chen, Q. Yan, C. Shahriar, C. Lu, W. Lou, and T. C. Clancy, “On passive wireless device
fingerprinting using infinite hidden markov random field,” submitted for publication.
[12] N. T. Nguyen, G. Zheng, Z. Han, and R. Zheng, “Device fingerprinting to enhance wireless
security using nonparametric bayesian method,” in 2011 Proceedings IEEE INFOCOM, April
2011, pp. 1404–1412.
[13] S. U. Rehman, K. Sowerby, and C. Coghill, “Analysis of receiver front end on the performance
of rf fingerprinting,” in 2012 IEEE 23rd International Symposium on Personal, Indoor and
Mobile Radio Communications - (PIMRC), Sept 2012, pp. 2494–2499.
[14] The signal metadata format specification. [Online]. Available: https://github.com/gnuradio/
SigMF
[15] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics).
Springer, 2006.
[16] Cs231n convolutional neural networks for visual recognition. [Online]. Available:
http://cs231n.github.io/convolutional-networks/
[17] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, ser. Springer
Series in Statistics. New York, NY, USA: Springer New York Inc., 2001.
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional
neural networks,” in Proceedings of the 25th International Conference on Neural Information
Processing Systems - Volume 1, ser. NIPS’12. USA: Curran Associates Inc., 2012, pp.
1097–1105. [Online]. Available: http://dl.acm.org/citation.cfm?id=2999134.2999257
[19] Keras: The python deep learning library. [Online]. Available: https://keras.io/
41
http://doi.acm.org/10.1145/1409944.1409959http://arxiv.org/abs/1702.00832https://github.com/gnuradio/SigMFhttps://github.com/gnuradio/SigMFhttp://cs231n.github.io/convolutional-networks/http://dl.acm.org/citation.cfm?id=2999134.2999257https://keras.io/
CoverTable of ContentsList of FiguresList of TablesAcknowledgmentsAbstract of the Thesis1 Introduction2 Related work2.0.1 Supervised learning2.0.2 Unsupervised learning
3 Causes of hardware impairments3.1 Hardware impairments3.1.1 I/Q imbalance:3.1.2 Phase noise3.1.3 Carrier frequency and phase offset3.1.4 Harmonic distortions3.1.5 Power amplifier distortions
3.2 Data Collection3.2.1 Protocols of operation3.2.2 Storage and processing
4 Deep learning for RF fingerprinting4.1 Initial studies on ML techniques4.1.1 Support vector machines4.1.2 Logistic regression
4.2 Convolutional neural networks4.2.1 CNN architecture
5 Results and performance evaluation5.1 Network setup5.2 Evaluation5.2.1 CNN vs. conventional algorithms5.2.2 Receiver operating characteristics for radio fingerprinting5.2.3 Impact of distance on radio fingerprinting
6 Conclusion6.1 Research challenges
Bibliography