+ All Categories
Home > Documents > WiCatch: A Wi-Fi Based Hand Gesture Recognition...

WiCatch: A Wi-Fi Based Hand Gesture Recognition...

Date post: 11-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
13
Received January 10, 2018, accepted March 5, 2018, date of publication March 12, 2018, date of current version April 18, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2814575 WiCatch: A Wi-Fi Based Hand Gesture Recognition System ZENGSHAN TIAN, JIACHENG WANG , XIAOLONG YANG, AND MU ZHOU, (Senior Member, IEEE) Chongqing Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China Corresponding author: Jiacheng Wang ([email protected]) This work was supported in part by the National Natural Science Foundation of China under Grant 61471077, in part by the Program for the Changjiang Scholars and Innovative Research Team in University under Grant IRT1299, in part by the Special Fund of the Chongqing Key Laboratory (CSTC), in part by the Fundamental and Frontier Research Project of Chongqing under Grant cstc2017jcyjAX0380 and Grant cstc2015jcyjBX0065, and in part by the University Outstanding Achievement Transformation Project of Chongqing under Grant KJZH17117. ABSTRACT In recent years, a large number of researchers are endeavoring to develop wireless sensing and related applications as Wi-Fi devices become ubiquitous. As a significant research branch, gesture recognition has become one of the research hotspots. In this paper, we propose WiCatch, a novel device free gesture recognition system which utilizes the channel state information to recognize the motion of hands. First of all, with the aim of catching the weak signals reflected from hands, a novel data fusion-based interference elimination algorithm is proposed to diminish the interference caused by signals reflected from stationary objects and the direct signal from transmitter to receiver. Second, the system catches the signals reflected from moving hands and rebuilds the motion locus of the gesture by constructing the virtual antenna array based on signal samples in time domain. Finally, we adopt support vector machines to complete the classification. The extensive experimental results demonstrate that the WiCatch can achieves a recognition accuracy over 0.96. Furthermore, the WiCatch can be applied to two-hand gesture recognition and reach a recognition accuracy of 0.95. INDEX TERMS Wi-Fi, channel state information, interference elimination, virtual antenna array, gesture recognition. I. INTRODUCTION As one of the core technologies in human-computer interac- tion, the gesture recognition has become a research hotspot with the development of virtual reality (VR) and smart home. In particular, the device free gesture recognition technology contains a huge application market which attracts a large number of researchers to push. Device free gesture recogni- tion systems complete the recognition without any additional equipment such like cameras or sensors, which will trigger great changes in human-computer interaction. The previous works are mainly based on camera [1], [2], wearable sensors [3], [4] or some special equipment such like RF-Capture [5], which uses a chirp generator to trans- mit frequency modulation continuous wave (FMCW) signal and T-type array antennas to complete the signal reception. Comparing to these systems, the wireless sensing based gesture recognition systems have a greater potential which complete the recognition without any additional equipment. For example, some systems establish the relationship between the fluctuation of received signal strength (RSS) and the human motion to complete indoor passive localization [6], activity recognition [7]–[9] and gesture recognition [10]. At present, researchers use the fine-grained channel state information (CSI) obtained from the physical layer to replace RSS. Combining with the correlated model, the quality of wireless sensing has been improved greatly. Benefitting from Orthogonal Frequency Division Multiplexing (OFDM) modulation system, the CSI extracted from the pilot sig- nals depicts not only the amplitude attenuation, but also the propagation delay through the phase accumulation. This characteristic makes it more robust against complex indoor environment. Based on CSI, not only can we use the amplitude information on each subcarrier to construct a more precise model for achieving comprehensive percep- tion of the channel, but also combine with the phase to obtain some more in-depth information such like angle of VOLUME 6, 2018 2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 16911
Transcript
Page 1: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Received January 10, 2018, accepted March 5, 2018, date of publication March 12, 2018, date of current version April 18, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2814575

WiCatch: A Wi-Fi Based Hand GestureRecognition SystemZENGSHAN TIAN, JIACHENG WANG , XIAOLONG YANG,AND MU ZHOU, (Senior Member, IEEE)Chongqing Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Corresponding author: Jiacheng Wang ([email protected])

This work was supported in part by the National Natural Science Foundation of China under Grant 61471077, in part by the Program forthe Changjiang Scholars and Innovative Research Team in University under Grant IRT1299, in part by the Special Fund of the ChongqingKey Laboratory (CSTC), in part by the Fundamental and Frontier Research Project of Chongqing under Grant cstc2017jcyjAX0380 andGrant cstc2015jcyjBX0065, and in part by the University Outstanding Achievement Transformation Project of Chongqingunder Grant KJZH17117.

ABSTRACT In recent years, a large number of researchers are endeavoring to develop wireless sensingand related applications as Wi-Fi devices become ubiquitous. As a significant research branch, gesturerecognition has become one of the research hotspots. In this paper, we propose WiCatch, a novel devicefree gesture recognition system which utilizes the channel state information to recognize the motion ofhands. First of all, with the aim of catching the weak signals reflected from hands, a novel data fusion-basedinterference elimination algorithm is proposed to diminish the interference caused by signals reflected fromstationary objects and the direct signal from transmitter to receiver. Second, the system catches the signalsreflected from moving hands and rebuilds the motion locus of the gesture by constructing the virtual antennaarray based on signal samples in time domain. Finally, we adopt support vector machines to complete theclassification. The extensive experimental results demonstrate that the WiCatch can achieves a recognitionaccuracy over 0.96. Furthermore, the WiCatch can be applied to two-hand gesture recognition and reach arecognition accuracy of 0.95.

INDEX TERMS Wi-Fi, channel state information, interference elimination, virtual antenna array, gesturerecognition.

I. INTRODUCTIONAs one of the core technologies in human-computer interac-tion, the gesture recognition has become a research hotspotwith the development of virtual reality (VR) and smart home.In particular, the device free gesture recognition technologycontains a huge application market which attracts a largenumber of researchers to push. Device free gesture recogni-tion systems complete the recognition without any additionalequipment such like cameras or sensors, which will triggergreat changes in human-computer interaction.

The previous works are mainly based on camera [1], [2],wearable sensors [3], [4] or some special equipment suchlike RF-Capture [5], which uses a chirp generator to trans-mit frequency modulation continuous wave (FMCW) signaland T-type array antennas to complete the signal reception.Comparing to these systems, the wireless sensing basedgesture recognition systems have a greater potential whichcomplete the recognition without any additional equipment.

For example, some systems establish the relationshipbetween the fluctuation of received signal strength (RSS) andthe humanmotion to complete indoor passive localization [6],activity recognition [7]–[9] and gesture recognition [10].

At present, researchers use the fine-grained channel stateinformation (CSI) obtained from the physical layer to replaceRSS. Combining with the correlated model, the qualityof wireless sensing has been improved greatly. Benefittingfrom Orthogonal Frequency Division Multiplexing (OFDM)modulation system, the CSI extracted from the pilot sig-nals depicts not only the amplitude attenuation, but alsothe propagation delay through the phase accumulation. Thischaracteristic makes it more robust against complex indoorenvironment. Based on CSI, not only can we use theamplitude information on each subcarrier to construct amore precise model for achieving comprehensive percep-tion of the channel, but also combine with the phase toobtain some more in-depth information such like angle of

VOLUME 6, 20182169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

16911

Page 2: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

arrive(AoA) [11] or time of flight(ToF) [12], [13]. This infor-mation can be used for indoor localization [14]–[16], activitydetection [17]–[19] and gesture recognition [20], [21].

In this paper, we present WiCatch, a gesture recognitionsystem based on CSI which is collected from commercialwireless devices. The system completes the recognition with-out any additional equipment or wearable sensors that reducesthe cost of the system greatly. Specifically, we propose anovel algorithm to eliminate the interference caused by sig-nals reflected from stationary objects and the direct signaltravels from transmitter to receiver during the gesture com-pletion process. Furthermore, the system fully exploits thephase information brought by CSI. It virtualizes the mov-ing hand as an antenna array to extra the spectrum whichdescribes the gesture and finally completes the recognitionwith the SVM classifier.

Different from the previous systems which struggle toestablish the mapping relationship between the fluctuation ofCSI and the gestures, WiCatch combines the phase of signalwith virtual antenna array to rebuild the trajectory of movinghand. The main contributions of this paper are summarizedas follows.

1. In this paper, a novel algorithm is proposed to dimin-ish the interference caused by signals reflected from staticobjects and the direct signal from the transmitter to thereceiver, which enables WiCatch to detect weak signalsreflected from the hand in complex electromagnetic environ-ment.

2. Unlike the previous works, WiCatch treats the receivedconsecutive time samples as spatial samples to emulate anantenna array which is utilized to track the moving hand.By constructing the virtual array, the effective aperture of thearray is increased, which enhances the estimation resolutionsignificantly.

3. To the best of our knowledge,WiCatch is the first systemwhich can rebuild the moving trajectories of two hands andcomplete the two-hand gesture recognition. Through a largenumber of experiments, it is proved that the WiCatch canobtain a recognition accuracy of 0.96 for two-hand gestureswith less impact on communication links.

The rest of this paper is organized as follows. We sur-vey some related works about device based and device freegesture recognition systems in Section II. The detail of theproposed steps of interference elimination and motion trajec-tory reconstruction is presented in Section III. The extensiveexperiments and evaluation are shown in Section IV. Finally,in Section V we conclude the paper and provide some futureresearch directions.

II. RELATED WORKIn this section, we will introduce some existing gesture recog-nition systems and make a brief summary of their superiorityand weakness.

A. DEVICE BASED SYSTEMSAs we know, most recognition systems often employ addi-tional equipment to complete the recognition of different

gestures. Usually, these systems need to be equipped withsensors, camera, smartphone or more expensive equipmentsuch as USRP [22].

Combining with the Kinect and the proposedMLBP (mod-ified local binary pattern) algorithm [23], the system designedby Kwangtaek et al can achieve a recognition accuracyof 0.99 for 3D gestures. Another camera based system com-pletes the recognition through a trained random forests (RFs)classifier and the image caught by the single depth imagingsensor [24]. Then, based on the built mapping relationshipbetween gestures and control commands, the users can con-trol the smart home applications such like TV, air conditionerand radio. This system can achieve an average recognitionaccuracy of 0.98 between 4 single-hand gestures. However,the systems based on camera may not be accessible in everyscenario such like restroom where the privacy is the firstpriority. Furthermore, such systems have a poor performancewhen lighting is weak.

The system FDSVM [25] employs a 3-dimensionalaccelerometer to complete recognition. In the user-independent case, this system obtains a recognition accuracyof 0.98 for 4 gestures and 0.89 for 12 gestures. Anotherwearable device based system uWave [26] performs therecognition with the data from a single three-axis accelerator.uWave achieves average accuracy of 0.98 and 0.93 between8 gestures with and without template adaptation respectively.Harrison et al. [27] design Skinput by leveraging the mechan-ical vibrations which propagate through the body. Basedon a novel array of sensors, Skinput can reach an averageaccuracy of 0.88 in different conditions. This kind of systemsavoid the problems of privacy and light-dependence, but theyrequire the user to wear sensors such as accelerometers andgyroscopes to complete recognition, which increases the costof the system greatly.

The systems based on Soft Defined Radio (SDR) devicesnever require users to be equipped with any sensors whileprovide a high accuracy. For example, WISEE [28] canachieve a recognition accuracy of 0.94 between 9 gestures byleveraging the Doppler effects caused by different gestures.This kind of systems built on special hardware can reach asatisfactory recognition accuracy, but result in a bulky deviceand high equipment costs.

All of these systems contain some weakness which makeit hard to be further popularized, though most of them canachieve an impressive estimation accuracy.

B. DEVICE FREE SYSTEMSComparing to the above systems, device free systems workwithout any special equipment. Such systems are mainlybased on RSS [10] and CSI [20], [21], [29].

Comparing with RSS, CSI measured from multiple sub-carriers contains more fine-grained information that makesit more suitable to be applied for gesture recognition. Thesystem WiGeR [20] leverages the fluctuation of amplitudeof CSI caused by the gesture to complete the recognition.The WiGeR achieves a recognition accuracy of 0.92 between

16912 VOLUME 6, 2018

Page 3: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 1. Framework of the proposed system WiCatch.

7 gestures in 6 scenarios. Another CSI based systemWiG [21]can achieve a recognition accuracy of 0.92 and 0.88 in line-of-sight (LOS) and non-line of sight (NLOS) respectively.By leveraging the fact that peaks and troughs in the amplitudeof the received signal caused by the gesture are unique,Nandakumar designs system Allsee [29]. This system canreach a recognition accuracy of 0.91 with the amplitude ofCSI when distinguishing 4 gestures from 6 individuals

In general, the systems with additional equipment suf-fer from the problems, like cost and privacy. The existingdevice free systems overcome these above problems, butthey don’t consider the two-hand gesture recognition. How-ever, the WiCatch completes recognition by utilizing the CSIobtained from off-the-shelf Wi-Fi device [31], which avoidsthe problem of cost and privacy. Furthermore, our systemutilizes the received consecutive time samples to build thevirtual antenna array, that enables it to performmoving trajec-tories reconstruction of two hands and complete the gesturerecognition.

III. SYSTEM DESIGNA. OVERVIEW OF WICATCHWiCatch is a device-free system that can recognize thehand gestures using ubiquitous off-the-shelf Wi-Fi devices.The system consists of a transmitter with one antenna anda receiver equipped with two antennas. By processing thesignal caught by the receiver, the system can rebuild themoving trajectories of multiple hands by estimating theirrelative movement and finally accomplish the hand gesturerecognition.

Themain challenges forWiCatch are the interference elim-ination and the moving hands tracking in a complex indoorenvironment. In order to address these challenges, we fuse thereceived data from two antennas to remove the interference.After that, signals reflected from moving hands are capturedto estimate moving trajectories. Finally, a trained SVM clas-sifier is applied to complete recognition. The structure of thesystem is shown in Fig.1.

B. THE CHANNEL STATE INFORMATIONThe impact of the propagation medium on the signal includestwo aspects: the power attenuation and propagation delay.

If there are I propagation paths, each of them experiencesthe attenuation αi and travels τi to reach the receiver. Then,the channel impulse response (CIR) f (t) can be denoted as

f (t) =∑I

i=0αi·δ(t − τi) (1)

Through Fast Fourier Transform (FFT), the CIR can beconverted into channel frequency response (CFR) and CSI isthe sample version of CFR [30].

h(f ) =∑I

i=0αi · e−j2π f τi (2)

Therefore, the signal power attenuation and phase shiftcaused by channel are recorded in CSI which is employedto eliminate the interference.

C. INTERFERENCE ELIMINATIONIn a complex indoor environment, the signals that cause theinterference are divided into two types. The first one is the sig-nal that travels from transmitter to the receiver directly whosepower is strong, because the transmitter and the receiver areplaced very close. The other one is the signals reflected fromdifferent static objects such as walls, doors and even furniture.There is a large number of such reflected signals mixed withcoherent signals. Unfortunately, the signals reflected fromhands are much weaker due to the smaller reflection areaand reflection coefficient. Therefore, the interference causedby these two types of signals needs to be diminished beforetrajectory reconstruction. In this paper, we propose a novelalgorithm that fuses the data from two receiving antennas toachieve elimination.

The Multiple-Input Multiple-Output (MIMO) technologyhas been developed rapidly, which not only boosts the qual-ity of communication greatly but also improves the channelcapacity. Past works employ pre-coding to eliminate the inter-ference between different transmit and receive pairs whenworking concurrently. In this paper, we convert this ideainto our system to complete the interference elimination.First, the transmitter sends a known preamble x and thefirst receiving antenna receives c1x, in which c1 denotes thechannel between transmitter and the first receiving antenna.Second, receiver uses c1x to obtain c1 which is an estimationof c1 and reported by the Wi-Fi card as a CSI value [31].

VOLUME 6, 2018 16913

Page 4: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 2. Signal processing procedure of OFDM receiver.

Then, we complete the same operation to the second receivingantenna to get c2. Next, these two values are used to computethe ratio R = c1/c2. Finally, two receiving antennas workconcurrently and we fuse the data from these two receivingantennas through

cres = c1 −(c1c2

)× c2 ≈ 0 (3)

In this way, the interference elimination is achieved. In anideal case, when the environment remains stable, cres shouldequal to zero. If a new target appears or an object begins tomove in the detection area, the impact on channel will berecorded in cres. However, the actual indoor environment isquite complex, so the obtained cres would contain residualerrors which will affect the subsequent detection seriously.Therefore, the data frommultiple packets is used to correct c1and c2 iteratively to obtain a more accurate estimation of cres.Specifically, by assuming that c2 = c2, a better estimation ofc1 can be obtained

c′1 = c1 + cres (4)

Similarly, assume c1 = c1 and a more trustworthy estimationof c2 can be obtained

c′2 =(1−

cresc1

)× c2 (5)

Based on this, the multiple sets of data which characterizethe channel between transmitter and receiver can be used toget a better estimation of c1 and c2. After iterative correction,a better ratio R′ = c′1/c

2 can be attained. Thus, the c′res thatdetails the changes in channel can be obtained

c′res = c1 −(c′1c′2

)× c2 (6)

D. ERROR ANALYSIS AND ELIMINATIONThe commercial Wi-Fi equipment is designed to meet therequirement of communication instead of gesture recogni-tion. Therefore, the data collected from commercial Wi-Fidevice contains some error which will affect the interferenceelimination and gesture trajectory reconstruction seriously.In order to obtain a desired estimation accuracy, the errorsneed to be analyzed and eliminated.

1) THE SOURCE OF ERRORS IN CSI VALUEIn a wireless communication system based on the 802.11nprotocol, the channel estimation is accomplished in theOFDM receiver as Fig.2 illustrates. As shown in the picture,the signal received from the RF is down converted into the

baseband signal s (t) at first. Second, the cyclic prefix (CP)is removed after AD sampling. Then, the receiver computesthe channel estimation after serial to parallel conversionand FFT. During this process stage, the non-synchronizationof the transceiver pair and the imperfection of hardware resultin carrier frequency offset, sampling frequency offset, etc.Within a certain range, these errors never reduce the qualityof communication [32], but they are fatal for the phase of theCSI which is utilized to complete the detection.

a: SAMPLE FREQUENCY OFFSETSFO is caused by the non-synchronization of transceiver pair.This would cause a time shift in received signal relative tothe transmitted signal. This time shift will lead to an error inphase of CSI. For different subcarriers, this error is relatedto its ID which is the sequence number of subcarrier [33].As the local oscillator remains stable in a short period of time,the phase error in CSI caused by SFO is treated as a constantin our system.

b: SYMBOL TIME OFFSETThe receiver detects the packet through correlation operationand signal power judgment [34]. Due to the imperfection ofhardware, this process introduces a random time shift calledSTO, which results in another offset to the phase of CSI.Similar to the error caused by SFO, the phase offset causedby STO is also related to the ID of subcarrier [33].

c: CENTRAL FREQUENCY OFFSETCFO is caused by the fact that the central frequency oftransceiver is not synchronized [35]. At the receiver, the sys-tem completes the CFO estimation and compensation byusing the cyclic prefix(CP) and pilot signals. But, due to theinstability of hardware, the frequency offset cannot be fixedfully and the residual shift will cause a non-negligible errorin the phase of CSI.

2) ERROR ELIMINATIONBesides the errors discussed above, there are some othererrors emerged during the signal processing stage, whichmay only affect the amplitude of the CSI or the phase ofthe CSI so slightly that they can be ignored comparing withSTO and CFO. Thus, the construction of error model mainlyconsiders SFO, CFO and STO.

Based on the discussion above, the phase of received signalcan be expressed as

θk = θk +2πNkϕ + ε (7)

16914 VOLUME 6, 2018

Page 5: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

where θk and θk denote the measured phase and the realphase of kth subcarrier respectively.Meanwhile,N representsthe number of subcarriers,1 ϕ is the phase shift caused bySTO and SFO, and ε refers to the error due to CFO.As can be seen from (7), the phase error 2π

N kϕ + ε is alinear function of the ID of subcarrier. Thereby, the errorcan be eliminated through estimating the slope and interceptof the phase from different subcarriers if the IDs of themeasured subcarriers are fixed. Suppose the phase value ofthe n(n ≤ N ) subcarriers are measured2 and the IDs of sub-carriers are an increasing sequence

{kj}nj=1, then the estimated

slope of measured phase can be expressed as

a =θkn − θk1

kn − k1=θkn − θk1

kn − k1+

2πNϕ (8)

and the estimated intercept is

b =1n

∑n

j=1θkj

=1n

∑n

j=1θkj +

2πnN

ϕ∑n

j=1kj + ε (9)

As shown in the result, the estimated a and b are not the realslope and intercept of the error. However, the estimated valuescan be used for error elimination because they contain thereal phase error ϕ and ε. Thereby, after removing the linearerror from measurements, the processed phase value can beobtained

θkj = θkj −(akj + b

)= θkj −

kjkn−k1

(θkn−θk1

)−1n

∑n

i=1θki−

2πϕnN

∑n

i=1ki

(10)

As shown in the above formula, the error ε has been elim-inated, but ϕ still exists. According to the 802.11n protocoland the toolkit [31], the IDs of measured subcarriers arepositive and negative symmetry, that is

∑ni=1 ki = 0. Thus,

the formula (10) can be transformed into

θkj = θkj −kj

kn − k1

(θkn − θk1

)−

1n

∑n

i=1θki (11)

It can be seen that the final phases of CSI no longer containthe errors except the linear combination of the real phases.Therefore, the processed CSI can be used to distinguish dif-ferent channels and monitor the channel changes.

Before the elimination, the system needs to preprocess thedata because the collected phase of the CSI is folded dueto the cyclic characteristic of the signal, Fig. 3 shows thephases from multiple packets. Specifically, when the phaseof subcarrier decreases below −π , the value will jump to πand begin to decrease. So, the phase of the jumped subcar-rier is significantly different from the phase of the previoussubcarrier and the shift value is far greater than 2πϕ

/N .

1There are in total 114 subcarriers with the bandwidth of 40 MHz in IEEE802.11n/g protocol.

2The CSI toolkit [31] measures the value of 30 subcarriers out of 114 andthe IDs of subcarriers are −58, −54 ... 54, 58.

FIGURE 3. The measured phases from multiple data packets beforeunwrap .

FIGURE 4. The measured phases from multiple data packets after errorelimination .

Based on this observation, the extension of the phases fromdifferent subcarriers can be achieved. At last, by combiningthe extended phases, error elimination is accomplished andthe results are shown in Fig. 4.

E. MOTION TRAJECTORY RECONSTRUCTIONIn this part, we will introduce how WiCatch utilizes theconsecutive time samples to build the virtual antenna arrayand rebuild the motion trajectory.

In previous systems, in order to obtain a spectrum withhigher resolution, the receiver usually employs a largeantenna array to capture the signal. However, if our systemuses an antenna array to capture the signal, this would requiremore receive antennas to complete the error elimination.Thereby, in order to avoid using a bulky antenna array whileachieving a high accuracy, we have adopted a new idea whichis contrast to traditional way. The key idea of the proposedalgorithm is to treat the moving hand as an antenna array,as shown in Fig. 5. According to the channel reciprocity,the consecutive time samples caught by receiver are corre-sponding to the successive spatial locations of the movinghand. Therefore, the signals collected by the system in time

VOLUME 6, 2018 16915

Page 6: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 5. Virtual antenna array. In the traditional system shown in thefirst picture, in order to detect the direction of the incoming signalaccurately, the receiver requires to be equipped with an antenna array tocomplete the spatial sampling of the signal. In our system, WiCatchsamples the signals reflected from moving hands in time, and employsthese series to replace the spatial samples to build the virtual antennaarray.

are the same as the signal caught by antenna array in space.By leveraging this observation, WiCatch employs these sam-ples to build a virtual antenna array and accomplishes theestimation. In order to obtain the movement information,the received signal power P (n, θ) in the θ direction at time nneeds to be computed.

Therefore, the smooth MUSIC algorithm [36], [37] is usedto complete the received signal processing. For MUSIC,assume the received signal matrix is X and its conjugatetranspose is XH , the key idea is that if the eigenvectorscorresponding to the zero eigenvalue of XXH exist, theyare orthogonal to the constructed steering matrix [36]. Theprevious work has proved that it is true if the matrix is fullrank and the number of rows in the steering matrix is largerthan the number of columns. In other words, the number ofsensors needs to be greater than the number of signal propa-gation paths (For example, if there are five propagation pathswhile the array equipped with four antennas, then theMUSICcannot find the right P (n, θ)). Therefore, in our system thenumber of sensors in virtual array is required to be largerthan the number of propagation paths. Suppose k consecutivechannel measurements3 are X =

[cf1 , cf2 · · · cfk

], MUSIC

computes the correlation matrix of the received signals first

R[n] = E[XXH

](12)

Next, the matrix is decomposed to obtain the eigenvaluesand eigenvectors. Then, MUSIC divides the acquired eigen-vector into two parts. The set ES [n] contains larger eigen-vectors represents the signal subspace, which corresponds tothe moving hands, as well as DC component. The set EN [n]contains smaller eigenvectors stands for the noise subspacecorresponding to the noise component in the received signal.For instance, when two hands appear in the detection area,

3To ensure that the algorithm can run efficiently, the number of sensorsin the virtual array has to be greater than the number of propagation paths.There are 6-8 propagation paths in an indoor environment [38]. Consideringthat we use successive time sampling values instead of spatial sample values,only in a short period of time, the movement of the hands can be assumed tobe uniform. Thus, in our system we set the k to 15.

FIGURE 6. Two-dimensional smoothing.

it would produce twomain eigenvalues except for DC compo-nent. Based on this, the algorithm multiplies the constructedsteering matrix with the noise subspace. When the steeringmatrix is orthogonal to the noise subspace, the product isclose to zero (due to the imperfection of measurements inpractice, they are not perfectly orthogonal). Finally, a clearspectrum with sharp peaks is obtained by taking the inverse.The angle that corresponds to maximum peak denotes thespatial angle of the signal. In more detail, MUSIC uses thefollowing formula to detect the signal strength from eachdirection in the space

P(θ, n) =1

aH (θ, n)ENENHa(θ, n)(13)

where a(θ, n) denotes the steering matrix and EN standsfor the eigenvectors corresponding to noise subspace. Forcombating the coherent signals, system performs two-dimensional smoothing on XXH before decomposition asshown in Fig. 6.

After smoothing, the matrix is divided into multiple over-lapping sub-matrixes with size n < k and the first matrix isshown below

T1=

cf1cf1 · · · cf1cfn.... . .

...

cfncf1 · · · cfncfn

(14)

Then, these matrixes are summed up, so the XXH can betransformed into

Rsmoothed=k−n+1∑i=1

Ti (15)

Finally, the Rsmoothed is decomposed to obtain the signalsubspace and noise subspace.

In fact, the direction of motion can be estimated accuratelyonly when the antenna spacing of virtual array is acuired.Based on the construction principle of the virtual array dis-cussed above, the antenna spacing d = T × v, where T 4 isthe sample period and v denotes the velocity of the hand. Thesample period is a fixed value, but the speed of movementof the hand is usually unstable, which results in inaccurateestimation of d . For instance, when the velocity of gesture is

4In our system,wemeasure the channel 400 times per secondwhichmeansT = 2.5× 10−3s.

16916 VOLUME 6, 2018

Page 7: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 7. The test gesture.

set to v, the true velocity of gesture may be v′ = v + 1 dueto the unstable movement of hands in a short period of time.Thus, relative to the true velocity of gesture v′, the steeringmatrix built in formula (13) can be expressed as

a(θ, n) =[1 e−j2π

vTλsin θ e−j2π

2(v′−1)Tλ

sin θ

· · · e−j2π(n−j)(v′−1)T

λsin θ e−j2π

(n−j+1)vTλ

sin θ

e−j2π(n−2)vT

λsin θ e−j2π

(n−1)vTλ

sin θ]T

(16)

where the λ is the signal wavelength. As can be seen fromformula (16), when 1 > 0 and the angle corresponding tothe moving direction is less than zero, then the estimatedθ will be larger than the true value. Similarly, if the anglecorresponding to the moving direction is greater than zero,then the estimated θ will be smaller than the true value. At thesame time, it will show the opposite result when 1 < 0.Fig. 8 shows a typical experimental result. In this experi-

ment, the tester holds his hand towards to the devices, slidingfrom the left of the device to the right as shown in Fig. 7.Obviously, there are two lines can be seen from the result,one of which keeps stable along the time axis (correspond-ing to 0 degree) representing the DC component, anotherrecords the movement of the hand. As mentioned above,the interference elimination enables WiCatch to detect theminute changes caused by moving objects. However, dueto the imperfect channel estimation during the interferenceelimination stage, a part of reflected signals cannot be elimi-nated completely, then they are recorded as the DC line.Themovement of the hand is recorded in detail by another linewhose angle changes along the time. It is obvious that whenthe hand moves closer to the devices, the value of angle isnegative and its absolute value gradually decreases. Whenthe hand arrives at the front of the devices, the value ofangle equals to 0. During the phase of moving away fromthe devices, the value of angle appears to be positive andgradually increases. However, the imperfect estimated veloc-ity of hand causes the underestimation or overestimation ofangle, as marked with the red circles in Fig. 8. In other words,the true velocity of the hand cannot be obtained, so the exactposition of the hand cannot be estimated. But it does notprevent the system from tracking the relative movement ofthe hand and completing gesture recognition.

FIGURE 8. The corresponding result .

FIGURE 9. The result of estimation with GDC algorithm.

It is worth noting that, in the smooth MUSIC algorithm,the estimated number of signal sources (corresponding to thenumber of eigenvalues with larger value) has a significantimpact on the accuracy of the direction estimation. Whenthere is an undervaluation (i.e., the estimated number is lessthan the actual number of signal sources), a part of the signalswould spread into the subnoise space results in a lost ofsome real directions corresponding to the signals. At the sametime, the diffusion makes the signal subspace and the noisesubspace become non-orthogonal, so the estimated anglecontains a large deviation. When there is an overestimation(i.e., the estimated number is greater than the actual numberof signal sources), the noise would spread into the signalsubspace and produce a pseudo-peak. This diffusion doesnot affect the orthogonality of the signal subspace and thenoise subspace. Thereby, it can be ignored when SNR isstrong enough [39]. To address this problem, our systememploys Gerschgorin Disk Criterion (GDC) [39] to estimatethe number of signal source and the result in Fig. 9 illustratesits effectiveness.

When GDC algorithm is utilized to estimate the numberof signal sources, the DC component and the curve linescorresponding to the gesture are both clear through whichthe recognition can be accomplished easily. However, the out-come of estimation without GDC algorithm (set the number

VOLUME 6, 2018 16917

Page 8: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 10. The result of estimation without GDC algorithm.

of signal source to a fixed value) is unsatisfactory as shownin Fig. 10. As can be seen, the DC component and curve linesappear with large noise which would reduce the recognitionrate.

F. SVM BASED RECOGNITIONBased on the obtained spectrum, a trained Support VectorMachine (SVM) classifier is used to complete the recog-nition. The key idea of SVM is to find an optimalseparating hyperplane which can minimize the error inthe cost function to widen the margin of the separat-ing hyperplane [40]. When a set of training data D ={(xi, yi) |xi ∈ Rd , yi ∈ {−1,+1}

}, i = 1, . . . , l are given,

where xi is the ith attribute and yi denotes the corre-sponding label, SVM has to find a separating hyperplanewhich maximizes the smallest distance to the trainingdata

w∗x + b∗ = 0 (17)

In the proposed system, the attributes are extracted from theobtained spectrum. At every moment, a spectrum is stored asa vector. According to the data in the vector, the number ofpeak points and their locations are found. Then, we hold thevalues in these locations and set the rest values to be zero.Next, the processed vectors are arranged in time sequenceto form a matrix. Finally, this matrix is resampled and fedto SVM. Each value in the resampled matrix is treated asan attribute. Without loss of generality, a decision functionf (x) = sign(w∗ × x + b∗) needs to be constructed, suchthat f (xi) > 0 for all i that yi = 1, and f (xi) < 0 forall i that yi = −1. The processed data is always linearlyinseparable, so we combine SVM with Radial basis func-tion (RBF) kennel to complete the training [41]. Meanwhile,several SVM classifiers are put together to decompose themulti-class problem [42] in our system since SVM is a binaryclassifier in its simplest form.

IV. IMPLEMENTATION AND EVALUATIONIn this section, we will describe the relevant experimentalsetup and analyze the result of our experiments.

FIGURE 11. The designed 9 test gestures.

A. IMPLEMENTATIONThe proposed system WiCatch contains a transmitter anda receiver both equipped with the Intel 5300 wireless NICand CSI toolkit [31], and the parameters setting is shownin Table I. As the Fig. 11 shows, the transmitter has an antennaand the receiver equipped with two antennas which are spac-ing with one wavelength. Meanwhile, in order to reduce thenoise interference, we accomplish the experiments with thehelp of directional antenna at the transmitter. All the experi-ments are conducted in a typical indoor environment with thesize of 33 square meters, surrounded with sofa, bookcase andother furniture. During the experiments, nine gestures shownin Fig. 11 are designed to verify the effectiveness of proposedsystem.

B. SPECTRUM ANALYSIS FOR HAND GESTURESIn this section, we analyze the results corresponding to ninetest gestures and describe howWiCatch tracks the target. Theexperiments are conducted in a typical indoor environmentand the tester completes the gestures in front of the devicesas Fig. 11 shows. We set the central frequency to 5.745GHzand the channel measuring frequency is 400 Hz. During theexperiment, the interference elimination algorithm and thesmooth MUSIC are used to process the received data.

Fig. 12 shows the results of nine basic test gestures. Thespectrum (a), (b) and (c) in Fig. 12 show the results oftwo-hand gestures ‘Boxing’, ‘Open the fridge’ and ‘Openthe window’. As can be seen from the results, besides theDC component, two curves record the motion of both handsrelative to devices respectively. The result of gesture ‘Boxing’is shown as (a) in Fig. 12. The curve with positive anglescorresponding to the hand that moves away from the devices.As the hand moves further, the value of angle graduallyincreases. Another curve with negative angles represents the

16918 VOLUME 6, 2018

Page 9: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

TABLE 1. Parameter setting.

FIGURE 12. The spectrum corresponding to 9 test gestures.

hand that moves closer to the devices. As the hand movescloser, the absolute value of the angle gradually decreases.The Fig. 12(b) shows that when the tester completes thegesture ‘Open the fridge’, the corresponding angles of curvesare positive and gradually increase when hands move awayfrom the devices. During the first 0.2 seconds, the two curveswrap together because two hands are too close with almostthe same speed. With the movement of two hands, the dif-ferences in distance and speed become larger, thus curvesgradually separate and record the movement of each handrespectively. The result of gesture ‘Open the window’ isshown in Fig. 12(c). It can be seen clearly, two curves withnegative angles represent two hands that move closer to thedevices. As the hands move closer, the absolute value ofangles gradually decrease.

The rest pictures show the results of single-hand gestures.As these figures show, in addition to the DC component, thereis only one curve to record the motion trajectory of the hand.When the hand moves close to the devices, the correspondingangle is negative. As the hand moves away, the value of anglebecomes positive.

Fig. 13 is a spectrum of normalized data at time 0.35s ofresult (b) in Fig. 12. As can be seen that 0 degree representsthe DC component and the other two peaks represent the

FIGURE 13. The spectrum at 0.35s of (b) in Fig.12.

corresponding angles of the two moving hands. The ampli-tude of the DC component is low (i.e., low energy), indicatingthat the residual error in the channel estimation is small. Therest two sharp peaks with different angle values represent thatboth hands are moving away from the devices at differentlocations.

Based on the results shown above, one can draw someconclusions. First, when the hand is getting closer to thedevices, its value of corresponding angle is negative, while apositive angle denotes that the hand is moving away. Second,the exact value of the angle is related to the orientation of thehand, the direction and the speed of movement. At the sametime, the angle value will experience fluctuation when thehand moving with an unstable speed. Third, as the number ofhands increases, it become tougher to distinguish two handsbased on the curve lines. Because the residual noise wouldbecome stronger and the signals reflected from other partssuch like arms are captured, it is more difficult to distinguishthe gestures.

C. RECOGNITION ACCURACY ANALYSISIn order to analyze the recognition accuracy of the system,a total of ten testers are asked to participate in the exper-iment. Everyone performs each gesture 200 times, and therecognition accuracy is counted. For comparison, we employschemes in WiGeR [20] and WiG [21] to classify these ninetest gestures and the results are shown in Fig. 14. Accord-ing to the result, our system achieves an average recogni-tion accuracy of 0.97, 0.96, 0.97, 0.97, 097 and 0.98, forgestures ‘Pull’, ‘Push’, ‘Slide’, ‘Leftward’, ‘Rightward’ and

VOLUME 6, 2018 16919

Page 10: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 14. The recognition rate for 9 gestures of 3 systems.

‘Wave hand’, respectively. The system WiGeR can achievean average recognition accuracy of 0.96 which is slightlylower than WiCatch. The system WiG can reach an averageaccuracy of 0.92 which is weaker than WiGeR. Overall,the proposed system has a higher accuracy for single-handgestures recognition. For two-hand gestures ‘Boxing’, ‘Openthe fridge’ and ‘Open the window’, our system can reach anaverage recognition accuracy of 0.95, 0.95 and 0.96.

The experimental results illustrate that our system cancomplete single-hand gestures recognition with a high accu-racy. Moreover, it also has the ability to recognize two-handgestures. The confusion matrices shown in Fig. 15 are built toevaluate our system.We can see from the matrices, for single-hand gestures, the average accuracy of our system is about0.97while the other two can reach 0.96 and 0.92, respectively.Moreover, for two-hand gestures, the average recognitionaccuracy of our system can reach 0.96, while WiGeR andWiG can reach 0.87 and 0.85 .

D. PARAMETER STUDY1) IMPACT OF CHANNEL ESTIMATION FREQUENCYIn order to detect the moving hands,WiCatch needs to occupythe channels to transmit packets and obtain the channel mea-surement, which results in the interference to communicationlinks. The packet transmission rate represents the channelestimation frequency, thereby, it is necessary to evaluate theperformance of the system at different transmission rates. Thedefault value of channel estimation frequency is 2048Hz andthe collected series are resampled to 1024, 512, 256, 128Hz.Take the gesture ‘Pull’ as an example, the outcome is shownin Fig. 16(a). As can be seen from the result, overall, the pro-posed system has a higher recognition accuracy at differentchannel estimation frequency. Specially, when the channelestimation frequency decreases to 128Hz, WiCatch can stillmaintain a high recognition accuracy of 0.96 comparing with0.94 and 0.93 of WiGeR and WiG. Therefore, our systemworks more stable with a lower channel estimation rate whichis significant for reducing the impact on the communicationlinks.

2) IMPACT OF ANTENNA SPACINGThe WiCatch is equipped with two receiving antennas toperform interference elimination. During this stage, thesereceived packets from two antennas are used to obtain the

FIGURE 15. The confusion matrices for 3 systems.

channel ratio iteratively. For the detection, we fuse the datafrom antennas to complete the moving trajectory recon-struction. Thus, the antenna spacing will affect these twophases and finally influence the recognition accuracy. Actu-ally, when system is running, the antenna spacing needs tobe relatively small comparing to the range of action. Forinstance, when completing a larger behavior detection (forexample, the passive individual tracking), the antenna spac-ing can be increased appropriately, but it still needs to bewithin a certain range. Therefore, the performance of the

16920 VOLUME 6, 2018

Page 11: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

FIGURE 16. The impact of different parameters on the accuracy.

system is evaluated with different antenna spacing. In thistest, the antenna spacing is increased from half-wavelength to6 times of wavelength and the recognition accuracy is shownin Fig. 16(b). From the result, it can be seen when spacingis 1.5λ, the accuracy reaches the highest point. As spacingincreases, the overall estimation accuracy decreases slightlyand hits the lowest valley when spacing reaches 5.5λ. How-ever, WiCatch can reach an average accuracy of 0.95 whichis quite acceptable.

3) THE IMPACT OF DETECTION RANGEIn order to evaluate the monitoring range of the system,the tester is required to complete nine gestures at distances0.5m, 1m, 1.5m, 2m and 2.5m. As a comparison, WiG andWiGR are employed to process the same data and the resultsare shown in Fig. 16(c). As can be seen in Fig. 16(c), withthe increase of the distance, the recognition accuracy ofWiCatch declines slightly, but it still can reach an impressiveaverage accuracy of 0.96. While both WiG and WiGR canonly achieve 0.95. Overall, WiCatch operates more stable anddue to the stronger ability of two-hand gestures detection,its average recognition accuracy is higher than other twosystems.

4) IMPACT OF CENTRAL FREQUENCY AND BANDWIDTHThe signal at different frequencies has different ability ofpenetrating and reflecting. Furthermore, the bandwidth of thesignal also affects the sensing of the wireless channel. There-fore, in order to analyze the impact of frequency and band-width on the system, we conduct experiments at the 2.4GHzand 5.7GHz combining with the bandwidth of 20MHz and40MHz. The test results of 2.4GHz are shown as (a) and (b)in Fig. 17. Obviously, there is too much noise in the spectrumwhich makes the curve that relates to the movement andDC component hard to recognize. The fundamental reason isthat the wireless channels at 2.4GHz are occupied by a largenumber of commercial wireless devices, resulting in the lossof packets during the channel measuring. Therefore, the timeinterval between every two measured values in the slidingwindow is elusive which causes random changes in antennaspacing (possibly greater than half of wavelength), and finally

FIGURE 17. The impact of central frequency and bandwidth.

estimation is failed. For comparison, the similar experi-ments are carried out at 5.7GHz and the results are shownas (c) and (d) in Fig. 17. With 20MHz bandwidth, the spec-trum is still nebulous due to the residual phase error causedby the asymmetry of subcarrier index. The clear curves andDC component appear in Fig. 17(d) which is completed with40MHz bandwidth.

From the results, we observe that signals at 2.4G and5.7G with 20MHz bandwidth are not suitable for recognitionbecause of the residual phase error. Also, 2.4G is seriouslyaffected by Wi-Fi signal which makes the spectrum toughto recognize. However, the system works stable when wechoose to complete the estimation at 5.7G with a bandwidthof 40MHz.

V. CONCLUSION AND FURTHER WORKIn this paper, we propose the WiCatch, a device free ges-ture recognition system based on channel state information.First, the system fuses the data from different antennas toeliminate the interference. Then, our system captures thesignals reflected from the hands and virtualizes the mov-ing hands as the antenna arrays to complete the relative

VOLUME 6, 2018 16921

Page 12: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

motion trajectory estimation. Finally, the gesture recogni-tion is completed through classifying the motion trajectorybased on SVM. Experimental results show that WiCatch canachieve an overall recognition accuracy of 0.95. Specially,for two-hand gestures, the accuracy can reach 0.94, whichmakes our system more universal and robust than the existingsystems. In the future work, we will take full advantages ofthe two-hand gestures recognition and expand our system toclassify more sophisticated gestures. Furthermore, to explorethe potential of the system in smart home, we will extendthe system to other fields, for example, indoor high-precisionindividual passive tracking and human behaviors detectionsuch as walking, running and jumping.

ACKNOWLEDGEMENTSThe authors wish to thank the reviewers for the careful reviewand valuable suggestions.

REFERENCES[1] Microsoft Kinect. Accessed: Nov. 2012. [Online]. Available:

http://www.microsoft.com/en-us/kinectforwindows[2] S. Gaglio, G. L. Re, and M. Morana, ‘‘Human activity recognition process

using 3-D posture data,’’ IEEE Trans. Human-Mach. Syst., vol. 45, no. 5,pp. 586–597, Oct. 2015.

[3] M. Field, D. Stirling, Z. Pan, M. Ros, and F. Naghdy, ‘‘Recognizing humanmotions through mixture modeling of inertial data,’’ Pattern Recognit.,vol. 48, no. 8, pp. 2394–2406, 2015.

[4] L. Jing, Y. Zhou, Z. Cheng, and T. Huang, ‘‘Magic ring: A finger-worndevice for multiple appliances control using static finger gestures,’’ Sen-sors, vol. 12, no. 5, pp. 5775–5790, 2012.

[5] F. Adib, C.-Y. Hsu, H. Mao, D. Katabi, and F. Durand, ‘‘Capturing thehuman figure through a wall,’’ ACM Trans. Graph., vol. 34, no. 6, 2015,Art. no. 219.

[6] M. Youssef, M. Mah, and A. Agrawala, ‘‘Challenges: Device-free passivelocalization for wireless environments,’’ in Proc. 13th Annu. ACM Int.Conf. Mobile Comput. Netw., 2007, pp. 222–229.

[7] Y. Gu, L. Quan, and F. Ren, ‘‘WiFi-assisted human activity recognition,’’in Proc. IEEE Asia–Pacific Conf. Wireless Mobile, Aug. 2014, pp. 60–65.

[8] S. Sigg, S. Shi, F. Buesching, Y. Ji, and L. Wolf, ‘‘Leveraging RF-channelfluctuation for activity recognition: Active and passive systems, continuousand RSSI-based signal features,’’ in Proc. 11th Int. Conf. Adv. MobileComput. Multimedia, 2013, p. 43.

[9] W. Wang, A. X. Liu, M. Shahzad, K. Ling, and S. Lu, ‘‘Understanding andmodeling of WiFi signal based human activity recognition,’’ in Proc. 21stAnnu. Int. Conf. Mobile Comput. Netw., 2015, pp. 65–76.

[10] H. Abdelnasser, M. Youssef, and K. A. Harras, ‘‘WiGest: A ubiquitousWiFi-based gesture recognition system,’’ in Proc. IEEE Conf. Comput.Commun., Apr./May 2015, pp. 1472–1480.

[11] Z. Tian, Z. Li, M. Zhou, Y. Jin, and Z. Wu, ‘‘PILA: Sub-meter localizationusing CSI from commodity Wi-Fi devices,’’ Sensors, vol. 16, no. 10,p. 1664, 2016.

[12] Y. Xie, Z. Li, and M. Li, ‘‘Precise power delay profiling with commodityWiFi,’’ in Proc. Int. Conf. Mobile Comput. Network., 2015, pp. 53–64.

[13] D. Vasisht, S. Kumar, and D. Katabi, ‘‘Decimeter-level localization witha single WiFi access point,’’ in Proc. USENIX Conf. Netw. Syst. DesignImplement., 2016, pp. 165–178.

[14] Y. Chapre, A. Ignjatovic, A. Seneviratne, and S. Jha, ‘‘CSI-MIMO: IndoorWi-Fi fingerprinting system,’’ in Proc. IEEE 39th Conf. Local Comput.Netw., Sep. 2014, pp. 202–209.

[15] Z. Yang, Z. Zhou, and Y. Liu, ‘‘From RSSI to CSI: Indoor localization viachannel response,’’ ACM Comput. Surv., vol. 46, no. 2, 2013, Art. no. 25.

[16] X. Wang, L. Gao, and S. Mao, ‘‘CSI phase fingerprinting for indoorlocalization with a deep learning approach,’’ IEEE Internet Things J.,vol. 3, no. 6, pp. 1113–1123, Dec. 2016.

[17] Y. Wang, J. Liu, Y. Chen, M. Gruteser, J. Yang, and H. Liu, ‘‘E-eyes:Device-free location-oriented activity identification using fine-grainedWiFi signatures,’’ in Proc. 20th Annu. Int. Conf. Mobile Comput. Netw.,2014, pp. 617–628.

[18] Y. Wang, X. Jiang, R. Cao, and X. Wang, ‘‘Robust indoor humanactivity recognition using wireless signals,’’ Sensors, vol. 15, no. 7,pp. 17195–17208, 2015.

[19] M. A. A. Al-Qaness, F. Li, X. Ma, Y. Zhang, and G. Liu, ‘‘Device-freeindoor activity recognition system,’’ Appl. Sci., vol. 6, no. 11, p. 329,2016.

[20] M. A. A. Al-Qaness and F. Li, ‘‘WiGeR: WiFi-based gesture recognitionsystem,’’ Int. J. Geo-Inf., vol. 5, no. 6, p. 92, 2016.

[21] W.He, K.Wu, Y. Zou, and Z.Ming, ‘‘WiG:WiFi-based gesture recognitionsystem,’’ in Proc. 24th Int. Conf. Comput. Commun. Netw., Aug. 2015,pp. 1–7.

[22] M. Ettus, ‘‘Usrp users and developers guide,’’ Ettus Res. LLC, Santa Clara,CA, USA, Tech. Rep., 2005.

[23] K. Kim, J. Kim, J. Choi, J. Kim, and S. Lee, ‘‘Depth camera-based3D hand gesture controls with immersive tactile feedback for naturalmid-air gesture interactions,’’ Sensors, vol. 15, no. 1, pp. 1022–1046,2015.

[24] D.-L. Dinh and T.-S. Kim, ‘‘Smart home appliance control via hand gesturerecognition using a depth camera,’’ Smart Innov., Syst. Technol., vol. 67,pp. 159–172, May 2017.

[25] J. Wu, G. Pan, D. Zhang, G. Qi, and S. Li, ‘‘Gesture recognition with a3-D accelerometer,’’ in Proc. Int. Conf. Ubiquitous Intell. Comput., 2009,pp. 25–38.

[26] J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan, ‘‘uWave:Accelerometer-based personalized gesture recognition and its appli-cations,’’ Pervasive Mobile Comput., vol. 5, no. 6, pp. 657–675,2009.

[27] C. Harrison, D. Tan, and D. Morris, ‘‘Skinput: Appropriating the body asan input surface,’’ in Proc. SIGCHI Conf. Human Factors Comput. Syst.,2010, pp. 453–462.

[28] Q. Pu, S. Gupta, S. Gollakota, and S. Patel, ‘‘Whole-home gesture recogni-tion using wireless signals,’’ in Proc. 19th Annu. Int. Conf. Mobile Comput.Netw., 2013, pp. 27–38.

[29] R. Nandakumar, B. Kellogg, and S. Gollakota, ‘‘Wi-Fi gesture recognitionon existing devices,’’ Comput. Sci., vol. 3, no. 2, p. 17, 2014. [Online].Available: https://arxiv.org/abs/1411.5394

[30] D. Halperin, W. Hu, A. Sheth, and D. Wetherall, ‘‘Predictable 802.11packet delivery from wireless channel measurements,’’ in Proc. ACMSIGCOMM Conf., 2010, pp. 159–170.

[31] D. Halperin, W. Hu, A. Sheth, and D. Wetherall, ‘‘Tool release: Gathering802.11n traces with channel state information,’’ ACM SIGCOMMComput.Commun. Rev., vol. 41, no. 1, p. 53, Jan. 2011.

[32] Y. Zhuo, H. Zhu, and H. Xue, ‘‘Identifying a new non-linear CSI phasemeasurement error with commodity WiFi devices,’’ in Proc. IEEE Int.Conf. Parallel Distrib. Syst., Dec. 2016, pp. 72–79.

[33] M. Speth, D. Daecke, and H. Meyr, ‘‘Minimum overhead burst syn-chronization for OFDM based broadband transmission,’’ in Proc. GlobalTelecommun. Conf., Nov. 1998, pp. 2777–2782.

[34] T. M. Schmidl and D. C. Cox, ‘‘Robust frequency and timing syn-chronization for OFDM,’’ IEEE Trans. Commun., vol. 45, no. 12,pp. 1613–1621, Dec. 1997.

[35] P. H. Moose, ‘‘A technique for orthogonal frequency division multiplexingfrequency offset correction,’’ IEEE Trans. Commun., vol. 42, no. 10,pp. 2908–2914, Oct. 1994.

[36] T.-J. Shan, M. Wax, and T. Kailath, ‘‘On spatial smoothing for direction-of-arrival estimation of coherent signals,’’ IEEE Trans. Acoust., Speech,Signal Process., vol. 33, no. 4, pp. 806–811, Apr. 1985.

[37] J. Xiong and K. Jamieson, ‘‘ArrayTrack: A fine-grained indoor locationsystem,’’ in Proc. USENIX Conf. Netw. Syst. Design Implement., 2013,pp. 71–84.

[38] J. Gjengset, J. Xiong, G. McPhillips, and K. Jamieson, ‘‘Phaser: Enablingphased array signal processing on commodityWiFi access points,’’ inProc.Int. Conf. Mobile Comput. Network., 2014, pp. 153–164.

[39] J. Liu, G. S. Liao, and Q. H. Guo, ‘‘A method for estimation of sourcenumber based on spatial smoothing and gerschgorin disk criterion,’’ RadarSci. Technol., no. 3, p. 002, 2003.

[40] A. M. Andrew, ‘‘An introduction to support vector machines and otherkernel-based learning methods,’’ Kybernetes, vol. 32, no. 1, pp. 1–28,2001.

[41] M. H. Rahman and J. Afrin, ‘‘Hand gesture recognition using multiclasssupport vector machine,’’ Int. J. Comput. Appl., vol. 74, no. 1, pp. 39–43,2014.

[42] S. Abe, Support Vector Machines for Pattern Classification, vol. 36.London, U.K.: Springer, 2005, pp. 7535–7543.

16922 VOLUME 6, 2018

Page 13: WiCatch: A Wi-Fi Based Hand Gesture Recognition Systemstatic.tongtianta.site/paper_pdf/37138588-a4ae-11e... · In particular, the device free gesture recognition technology contains

Z. Tian et al.: WiCatch: Wi-Fi-Based Hand Gesture Recognition System

ZENGSHAN TIAN was born in 1968. He receivedthe bachelor’s degree from the Department ofMechanical Engineering, Zhengzhou University,China, in 1990, the master’s degree from theDepartment of Mechanical Engineering, Univer-sity of Electronic Science and Technology ofChina, in 1998, and the Ph.D. degree from theDepartment of Communication and InformationEngineering, University of Electronic Science andTechnology of China, in 2002. He is currently a

Professor with the Chongqing University of Posts and Telecommunications.His research interests include wireless positioning, target detection, andmachine learning.

JIACHENG WANG was born in 1992. He receivedthe bachelor’s degree from the Department ofScience, Kunming University of Science andTechnology. He is currently pursuing the mas-ter’s degree with the Department of Communi-cation and Information Technology, ChongqingUniversity of Posts and Telecommunications. Hisresearch interests include indoor localization andtarget detection.

XIAOLONG YANG was born in 1987. He receivedthe M.Sc. and Ph.D. degrees in communicationengineering from the Harbin Institute of Tech-nology in 2012 and 2017, respectively. From2015 to 2016, he was a Visiting Scholar withNanyang Technological University, Singapore. Heis currently a Lecturer with the Chongqing Uni-versity of Posts and Telecommunications. Hiscurrent research interests include cognitive radionetworks, energy efficiency optimization, andmanifold learning.

MU ZHOU was born in 1984. He received thebachelor’s degree from the Department of Elec-tronic and Information, Harbin Institute of Tech-nology, the master’s degree from the Departmentof Astronautics, Harbin Institute of Technology,and the Ph.D. degree from the Department of Sci-ence, Harbin Institute of Technology. He was aVisiting Scholar with the University of Pittsburgh.He is currently a Professor with the ChongqingUniversity of Posts and Telecommunications. His

research interests include wireless localization, machine learning, artificialintelligence, and convex optimization.

VOLUME 6, 2018 16923


Recommended