Lab 5 : Audio analysis system based on Zedboard——inaudible ...

Lab 5 : Audio analysis system based onZedboard——inaudible voice attack and defense project

Yanlin Liu 515030910461

Zening Li 515030910310

May 27, 2018

Abstract

This project is based on the purpose of re-alizing inaudible voice commands attack anddefense. Our main work is to calculate thefrequency of audio signal in the situation ofinaudible voice commands, and then builtan audio analysis system based on Zedboardzynq-7000 to obtain audio input and do FFTspectrum analysis along with displaying spec-trum. This system can pave the way for thefuther work for the inaudible voice attack anddefense. Zening Li is responsible for the cal-culation part, Yanlin liu is responsible for theSystem design & realization part.Keywords: Inaudible voice commands, In-audible voice atttack and defense, Zedboard,Audio analysis, FFT spectrum analysis

1 Introduction

So-called inaudible voice attack is basedon the principle that it is possible to cre-ate sounds(40KHZ) that humans(20KHZ)cannot hear but microphones(24KHZ) canrecord. This is not because the sound istoo soft or just at the periphery of human’sfrequency range. The sounds we create areactually 40kHz and above, completely out-side both human’s and microphone’s rangeof operation. However, given micro- phonespossess inherent non-linearities in their di-aphragms and power amplifiers, it is possi-ble to design sounds that exploit this prop-erty. To elaborate, we shape the frequencyand phase of sound signals and play themthrough ultrasound speakers; when thesesounds pass through the non- linear amplifierat the receiver, the high frequency sounds are

1

expected to create a low-frequency “shadow”.The “shadow” is within the filtering range ofthe microphone and thereby gets recorded asnormal sounds. Figure 1 illustrates the ef-fect. Importantly, the microphone does notrequire any modification, enabling billions ofphones, laptops, and IoT devices to leveragethe capability.

Figure1.

The upper bound frequency of humanvoices and human hearing is 20 kHz withthe lower bound frequency at 20 Hz. Formost voice recognition systems in smartphone(such as Apple Corp’s Siri, GoogleNow, Huawei Hi Voice and so on), a com-pletely inaudible attack mode is designed.Briefly, non-linearity is a hardware prop-erty that makes high frequency signals ar-riving at a microphone. The correspondingspeech recognition operation instruction sig-nal is modulated by AM, and its spectrumis moved to the range of human ear. Thereceiver circuit will produce the shadow sig-nal corresponding to the frequency in the20Hz-20kHz because of the nonlinearity, andthe demodulation and recovery can be suc-cessfully demodulated by the modulation fre-

quency audio command. Through the loop-hole of the circuit nonlinearity, we can realizethe manipulation of the speech recognitionsystem of the device under the circumstancesof the mobile device owner.

2 BackgroundMicrophones and speakers are in general de-signed to be linear systems, meaning thatthe output signals are linear combinationsof the input. In the case of power ampli-fiers inside microphones and speakers, if theinput sound signal is s(t), then the outputshould ideally be: sout(t) = A1s(t) In prac-tice, however, acoustic components in micro-phones and speakers are linear only in theaudible frequency range (< 20kHz). In ul-trasound bands (> 25kHz), the responses ex-hibit non-linearity. Thus, for ultrasound sig-nals, the output of the amplifier becomes

sout(t) = A1s(t)+A2s2(t)+A3s

3(t)+... ≈A1s(t) + A2s

2(t)

When we make sin(t) = s1(t) +s2(t); s1(t) = cos(2πf1t; s2(t) = cos(2πf2t)And we can get sout(t) = A1sin(t) + A2s

2in(t)

Before digitizing and recording the signal, themicrophone applies a low pass filter to re-move frequency components. Then we canget sout(t) = A2 + A2 cos(2π(f2 − f1)t)

3 Attack and DefenseIn this part, we will summarize the previ-ous attack and defense methods, and give theshortcomings of defense methods.

2

3.1 Inaudible voice attackLet v(t) be a baseband voice signal thatonce decoded translates to the command:¡°Hi, Siri¡±. An attacker moves this base-band signal to a high frequency fin =40kHz (by modulating a carrier signal),and plays it through an ultrasound speaker.The attacker also plays a tone at fin =40kHz. The played signal is: sin(t) =cos(2πfint)+v(t) cos(2πfint) After this signalpasses through the non-linear hardware andlow-pass filter of the microphone, the micro-phone will record: slow(t) = A2

2(1 + v2(t) +

2v(t)) This shifted signal contains a strongcomponent of v(t) (due to more power in thespeech components), and hence, gets decodedcorrectly by almost all microphones.

3.2 Defending inaudible voicecommands

The final defense is to search for traces ofv2(t) in sub-50Hz. However, we now focuson exploiting the structure of human voice.The core observation is simple: voice signalsexhibit well-understood patterns of funda-mental frequencies, added to multiple higherorder harmonics. This structure partly re-flect in the sub-50Hz band of s(t) (that con-tains v2(t)), and hence correlate with care-fully extracted spectrum above-50Hz (whichcontains the dominant v(t)). With appro-priate signal scrubbing, we expect the cor-relation to emerge reliably, however, if theattacker attempts to disrupt correlation byinjecting sub-50Hz noise, the stronger energyin this low band should give away the attack.

So, we can calculate the correlation betweenself convolution and square term itself basedon square terms in frequency domain .

3.3 Disadvantages of the de-fense

1. When the signal function is changed, theremay be a case that the square term is can-celled after the two term is cancelled, whichmakes the similarity reduced dramatically, orthere is a negative number of items, whichmay not increase the increment of the ampli-tude of the time domain.

2. The fundamental frequency of 20Hzhas greater environmental noise interference.

4 Audio analysis systembased on Zedboard

According to the calculation and principle ofthese inaudible voice commands, Our mainwork is to construct an audio analysis sys-tem, the system requirements to obtain in-put audio analog signals, modulus conver-sion, and then to analog conversion, the out-put of an audio signal FFT waveform, that is,to a random spectrum analysis of the audiosignal in real time. The audio signal is ob-tained from PC and sent to OLED throughFFT IP processing of PL for real-time dis-play of audio spectrum and data recording.Given consideration to the sampling rate andother requirments,we choose to use Zedboardto construct the whole system. ZedBoardis based on Xilinx Zynq ™ - 7000 extended

3

processing platform (EPP) of low-cost de-velopment board. This board can be runbased on Linux, Android, Windows ® orother OS/design of RTOS. In addition, ex-tensible interfaces make it easy for users toaccess processing systems and programmablelogic. Zynq - 7000 EPP will ARM ® pro-cessing system and with Xilinx 7 series pro-grammable logic perfectly together, you cancreate a unique and powerful design. The rea-son why we choose to use Zedboard is that weneed high enough sampling rate to completethe process of audio acquisition and analysisto pave the way for future attack and defensework.

Figure2.Zedboard Zynq-7000

4.1 Zedboard programmingand configuration

The development environment we used iswin10 (64),development software-vivado ver-sion 2015.2 and Serial port software: Secure-CRT.

The whole programming and configura-tion process is as followed. And our firsttrying is using the ip module of OLED and

Zed_audio_crtl to realize the function of ob-taining and analyzing the audio.

4.1.1 Zedboard programming andconfiguration

First of all, we use Vivado to built a project,adding oled, audio control and other IP ad-dresses to complete the hardware design.Then we generate the bitstream, which is fi-nally imported into the SDK 1) create myau-diosys directory and vivado project under it.Select zedboard development board.2) copy ip_repo to myaudiosys/. Open theIP catalog directory, select IP setting, andadd /myaudiosys/ip_repo.3) copy system.tcl to myaudiosys directory.4) open vivado’s TCL consoleInput: CD (dir)/project/myaudiosys.Input: source system.tclYou can see that a system.bd is automaticallycreated in vivado software.

Figure3.Hardware design

5) generate output product6) create HDL wrapper7) add constraint files. Oled. XDC andzed_audio_constraints. XDC

4

8) generate bitstream9) open implementation design & blockdesign10) export for the SDK11)Create a new project in the SDK, myau-diosys, and select the empty template12) add the source file, as shownin/SRC/SDK13) check whether the result of automaticcompilation is correct. If an error is reported,it indicates that sine and cosine are missing.Add parameter m in the linker library ofproject property to use the mathematicalfunction libraryOnce added, the SDK compiles the programautomatically. The appearance of ”’Finishedbuilding: test_audio. Elfsize ’” indicatessuccess.14) connect the audio port of PC with theLINE IN interface of zedboard by usingthe audio cable (the LINE connecting thespeaker and the computer), and connect theaudio plug of headset to the LINE OUTinterface. Connect J14 and J17 interfacesusing miniUSB for serial communication andwrite programs.15) open SecureCRT and connect. SelectXilinx tools->Program FPGA, burn theProgram, and the blue light DONE onindicates success.16) open the PC audio player and play anyaudio. Run >run as, select GDB.17) it can be controlled by switching theswitch.

Figure4.Function realizationThe basic function is to use button SW0:0to play the original audio,button SW1 to doFFT analysis of audio� For this realization,especially for the test function, it is ideal toselect 128 points, which will cause problemsif the sampling length is low, and the defaultsampling time is 1s, and the frequency reso-lution is 1hz.

4.2 Analysis and improvementAt this stage, we have implemented thefunction originally envisaged, but the sam-pling point is only 128 and the samplingrate is only 1hz. However, this examplemainly adds oled module and FFT algorithm,which has been verified under cfree. Themiddle connection is mainly in fft_audio.Hand fft_audio.C files for testing, obtainingaudio data get_audio_fft, audio process-ing audio_process, oled display oled_show,and maximum frequency show_max_fre, etc.The function of spectrum analysis is realizedsimply. Several algorithms were found on theInternet, and then the test function was writ-ten to test, and one of them was selected tomatch the results of matlab calculation. IP

5

HLS was supposed to be hardware algorithm,but has survived to generate the IP, but whenused in vivado cannot be comprehensive, 8 gbof memory to run full or crash, obviously notto do a good job of the data processing of theinterface. As for the audio data collectionand sampling frequency, it is determined bythe system size, and the processing will be re-turned after the full fft_points are collected.The playback audio of the example is simplyto read the data from the data register andsend it back directly. And at the time of pro-cessing data is read from register function ofFFT processing, so the two cannot simultane-ously, namely display spectrum of time can’tplay music, is the work of a single thread.

Therefore, in order to increase the sam-pling rate, the first thing we do is to imorovethe fft function code and increase the fft size.Reference ug871 FFT IP is implemented, thecore is the FFT vivado, plus by HLS, pro-ducing the pre-treatment and post-treatmentof IP core, constitute the RealFFT nuclear(originally thought to give it to generate aseparate IP, temporarily not successful vali-dation). Blogger before I tried to put the lastsoftware FFT by HLS to IP, although suc-ceeded, but vivado comprehensive, guess thedata type and the amount of data to trans-fer large quantities of data. The FFT_SIZEfor this FFT IP is 1024 and the data is muchlarger. By using streaming mode, FFT IPcommunicates directly with CPU, that is,DMA IP connects RealFFT IP and CPU, soCPU only needs to operate DMA. Therefore,the input and output of data only need todeal with DMA. The simpleTransfer functionused in this paper can only handle data of 32-

bit length. Therefore, the input and outputdata are all short types, and more complextypes need further study. Code and personalcomments are posted here for reference.

Figure5.Code and comments

Also, compared with the initial hard-ware(fig3),we have to increase two ip mod-ules(REALFFT and Direct Memory Access)into the hardware design to match the RE-ALFFT algorithm.

Figure6.Hardware design

Now,at this point, our sampling points havebeen increased to 1024, and we set the sam-pling time to about 30ms. At this time,according to the public display, the sam-pling frequency can be calculated to increaseto about 34khz, while it is still lower than

6

100khz. However, if we continue to shortenthe sampling time to increase the samplingrate, there will still be an error. It is sus-pected that 8G of memory is full.

4.3 XADC module configura-tion

Finally,we learn to use the XADC mudule onthe Zedboard to meet the need of 10khz sam-pling rate requirement. On the Xilinx 7 serieschip, there is an XADC module, which is adual 12bit AD converter. Xadc itself has itsIP core can directly use the IP core throughthe data bus to ARM the minimum Linuxsystem, the system of the xadc driver xadcitself has its IP core can directly use the IPcore through the data bus to ARM the min-imum Linux system, the system of the xadcdriver.

Figure7.XADC Block Diagram

Then we add XADC ip core into the hard-ware diagram, then complie, generate newbitstream file and built SDK project.

Figure8.Add XADC IP

After using the XADC module ip, we can fi-nally successfully increase the sampling rateto 100khz and realize this audio analysis sys-tem.

Figure9.Final System

OLED central display maximum values,evenly distributed on both sides.

5 ConclusionThe main work of this project is very crucialfor the inaudible voice commands defense andattack. According to the calculation of the

7

frequency of inaudible voice commands, theaudible analysis system can obtain the audio,do AD conversion, do REAl FFT process,then DA conversion, finally record and dis-play the FFT spectrum, especially samplingrate can reach 100khz. With the pre-processbased on this system, it paves the way forthe further work for the inaudible voice at-tack and defense.

6 Division of laborZening Li is responsible for the calculationpart;Yanlin liu is responsible for the System design& realization part.

7 References1.Guoming Zhang, Zhejiang University: Dol-phinA ack: Inaudible Voice Commands.CCS’17, October 30-November 3, 2017, Dal-las, TX, USA2.Nirupam Roy, Sheng Shen, Haitham Has-sanieh, Romit Roy Choudhury Universityof Illinois at Urbana-Champaign: InaudibleVoice Commands: The Long-Range Attackand Defense3.Nirupam Roy, Haitham Hassanieh, RomitRoy Choudhury University of Illinois atUrbana-Champaign: BackDoor: Making Mi-crophones Hear Inaudible Sounds

8

Date post:	05-Jan-2022
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

Lab 5 : Audio analysis system based on Zedboard——inaudible ...

Documents