Sauron Security Final Design Review Report · SENIOR DESIGN PROJECT 2016, TEAM01, FINAL DESIGN...

SENIOR DESIGN PROJECT 2016, TEAM01, FINAL DESIGN REVIEW 1

Sauron SecurityFinal Design Review Report

Jose LaSalle, Omid Meh, Walter Brown, Zachary Goodman

Abstract—Sauron is a security system that can bedeployed in crowded areas to eavesdrop on individualsof interest. Sauron is an acoustic beamformer with acamera so that the operator can visually select targets.The beamformer is composed of a microphone array thatrecords sound at different points. When the operatorclicks on a target in the video, Sauron calculates the angleto the target and uses enhanced delay sum beamformingto extract what the target is saying.

Index Terms—Acoustic, Source Isolation, MicrophoneArray, Delay Sum Beamforming, Compound Array.

I. INTRODUCTION

SECURITY is a significant concern in publicplaces, resulting in an increased interest in

surveillance. Crowded places such as museums,markets, and airports are swarming with cameras.Sauron is a tool to further improve safety. Sauronallows security personnel to eavesdrop on individ-uals through the power of acoustic beamformingby simply identifying them in a video feed.

Sauron consists of a microphone array and cam-era that interface with a computer. An operatoris able to hover their cursor over an individualin a crowded environment and the system playswhat that individual is saying. This system can beadapted to be useful in almost any situation wherea voice needs to be isolated. For example, an op-erator might record a lecture and click on studentsin the audience who are asking questions. Anotheruse case would be video editing. A cameramanmight record something and then want to eliminatea distraction in the background. Although Sauron

J. LaSalle majors in Electrical Engineering and is a member ofCommonwealth Honors College.

O. Meh majors in Electrical Engineering and in ComputerSystems Engineering and is a member of Commonwealth HonorsCollage.

W. Brown majors in Computer Systems Engineering and inComputer Science and is a member of Commonwealth HonorsCollage.

Z. Goodman majors in Electrical Engineering.

is meant to improve safety, it has other applicationsas well.

Sauron is a threat to privacy. If enhanced, itcould be deployed in a neighborhood to eaves-drop on conversations inside households and otherprivate locations. The major obstacle in this taskwould be that potential targets would be at dif-ferent distances from the array. Closer targets willbe louder, meaning that delay sum beamformingwould fail unless there were a large number of mi-crophones, all of which would be sensitive enoughto hear at a long range.

Sauron consists of a microphone array and cam-era that interface with a computer. An operator isable to hover their cursor over an individual in acrowded environment and the system plays whatthat individual is saying.

A. Established SolutionSquarehead Technology’s new AudioScope is a

device designed to listen in on players, coaches,and the like at sports events. This device performsacoustic beamforming with an array of around 300microphones mounted on a disk on the ceiling toisolate locations selected by the operator [2].

Currently, airports have some of the most ad-vanced surveillance. Video feeds are analyzed toidentify individuals on watch lists, bags being leftbehind by their owners, people going the wrongway through checkpoints, and cars spending anabnormal amount of time in the parking lots [1].However; audio is not as prevalent in airport secu-rity.

B. Use CaseA security guard with no knowledge of acoustic

beamforming and very little training beyond thenorm sits in a video surveillance room. One of thecameras is aimed at a line of people waiting to


Fig. 1. Image of the physical array.

be screened at a checkpoint. Two individuals withsuitcases are chatting near the back of the line. Tobe on the safe side, the guard hovers a cursor overthe head of one of the speakers. The conversationcan be heard through the guards headphones.

SUSPECT

SpanningAngle

BeamWidth

Range

Fig. 2. Visual depiction of specifications.

C. SpecificationsTable I lists the specifications of Sauron. The

specifications for the targets distance, angle fromthe arrays center-line, and maximum beamwidthare about the same as the SDP 14 beamforminggroup had [3]. The SDP 16 group added contextaround the array, only increasing the performanceof the array where absolutely needed. These oldspecifications are reasonable for a bottleneck likea corridor.

The beamwidth specification is for a -10dBbandwidth because -10dB will make a sound seemhalf as loud to a listener [4]. Tests done withinthe SDP 16 group showed that when one of twosuperimposed voices is amplified to 10dB abovethe other the amplified voice is easy to understand.

TABLE ITABLE OF SPECIFICATIONS

Specification Promised AchievedRange 1 to 3 meters 1 to 3 metersAngle of Operation -65◦ to 65◦ -65◦ to 65◦

Maximum -10dB beam width 40◦ 30◦

Frequency Range 1kHz to 3.5kHz 500Hz to 5kHzReal-time Delay 10s 5sError in angle selection 20◦ 10◦

Experiments within the SDP 16 group foundthat higher frequencies were more important fordetermining what a person is saying than lower fre-quencies. These experiments involved taking soundclips of group members speaking and runningthem through a digital bandpass filter, expandingthe passband until the message was clear. Thespecifications were changed to include this usefulfrequency range, as is reflected in Table I.

As security may need to quickly respond toa conversation, the operator must hear what thetarget said no longer than 10 seconds after theyhave said it. Reducing this delay is preferable evenover hearing all that the target is saying. When theoperator selects on a target, the actual angle thatthe system is focusing on must be within 20 ◦ ofthe intended target. More error than this and thebeam will miss the target.

Figure 2 provides a visual depiction of thesespecifications.

II. DESIGN

Figure 3 shows the layout of Sauron. Sauronuses a fisheye camera, a 16 microphone array, and


Fig. 3. System diagram for Sauron.

a computer. The video information is sent to theuser interface so the operator can pick a target.The user interface maps the target location to anangle which is used by the audio processing portionof the program to perform beamforming on themicrophone data. This yields an isolated sound thatthe user interface can hear.

A. Microphone ArrayThe purpose of the microphone array is to record

sound from different locations. It sends this infor-mation to the audio processing software describedin section II-D.

Our array needs to produce high-quality soundacross our desired frequency range with a relativelyconstant beamwidth.

Beamforming involves processing multiple mi-crophone outputs to create a directional pickuppattern. It is important that the microphone onlypicks up sound from one direction and attenuatesthe sound that is off the main axis. Beamformingcapabilities are determined by the geometry ofthe microphone array, the polar pattern of themicrophones, and the speed of sound (which couldbe more accurately determined using a temperaturesensor).

Information about the geometry of the micro-phone array and speed of sound are used to de-termine the time delays used in the beamform-ing algorithm. The array geometry also influences

what frequencies the array operates at. Smallermicrophone spacing is optimal for high frequencieswhile larger spacing is superior at lower frequen-cies.

1) Microphones: Microphone (or microphonearray) directionality describes the pattern in whichthe microphones sensitivity changes with respectto changes in the position of the sound source.An omnidirectional pattern is equally sensitiveto sound coming from all directions regardlessof the orientation of the microphone. A cardioidpolar pattern means that there is minimum signalattenuation when the signal arrives from the frontof the microphone (0◦ azimuth), and maximumsignal attenuation when the signals arrive from theback of the microphone (180◦ azimuth), referredto as the null. Figure 4b shows a 2-axis polar plotof the omnidirectional and cardioid microphoneresponses. This plot looks the same regardless ofwhether the microphones port is oriented in thex-y, x-z, or y-z plane [5].

The cardioid polar pattern offers beamformingcapabilities by creating a beam where the sig-nal is attenuated except for where the beam issteered, while an omnidirectional polar patternhas no attenuation in any direction relative tothe microphone. A cardioid polar pattern with awide angle of operation and narrow beamwidthis desired from our beamforming array in orderto focus our beam on a single individual and


(a) Omni Directional (b) Cardioid

(c) 8-Mic Array (d) 16-Mic Array

Fig. 4. Polar plots that depict directivity and beamwidth for (a) omnidirectional microphones (b) cardioid microphones (c) our 8-microphonearray and (d) our 16-microphone array. As one can see, there are significant improvements in both the directivity and beamwidth whenthe number of omnidirectional microphones are increased.

operate in the largest area possible. We use om-nidirectional MEMS microphones for our array tocreate cardioid polar patterns across our operatingfrequency range. MEMS stands for Micro-Electro-Mechanical Systems, which include microsensorsand microactuators that act as transducer elementsthat convert acoustic pressure waves into electricalsignals [6]. MEMS microphones enable improve-ments in sound quality for multiple-microphoneapplications. Microphone arrays can take advan-tage of the small form factor, sensitivity matching,and frequency response of a MEMS design forbeamforming to help isolate a sound in a specificlocation [7].

High Input sound quality is the result of highsensitivity microphones, a uniform output levelacross our operating frequency, and low noise.

Microphone sensitivity is defined as the ratio ofthe analog output voltage to the input pressure.

The standard reference input signal for microphonesensitivity measurements is a 1 kHz sine wave at94 dB sound pressure level (SPL), or 1 pascal(Pa) pressure. Microphone sensitivity is determinedusing the reference input signal. As microphonesensitivity increases, the output level for a fixedacoustic input increases. Microphone sensitivitymeasured in decibels (dB) is a negative value,meaning that higher sensitivity is a smaller abso-lute value [8]. The sensitivity of the microphonearray is higher than that of each individual arraybecause their outputs are summed.

• Cardioid– -54dBV sensitivity– 50-15kHz frequency range

• Electret– -44dBV sensitivity– 20-20kHz frequency range

• MEMS


– -38dBV sensitivity– 100-15kHz frequency range

The frequency response of a microphone de-scribes its output level across the frequency spec-trum. The high and low frequency limits are thepoints at which the microphone response is 3 dBbelow the reference output level (normalized to0 dB) at 1 kHz. Figure 5 shows the frequencyresponse of the ADMP510 omnidirectional MEMSmicrophone [5].

Fig. 5. Frequency response of ADMP510 MEMS microphone.

When building the microphone array, knowingthe microphones frequency response enables usto choose microphones based on what frequencyrange we want to cover. In our desired operatingrange (1kHz - 3.5kHz), we can see that MEMsmicrophones have a flat, linear frequency response,meaning we do not have to attenuate or amplifyour signals differently at different frequencies toachieve a uniform output across the frequencyspectrum.

Figure 6 shows the design of the MEMS mi-crophone modules. These are commercial modulesprovided by sparkfun that we purchased as a wayto test our methods. Consequently, these modulesproved to be of excellent quality and allowed us tomeet our specifications. Therefore, we decided touse these products in our project.

Low noise is essential for high quality audio.Following the microphones, op amps are availablewith significantly lower noise than the microphonesthemselves, making the microphones the limitingfactor regarding the noise of the overall design. Thecable connections must be shielded and/or filteredto prevent the wires from picking up electromag-netic interference (EMI) or RF noise.

Fig. 6. Circuit schematic for MEMS microphone board.

By using an array of high sensitivity mi-crophones, low noise preamplifier circuitry, andshielded transmission wires, we achieve high qual-ity audio input into our computer interface forfrequencies based on the array geometry.

2) Array Organization: The geometry and thenumber of elements in the array directly affectthe performance. In general, given a fixed lineararray, as the frequency increases, the beam widthdecreases. To understand this, look at Figure 7. Thesignal is parallel to the mic array which is givesus the maximum delay. The phases for the 500Hzsignal arrive at the microphones in the array at [036 73 110] degrees, which are close and difficultto distinguish in terms of coherency. However, asthe frequency increases the phases for the 1500Hzsignal arrive at [0 110 146 330] degrees. Forhigher frequencies, the maximum phase differencebecomes larger and during the analysis it will beeasier to distinguish how incoherent signals due tolarge phase differences in the received signal.

In other words, the larger phase differencesallow us (as long as we are in the same cycle) todetermine the direction of the source more clearly,thus giving the microphone array higher directivity.Notice that the directivity is different for differentfrequencies, as for different frequencies we havedifferent range of arrival phase difference.

Smaller microphone spacing is better for highfrequencies and larger microphone spacing is de-sirable for lower frequencies. To achieve the bestresult for all frequency bands, we use a compoundmicrophone array, which is the superposition ofmultiple arrays with different microphone spac-ings. Bandpassing the signal to the proper fre-quency range for each array and subarray, perform-


Fig. 7. Wave phases over time for different frequencies.

ing the delay-sum for the specific band, and finally,summing the results of the different bands to obtainresult with maximum beam precision for multiplefrequency bands. Using equal microphone spacingprevents the array to create a precise beam for awider range of frequencies. Figure 8 depicts thelayout of our compound array.

3) Analog to Digital Converter: National In-strument’s USB-6210 [9] is the A/D used for thisproject to handle the microphones. This A/D cansample above the needed Nyquist rate of 7 kHz.

This A/D supports 16 microphones. [10] de-scribes how to physically attach the A/D to itsinputs. The SENSE line was left floating.

Before connecting the A/D to the laptop, theCD-ROM that came with the A/D was used toinstall the appropriate drivers. During the in-stallation of the A/D drivers, daqlib was in-stalled in Simulink. A block can be added called“daqlib/Analog Input” that allows access to thereadings of the A/D. For our setup, we needed toset the Analog Input block to use a “referenced”signal as there was a common ground across alldevices connected to the A/D.

To demonstrate the real-time functionality of theA/D with the rest of the array, a “Sinks/Scope”was attached to the end of each output line of theAnalog Input block. This allowed us to watch whatthe A/D detects.

B. Camera

The purpose of this block is to produce a videothat the operator can reference to choose a target tolisten to. This produces visual data that is displayedby the user interface described in section II-B.

We interfaced a USB fisheye camera withSimulink to give a wide field of view on which tobeamform. The camera’s functionality was testedby attaching it to a computer, running the Simulinkscript, and observing video streaming.

C. User Interface

The purpose of this block is to let the user easilyinteract with the system.

This is a graphical user interface that takes videoinformation from the camera described in sectionII-B and displays it. The user is able to hover his orher cursor on a target in the video to listen to it. Acurve is drawn on the display to show the user theregion he or she is listening to. This is necessarybecause Sauron listens to an angle, not a precisespot. This block then calculates the angle from thecenter-line of the array described in section II-A tothe target. This value is sent as an input to the audioprocessing software described in section II-D. Theaudio processing software calculates and providesthe audio coming from the selected point so thatthe user interface can play it to the user.

The interface was written in Simulink.


Low Frequency

Mid Frequency

High Frequncy

Compound Array

Fig. 8. Drawing of our compound array design. The low frequency array has a spacing of 21cm. The array for middle frequencies has aspacing of 14cm, except for he middle two microphones which are 7cm apart. The highest frequency array has a spacing of 7cm.

(a) Low Frequency Subarray (b) Mid Frequency Subarray (c) High Frequency Subarray

Fig. 9. Simulation results for sub-arrays within the system. 9a is the low frequency array in the Fig 8 with four elements at 21cm spacing.9b is the middle frequency array in the Fig 8 with six elements at [14cm 14cm 7cm 14cm 14cm] spacing. 9c is the high frequencyarray in the Fig 8 with six elements at 7cm spacing. 9a is tuned for [600Hz, 1kHz], 9b is tuned for [1kHz, 1.7kHz], and 9b is tuned for[1.7kHz, 3.5kHz]

This block was tested by having a human userobserve the system respond to selecting an individ-ual on the video feed and and hearing the audio.

D. Audio Processing Software

The purpose of this block is to isolate the target’svoice. It is given the angle to the target by theuser interface described in section II-C. It gets thenecessary audio data from the microphone arraydescribed in section II-A. This block gives theisolated voice of the target to the user interface.

1) Delay-Sum Beamforming: Beamforming,also known as Spatial Filtering, is a signalprocessing method used with sensor arraysallowing directional reception or transmission ofthe signal. For this project we are interested indirectional reception of the human voice. Sincethe speech is a broadband signal, we decided touse a delay-sum beamforming with a linear arraywhich allows us to process a wideband signal andrelatively low computational complexity.

Figure 10 is an illustration of a simple micro-phone array composed of three microphones anda summing module. As shown, when the signal is

produced at the -45◦ it arrives at the left, middle,then right microphones in order, and when thesignal is produced at the +45◦ angle it arrives atthe right, middle, then left microphones in order.In both cases, when all three signals are summedthe signals will be off by some time delay and willnot constructively add up. However, if the signalis produced perpendicular to the array, it arrives atthe three microphones at the same time resulting ina constructive signal sum. This microphone arrayis called a non-steered (focused on 0◦ azimuth) 3-element linear microphone array.

As illustrated in Figure 11, this concept can befurther expanded to steer the array beam to anarbitrary direction. A delay block is added to eachsignal before the summer which further delays thesignal. The added delay is to reverse the expectedtime delays for the signal coming from the desireddirection. For instance, in Figure 11, we desire tolisten to the target wavefront (top speaker), this wemathematically calculate the expected time delayfor the signal to arrive at each microphone. Next,the received signals are shifted back in time (in thesteering unit) to look as if they were all receivedat the same time by mics. At the summing stage,


Fig. 10. Simple microphone array with sounds coming from thedirection of -45◦, 0◦ , and 45◦. Reprinted with permission from[14]. .

this will result in the constructive interferencefor the signals coming from the target directionand destructive- or incoherent- interference for thesignals coming from other directions.

Fig. 11. Illustration of delay-sum beamforming. Reprinted withpermission from [14].

III. PROJECT MANAGEMENT

Our team has shown a lot of vitality and perse-verance since the beginning of this project, andthrough that we continue to learn how to worktogether efficiently and effectively. With commu-nication and personal accountability as our modeof operation, coupled with frequent meetings andclearly delegated tasks, we were able to accomplishall of our MDR goals despite a late start. Weachieved our goal of demonstrating voice isolationbetween two speakers by establishing four specificsub-goals that were tailored to each team member’sarea of strength. Analysis of the hardware forthe mic array was headed by Zach, as amplifierdesign and use of electronic elements are in hisfield of study as an electrical engineer. Walterwas responsible for interfacing the hardware into

Simulink and building a software block for cali-brating the array. Omid took on the beam-formingalgorithm given his CSE background, and Josewas responsible for noise reduction as this wasinvolved in his REU. As an execution of our plansunfolded, an overlap of our knowledge bases leadto a very integrated experience of one helping theother, resulting in a very rewarding experience sofar.

IV. CONCLUSION

Project Sauron proceeded as planned after MDR.Table I details our desired and accomplished speci-fications. These deliverables demonstrated that ourgroup could interface with an array and that ourgroup could isolate voices. Our final product hasresolved the physical boundaries that the groupfeared would stop them.

For CDR, our group was able to demonstrate thata user can hover over a point in a fisheye video feedand Sauron will isolate the audio at that point. Anew 16-microphone array was built to support thetight beamwidth called for by the specifications. Afisheye camera was implemented as promised toprovide a visual of the environment for aiding insecurity applications. There is a successful map-ping between the video and the target angle.

The major challenge of this project was imple-menting a realtime system. This challenge mutuallyarises with the 5 second sampling buffer required toacquire enough samples from all 16 microphones.Another consideration that presented a challengefor this project was an elegant arrangement of themicrophones that provides an ease of use for theoperator. Figure 1 displays the setup, which fits aform factor that would be easily deployable on anairport terminal wall.

A. Future WorkThere are extensive options for improving this

project:• Post Processing: Implementing post process-

ing would allow the user to perform beam-forming on media previously recorded throughthe beamformer, a valuable application forvideo forensics.

• Multi-Dimensional Array: Developing amulti-dimensional microphone array to


improve the beamformer’s directivity. Thereis potential for a distributed mesh array thatcan span a large space.

• Detection of Arrival and Tracking: Createdetection of arrival and tracking algorithms tocontrol the beamformer. These functionalitieswould enable the beamformer to operate with-out a user and could isolate audio in areasmaking noise.

• Temperature Sensor: Add a temperature sen-sor to more accurately determine the speed ofsound in the beamforming environment.

• Wireless Array: Creating an wireless arraythat interacts with other wireless arrays and/orthe main computer. This could be useful forimplementing a system that is easy to setupand fit to any environment.

• Stand Alone System: Implementing a standalone system, where the beamforming occurson an FPGA instead of laptop would allowyou reduce the delay in our beamformingsystem by performing signal processing inparallel.

• HoloLens, Virtual Reality, Speech-to-Text:Integrating beamforming with MicrosoftHoloLens and/or Virtual Reality. This couldbe applied towards applications for audioimpaired individuals, allowing people withspatial hearing difficulties, making it hardfor them to locate the source of the audiothey are hearing. This could help them focustheir hearing where they want, and not bedistracted by other noises. By implementingspeech to text functionality, you can providetranscripts of targeted individuals even in acrowded environment. This could be used todevelop a solution to language barrier issuesby showing the transcript of a persons speechin the users native language.

B. AcknowledgmentsWe would like to thank Professor Hollot and

Professor Moritz for their feedback and guidancein establishing realistic goals. We would also liketo send a big thanks to Professor Wolf who took thetime to meet with us each week and helped us stayon track and organized. An additional thanks forAlumni John Shattuck for coming back to UMassto meet with us as we evolve his old project.

REFERENCES

[1] Airports [Online]. Available:https://www.videosurveillance.com/airports.asp [AccessedWeb. 18 Jan. 2016.]

[2] Catherine de Lange Audio zoom picks outlone voice in the crowd [Online]. Available:https://www.newscientist.com/article/dn19541-audio-zoom-picks-out-lone-voice-in-the-crowd/ [Accessed Web. 21 Jan.2016.]

[3] J. A. Danis, et al. The Acoustic Beamformer. Available:http://www.ecs.umass.edu/ece/sdp/sdp14/team15/assets/Team15FinalMDRReport.pdf [Accessed Web. 18 Jan. 2016]

[4] University of WisconsinMadison, About Decibels (dB) [On-line]. Available: http://trace.wisc.edu/docs/2004-About-dB/[Accessed Web. 24 Jan. 2016.]

[5] InvenSense, Microphone Specifica-tions Explained [Online]. Available:http://43zrtwysvxb2gf29r5o0athu.wpengine.netdna-cdn.com/wp-content/uploads/2015/02/MICROPHONE-SPECIFICATIONS-EXPLAINED.pdf [Accessed Web. 2 Dec.2015.]

[6] InvenSense, Analog and Digital MEMS Micro-phone Design Considerations [Online]. Available:http://43zrtwysvxb2gf29r5o0athu.wpengine.netdna-cdn.com/wp-content/uploads/2015/02/Analog-and-Digital-MEMS-Microphone-Design-Considerations.pdf [AccessedWeb. 2 Dec. 2015.]

[7] Digi-Key, MEMS Technology for Microphonesin Audio Applications [Online]. Available:http://www.digikey.com/en/articles/techzone/2012/aug/mems-technology-for-microphones-in-audio-applications [AccessedWeb. 2 Dec. 2015.]

[8] Analog Devices, Understanding Micro-phone Sensitivity [Online]. Available:http://www.analog.com/library/analogDialogue/archives/46-05/understanding microphone sensitivity.pdf [Accessed Web.2 Dec. 2015.]

[9] National Instruments, ”NI USB-610 - Na-tional Instruments” [Online]. Available:http://sine.ni.com/nips/cds/view/p/lang/en/nid/203223[Accessed Web. 3 May 2013.]

[10] National Instruments, ”Bus-Powered M Series Multifunc-tion DAQ for USB - 16-Bit, up to 400 kS/s, upto 32 Analog Inputs, Isolation” [Online]. Available:http://www.ni.com/datasheet/pdf/en/ds-9 [Accessed Web. 3May 2013.]

[11] Mathworks, Database Toolbox [Online]. Available:http://www.mathworks.com/products/database/ [AccessedWeb. 23 Jan. 2016.]

[12] Mathworks, Acquire Images fromWebcams [Online]. Available:http://www.mathworks.com/help/supportpkg/usbwebcams/ug/acquire-images-from-webcams.html [Accessed Web. 24 Jan. 2016.]

[13] National Instruments, Least Mean Square (LMS) Adaptive Fil-ter [Online]. Available: http://www.ni.com/example/31220/en/[Accessed Web. 18 Jan. 2016.]

[14] A. Greensted, Delay Sum Beamforming, The labbook Pages An online collection of electronicsinformation, 01-Oct-2012. [Online]. Available:http://www.labbookpages.co.uk/audio/beamforming/delaySum.html.[Accessed: 24-Jan-2016].

Date post:	07-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Sauron Security Final Design Review Report · SENIOR DESIGN PROJECT 2016, TEAM01, FINAL DESIGN...

Documents