Public Restroom Detection on Mobile Phone via Active Probing Mingming Fan, Alexander Travis Adams, Khai N. Truong
Department of Software and Information Systems University of North Carolina, Charlotte {mfan, aadams85, ktruong8}@uncc.edu
ABSTRACT Although there are clear benefits to automatic image capture services by wearable devices, image capture sometimes happens in sensitive spaces where camera use is not appropriate. In this paper, we tackle this problem by focusing on detecting when the user of a wearable device is located in a specific type of private space—the public restroom—so that the image capture can be disabled. We present an infrastructure-independent method that uses just the microphone and the speaker on a commodity mobile phone. Our method actively probes the environment by playing a 0.1 seconds sine wave sweep sound and then analyzes the impulse response (IR) by extracting MFCCs features. These features are then used to train an SVM model. Our evaluation results show that we can train a general restroom model which is able to recognize new restrooms. We demonstrate that this approach works on different phone hardware. Furthermore, the volume levels, occupancy and presence of other sounds do not affect recognition in significant ways. We discuss three types of errors that the prediction model has and evaluate two proposed smoothing algorithms for improving recognition.
Author Keywords Restroom detection, room impulse response, active probing, pattern recognition
ACM Classification Keywords H.5.5. [Sound and music computing]: Signal analysis, synthesis, and processing
INTRODUCTION Wearable cameras produce personal image-based records which can be used in a variety of ways. For example, researchers have used such records to investigate health behaviors (such as exercise and diet [9, 10]), help people with memory loss recall past events [8], increase parental understanding of the needs of children with autism [15, 18], and improve everyday memory and social skills for children with disabilities [1]. Although there are demonstrated
benefits to wearing an always-on and automatically recording personal camera, there are also documented concerns of recording others, particularly in sensitive spaces [1, 3, 4, 9, 10, 11]. As a result, many researchers and users have expressed a need for a mechanism to temporarily disable capture. However, even when manual “privacy buttons exist and wearable cameras can be removed, it is not uncommon for participants to report that they forgot they were wearing the unit. Therefore, the participant inadvertently might collect inappropriate images, such as going to the bathroom” [11]. “With thousands of images automatically recorded every day, … [the user] only deletes unwanted images if he comes across them, as searching for them would take too much time” [4]. Therefore, how to turn off wearable cameras automatically in sensitive or private spaces is an important research problem.
We tackle this problem by exploring how to detect a specific type of private space where image recording is socially inappropriate—the public restroom. Many researchers have identified the restroom as a specific type of space where they want to suspend capture (e.g., [1, 3, 4, 11]). We focus on public restrooms, in particular, because of the potential for others to be recorded in the captured images there. This problem is challenging for two reasons. First, infrastructure-dependent indoor localization approaches (e.g., cellular, WiFi, and visible light) depend on the infrastructure coverage and floor maps to identify a restroom’s location. Infrastructure-independent indoor localization approaches (e.g., inertial sensors on phone) would still require floor maps in order to reason and determine if the user’s location is inside a restroom. However, such localization methods fail when the user is outside of an infrastructure’s coverage or at a location where floor maps have not yet been developed. Alternatively, video or image based approaches can be employed to detect restrooms [17, 19, 20, 25, 26]. Unfortunately, vision based techniques can sometimes miss signage located immediately outside the space [26]. These methods still can be used inside the space to detect the presence of objects, construction material, and fixtures commonly found in restrooms to reason that must be where the user is located (e.g., [19]); however, this violates the original motivation of not wanting recording to happen there in the first place. Furthermore, recording must still be on to determine when the user has left the space in order to resume the archiving of captured images.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. Copyrights forcomponents of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, topost on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ISWC'14, September 13 - 17 2014, Seattle, WA, USA Copyright 2014 ACM 978-1-4503-2969-9/14/09...$15.00. http://dx.doi.org/10.1145/2634317.2634320
27
ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA
Iacaurlotrsdsovpptys
TAae[wsmsasafctuatsce
Tc(c
In this paper, approach, whicommodity moanalyze the imuses active prorather than a ocations fromrained. We de
spaces and thatdiscuss how thsignificant waobtrusiveness volume. We prediction perfpresence of otypes of errors
smoothing algo
THEORY OF OAn important aand characterienvironments a24]. IRs conta
which is the resound, decay timethod for capsweep over a pacoustics of aspecified amouanalyzed to ufrequency ovecommonly usedune their syste
any undesired echnique as an
space. IRs are acreate a digitalenvironments.
The acoustic contingent on (absorption coconstant factor
Figure 1. (Left)
we present ich uses the obile phones t
mpulse responsobing to detec
specific spacm which a roemonstrate that the approach whe volume leveays. Thereforof the sweep also show thformance despther sounds in that our SVM
orithms to impr
OPERATION and commonlyize the acouand materials ain time-domaeverberation anme, early refle
pturing the IR opredetermined a room [14]. Tunt of time aftunderstand ther time in an d by audio engems in order to
feedback or n analysis tool also used in col representation
characteristicits dimensions
oefficient). Thrs including t
Wearable phone
an infrastructhardware alr
to emit a prose (IR) of a sct restrooms (ace amongst a om type clas
at our model cworks on diffe
el does not affere we can
by outputtinghat our modepite of the occn restrooms. WM model has arove the predic
y used measureustic propertieis the Impulse
ain acoustic prnd its compon
ection, and echof a room is tofrequency ran
The sine wavfterwards is rece behavior o
environment.gineers to help o enhance the s
noise. Acousto aid them w
onvolution reven of the acous
cs of an ens and ability t
his is a functhe shape, siz
e attachment for
ture-independeready found obing sound anpace. Our woa type of spac
defined set ssifier has becan predict neerent phones. Wfect the model
minimize thg it at a lowel maintains icupancy and th
We discuss thrand propose twction accuracy.
ement to analyes of differee Response (IRroperties, one nents (i.e., direoes). A commo use a sine wav
nge to excite thve sweep and corded and th
of each audibIR analysis
them design ansound and avosticians use thwhen designingerb processing stics of differe
nvironment ato absorb soun
ction of severze, constructio
r performing anduse
ent on nd
ork ce) of en ew We
in he
wer its he
ree wo
yze ent R) of
ect on ve he
a en
ble is
nd oid his g a to
ent
are nd ral on
materiadesks).and pecontinuaffect paramereflecti
Becaushave Howevdue touniquehas strserve th
In bothaffordaThese toilets,privatesimilarrestroothe phorestroothe surhave si
Our syspaces acoustiprocesto trainuser isexistinmethodenvironnoise o21]. Aclassifibegins and han
d analyzing impued in our study.
als, and object. There are al
ercentage of ocuously over t
reverberationeters of soundion [23].
se no two enviunique acou
ver, many cono its purpose. e qualities fromrong correlatiothe same purpo
h the public anances that greaffordances in
, and sinks. We restrooms havr acoustic reoms from that one). This is p
oms but can alsrfaces and theimilar absorpti
ystem leverage(in particular
ics via IR. Thsed to extract n a classifier ts located. Thisng computationds which extrnment or humor sink usage) An active p
fication of restrto use the spa
and washing).
ulse response and
ts inside of thlso many variaccupancy, that time. These
n time, but d such as dif
ironments are eustic charactenstructs have v
One particulam other types oons between siose, is the restro
nd private spaceatly impact tnclude water re
While public reve showers/tubesponses and response (eve
partially due tso be attributede items found ion coefficients
es the natural ar the restroom)he IR of diffeacoustic featu
to identify the s active probinnal auditory scract features a
man generated collected fromprobing apprtrooms to happace (e.g., urinat
d image capture
he construct (eables, such ascan change thcharacteristics they also af
ffraction, refra
exactly the sameristics or fivery similar far environmenof environmenimilar environoom.
e, restrooms hathe acoustic fesistant floors estrooms havebs, they both d
can be ideen to the humato the commond to the materiain a restroom s [14]).
acoustic traits o) by exciting t
erent spaces caures which are
type of spaceng approach dicene recognitioand trains cla
sounds (suchm the target sceroach allows pen even beforting, defecating
; (Right) 8 samp
e.g., chairs, s humidity, e acoustics
not only ffect other action, and
me, they all ingerprints. fingerprints nt that has ts, yet also
nments that
ave similar fingerprint. and walls,
e stalls and emonstrate
entified as an ear over n layout of als used on (which all
of different the natural an then be e then used where the iffers from on (CASR) assifiers on h as traffic enes [2, 16,
for the re the user g, flushing,
ple restrooms
28
ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA
A[moBRlouSsvth
S
ImTapam2woththsa
F
Active probing13,22]. Kunze
mobile phone’sones by using vBy emitting a Rossi et al. [22ocation among
use a Sine WSequence (MLsweep exhibitsvariance of thehe goal of reco
SYSTEM IMPL
mpulse RespoTo perform theand the speakephones (see Fapplication thamicrophone, an20Hz – 20kHzwave sweep stone second, allhe measured she IR at 44.1 K
spectrograms oa restroom and
Figure 2. (a) Imp
has been exple and Lukowiczs symbolic locavibration and shsound and an2] were able tg 20 rooms at 9
Wave Sweep iLS) for active s better tolerane probed spaceognizing space
LEMENTATION
onse Measuree IR measuremer already avaFigure 1). Wat first starts nd then it outpz from the butops, the appliowing it to capspace. The meKHz with 16 b
of IRs collecteda non-restroom
(
(pulse response o
of an
ored as a localz [13] showedation from a sehort narrow fre
nalyzing the imto classify a u98% accuracy. instead of Maprobing, becaunce to nonline
es [6]. This allotype.
N
ement ment, we use ailable on com
We developed recording w
puts a sine wauilt-in speakercation continupture fully the easurement appbit depth. Figud on a Nexus 4m space.
(a)
(b) of a restroom; (b)n office.
ization approad how to detectet of pre-definequency ‘beeps
mpulse responsser’s room-levIn our work, w
aximum Lenguse a sine wavearity and timows us to tack
the microphonmmodity mobia measureme
with the built-ave sweep fro. After the sin
ues to record freverberation
plication recorure 2 shows ho phone differ f
) Impulse respon
ch t a ed s’. se, vel we gth ve
me-kle
ne ile ent -in om ne for of
rds ow for
Stans emeasurTime-Scompachoice not reqcreatinhave vunoccuoptimaFurthertypicalmatchesweephumanable tosweeps0.1s duacousti
FeaturThe mbefore after than IRmeasuris a vapplicaand whstarts asweep:size Wtwo wieach wFFT m
(K1…N, frames∗∗of thProcessweep Howevbecausat diffeoptimaperformitself +sectionthrough
For thefor feacoefficcommo
nse
et al. previouslrements (MLSStretched Pulsarison, they fou
for non-occupquiring any tedng a robust movery few peopupied. This enally, making it rmore, a comlly optimized fes the ideal fr
to capture IRns, we want it o produce a s of 0.01, 0.1 uration providics of a space a
re Extractionmeasurement ap
it outputs the he sweep has c
R. The IR isrement applicavariable latenation requests hen it actuallyat an unknown: 1) Divide the
W = 1024 samindows have 5
window using tmagnitudes forK = 0,…,W-1)
N:# of frames S inside the. % . 4) Estimat
he sweep u∈ … ∑ssing the rest
guarantees ver, it may se the IRs captferent rates. Wal amount of tm feature extr+ 0.3 s aftern, we describe h tests of diffe
e identified opature extractioncients (MFCConly used as f
ly compared diS, Inverse Reses, and Sine und that the sinupied spaces. Idious calibratioobile system) [2ple coming an
nables the sine t the best optiommodity mobifor the range orequency rangeRs. Because tto be as shornoticeable proand 1s in dur
des the best band minimizing
pplication intensweep and sto
completed to fpresent in t
ation begins toncy between w
the OS to outpy plays. Due t
n time point. Toe IR into frame
mples (23 milli0% overlap withe Hanning fur each windowFFT magnitu
es in a sweep e sweep itsel
ate the index to
using the f∑ ,of the recordithat the comalso process tured in differe
We have foundthe recording raction on is 0r the sweep s
the process foerent durations
ptimal amountn, we extract thCs) from eachfeatures in the
ifferent impulsepeated Sequen
Wave Sweepne wave sweept also has the
ons (which is im24]. Restroomnd going, and
wave sweep on to use for IRile phone’s haof human heare for using a the sweep is rt as possible yobing effect. ration and founalance for capg the intrusiven
ntionally startsops recording ofully capture ththe recording
o output the swwhen the meput the sine wto this delay, o estimate the es using a slidiniseconds). Eacith each other.
function. 3) Caw. , repr
ude of the winrecord). The
lf is calculate
o the window
following op, .
ing after the smplete IR is
extraneous inent types of spd experimentalafter the swee
0.4 s (0.1 s of tops). In the or determiningafter the start o
t of the recordhe mel-frequenh window. Me speech recog
se response nce (IRS), p). In this
p is an ideal benefit of
mportant in s generally
d are often to perform R analysis. ardware is ring, which sine wave audible to yet still be We tested nd that the pturing the ness.
s recording one second he effect of
after the weep. There easurement
wave sweep the sweep start of the ng window ch adjacent 2) Smooth
alculate the resents the
ndow j (j = number of
ed as:
at the start
ptimization:
start of the analyzed.
nformation aces decay ly that the
ep starts to f the sweep
evaluation g this value of sweep.
ding to use ncy cepstral MFCCs are gnition and
29
SESSION: CONTEXTUAL AWARENESS ON MOBILE DEVICES
cimaufc
u
o
CCgCaHthacpr
Bcm(cmcorbbtyL
PUr1baaciresor
Fpbthruah
computational mplementation
applying 23 Muse the rest to final step, we acalculating the
using the follow
of frames in the
Classification Classifying resgenerally referClassification, are interested iHowever, one-heir prediction
algorithms (Hclassification inpredict a new srecalls for the r
Because restrooconform to bmaterials, and h(e.g., toilets, wconsistency in might be highlclass. Furthermoutside of restrrestroom data tboundary. Thusbinary-class clype as either
LibSVM [5] fo
PERFORMANCUsing a Galaxyrestrooms from1960s – 2006)boundary, weattempted to saas possible (e.gclassroom, shops hardly poss
restroom spaceevaluation. Infshown in the fiof places wherestroom data.
For each spaceperceived to subuilding architehrough and int
restroom, the tyurinals / toilet and then back held the phone
auditory scenen, we calcula
Mel filters, remform a feature
aggregate all the mean values
wing equation:
e optimal amou
trooms vs. nonrred in machbecause restro
in identifying a-class classifiens. We have triHempstalk et
n LibSVM [5sample as “unkrestroom class.
oms are built tbuilding codehave similar mwash basins), the restroom ly distinguisha
more, because prooms, we can to help the clas, we decided tlassification, wr restroom orr classification
CE EVALUATIy Nexus, we com 49 different. To help the
also collectample as diversg., hallway, elp, and bus statsible to coveres that a userformation aboirst row of Tabere we collec
e, we collectedupport circulaecture to referteract with the ypical circulatistalls, the wasto the door. W
e still in one
e recognition [2ate the first ove the first M
e vector: , ,he MFCCs of as of each one
: ∑ ,
unt of the recor
n-restrooms falhine learning oom is the targamong all the
ers tend to be ied two one-cla
al. [7] and5]), and found known” and the
o serve the sames, use simil
materials insidethere should
class, which aable from the people spend measily collect a
assifier learn tto treat restroowhich predictr non-restroomn in our evaluat
ON ollected IR dat
buildings (bumodel learn tted non-restrose a set of non-levator, lockertion). Howeverr all differentr can potentiaout our colleble 1. Figure 3 cted the restr
d 30 samples aation. Circulatir to the way thspace. For instion would be fshstand, the paWhile collectihand in front
2, 16, 21]. In o13 MFCCs b
MFCC, and th, … , , . In thall the frames b
of the MFCC
(k=1…12, N:
rding).
lls under what as One Cla
get class that wpossible spaceconservative
ass classificatiod the one-cla
that they ofterefore yield lo
me functionalitlar constructioe of the construd be high innat the same tim
“non-restroommuch more tima variety of nothe classificatioom detection asts current room. We leveragtion.
ta for 103 publuilt between ththe classificatiooom data. W-restroom spacr room, outdoor, we note thatt types of noally visit in oected dataset
shows the typroom and no
at different spoion is a term hat people movtance, in a menfrom the door
aper towel racking the data, w
of the chest
our by en he by Cs
: #
is ass we es. in on ass en
ow
ty, on uct ner me m” me on-on s a om ge
lic he on
We ces or, t it on-our
is pes on-
ots in ve
n’s to
ks, we to
simularemainWe tooand thanyway
OptimaAs preslightlyThis grecordithe staextractstart orecordiextractinterva(0.1~1featureand peclassifiperformduratioof the that moare leswe useon the the swsweep
EfficacTo assspacesnumbeUsing
Figu
Figure 4duration
ate wearing thened stationary ok care to holdhe microphony.
al Amount of eviously descry before the sguarantees thaing. We can reart of the recoting features fr
of the recordining after the tion, we selecal after the sw s). For each d
es based on all erformed a 10
fier. The resultmance measuon after the swrecording to u
models trained wss accurate. Thed this finding 0.4 seconds po
weep (0.1 s swhas stopped).
cy of Model insess the model’, we evaluate
er of restroomthe Galaxy
ure 3. Different t
4. Five measurns from the start
e device arounwhile recordind the phone su
ne were not
Recording forribed, the applsweep and conat the full IR emove extraneording and therom the start ofng. To identifystart of the swcted 10 differ
weep has stoppduration (D = 7474 recordin
0-fold cross vts are shown iurements indiweep has stoppeuse for feature with longer duherefore, in al and performeortion of the re
weep itself +
n Classifying N’s efficacy in ced how the m
ms used for traNexus datase
types of places w
rements of the of the sweep for
nd the neck. Tng the impulseuch that both tcovered or b
r Feature Extrlication beginsntinues for a fu
is capturedeous informatioe start of the f the sweep insy the optimal weep to use frent durationsed after the sw0.1,…,1.0), we
ngs by the Galavalidation within Figure 4. Acate that 0.3ed is the optimextraction. It a
urations than 0ll following e
ed feature extraecording after 0.3 s duration
New Restroomclassifying new
model convergaining (traininget (103 restro
where data was c
model using 1r the feature extr
The phone e response. the speaker blocked in
raction s recording full second. within the
on between sweep by
stead of the amount of for feature s at 0.1 s weep stops e extracted axy Nexus, h SVM as
All the five 3 seconds
mal amount also shows
0.3 seconds evaluations, action only the start of n after the
m Spaces w restroom ged as the g set size). ooms), we
collected
10 different raction.
30
ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA
gupNcnnN
I
athbwdps
ApTacbths
GDthdwoasdocNc
Wc
Fn
gradually increused for trainprocedure workN restrooms frochose the samnon-restroom IRnon-restroom aN restroom is
IRs chosen wa
as the classifierhese
by the strategy we repeated thdifferent restroperformances fsize. The final r
As the numbperformance beThe weighted Fand recall of bconverged at ~between 0.91 ahan 40 restroo
seen enough va
GeneralizabilitDifferent phonehe microphone
distances and inwe must also von different phalso collectedsamples on a Ndata collected oone collected confirm the reNexus data. Tcollected on the
We performed classifier for ea
Figure 5. Measurnumber of restroo
eased the numbning. For eachked in this manom the 103 rese percentage Rs. Thus, if the
are , , . Therefore
s: r and performe
IRs. Fourth,of choosing N
he procedure fooms each roufrom the 10 roresults are show
ber of trainiecame more stF-Measure, whboth the restro~0.93. The wand 0.93 while oms. This sugariations of rest
ty of the Approes may use dife and the spean different posalidate that our
hones. In additd additional Nexus 4, an HTon these deviceon the Galax
esults obtainedTable 1 summe four phones.
a 10-fold crosach phone’s d
rements of the moms used for trai
ber of restroomh number N, nner. First, we strooms. Seconof non-restrooe total numberthen the numb, the number ∗ . Third
ed a 10-fold cro, to remove v
N spaces for trafor 10 rounds und. Finally, wounds for eachwn in Figure 5
ing set size table and graduhich incorporatom and non-r
weighted F-Methe model wa
ggests that thetrooms at that p
oach across Pfferent hardwaaker can be plasitions from onr approach gention to the Garestroom andTC One, and aes were not as y Nexus, but d from analyzmarizes the a
ss validation usdata separately.
model’s convergining
ms (N=1,…,10the evaluatio
randomly chond, we randomom IRs from ar of restroom anber of IR in theof non-restroo
d, we used SV
oss validation ovariations causaining at random
by choosing we averaged thh N training s.
increases, thually convergetes the precisioestroom classeasure fluctuat
as trained on lee model had npoint yet.
Phones are. Furthermoraced at differe
ne another. Thuneralizes to woalaxy Nexus, wd non-restrooa Galaxy S. Thextensive as ththey helped
zing the Galaxamount of da
sing SVM as th. The results a
gence with differ
03) on
ose mly
all nd
ese om
VM
on ed m, N he set
he ed. on es, ed
ess not
re, ent us, ork we om he he to xy ata
he are
shown modelsat betw
Unfortsoftwaon difcaptureother. dataset10-foldthat thephonesas welMeasu
Effect We tesrestroorestroothe Ga
Phone Na
Galaxy NNexus 4HTC OneGalaxy S
Figure validatiorestroom
rent Figure 7other thr
in Figure 6s for the four d
ween 0.92 and 0
tunately, diffeare optimizatiofferent phoneed by each phWe applied tht and tested ond cross validate extracted MFs. Phone-indepll as phone-d
ure between 0.4
of Occupancysted the model’om and soundom (e.g. urinatialaxy Nexus, w
Table 1. Sampl
ame # rest
exus (GN) 103522020
6. Model Geneon Results on fom)
7. Model trainedree phones’ data
. It highlightsdifferent phone0.98 (weighted
erent hardwarns for the mics mean that
hone are drastiche model trainn the other thrtion results, shFCCs features pendent classifdependent clas43 and 0.63).
y & Sounds on’s robustness ad generated bing, flushing, awe collected ne
le data collected
of rooms
#of rdata sa4258 2230 600 600
eralization acrour phones separa
d on Galaxy Nexaset (R: restroom
s that the claes perform simd F-Measure).
re settings acrophone and t
the impulse cally different
ned on the Galree phones’ dahown in Figur
do not generafication does nssification (we
n the Model gainst the occuby the occup
and hand washiew data from
on the four pho
estroom amples
# of ndata s32161296523573
ss phones: 10-ately (R: Restroo
xus dataset and tm; N: non-restroom
assification milarly well
along with the speaker
responses from each
laxy Nexus ataset. The e 7, reveal
alize across not perform eighted F-
upancy of a pants in a ing). Using a restroom
nes
on-restroom samples
-fold Cross-om; N: non-
tested on the m).
31
SESSION: CONTEXTUAL AWARENESS ON MOBILE DEVICES
wfoo53eanlaoass
EWmmc0u
FethrthWu~awn
Wc8alrimdh
Fle
with two urinafunctional spotoccupancy ratoccupied by p57%, 71%, 86%30 additional IRearlier in the paall seven casenormal mannerarge Galaxy N
of Table 1) to tand 86% occusample each (asamples for the
Effect of SweeWe evaluated model to explominimized by collected 30 sa0.75 of the musing the Galax
For comparisonenvironment sohe IR for the sw
recording betwhe sweep (see
We note that ausing this app~0.15s in lengthat other volumwave sweep, wnormally used t
We performedcollected at eac8 demonstratesafter an outputevels perform
recordings witmprovement in
data captured higher volume
Figure 8. Measuevels. Volume 0
als, two toilet s in total) for te as the pereople. Therefo% and 100% oR samples usinaper for each os, people sim
rs. We used theNexus data corptest the new coupancy rates, accuracy: 97% remaining fou
ep Volume on the influence
ore if the obtruoutputting the
amples at 4 vomax volume) fxy Nexus.
n, we also extound without ouweep at volumeen the start oFigure 2) as ea
a limitation andproach exists bh each, in comes. In this inst
we did not needto capture the I
d a 10-fold cch volume leves that models t sweep outputmed better ththout a sweepn accuracy. Awhen the apps generally pe
urements of the0 uses only the p
stalls, and threthe evaluation.rcentage of fore, we had 1occupancy rateng the same m
of the seven occmulated using e SVM classifipus (describedllected data. Fothe model m
%), and correctur occupancy ra
the Model of the sweep
usiveness of thsweep at low
olume levels (0from 13 addit
tracted MFCCsutputting a swe
me 0. We used tf the recording
ach space’s envd potential threbecause the e
mparison to the tance, because d to include anIR after the swe
cross validatioel. The results trained on re
tted at any of han the mod
p; there was additionally, mplication outpuerformed bette
e model using 5ortion without a
ee basins (sev. We defined thfunctional spo4%, 29%, 43%es. We collect
method describcupancy rates. the restroom
ier trained on thd in the first roor the 14%, 29
misclassified ontly classified aates.
volume on thhe sweep can b
wer volumes. W0.1, 0.2, 0.5 antional restroom
s using only theep; this acted the portion of thg and the start vironment souneat to validity extracted data
0.4 s recordinthere is no sin
n additional 0.3eep plays.
on on the dashown in Figu
ecordings of IRthe four volumdel trained oa 10% or moodels trained outted sweeps er than those
5 different volusweep.
en he ots %, ed ed In in he
ow 9% ne all
he be
We nd ms
he as he of
nd. in is
ngs ne
3 s
ata ure Rs me on
ore on at at
lower after loare stilvolumethe swactive p
ContinIn thiscollectworksGalaxycollectsound automa
We codays. Wa resuenabledThe teapplypredictthe Feaclassifirestroo
ClassifWe evspaces differedays’ wall the groupecombinrespect
ume
Table 2Day
1 2345
Figure 9of days’
volumes. Althower volumell comparable e sweeps. This
weep at a loweprobing is less
nuous In-the-Ws section, we ption to evaluatefor realistic sc
y Nexus arountion application
at 0.3 of atically repeate
ollected data foWe collected d
ult the restroomd us to test the
emporal continthe temporal tions later. Weature Extractio
fier. Table 2 sumom data and 15
fying Spaces valuated the mo
in-the-wild.ent days, the mworth of data. possible comb
ed together. Wnations for 1,tively. We use
2. Sample data co# of re
943411
9. Measurements data.
hough models sweeps lose sin performan
s suggested thaer volume in s perceivable an
Wild Samplingperformed cone how well ou
cenarios, such nd the neck (n plays a 0.1 the max vo
ed this procedu
or about 2 houdata in differenms were all d
e model’s abilitnuity of each d
optimization e followed theon section andmmarizes the c
503 restroom da
odel’s ability tBecause we
model could be For each numb
binations of difWe had 5, 10, 2, 3, 4 and ed a 10-fold cr
ollected in-the-westrooms #of r
data s378 171 141 109 704
s of the model t
trained on IRsome performance to those aat it is possiblereal practice snd thus less ob
g and Evaluatintinuous in-theur classificationas when the ussee Figure 1)second sine w
olume. The aure in 5 second
urs per day on nt places each different. Thisty to classify nday’s data alloto the SVM
e procedure ded again used Scollected data ata).
to classify newcollected da
trained on 1, 2ber of days, wefferent days tha0, 10, 5 and
5 days’ worross validation
wild using Galaxrestroom samples
# of ndata s7884853904132093
trained on differ
Rs captured ances, they fter higher e to output so that the btrusive.
on e-wild data n approach ser wears a . The data
wave sweep application
ds interval.
5 different day and as s approach new spaces. owed us to classifier’s escribed in VM as the (4169 non-
w restroom ata over 5 2, 3, 4, or 5 e evaluated at could be 1 possible
rth of data method to
xy Nexus. non-restroom samples
rent number
32
ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA
etooliacg9b
ImIthththiswpth“p
Sebinina
Fd
evaluate each co accurately cl
ones, taking intikely to revis
averaged the combinations fget the mean a9, illustrate thabalanced precis
mproving PredIn this section, he model’s pehe-wild data thhem. The firssolated predic
second type ofwhen the userprediction doeshird type of
“sporadic” andperiod of time.
Spark and spoeliminated to because it can bn or out of thenterval). Howe
a smoothing alg
Figure 10. Evaluadifferent window
combination. Tlassify previouto consideratiosit some plac
10-fold crosfor a given numaverage valuesat our model sion and recall
diction Errors we describe th
erformance in hat we collectst type of errotion of one cl
f error is a “bor enters or ls not reflect th
error is whed multiple wro
oradic predictisome extent
be assumed thae restroom for ever, boundarygorithm.
ation results of tw size (top: alg. 1
This tested theusly seen spaceon the fact thates and restross validation mber of days’ . The results, scan predict n(weighted F-M
he prediction eclassifying the
ted and discussor is a “sparklass instead ofoundary” errorleaves the reshat transition imen the SVM ng results are
ion errors potby a smoot
at people normonly 5 second
y errors cannot
the two smoothin1; bottom: alg. 2)
e model’s abilies as well as net a person wouoms. Then, wresults of a
worth of data shown in Figu
new spaces wiMeasure > 0.92
errors that affee continuous is how to correk,” which is f the other. Th, which happestroom, but thmmediately. Th
predictions areturned over
tentially can bthing algorithm
mally do not jumds (the samplinbe addressed b
ng algorithms w)
ity ew uld we all to
ure ith
2).
ect in-ect an he ns he he
are r a
be m, mp ng by
Smooththe cura diffealgorithfurtherbuffermatchcorrectalso chnew onmatch predictthe cur
Smooththe samfrom rdifferetime iapproxThus, transitinumbeN) that
We tesizes ftemporleave oone ofmodel remainalgorithaverageach wFigure algorithcompathat th“see” while “performincreasincreaswindowreason boundamajoritComparecall o
CONCIn thismethodacoustimicropdemonoutputt
with
thing Algorithmrrent space typerent space typhm will hold tr predictions fr
N. If the majthis different
t all buffered phange the currene. If the maj
the current tions will be crrent space typ
thing Algorithmme as the first restroom (non-ently. People tyin a restroom
ximately only to minimize
ions into a reser of restroom pt must be in the
ested the two from 3 to 30ral continuity one day’s dataf the five com
using four daning day’s dathms to the
ged the evaluatwindow size. T
10. Windowhm was used
ared to the onehe cross-validaa portion of e“leaving one dmance will incse the variatioses, the overalw sizes, howev
is that largary errors durty voting straared to the alof restroom bu
LUSION s paper, we dd of detectingics of an enviphone on nstrate that IRsted improve t
m I. The first pe predicted bype is predictedthis new predirom the model jority of the pt space type, predictions as ent space type jority of the bspace type,
classified as the remains unch
m II. Smoothione, except th-restroom) to ypically spendm (people h5 minutes in potential mis
stroom space, predictions (> e buffer windo
smoothing al0. Smoothing in the test dat
a out strategy mbinations, weays’ worth of
ata. Then, we SVM predicti
tion results of tThe performancw size zero
d. The differene shown in Figation strategy each day’s da
day’s data out” crease when trons of restroomll performancever, hurt the peer window siring room typategy used inlgorithm I, algut sacrificed the
escribed an ing restrooms bironment with mobile phons captured aftethe accuracy
algorithm keey the SVM mod by the modeiction and keepfor a pre-set wpredictions in then our algothe new spacthat the user i
buffered predithen all the
he current spachanged.
ing algorithm Ihat it treats the non-restroom
d a small amouhave reported
public restrosclassificationsalgorithm II r1/3 * N instea
ow to correct an
lgorithms usinalgorithms r
ta. Therefore, for evaluation
e first trained f data, and tes
applied our ion results. Fthe five combice results are r
means no nce in the pegure 9 is due allows the cl
ata during traindoes not. We
rained with moms. As the wi improves at ferformance. Onizes might cape transition dn smoothing agorithm II impe precision of r
nfrastructure-inby actively prthe built in sp
nes. Our eer a sine waveof prediction
ps track of odel. When el, then the p receiving
window size the buffer
orithm will e type and is in to this ictions still e buffered ce type and
II is almost transitions (restroom)
unt of their spending
oms [12]). of actual
reduces the ad of > ½ * n error.
ng window require the we used a
n. For each the SVM
sted on the smoothing
Finally we inations for reported in smoothing
erformance to the fact lassifier to ning phase expect that ore days to indow size first. Large ne possible ause more due to the algorithms. proves the restroom.
ndependent robing the peaker and evaluations e sweep is compared
33
SESSION: CONTEXTUAL AWARENESS ON MOBILE DEVICES
against only using the environment sound without a sweep. Models can be developed on different phones to classify new restrooms with a weighted F-Measure of 0.92~0.98. Occupancy, the presence of sounds, and the volume levels of the sweep do not affect the model’s performance in significant ways. We discuss three types of errors that affect the prediction model and propose temporal smoothing algorithms to improve the prediction accuracy.
REFERENCES 1. Agnihotri, S., Rovet, J., Cameron, D., Rasmussen, C.,
Ryan, J. and Keightley, M. SenseCam as an everyday memory rehabilitation tool for youth with fetal alcohol spectrum disorder. In Proc. SenseCam 2013, ACM (2013), 86-87.
2. Beritelli, F. and Grasso, R. A Pattern Recognition System for Environmental Sound Classification based on MFCCs and Neural Networks. In Proc. ICSPCS 2008, IEEE (2008), 1-4.
3. Byrne, D., Doherty, A., Jones, G., F., Smeaton, A. Kumpulainen, S. and Järvelin. K. 2008. The SenseCam as a tool for task observation. In Proc. BCS-HCI '08, Vol. 2, 2008, 19-22.
4. Caprani, N., O'Connor, N., Gurrin, C. Experiencing SenseCam: a case study interview exploring seven years living with a wearable camera. In Proc. SenseCam 2013, ACM (2013), 52-59.
5. Chang, C., Lin, C. LibSvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2(3), 2011, 188-205.
6. Farina, A. Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique. 108th Convention of the Audio Engineering Society, Paris, France. February 2000, 19-22.
7. Hempstalk, K., Frank, E., Witten, I. One-Class Classification by Combining Density and Class Probability Estimation. In Proc. ECML PKDD 2008, 505-519.
8. Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K. SenseCam: A Retrospective Memory Aid. In Proc. Ubicomp 2006, ACM (2006), 177-193.
9. Keadle, S., Lyden, J., Hickey, A., Ray, E., Fowke, J., Freedson, P., Mattews, C. Validation of a previous day recall for measuring the location and purpose of active and sedentary behaviors compared to direct observation. Int J Behav Nutr Phys Act, 2014, 11:12.
10. Kerr, J., Marshall, S., Godbole, S., Chen, J., Legge, A., Doherty, A., Kelly, P., Oliver, M., Badland, H., Foster, C. Using the SenseCam to improve classifıcations of sedentary behavior in free-living settings. American Journal of Preventive Medicine, 44(3), 2013, 290–296.
11. Kelly, P., Marshall, S., Badland, H., Kerr, J., Oliver, M., Doherty, A., Foster, C. An ethical framework for automated, wearable cameras in health behavior
research. American Journal of Preventive Medicine, 44(3), 2013, 314-319.
12. Kientz, J.A., Choe, E.K., Truong, K.N., Texting from the Toilet: Mobile Computing Use and Acceptance in Private and Public Restrooms. Knowledge Media Design Institute, University of Toronto. Technical Report KMD-13-1. April 2013.
13. Kunze, K., Lukowicz, P. Symbolic object localization through active sampling of acceleration and sound signatures. In Proc. Ubicomp’07, ACM (2007),163-180.
14. Kuttruff, H. Room acoustics. CRC Press, 2009, 160-292.
15. Marcu, G., Dey, A., Kiesler, S. Parent-driven use of wearable cameras for autism support: a field study with families. In Proc. Ubicomp’12. ACM (2012), 401-410.
16. Peltomen, V., Tuomi, J., Klapuri, A., Huopaniemi, J. Sorsa, T. Computational auditory scene recognition. In Proc. ICASSP 2001, IEEE (2001), 1941-1944.
17. Perina, A., Jojic, N. In the sight of my wearable camera: Classifying my visual experience. Eprint arXiv: 1304. 7236, 04/2013.
18. Piccardi, L.; Noris, B.; Barbey, O.; Billard, A.; Schiavone, G.; Keller, F.; von Hofsten, C., "WearCam: A head mounted wireless camera for monitoring gaze attention and for the diagnosis of developmental disorders in young children,". In Proc. RO-MAN 2007, IEEE (2007), 594-598.
19. Pirsiavash, H. and Ramanan, D. Detecting activities of daily living in first-person camera views. In Proc. CVPR 2012, IEEE (2012), 2847-2854.
20. Quattoni, A. and Torralba, A. Recognizing indoor scenes. In Proc. CVPR’09, IEEE (2009), 413–420.
21. Rossi, M.; Feese, S.; Amft, O.; Braune, N.; Martis, S.; Tröster, G. AmbientSense: A real-time ambient sound recognition system for smartphones. In Proc. PerCom Workshop, IEEE (2013), 230-235.
22. Rossi, M., Seiter, J., Amft, O., Buchmeier, S. and Tröster, G. RoomSense: an indoor positioning system for smartphones using active sound probing. In Proc. AH 2013, ACM (2013), 89-95.
23. Rossing, T., Moore, R., Wheeler, P. The Science of Sound, 3rd Edition. Addison-Wesley. 2001.
24. Stan, G., Embrechts, J., and Archambeau, D. Comparison of different impulse response meansurement techniques. Journal of the Audio Engineering Society, 50(4), 2002, 249-262.
25. Templeman, R., Korayem, M., Crandall, D., Kapadia, A. PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces. In Proc. NDSS 2014.
26. Tian, Y., Yang, X., Yi, C. and Arditi, A. Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Machine Vision and Applications, 24(3), 2013, 521-535.
34
ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA