+ All Categories
Home > Documents > Mitigating the Effects of Delayed Virtual Agent Response ...

Mitigating the Effects of Delayed Virtual Agent Response ...

Date post: 21-Dec-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
9
Mitigating the Effects of Delayed Virtual Agent Response Time Using Conversational Fillers Halim-Antoine Boukaram American University of Science and Technology Beirut, Lebanon [email protected] Micheline Ziadee American University of Science and Technology Beirut, Lebanon [email protected] Majd Sakr Carnegie Mellon University Pennsylvania, Pittsburgh, USA [email protected] ABSTRACT Virtual agents increasingly rely on cloud-based services, which makes them vulnerable to unpredictable network latency and ser- vice response times. In this work, we evaluate the use of conversa- tional fillers to mitigate the impact of delay in system response time on users’ perception of a virtual agent. These fillers are uttered by the agent to keep the user engaged until the response is ready. We present the findings of a study run on the Mechanical Turk platform with 360 participants who interacted with a virtual agent. We tested two types of conversational fillers. The first type were generic ut- terances that simply asked the user to hold. The second adopted contextualized fillers that assume some semantic knowledge of the input and contain some of its elements. To test the generalizability of the different fillers, we ran two task-based experiments. The first task was to get a recipe and the second was to find a restaurant. Contextualized fillers positively affected participants’ rating of the agent’s response time but did not impact the agent’s likeability. CCS CONCEPTS Human-centered computing Empirical studies in HCI; User studies. KEYWORDS conversational fillers, context, system response time, conversational agents, human-agent interaction, user studies, response time delays ACM Reference Format: Halim-Antoine Boukaram, Micheline Ziadee, and Majd Sakr. 2021. Miti- gating the Effects of Delayed Virtual Agent Response Time Using Conver- sational Fillers. In Proceedings of the 9th Int’l Conference on Human-Agent Interaction (HAI ’21), November 9–11, 2021, Nagoya, Japan. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3472307.3484181 1 INTRODUCTION Virtual conversational agents have become more dependent on cloud-based operations for speech recognition, natural language understanding, information lookup, natural language generation and speech generation all of which are susceptible to variations in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. HAI ’21, November 9–11, 2021, Nagoya, Japan. © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8620-3/21/11. . . $15.00 https://doi.org/10.1145/3472307.3484181 latency. The reasons for using cloud-based operations include low local processing performance and high external provider accuracy. While such operations enrich agents’ functionality and usability, they present a disadvantage, which is an added variability in system response time (SRT) or the time elapsed between a user’s query and an agent’s response. When a user queries a virtual agent, system modules that function sequentially may experience latencies that accumulate and result in a delayed response to the user’s query. In a typical interaction between a user and a conversational agent, the user speaks to an agent, the agent performs automatic speech recognition (ASR) that produces a text input for the natural language understanding (NLU) module. The resulting structured output is used to look up the required information (Info). From the information, a text phrase is generated using a natural language generation (NLG) module. The text is inputted to a text-to-speech (TTS) engine to generate audio (possibly including word timing information). In agents that rely on cloud-based operations for improved accuracy, a possible source of uncertain delay in SRT is the cloud processing time and network latency (Figure 1). Figures 2, 3, and 4 show the cumulative distribution function of response times to various cloud-based services. Several thousand requests were made over a period of one week from two different geographic locations. The cloud service provider endpoints are automatically selected by the service’s software development kit. The figures show that there is significant variation in the response time given the source of the requests and that there are significant differences between the mean response time and the 99.9th percentile response time which is a well known issue ([19]). Figure 1: Typical user/agent interaction. The cloud based ASR, Info, and TTS services are vulnerable to unpredictable network latency and service response times. Delays in module performance and communication have a neg- ative impact on user experience and perception of an agent. Any
Transcript

Mitigating the Effects of Delayed Virtual Agent Response TimeUsing Conversational Fillers

Halim-Antoine BoukaramAmerican University of Science and

TechnologyBeirut, Lebanon

[email protected]

Micheline ZiadeeAmerican University of Science and

TechnologyBeirut, Lebanon

[email protected]

Majd SakrCarnegie Mellon University

Pennsylvania, Pittsburgh, [email protected]

ABSTRACTVirtual agents increasingly rely on cloud-based services, whichmakes them vulnerable to unpredictable network latency and ser-vice response times. In this work, we evaluate the use of conversa-tional fillers to mitigate the impact of delay in system response timeon users’ perception of a virtual agent. These fillers are uttered bythe agent to keep the user engaged until the response is ready. Wepresent the findings of a study run on the Mechanical Turk platformwith 360 participants who interacted with a virtual agent. We testedtwo types of conversational fillers. The first type were generic ut-terances that simply asked the user to hold. The second adoptedcontextualized fillers that assume some semantic knowledge of theinput and contain some of its elements. To test the generalizabilityof the different fillers, we ran two task-based experiments. The firsttask was to get a recipe and the second was to find a restaurant.Contextualized fillers positively affected participants’ rating of theagent’s response time but did not impact the agent’s likeability.

CCS CONCEPTS• Human-centered computing → Empirical studies in HCI;User studies.

KEYWORDSconversational fillers, context, system response time, conversationalagents, human-agent interaction, user studies, response time delaysACM Reference Format:Halim-Antoine Boukaram, Micheline Ziadee, and Majd Sakr. 2021. Miti-gating the Effects of Delayed Virtual Agent Response Time Using Conver-sational Fillers. In Proceedings of the 9th Int’l Conference on Human-AgentInteraction (HAI ’21), November 9–11, 2021, Nagoya, Japan. ACM, New York,NY, USA, 9 pages. https://doi.org/10.1145/3472307.3484181

1 INTRODUCTIONVirtual conversational agents have become more dependent oncloud-based operations for speech recognition, natural languageunderstanding, information lookup, natural language generationand speech generation all of which are susceptible to variations in

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’21, November 9–11, 2021, Nagoya, Japan.© 2021 Association for Computing Machinery.ACM ISBN 978-1-4503-8620-3/21/11. . . $15.00https://doi.org/10.1145/3472307.3484181

latency. The reasons for using cloud-based operations include lowlocal processing performance and high external provider accuracy.While such operations enrich agents’ functionality and usability,they present a disadvantage, which is an added variability in systemresponse time (SRT) or the time elapsed between a user’s query andan agent’s response. When a user queries a virtual agent, systemmodules that function sequentially may experience latencies thataccumulate and result in a delayed response to the user’s query.

In a typical interaction between a user and a conversationalagent, the user speaks to an agent, the agent performs automaticspeech recognition (ASR) that produces a text input for the naturallanguage understanding (NLU) module. The resulting structuredoutput is used to look up the required information (Info). From theinformation, a text phrase is generated using a natural languagegeneration (NLG) module. The text is inputted to a text-to-speech(TTS) engine to generate audio (possibly including word timinginformation). In agents that rely on cloud-based operations forimproved accuracy, a possible source of uncertain delay in SRT isthe cloud processing time and network latency (Figure 1). Figures2, 3, and 4 show the cumulative distribution function of responsetimes to various cloud-based services. Several thousand requestswere made over a period of one week from two different geographiclocations. The cloud service provider endpoints are automaticallyselected by the service’s software development kit. The figuresshow that there is significant variation in the response time giventhe source of the requests and that there are significant differencesbetween the mean response time and the 99.9th percentile responsetime which is a well known issue ([19]).

Figure 1: Typical user/agent interaction. The cloud basedASR, Info, and TTS services are vulnerable to unpredictablenetwork latency and service response times.

Delays in module performance and communication have a neg-ative impact on user experience and perception of an agent. Any

HAI ’21, November 9–11, 2021, Nagoya, Japan. Halim-Antoine Boukaram, Micheline Ziadee, and Majd Sakr

Figure 2: Example of delays and variability in response time for Automatic Speech Recognition (ASR). Requests were madefrom two different geographic locations (Geo Location) between the first and eighth of September, 2021.

Figure 3: Example of delays and variability in response time for weather information retrieval (Info). Requests were madefrom two different geographic locations (Geo Location) between the first and eighth of September, 2021. The 99.9th percentileof requests from Geo Location 2 (36.93 sec) is not shown.

Figure 4: Example of delays and variability in response time for Text-to-Speech (TTS). Requests were made from two differentgeographic locations (Geo Location) between the first and eighth of September, 2021.

deployed agent that utilizes services such as those shown in Figures 2, 3, and 4 should be able to mitigate these delays when they occur.

Mitigating the Effects of Delayed Virtual Agent Response Time Using Conversational Fillers HAI ’21, November 9–11, 2021, Nagoya, Japan.

Recent studies propose the use of conversational fillers to keep theuser engaged while the response is being prepared [6, 18]. In thisstudy, we evaluate the effectiveness of two types of conversationalfillers, uncontextualized and contextualized fillers, in mitigating theeffects of delays that exceed the acceptable threshold in an agent’sresponse time. Uncontextualized fillers are generic utterances thatsimply ask the user to hold until the response is ready. Contextu-alized fillers assume some semantic knowledge of the input queryand contain some elements of the input. In addition, we ran twotask-based experiments to test whether the effects of delays andconversational fillers differ depending on the task.

2 RELATEDWORKDelays and silences occur naturally in human-human verbal com-munication [4, 8] and they are deemed useful in conversationalsystems as they render the conversation more natural [1, 7, 13].Long delays, however, can have a negative effect on users’ experi-ence and their perception of a robot or an agent. In a study donewith a communication robot, Shiwa et al. [14] found that users’preferred response time is one second. Although users can get habit-uated to delays, they still preferred a one-second system responsetime [15]. In fact, studies show that delays negatively affect users’experience and their evaluation of an agent or a robot and of the in-teraction experience. For example, Yang et al. [20] found that delaysin system response in a remote controlled robot vehicle resultedin increased anger and decreased satisfaction for users. A studydone with a robot receptionist deployed in an open environmentfound that the robot’s ability to respond quickly was the main fac-tor affecting users’ perception of the interaction’s quality and thaterrors made by the robot were more tolerated than long responselatencies [11].

Given the negative effects of delays on human robot interaction(HRI) and human agent interaction (HAI), some studies have inves-tigated the use of incremental speech processing and generationas a way to minimize delays in SRT. For example, Tsai et al. [17]were able to save 1.7 seconds on average per user-agent interaction.Skantze et al. [16] were able to reduce their conversational system’sresponse time by 0.6 seconds. While this method is successful inreducing SRT, it does not address network related delays.

Other studies have suggested amitigation technique that consistsof using conversational fillers [5, 6]. These fillers are sounds, likeuh and uhm, used in everyday speech and that fulfill functionslike expressing approval or managing turn taking [2]. A recentstudy [9] suggests that simple time buying fillers negatively impactthe perceived intelligence and likeability of a robot. More complexfillers were used by Wigdor et al. [18] in an experiment wherechildren interacted with a robot that used “acknowledgement fillers”and “pensive fillers”. Acknowledgement fillers consisted of non-lexical utterances (like aha) accompanied by iconic gestures (suchas nodding) - pensive fillers used lexical utterances (like let methink) instead of non-lexical utterances. These conversational fillerswere found to be effective in mitigating the negative effects ofrobot response delays without decreasing perceptions of the robot’sintelligence. Gambino et al. [6] ran a study inwhich two informationsystems were rated. The first asked users to wait and remainedsilent until the response was ready and the second used several

conversational fillers, including generic utterances (like uh) andutterances that assumed that the semantics of the user query wereunderstood (like I’ve found a flight for you). It was found that usersrated the second system higher in terms of appropriate waitingtime even though the time to response was similar in both systems.

3 EXPERIMENTThe aim of this study is to test different strategies for mitigatingthe negative effects of delays in a virtual agent’s response timeto user queries. For this purpose, we designed an experiment inwhich participants interact with a virtual agent by asking about thesteps required to perform a task. We introduced two types of con-versational fillers uttered by the agent to test whether they couldmitigate the effects of long delays in response time. The study in-cluded two experiments consisting in two task-based conversationswith an agent. The first task was to complete a recipe and the sec-ond was to find a restaurant. Results of the two experiments werecompared to see if the perception of response time and the effectsof conversational fillers were consistent across different tasks.

3.1 DesignWe used a web-based female virtual agent that features a 3D face(Figure 5). The agent’s lips are synchronized with her speech whichis produced using the Festival text-to-speech engine. The speechoutput is accompanied by a speech bubble showing the text ofher utterances. While idle, the agent has random gaze and neckmovements along with other details like blinking, breathing, andbrow movements.

Figure 5: Our agent

Our study consisted of two between-subjects experiments inwhich participants interacted with a virtual agent online. In orderto exercise control over the flow of the interaction, in terms of

HAI ’21, November 9–11, 2021, Nagoya, Japan. Halim-Antoine Boukaram, Micheline Ziadee, and Majd Sakr

content and timing, we restricted the participants’ queries to theagent by using a predefined set of inputs that could be cached inthe browser at load time. The interaction scenario was using thevirtual agent as an assistant that helps users perform a task. Thetasks we chose were making pancakes and finding a restaurant. Wechose queries that reflect the different steps required for completingthese tasks and that, in a real world setting, would require networkdependent lookups. Figure 5 shows an example of a prompt wherethe user can choose from multiple options (for the recipe task).The interaction starts with the user selecting what they want to doand getting a response without any delay. After that, there wouldbe eight question-answer turns where the participant selects aquestion and waits for a response from the agent. By choosingeight turns, we wanted to make sure that the interaction wouldnot be too long so as to avoid boredom but at the same time longenough to allow participants to form an impression about the agent.For each of the eight turns, the user would be presented with oneto four query options (Figure 5).

The experiments aimed at evaluating the effectiveness of conver-sational fillers in mitigating delays in the agent’s response time andconsisted of three filler conditions: No Conversational Filler (NCF),Uncontextualized Conversational Filler (UCF), and ContextualizedConversational Filler (CCF). For the NCF condition, there was onlya long silence between the user’s query and the agent’s response.For the UCF condition we used generic fillers (for example, “Hold ona minute please’) employed by the agent in order to fill the silencebetween the participant’s query and the corresponding response.The fillers used in the CCF condition perform the same function butthey also indicate that the agent understands what the participantis asking. This is conveyed by echoing some content of the ques-tion in the filler, such as, “Let’s see how much you need” for thequery “I’d like to add some blueberries, how much do I need?”’ (forthe recipe task) and “Checking the opening hours” for the query“What are their opening hours?” (for the restaurant task). The twotypes of fillers (UCF and CCF) were used to test whether expressingan understanding of the participant’s question, as opposed to justindicating that the participant should wait, affects the perceptionof long delays in SRT and the perception of the agent (for example,the agent’s competence). In a prestudy, we found that if the agentused a filler at every turn, the users rated it more negatively withsome even commenting that they would prefer silence to a fillerat every turn. As a result, fillers (whether UCF or CCF) were onlyused in half of the turns, alternating between a turn with a delayand filler and a turn with no delay and no filler. To minimize theeffect of order, participants within the same condition were equallydistributed between interactions that start with a delay and inter-actions that start with no delay. To maintain consistency acrossthe three conditions, the NCF condition was also designed to haveturns alternate between a turn with delay and a turn with no delay(only 0.5 sec between user query and agent response).

Although with contextualized fillers, longer meaningful fillerscan be generated compared to uncontextualized fillers that simplyask the user to hold, in order to maintain the same SRT for all turnsand to make direct comparisons between the types of fillers, wemade sure that all fillers (UCF and CCF) were 1.8 seconds long (±0.1seconds). This is in contrast to [6] that featured fillers of differentlengths which resulted in one filler condition having longer silences

than the other filler condition. In our study, we wanted to eliminateany effects that might occur due to differences in the length ofsilences. This was done by having the same length of silence inthe UCF and CCF filler conditions. In the UCF and CCF conditions,there was a 2.5 second delay before the filler and 1.5 second delaybetween the filler and the response. We chose 2.5 seconds for thefirst delay so that the user is made aware that they are waiting inaccordance with the “two second rule” [10]. We chose a 1.5 seconddelay after the filler as it was seen to be the most appropriate in aprestudy. Figure 6 shows the different experimental conditions.

Figure 6: Experimental conditions

3.2 ProcedureWe used the Mechanical Turk platform1 to run our experiment.After accepting the Mechanical Turk task, participants were redi-rected to the study’s landing page which showed the consent form.Then they were asked to fill out demographic information includingage, gender, nationality and education level. The participants thenfilled out the Negative Attitude towards Robots Survey (NARS) [12],which measures people’s attitudes towards robots that may preventthem from interacting with conversational robots (a higher NARSscore means a more negative attitude towards robots and hence,less willingness to interact with them). Participants then filled aquestionnaire to gauge how experienced they are with robots andtechnology. Before proceeding to the interaction, participants werepresented with instructions to test their volume settings to ensurethat they could hear the agent clearly. After that, they were told thatthey would see an agent on the next page along with one or morequestions. Participants were instructed to pick one question to askthe agent when presented with more than one question. Partici-pants were randomly assigned to a conversational filler condition.

Once they were done with the interaction, participants wereasked to fill out a post-interaction questionnaire. The first question(Q1: To what extent was the response time of the agent acceptable?)asked the participants to evaluate the agent’s response time on a5-point Likert scale. The second, third, and fourth questions were5-point Likert scales evaluating the agent’s likeability (Q2: Howmuch do you agree with the following statement: I found the agentto be likeable), if the participants would be willing to spend timewith the agent in the future (Q3: How much do you agree withthe following statement: I would be willing to interact with/spendtime with the agent), and how willing they would be to use the1https://www.mturk.com/

Mitigating the Effects of Delayed Virtual Agent Response Time Using Conversational Fillers HAI ’21, November 9–11, 2021, Nagoya, Japan.

agent given its response time (Q4: Given the response time of theagent, how willing would you be to use it in the future?). We alsomeasured participants’ perception of the agent’s competence byusing the RoSAS [3] (5-point Likert) competence subscale (Q5)anticipating that delays in response time would have a negativeeffect on users’ perception of the agent’s competence and that theuse of conversational fillers could mitigate this effect. Finally, therewere two open ended questions where the participants could givetheir feedback on the study and on the agent.

Participants were then given a Mechanical Turk code to redeemtheir compensation. On average the whole procedure took 5.5 min-utes. Participants were not allowed to use mobiles and tablets toavoid potential problems with rendering the 3D face. Also, theycould not use the Safari browser due to an issue in how it handlesaudio playback permission.

3.3 ParticipantsWe used the Mechanical Turk platform to recruit participants whowere paid $1.25 for participation. In order to avoid any cross culturaldifferences, we restricted participation to users from the UnitedStates. We also used the Mechanical Turk “master” filter.

In all we had 384 participants of which 24 were discarded due tothe agent reporting to have run at a very low frame per-second (<10 frames per second) which could indicate that the agent didn’t runproperly. We were left with 360 participants: 185 males, 175 females,and 1 “other”. The mean age was 42. There were no significantdifferences in the age or gender distributions for each condition.Participants were equally divided among filler conditions: 60 (NCF),60 (UCF) and 60 (CCF). For each filler condition, each of the 60participants were equally divided between those who started witha delay and those who started with no delay to ensure there wasno effect of order.

The median time taken to complete the whole process was 5minutes 40 seconds for the recipe experiment (of which 1 minute40 seconds was spent interacting with the agent) and 5 minutes2 seconds for the restaurant experiment (of which 1 minute 17seconds was spent interacting with the agent).

3.4 HypothesesWe had four hypotheses:

H1: Compared to NCF, UCF will have a positive effect on par-ticipants’ rating of the agent’s response time and on participants’evaluation of the agent’s likeability, competence, and usability.

H2: Compared to NCF, CCF will have a positive effect on par-ticipants’ rating of the agent’s response time and on participants’evaluation of the agent’s likeability, competence, and usability.

H3: Compared to UCF, CCF will have a more positive effect onparticipants’ rating of the agent’s response time and their evaluationof the agent’ likeability, competence, and usability.

H4: The results will be generalizable across both tasks.

4 RESULTSWe now present the results from our experiments where we inves-tigate how the perception of the agent (timing of response, like-ability, willingness to spend time, competence, usability) was af-fected by the experimental conditions and participants’ attitudes

towards robots (NARS). We also compare the results of the twotasks/experiments. All statistical analyses were performed withinthe framework of a generalised linear regression model (GLM).When testing for the impact of the conversational filler on the mea-sured variables, Kruskal–Wallis and Wilcoxon rank sum tests wereused instead of the ANOVA and Tukey tests since the residualsonly passed the heteroscedasticity test and not the normality test.The Wilcoxon tests use the BH adjusted p-values. There was no sta-tistically significant impact of age, gender, education, or the orderof the long and short delays on the measured variables.

4.1 Impact of Conversational Filler onPerception of Agent’s Response Time

A significant effect of conversational filler was found on the par-ticipants’ perception of the agent’s response time (Q1) in bothexperiments. The agent’s response time was perceived to be moreacceptable when uncontextualized or contextualized fillers wereused compared to when no fillers were used. These results can beseen in Figures 7 and 8.

4.2 Impact of Conversational Filler on Agent’sLikeability and on Participants’Willingness to Interact with Agent

We found no significant effect of the type of conversational filler onratings of the agent’s likeability (Q2) or the participants’ willingnessto interact with the agent in either of the tasks (Q3). Instead, wefound a strong effect of NARS on the perception of the agent for thelikeability and the willingness to interact with the agent measuresregardless of filler condition. Participants with a higher NARS score(more negative attitudes towards robots) found the agent to be lesslikeable and were less willing to interact with it (Figures 9 and 10and Table 1)

Table 1: Statistical impact per point of NARS on Agent like-ability and Willingness to interact with agent

NARS impact t178 p-value

Recipe

Agent likeability -0.367 -3.52 0.0005488Willingness tointeract withagent

-0.537 -4.775 3.75e-06

Restaurant Agent likeability -0.27936 -2.882 0.00444

Willingness tointeract withagent

-0.6294 -5.731 4.18e-08

4.3 Impact of Conversational Filler on Agent’sPerceived Usability

There was a strong effect of conversational filler on the agent’sperceived usability given its response time (Q4). Participants in theCCF condition were more willing to use the agent in the futuregiven its response time than those in the NCF and UCF conditions(Figures 11 and 12).

HAI ’21, November 9–11, 2021, Nagoya, Japan. Halim-Antoine Boukaram, Micheline Ziadee, and Majd Sakr

Figure 7: Rating of agent’s response time acrossfiller conditions for the recipe task.

Figure 8: Rating of agent’s response time acrossfiller conditions for the restaurant task.

Figure 9: Impact of NARS on agent likeabilityand willingness to interact with the agent for therecipe task.

Figure 10: Impact of NARS on agent likeabilityand willingness to interact with the agent for therestaurant task.

Figure 11: Effect of conversational filler onwilling-ness to interact with the agent given its responsetime for the recipe task.

Figure 12: Effect of conversational filler onwilling-ness to interact with the agent given its responsetime for the restaurant task.

Mitigating the Effects of Delayed Virtual Agent Response Time Using Conversational Fillers HAI ’21, November 9–11, 2021, Nagoya, Japan.

Figure 13: Effect of NARS on willingness to use theagent given its response time for the recipe task.

Figure 14: Effect of NARS on willingness to use theagent given its response time for the restaurant task.

Figure 15: Rating of agent’s competence across filler con-ditions for the recipe task.

Figure 16: Rating of agent’s competence across filler con-ditions for the restaurant task.

We also found a significant effect of NARS on this measure.Participants with higher NARS scores (more negative attitudestowards robots) were less willing to use the agent in the future(Figures 13 and 14) (-0.431 per point of NARS, t176 = -3.591, p =0.000426 for the recipe task and -0.265 per point of NARS, t176 =-2.495, p = 0.0135 for the restaurant task).

4.4 Impact of Conversational Filler onPerceived Agent’s Competence

For the perception of the agent’s competence (Q5), we found thatonly using CCF improved over NCF and only in the restauranttopic. The improvement in the recipe topic was not statisticallysignificant (Figures 15 and 16).

4.5 Impact of Task on MeasuresNo significant impact of the task on the measured metrics wasdetected (Table 2).

Table 2: Results of tests for differences in measures betweentasks

Measure p-valueAgent response time 0.64Likeability 0.5Willingness to interact 0.96Competence 0.46Willingness to use 0.48

HAI ’21, November 9–11, 2021, Nagoya, Japan. Halim-Antoine Boukaram, Micheline Ziadee, and Majd Sakr

5 DISCUSSIONOur results partially support hypothesis 1. Using uncontextualizedfillers significantly improved the participants’ perception of theagent’s response time. However, there was no statistically signifi-cant improvement in their perception of the agent’s likeability andcompetence or willingness to use the agent.

Hypothesis 2 was partially supported by the results. When theagent used fillers that implied an understanding of the question,participants evaluated the agent’s response time as more acceptablethan the condition with no fillers. We also expected that conversa-tional fillers would positively affect participants’ evaluation of theagent’s likeability, competence and usability. This was only truefor the question on willingness to use the agent in the future (Q4).

Hypothesis 3 was also partially supported by our results. Weexpected contextualized conversational fillers would have a morepositive effect than uncontextualized fillers on the rating of theagent’s response time, likeability, usability and competence. Onlythe question on participants’ willingness to use the agent in thefuture (Q4) gave a statistically significant result.

Hypothesis 4 was supported. Our results show that the task doesnot have any significant impact on the metrics.

The advantage of un-contextualized conversational fillers overcontextualized ones is that they don’t require any semantic pro-cessing of the input like intent detection (such as flight reservationrequests) or entity extraction (such as dates, and times) which aretypically part of the NLU module. They can also be cached ahead oftime and triggered in a timely manner. However, when consideringa user’s willingness to use an agent, we found that contextualizedfillers were better able to mitigate the effects of long delays.

6 LIMITATIONSIn order to exclusively detect the impact of timing delays on the par-ticipants’ perceptions of the agent, we exercised maximum possiblecontrol over the flow of the study by restricting the participants’queries to a predefined set fromwhich the participants could choose.In this study, we only had one invocation of the delay mitigationtactic where we use a conversational filler to mitigate the impactof long delays between the participant’s queries and the agent’sresponse. Also, all conversational fillers were generated to be thesame duration. This was an intentional limitation of the scope soas to simplify the experiment but in reality, contextualized conver-sational fillers would tend to be longer than uncontextualized onessince there are elements of the input query that could be wordedinto contextualized fillers.

7 CONCLUSIONS AND FUTUREWORKAs conversational agents become more complex through, for exam-ple, added functionality, they also become more prone to networkand module latencies. Mitigating the effects of such latencies can beachieved through the use of conversational fillers. Specifically, wefound that by having the agent employ any type of conversationalfillers, participants found the agent’s response time to be moreacceptable when long delays were introduced between the partic-ipants’ queries and the agent’s response. Furthermore, we foundthat participants were more willing to use an agent that employs

contextualized fillers compared to an agent that employed the useof uncontextualized fillers.

In this study, we only had one invocation of the conversationalfiller delaying tactic - for each of the participants’ queries whichhad a long delay before the agent gave the response, we only useda single conversational filler instead of multiple interspaced fillers.We will run a study with multiple invocations to investigate theimpact of longer delays in system response. This might allow us todetect the point at which user frustration sets in and check if theuse of conversational fillers would be able to mitigate it.

The experimental setup allowed us to study the impact of conver-sational fillers over one eight-turn interaction spanning less thantwo minutes. Further research is needed to explore the impact ofusing conversational fillers in interactions over a longer period oftime (days or weeks) and the effects such interactions may have onusers’ perception of the response, the agent and their willingnessto use the agent.

In order to exercise control over the experiment, we had usersselect from a set of predefined queries. However, a more open inter-action where the system can answer more diverse questions and usespeech recognition for the users’ queries would yield better insightinto users’ expectations and perceptions of conversational systems.Moreover, cultural differences in preferred delay in agent responseand the choice of conversational fillers should be considered sinceturn taking conventions and the use of pauses in speech are culturedependent.

Finally, further work is needed to identify techniques of com-bining different types of fillers and assess what types of fillers aremore efficient under what conditions. A sophisticated hybrid modelcan potentially be more effective than a model that uses one typeof conversational filler.

REFERENCES[1] Jana Appel, Astrid von der Pütten, Nicole C Krämer, and Jonathan Gratch. 2012.

Does humanity matter? Analyzing the importance of social cues and perceivedagency of a computer system for the emergence of social reactions during human-computer interaction. Advances in Human-Computer Interaction 2012 (2012).

[2] Štefan Beňuš and Marián Trnka. 2014. Prosody, voice assimilation and conver-sational fillers. Manuscript. Institute of Informatics, Slovak Academy of Sciences(2014).

[3] Colleen M Carpinella, Alisa B Wyman, Michael A Perez, and Steven J Stroessner.2017. The robotic social attributes scale (rosas): Development and validation.In Proceedings of the 2017 ACM/IEEE International Conference on human-robotinteraction. ACM, 254–262.

[4] Herbert H Clark. 1996. Using language. Cambridge university press.[5] M Soledad López Gambino, Sina Zarrieß, and David Schlangen. 2017. Beyond

on-hold messages: Conversational time-buying in task-oriented dialogue. InProceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 241–246.

[6] Soledad López Gambino, Sina Zarrieß, and David Schlangen. 2019. Testing strate-gies for bridging time-to-content in spoken dialogue Systems. In 9th InternationalWorkshop on Spoken Dialogue System Technology. Springer, 103–109.

[7] Ulrich Gnewuch, Stefan Morana, Marc Thomas Philipp Adam, and AlexanderMaedche. 2018. Faster is Not Always Better: Understanding the Effect of DynamicResponse Delays in Human-Chatbot Interaction. In ECIS.

[8] Joseph Jaffe and Stanley Feldstein. 1970. Rhythms of dialogue. Vol. 8. AcademicPress.

[9] Yuin Jeong, Juho Lee, and Younah Kang. 2019. Exploring Effects of ConversationalFillers on User Perception of Conversational Agents. In Extended Abstracts of the2019 CHI Conference on Human Factors in Computing Systems. ACM, LBW2715.

[10] Robert B Miller. 1968. Response time in man-computer conversational transac-tions.. In AFIPS Fall Joint Computing Conference (1). 267–277.

[11] Andreea Niculescu, Betsy Van Dijk, Anton Nijholt, Dilip Kumar Limbu, Swee LanSee, and Alvin Hong Yee Wong. 2010. Socializing with Olivia, the youngestrobot receptionist outside the lab. In International Conference on Social Robotics.

Mitigating the Effects of Delayed Virtual Agent Response Time Using Conversational Fillers HAI ’21, November 9–11, 2021, Nagoya, Japan.

Springer, 50–62.[12] Tatsuya Nomura, Tomohiro Suzuki, Takayuki Kanda, and Kensuke Kato. 2006.

Measurement of negative attitudes toward robots. Interaction Studies 7, 3 (2006),437–454.

[13] Nicole Shechtman and Leonard M Horowitz. 2003. Media inequality in conversa-tion: how people behave differently when interacting with computers and people.In Proceedings of the SIGCHI conference on Human factors in computing systems.281–288.

[14] Toshiyuki Shiwa, Takayuki Kanda, Michita Imai, Hiroshi Ishiguro, and NorihiroHagita. 2008. How quickly should communication robots respond?. In Proceedingsof the 3rd ACM/IEEE international conference on Human robot interaction. ACM,153–160.

[15] Toshiyuki Shiwa, Takayuki Kanda, Michita Imai, Hiroshi Ishiguro, and NorihiroHagita. 2009. How quickly should a communication robot respond? Delayingstrategies and habituation effects. International Journal of Social Robotics 1, 2(2009), 141–155.

[16] Gabriel Skantze and Anna Hjalmarsson. 2013. Towards incremental speechgeneration in conversational systems. Computer Speech & Language 27, 1 (2013),

243–262.[17] Vivian Tsai, Timo Baumann, Florian Pecune, and Justine Cassell. 2019. Faster

responses are better responses: Introducing incrementality into sociable virtualpersonal assistants. In 9th International Workshop on Spoken Dialogue SystemTechnology. Springer, 111–118.

[18] Noel Wigdor, Joachim de Greeff, Rosemarijn Looije, and Mark A Neerincx. 2016.How to improve human-robot interactionwith Conversational Fillers. In 2016 25thIEEE International Symposium on Robot and Human Interactive Communication(RO-MAN). IEEE, 219–224.

[19] Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. 2013. Bobtail:Avoiding long tails in the cloud. In 10th {USENIX} Symposium on NetworkedSystems Design and Implementation ({NSDI} 13). 329–341.

[20] Euijung Yang and Michael C Dorneich. 2015. The effect of time delay on emotion,arousal, and satisfaction in human-robot interaction. In Proceedings of the humanfactors and ergonomics society annual meeting, Vol. 59. SAGE Publications SageCA: Los Angeles, CA, 443–447.


Recommended