Active Semantic Mapping for a Domestic Service Robot · Abstract Title: Active Semantic Mapping for...

Active Semantic Mapping for a Domestic Service Robot

Miguel Oliveira da Silva

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisors: Prof. Rodrigo Martins de Matos VenturaProf. Pedro Manuel Urbano de Almeida Lima

Examination Committee

Chairperson: Prof. Joao Fernando Cardoso Silva SequeiraSupervisor: Prof. Rodrigo Martins de Matos Ventura

Members of the Committee: Prof. Francisco Antonio Chaves Saraiva de Melo

October 2018

I declare that this document is an original work of my own authorship and that it fulfills all the

requirements of the Code of Conduct and Good Practices of the Universidade de Lisboa.

i

Acknowledgments

I would like to thank my parents for their friendship, encouragement and caring over all these years,

for always being there for me and for teaching me that success is a result of hard work.

I would like to thank Cristiana for the support and for being always available to help me in this work

with her writing and communication skills.

I would also like to acknowledge my dissertation supervisors Prof. Pedro Lima and Prof. Rodrigo

Ventura and Tiago Veiga for their insight, support and sharing of knowledge that has made this Thesis

possible.

I would also like to thank the SocRob team, for sharing a lot of knowledge about several subjects

related to robotics.

Last but not least, to all my friends and colleagues that I have met in the last 5 years in the University

that helped me to arrive at this point.

Thank you all.

ii

Abstract

Title: Active Semantic Mapping for a Domestic Service Robot

Abstract: Domestic service robots need to deal with complex and dynamic environments. In order to

interact with them, robots must keep an up to date representation of relevant information. In this work,

an architecture to solve that problem is presented, considering the uncertainty associated with that rep-

resentation, given the stochastic and not fully observable characteristics of a domestic environment.

The architecture needs to generate a semantic map of the domestic environment, maintain it up to date

and making use of that. A solution to the agent’s problem of driving its behavior to keep an updated

probabilistic representation of the world state and using that information to carry out some tasks is pre-

sented. The architecture presented is composed by two parts: a Knowledge Representation Engine that

keeps a global belief about the world state and is responsible for generating and controlling the second

part, the Decision Maker that is responsible for the agent’s behavior. The Knowledge Representation

Engine uses ProbLog to have a probabilistic world representation and to take advantage of the inference

process to generate the Decision Maker model and the world state. The Decision Maker is composed

by a set of POMDPs, where each one is responsible for having a partial representation of the global

knowledge of the world and for making decisions, if required, in order to reduce the uncertainty about

the world state and eventually reach a specific goal. The decision making problem is divided into several

problems to reduce the state space of each POMDP and to bypass the problem of finding the optimal

policy on a large POMDP, given the poor scalability of existing solution algorithms.

Keywords

ProbLog; Semantic Mapping; POMDP; Decision Making; Knowledge Representation.

iii

Resumo

Tıtulo: Mapeamento Semantico Activo para um Robo de Servico Domestico

Resumo: Os robos de servico domestico precisam de lidar com ambientes complexos e dinamicos.

Para poderem interagir com eles, precisam de manter uma representacao atualizada da informacao

que lhes e relevante sobre esses mesmos ambientes. Neste trabalho e apresentada uma arquitetura

para solucionar esse problema, considerando a incerteza associada a essa representacao, dadas as

caracterısticas de um ambiente domestico. A arquitetura precisa de gerar um mapa semantico, mante-

lo atualizado e fazer uso do mesmo. Para isso, e apresentada uma solucao para o problema do agente

em decidir o seu comportamento de forma a manter uma representacao probabilıstica atualizada do

estado do mundo. A arquitetura e composta por duas partes: um mecanismo de representacao de

conhecimento, que mantem uma crenca global sobre o estado do mundo e um tomador de decisoes,

que e responsavel pelo comportamento do agente. O mecanismo de representacao de conhecimento

usa o ProbLog, para ter uma representacao probabilıstica do mundo e tira partido do seu processo de

inferencia para gerar o modelo do tomador de decisoes e o proprio estado do mundo. Por sua vez, o

tomador de decisoes e composto por um conjunto de POMDPs, onde cada um e responsavel por uma

representacao parcial do conhecimento global do mundo e por tomar decisoes, se necessario, a fim de

atingir o objetivo do sistema. O problema de tomada de decisao e dividido em varios subproblemas

para reduzir o espaco de estado de cada um e contornar o problema de encontrar a polıtica ideal em

POMDPs de grande dimensao, dada a baixa escalabilidade dos algoritmos existentes.

Palavras Chave

ProbLog; Mapeamento Semantico; POMDP; Tomada de decisao; Representacao de conhecimento.

iv

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Document outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theoretical Background 4

2.1 Semantic mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Logic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Probabilistic Logic Programming and ProbLog . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Decision Making Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1 Markov Decision Processes (MDPs) . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 Partially Observable Markov Decision Processes (POMDPs) . . . . . . . . . . . . 12

2.4.2.A Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.2.B Point-Based Value Iteration (PBVI) . . . . . . . . . . . . . . . . . . . . . . 15

2.4.3 POMDP with Information Rewards (POMDP-IR) . . . . . . . . . . . . . . . . . . . 16

3 Related Work 18

4 Proposed method 22

4.1 Architecture Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.1 Knowledge Representation Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1.2 Decision Maker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.2.A POMDP selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.2.B Global knowledge representation update . . . . . . . . . . . . . . . . . . 26

4.2 Semantic Mapping Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Experimental Results 31

5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 World Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

v

5.3 POMDP-IR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4 Simulated Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4.1.A Static Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4.1.B Dynamic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.4.1.C Carrying out a task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.4.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4.2.A Objects changing position . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4.2.B Incorrect observations robustness . . . . . . . . . . . . . . . . . . . . . . 42

5.4.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.5 Real scenario experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.6 Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Conclusion 51

6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

A P (Xk|Z) derivation 56

B World Model Files Examples 57

B.1 Furniture Model example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

B.2 Objects Model example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

C Experiment tables and figures 58

C.1 Scenario 1 - Static Environment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

C.2 Scenario 1 - Dynamic Environment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

C.3 Scenario 2 - Objects Changing Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

C.4 Scenario 2 - Wrong Observations Robustness . . . . . . . . . . . . . . . . . . . . . . . . 65

vi

List of Figures

2.1 Example of a 2D map of an indoor environment. Figure adapted from [1] . . . . . . . . . . 5

2.2 MDP diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 POMDP diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Example of a Value Function with two states. Figure adapted from [2] . . . . . . . . . . . 15

3.1 The layered structure of the spatial representation in [3], showing the different levels of

abstraction of the spatial knowledge. Figure adapted from [3] . . . . . . . . . . . . . . . . 20

4.1 Scheme of the architecture operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 POMDP model example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.1 ISRoboNet@Home Testbed layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Progression of objects distributions entropy in a static model . . . . . . . . . . . . . . . . 36

5.3 Progression of objects distributions entropy in a dynamic model . . . . . . . . . . . . . . . 38

5.4 Progression of Hellinger distance until the robot reaches the goal for two different config-

urations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.5 Progression of Hellinger distance, with modifications in the location of the objects . . . . . 41

5.6 Testbed and robot used for the real scenario experiments . . . . . . . . . . . . . . . . . . 46

5.7 Hellinger distance over the time for each object, in the real behavior experiment . . . . . . 47

C.1 Hellinger distance considering wrong observations for objects in configuration 1 . . . . . . 66

C.2 Hellinger distance considering wrong observations for objects in configuration 2 . . . . . . 67

vii

List of Tables

5.1 POMDP reward values and observation probabilities ranges for the different actions type 33

5.2 Testbed scenarios used in the experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Location of the objects in the static model . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4 Expected rewards for POMDP selection steps in a static model example . . . . . . . . . . 37

5.5 Location of the objects in the dynamic model . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.6 Expected rewards for POMDP selection steps in a dynamic model example . . . . . . . . 38

5.7 Location of the objects for experiments with the goal of moving the cocacola to close to

the pringles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.8 Expected rewards for POMDP selection steps to carrying out a task . . . . . . . . . . . . 39

5.9 Location of the objects for each step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.10 Comparison between living room actions, before(left) and after(right) pringles changing

location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.11 Location of the objects for experiments with wrong observations . . . . . . . . . . . . . . 43

5.12 Mean Hellinger distance for different multiples of the expected false and negatives rates . 43

5.13 Mean Hellinger distance for 100 steps for different object configurations in Scenario 1.

KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining

Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.14 Mean Hellinger distance for 100 steps for different object configurations in Scenario 2.

KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,

BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand . . . . . . . . . . . . . . . . 44

5.15 Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 1.

KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining

Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.16 Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 2.

KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,

BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand . . . . . . . . . . . . . . . . 45

viii

5.17 Location of the objects in the real scenario experiment . . . . . . . . . . . . . . . . . . . . 46

5.18 Scalability analysis of different POMDP models used in the Decision Maker . . . . . . . . 49

C.1 State variables distributions and actions for a static model of the environment . . . . . . . 59

C.2 State variables distributions in POMDP selection steps, for a dynamic model of the envi-

ronment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

C.3 State variables distributions in POMDP selection steps, with changes in the location of

the objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

C.4 Expected rewards for POMDP selection steps in a experiment with objects changing location 64

C.5 Probabilities of observing each object in Scenario 2 . . . . . . . . . . . . . . . . . . . . . 65

ix

List of Algorithms

2.1 Value iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

x

Acronyms

COARSE Cognitive lAyered Representation of Spatial knowledgE

DTPDDL Decision-Theoretic Planning Domain Definition Language

ERL European Robotics League

LHM Least Herbrand Model

LP Logic Program

MDP Markov Decision Process

OWL-DL Web Ontology Language - Description Logic

PBVI Point-Based Value Iteration

POMDP Partial Observable Markov Decision Process

POMDP-IR Partial Observable Markov Decision Process - Information Rewards

PTLplan Temporal-Logic Progressive Planner

PWLC Piecewise Linear and Convex

ROS Robot Operating System

SLAM Simultaneous Localization and Mapping

TBM Task Benchmark

xi

1Introduction

1.1 Motivation

During the last decades, the research in mobile and service-oriented robots has been growing and

many different algorithms have been developed, allowing a robot to operate in realistic environments.

There are a lot of algorithms for mapping and navigation that are almost ready to be implemented in do-

mestic service robots and that have really good performance. For example, the problem of Simultaneous

Localization and Mapping (SLAM) has been studied during the last 30 years and there is a progress in

this problem [4]. Nowadays, the SLAM problem can be considered solved but most of the solutions

do not allow the robot to understand and interpret the respective environment, providing only a floor

plan or a geometric map of the environment and localizing the robot on it. When the goal is to have

autonomous and intelligent robots, this kind of information is important but it is not enough. For Hu-

mans, if they are in an environment and they want to do a task planning, their first approach will be

to identify different regions, objects or the presence of other agents. For example, in a house it could

be trying to identify different rooms, furniture, objects and residents. In that way, a domestic service

robot that is always in contact with a domestic environment and that has the goal to service a human,

should perceive the environment the way a human does because it has to interact and communicate

in human-compatible ways. For that purpose, the robot needs a different skill that no geometrical map

can provide it. As motivation, one consider the Task Benchmark (TBM) Getting to know my home of

the European Robotics League (ERL)1. In this task, also known as TBM1, the main goal of the robot is

acquiring knowledge about the environment, organize it and use it for task planning. The robot needs

to identify some changes in the environment, such as objects and furniture location, detect if a door is

open or closed, or even detect the presence of an unknown object on the floor, considered as trash. Af-

1https://www.eu-robotics.net/robotics_league/upload/documents-2018/ERL_Consumer_10092018.pdf.Accessed 1 Oct 2018

1

https://www.eu-robotics.net/robotics_league/upload/documents-2018/ERL_Consumer_10092018.pdf

terward, it needs to move the objects that change the location to the default position, using the acquired

knowledge.

1.2 Problem Formulation

The main goal of this thesis is to explore an efficient way for a domestic service robot can be able

to accurately represent its cognitive knowledge about the environment in a semantic map and keeping

it updated, in order to have a reduced uncertainty about the world state. Eventually, it can also use that

information to influence its behavior in order to carry out some tasks. For this propose, it is important

that the robot interacts with the surroundings, updating the knowledge about it, which will also help the

agent to perform such tasks. This kind of domestic environments are non-deterministic and may have

more agents interacting with it. Then, the robot is not able to get information about the full state of the

environment at each point in time. So, in order to the robot behave in that kind of environments, it needs

to make decisions under uncertainty. In this work it is assumed that:

• The domestic environment is composed by a number of rooms, objects and possible locations to

the objects (placements)

• Each object is always placed in one of the placements

• The robot has a perception capability to detect and recognize the objects of interest

Most of the times this kind of environments are complex, given the large number of rooms and

places and the dynamism. Then, having a system able to make decisions in a big environment that

keeps changing its layout and state, can be a difficult problem.

Summarizing, this work intends to answer the questions: How to create and keep updated a knowl-

edge representation of complex and dynamic environments? How to use that information to make

decisions and eventually carrying out some tasks?

1.3 Proposed Approach

In order to solve the problem presented in Chapter 1.2 it is important to consider a probabilistic

representation of the world, to be able to represent uncertainty. This approach has the goal of create

an architecture that is able to keep a global probabilistic representation of the world state but in order

to make decisions splits the problem of decision making into multiple subproblems, that are part of the

Decision Maker structure. Each one is responsible for partial representation of the global world and

make decisions inside in that subworld. In semantic mapping application each subworld corresponds

2

to each room. The decision maker mechanism used to make decisions in each subworld is a Partial

Observable Markov Decision Process (POMDP).

To coordinate and generate this Decision Maker structure, it is necessary to have a Knowledge

Representation Engine. It keeps an updated belief about the global world state, using probabilistic

programming language, and it uses that information to coordinate the Decision Maker structure. Using a

inference process under uncertainty, the Knowledge Representation Engine is also able to full generate

each POMDP model of the Decision Maker, given the world model that it receives as input.

1.4 Contributions

With this work, it is presented an architecture for knowledge representation and decision making for

systems that need to deal with uncertainty and complex environments. It is suggested an architecture

for application in semantic mapping problem, able to keep the probabilistic representation of objects

locations in a dynamic environment, deciding the agent behavior.

It is also presented the contribution to the project SocRob@Home of Institute for Systems and

Robotics from Instituto Superior Tecnico, using the framework developed to solve some problems pre-

sented in TBM1 of ERL, explained in Chapter 1.1.

1.5 Document outline

This thesis is organized as follows: Chapter 3 presents a short description of some related work

in semantic mapping and decision making problems. Chapter 2 introduces some of the concepts and

subjects that are used in the presented work, as well as the nomenclature and notation used throughout

this thesis. It is explained the principles of semantic mapping, probabilistic logic programming and

decision making under uncertainty. Chapter 4 presents the proposed approach to solve the problem

formulated in Chapter 1.2, explaining the architecture of the solution on the whole and how to adapt it for

the semantic mapping application. Chapter 5 demonstrates the architecture performance for simulated

experiments and a real scenario. Chapter 6 wraps up this work, discusses achievements and presents

suggestions for future work.

3

2Theoretical Background

2.1 Semantic mapping

There are many different ways and characteristics that can be used to describe the world and con-

struct a map. A really important kind of map is a geometrical representation of the environment. These

geometrical map methods are focused on representations of the spatial structure of the environment

and it has been an active area of research in robotics [5] and there are a lot of solutions for this problem.

According to the way that each geometrical mapping method uses to perceive the environment, it is

possible to classify it in metric maps, topological maps and topometric maps [1]. Metric maps describe

geometric characteristics of the environment, with a coordinates system representing the shape of the

objects and rooms but without having an interpretation of the shapes or classification (see Figure 2.1(a)).

Topological maps represent a map as a graph where the vertices represent the places, points or re-

gions and the edges are the connections or relation between them (see Figure 2.1(b)). Topometric

maps have the combination of metric and topological map, merging the advantages of both, the accu-

racy of metric maps with the scalability of topological maps (see Figure 2.1(c)). These kind of maps are

also known as hybrid metric-topological maps [6].

For a robot, these geometrical maps are important for navigation. They retain all those geometrical

features that the robot should be aware of, in order to avoid obstacles to find possible paths. However

they do not provide more qualitative type of information, such as the kind of obstacles and if they are fixed

or not. This type of qualitative information is important to perform simple tasks that require knowledge

about more complex features about the environment. These geometrical maps are navigation-oriented,

which means that all of them are useful only in the navigation context and fail to encode the semantics

of the environment, which is also a really important context. For example, in a domestic environment, a

robot needs to be endowed with the aptitude of understanding the different functionality of each room,

4

the difference between objects and walls and the relation between some objects with some rooms.

This kind of information is not provided in a geometrical map, and the solution for that is in semantic

mapping.

(a) Metric map (b) Topological map (c) Topometric map

Figure 2.1: Example of a 2D map of an indoor environment. Figure adapted from [1]

A semantic map has a qualitative description of the robot’s environment, allowing the robot to get an

augmented representation of what surrounds it, complementing the geometrical knowledge with seman-

tic knowledge from different sources. The word semantic, in the dictionary 1 is defined as, ”relating to

meaning in language or logic”, and for that reasoning it is expected that a semantic map represents the

meaning given by a qualitative description of what is mapped.

The semantic mapping contains assignments of some mapped features to classes, that represent

their meaning and characteristics. Furthermore, it is also possible to create a relation between these

classes and use the knowledge about them to give some reasoning skills to robots. In that way, the agent

has a qualitative description of the environment, that is closer with the human conception of the world

and a knowledge base used for reasoning. For example, a semantic map using a metric map augmented

with labels of objects and rooms that are of interest to the robot and which it should be aware, allows

it to accomplish tasks in a domestic environment like: ”Robot, bring me a cup”. A semantic map can

allow the robot to have a knowledge base to reasoning about the characteristics of a cup, where it can

be, and how to arrive there, of course, providing a reasoning engine to the robot. Basically, a semantic

map has the capacity to augment the navigations and task-planning skills and helps in the human-robot

interaction, since it provides a conception of the world close to the human being.

All of this semantic information, that a robot can get from the environment, grants robots the ability

to represent and reason its surrounding in a semantic map, and can also be organized and divided into

different modalities. The inference method used to reason about what is observed is crucial and there

is a lot of information that can be used from different sources, such as the geometry, general appear-1Online Oxford English Dictionary, en.oxforddictionaries.com. Accessed 14 Oct 2018

5

ance and shape of places, recognized objects, topology of the environment and human input. In many

methods, only single sources are used to infer some semantic information about a place, while some

other methods exploit multiple sources. There is also another important feature in semantic mapping

techniques which is the temporal coherence. It is useful due to the fact that the information acquired,

at a single point in time, is not enough to provide an evidence for reliable categorization of places, or

objects. The confidence degree is related to the time that information was acquired because most of the

environments are stochastic and dynamic.

Most of the times, a semantic map also has incorporated a topological map incorporated which can

retain both geometrical information of the places arrangement and conceptual information about them.

2.2 Logic programming

Logic programming is a method of expressing knowledge in a formal language and trying to solve

some problems running inference processes on that knowledge. The basic objects in logic programs

are variables, constants, functors and predicates [7, p. 40-41]. The variables are denoted by strings that

start with uppercase letters and the others are also denoted by strings but start with lower case letters.

A term is a variable, constant or a functor of arity n, that depends on n terms, i.e. f(t1, ..., tn). An

atom or atomic sentence is formed from a predicate of arity n that depends on n terms, i.e. p(t1, ..., tn).

A ground term is a term with no variables. A literal is an atom (positive literal) or a negated atom

(negative literal). A clause is a disjunction of literals and a unit clause is a clause with a single literal. A

definite clause is a disjunction of literals of which exactly one is positive and as the form h : −a1, . . . , anwhere h and the ai are atoms. A rule, also called normal clause, has the form: h : −l1, . . . , ln and is

a universally quantified expression form that means l1 ∧ . . . ∧ ln ⇒ h where l1, . . . , ln (the body of the

rule) are literals and h (the head of the rule) is an atom. A rule that does not have a body is a fact

and represent an unconditional truth. An important concept in logic programming is also the Herbrand

base [8, p. 351] that is the set of ground atoms, which can be constructed using the predicates, functors

and constants in the theory. Herbrand interpretations are subsets of Herbrand base.

A Herbrand interpretation can be considered as a model of a clause, (which corresponds to a world

that satisfies that clause) if for every substitution θ in the body and in the head of the clause the resulting

body is in the interpretation as well. A substitution θ is a finite set of pairs V1/t1, V2/t2, ..., Vn/tn,

where Vi are different variables and ti are the terms that will replace the respective variable. A Herbrand

interpretation is a model of a logic program if it is a model of all clauses in the theory.

For negation-free Logic Programs (LPs), or definite clause programs, the model-theoretic semantics

is given by the smallest Herbrand model, also known as Least Herbrand Model (LHM), and it is assured

that it exists and it is unique. The main goal of a LP system is to check if a given atom is true in the LHM.

6

2.3 Probabilistic Logic Programming and ProbLog

The introduction of probabilities in logic programming allows it to encode this inherent uncertainty that

are present in real-life situations. Probabilistic logic programs are logic programs in which some of the

facts are annotated with probabilities, supporting probabilistic inference and learning. In this Chapter, it

will be presented a probabilistic logic programming language called ProbLog.

A ProbLog program has a set of ground probabilistic facts and a set of rules and non probabilistic

facts [9] . The last one is the same as in logic programming. A ground probabilistic fact is a fact f

with no variables and probability p, and can be written as p::f. It is also possible to write an inten-

tional probabilistic fact, which is a syntactic sugar for compactly specifying an entire set of ground

probabilistic facts. In Example 2.1, the statement 0.5::male(V):-vertebrate(V) is an intentional prob-

abilistic fact and is a compact way to write the ground probabilistic fact 0.5 :: male(v1) : −vertebrate(v1)

and 0.5 :: male(v2) : −vertebrate(v2). ProbLog also allows the use of annotated disjunctions [10], like

the sentence 0.15::bird(V); 0.09::mammal(V); 0.5::fish(V) :− vertebrate(V), with the struc-

ture p1 :: h1 ; ... ; pn :: hn : − body.

Example 2.1.

vertebrate(v1).

vertebrate(v2).

0.5 :: male(V) : −vertebrate(V).

The different atoms in a ProbLog program can be divided into probabilist atoms and derived atoms.

The first ones are the atoms that appear in a ground probabilistic fact and the second ones are the

atoms that appear in the head of some rule in a logic program. It is also important to refer that all the

variables in the head of a rule should also appear in a positive literal in the body of the rule.

The ProbLog allows to make inference in probabilistic logic systems, and can be considered different

inference tasks [7] [9] :

• SUCC(q), where q is a ground query. The task computes the success probability of the query q

• MARG(Q|e), where Q is the set of ground atoms of interest (query atoms). The task is to compute

the marginal probability distribution of each query atom q ∈ Q given the evidence e.

• MAP (Q|e) task is to find the most likely truth-assignment q to the atoms in Q given the evidence e.

• MPE(U |e), where U is the set of all atoms in the Herbrand base that do not occur in e (unobserved

atoms). Thes task is to find the most likely world of all the unobserved atoms given the evidence.

7

2.4 Decision Making Under Uncertainty

An agent, like a robot and a Human, act based on observations taken from the environment and

there is a cycle between the agent and the world. Over time, the agent receives an observation of the

world, chooses an action through some decision-making process, applies that action on the world and

that action effects the world, which forms a cycle. For an intelligent agent, the decision-making process

of choosing an action has the goal to achieving some objectives over time, given the set of observations

and knowledge about the environment.

Most of the agents, and clearly the robots, need to deal with uncertainty during this cycle, due to

the fact that the environment is uncertain, or in other words, it is not fully-observable, non-deterministic

or both [8, p. 42-45]. An environment is considered as fully-observable if the agent sensors provide

it with information about the full state at each point in time, or in other words, if the agent has access

to all the relevant aspects about the environment to decide which action to take. For that reason, an

environment can be partially observable if part of the state is missing from the observation data, for

example, occlusion of a small object by a bigger one, or if observations are noisy and inaccurate due to

the sensors used. In a nondeterministic environment, the next state is not fully determined by the actual

state and by the agent’s action. For that reason, the actions are characterized by the possible outcomes

and if we are characterizing and quantifying these possible outcomes, using probabilities, we consider

a stochastic environment.

Thus, an agent dealing with an uncertain environment may never know for certain in what state it’s

in, considering uncertainty in perception, or where it will end up after doing a given action, considering

uncertainty in action effects. The first one is related with not fully-observable environments and the

second one, with the stochastic environments.

When an agent is dealing with uncertainty, it should be able to compare the plausibility of different

statements, even if it is not sure about them. For example, even if a robot is not sure about an object’s

color, it should be able to represent that the belief in a color is stronger, weaker or more equal than the

belief in another color. For that reason, the agent may represent the degree of belief in some statement

using some tool and the main one is the probability theory.

For an agent, like a domestic service robot, the decision problem of which action should the robot take

at each time is a sequential decision problem, in which the agent is not interested in a single decision.

The agent is interested in taking a series of decisions to solve a problem, as search and planning

problems, for example. An algorithm to make a sequential decision in stochastic environments under

the assumption that the model is known and the environment is fully observable, this is presented in

Chapter 2.4.1, with Markov Decision Process (MDP). Furthermore, Chapter 2.4.2, presents a process

where both types of uncertainty, in action effects and perception, are considered. This model is called

Partially Observable Markov Decision Process (POMDP).

8

2.4.1 Markov Decision Processes (MDPs)

Considering that the agent has perfect perception abilities about the environment, which means that

the state of the world is fully observable at any point in time, a Markov Decision Process (MDP) assumes

that there is uncertainty about the effects of the agent’s actions. In an MDP, at each time t, the agent

chooses the action at based on observing state st and receives a reward rt for taking that action in that

state.

An MDP can be described as a tuple 〈S,A, T,R〉 [2], where S is a set of states of the world, A is a

set of actions, T is the probabilistic state transition function and R is the reward function. These sets

can be considered finite or infinite, but in this Chapter, it will be discussed only the finite case.

The state transition function T (s, a, s′) represents the transition probability of ending up in state s′,

given that starts in state s and executes action a. It can also be written as Pr (s′|a, s). R(s, a) represents

the expected reward received for taking action a from state s. The reward function depends only on

the current state and action. In this model, it is also assumed that the transition depends just on the

previous state and on the action taken, not considering any state or action from the previous history of

earlier states and actions. An MDP can be represented as in Figure 2.2. The assumption associated

with this property is the Markov assumption - the state at time t only depends on the state and action

taken at time t− 1.

Figure 2.2: MDP diagram.

It is also important to define how the solution to this problem looks like because it is already known

that any fixed action sequence will not solve the problem. The uncertainty about action effects can make

the agent end up in a state different to the goal. For that reason, it is important to define a policy denoted

by π, whose result of π(s) is the action specified by the policy π for state s. A policy is a description of the

behavior of an agent, specifying what action the agent should take for any state that it might reach. And

Two kinds of policies are considered: stationary and nonstationary, [2]. A stationary policy considers

that the choice of an action depends only on the state, independently of the time step. A nonstationary

policy takes into account the time, and it is represented with a subscript t.

9

What is desired with a sequential decision process, is that the agent acts to get the best performance.

For an MDP, this performance is represented by an additive utility function of the long-term rewards.

For that reason, the quality of a policy is therefore measured by the corresponding expected utility, that

for MDPs is often referred as the value function Vπ. An optimal policy is a policy that yields the highest

value function and it is denoted by π∗. In order to find the optimal policy, it is important to define if there

is a finite horizon or an infinite horizon for decision making and for finding the optimal policy.

When dealing with a finite horizon, the agent should act to maximize a finite horizon of K steps,

maximizing the value function given by the sum of rewards of the next K steps, presented in Equa-

tion (2.1).

E

[K−1∑t=0

rt

](2.1)

In an infinite horizon the number of steps is unbounded and the sum of the rewards can become

infinite. One way to solve the problem of defining the value function in the infinite horizon case is using

a discounted model, with a discount factor γ between 0 and 1 and the value function of Equation (2.2).

E

[ ∞∑t=0

γtrt

](2.2)

For the value function of Equation (2.2), the rewards in the current time are worth more than rewards

in the future because they have more value to the agent. If γ is close to 0, rewards in the future are

considered insignificant and the closer to 1 the discount factor is, the more the effect future rewards

will have on current decision making. The discount factor ensures that the value function is finite if the

rewards are also finite.

In the finite horizon model, the optimal policy is typically nonstationary because with a finite horizon

the optimal action for a given state depends on time. For example, if the agent has a goal and it has

a short horizon, it must head directly for it, perhaps in the bigger horizon, the agent may act avoiding

more uncertainty in the actions’ result. The way that the agent chooses its actions when it has a long

journey ahead is generally different than when it decides which action to take in the last step. One can

use dynamic programming to evaluate the utility of a policy π for t steps. Thus, in the finite horizon model

the value function Vπ,t(s) is the expected utility from starting in state s and executing the policy π for t

steps, given by the recursive Equation (2.3).

Vπ,t(s) = R (s, πt (s)) + γ∑s′∈S

T (s, πt (s) , s′)Vπ,t−1 (s′) (2.3)

The step t = 1 is the last step and the respective value Vπ,1(s) = R (s, π1 (s)) is just the expected

reward for taking action of policy π1. For a generic step t, should be also added the discounted value

of the the remaining t− 1 steps, considering all the possible states s′ under the policy π and respective

10

likelihood T (s, πt (s) , s′).

In the infinite horizon model, the agent has always the same time remaining. For that reason makes

no sense to change action strategy depending on time, which is why the optimal policy is stationary and

the value function Vπ(s) is given by the unique simultaneous solution of the set of Equations (2.4).

Vπ(s) = R (s, π (s)) + γ∑s′∈S

T (s, π (s) , s′)Vπ (s′) for all s ∈ S (2.4)

This process of computing the value function from executing a policy is known as policy evaluation.

To find optimal policies for MDPs, it can be used several methods but in this Chapter, it will be

presented the value iteration method, because it will also serve as the basis for finding policies in

POMDPs in Chapter 2.4.2.

To get the optimal policy π∗ for the finite-horizon, it is only needed a complete sequence of optimal

value functions and π∗ is defined by Equation (2.5).

π∗t (s) = argmaxa

[R (s, a) + γ

∑s′∈S

T (s, a, s′)Vπ∗t−1,t−1 (s′)

](2.5)

Considering that Vπ∗t−1,t−1 is the optimal value function for the step t − 1 and that it is derived from

policy π∗t−1 and value function Vπ∗t−2,t−2, this is a recursive function until the last step t = 1 when the

optimal policy π∗1 is given by Equation (2.6).

π∗1(s) = argmaxa

R (s, a) (2.6)

In infinite horizon discounted models, computing the optimal stationary policy is independent of the

starting state. It can be proven [8, p. 654-656] that the value of an optimal policy satisfies the Bellman

Equation (2.7), given that the value iteration Algorithm 2.1 eventually converges to a unique set of

solutions of the Bellman equations for all s ∈ S.

Vπ∗(s) = maxa

[R (s, a) + γ

∑s′∈S

T (s, a, s′)Vπ∗ (s′)

](2.7)

The initialization of V0(s) may not be 0 if there is a guess of the optimal value function. In that

case, the guessed value is used in an attempt to speed up the convergence. But independently of the

initialization if |V0(s)| < ∞ value iteration can be proven to converge [8, p. 654-656]. The algorithm

terminates when the maximum difference between two successive value functions is less than some ε,

that can be chosen in order to define the policy loss. The policy loss is the most the agent can lose by

executing near-optimal policy extracted from V ′π∗ instead of the optimal policy.

11

Algorithm 2.1: Value iterationt←− 0V0(s)←− 0 for all s ∈ S

repeatt←− t+ 1forall s ∈ S do

Vt(s)←− maxa

[R (s, a) + γ

∑s′∈S T (s, a, s′)Vt−1 (s′)

]until |Vt(s)− Vt−1(s)| < ε for all s ∈ SV ′π∗(s)←− Vt(s)

Once V ′π∗ is obtained, the near-optimal policy can be easily extracted using Equation (2.8).

π∗(s) = argmaxa

[R (s, a) + γ

∑s′∈S

T (s, a, s′)Vπ∗ (s′)

](2.8)

2.4.2 Partially Observable Markov Decision Processes (POMDPs)

In the previous Chapter 2.4.1 the environment was considered fully observable and with that as-

sumption the agent always knows in which state it is in. But, most of the times, because of sensor

limitations or noise, the state might not be perfectly observable and for that reason the Partially Observ-

able Markov Decision Processes (POMDPs) take into account the state uncertainty. In POMDPs there

is also a probabilistic model of the chance to make a particular observation given the current state.

A POMDP can be described as a tuple 〈S,A, T,R,Ω, O〉 [2], where S,A, T and R are the same as

described for MDPs in Chapter 2.4.1, Ω is a finite set of observations that the agent can experience and

O is the observation function that gives a probability distribution over possible observations, given an

action and resulting state. So, O(s′, a, o) can be defined as the probability of making an observation

o, given that the agent took an action a and end up in state s′, then is Pr (o|s′, a). A POMDP can be

represented in a diagram, has presented in Figure 2.3.

When considering optimal decision making in POMDP, a direct mapping of observations to actions

is not sufficient. The agent should have a memory about its past history, so it can choose actions

successfully in partially observable environments. For that reason, the agent can keep an internal belief

state b, that summarizes all information about its past. The belief b that will be used is a probability

distribution over all the states of the set S because it is a sufficient statistic of the history, which means

that extra data about its past actions or observations would not supply any further information about the

current state [11, p. 392]. The agent is responsible for updating this belief based on the previous belief

state, the last action and the current observation. Considering b(s′) as the probability of the belief state

b assigned to the state s′. It is possible to compute boa(s′), that represents the new degree of belief after

12

Figure 2.3: POMDP diagram

doing action a and get observation o, in state s′, by Equation (2.9).

boa(s′) = Pr(s′|o, a, b) =O(s′, a, o)

∑s∈S T (s, a, s′) b(s)

Pr(o|a, b)(2.9)

The complete derivation of Equation (2.9) can be founded in [2, p. 107]. After computing Equa-

tion (2.9) for all s ∈ S, it is possible to obtain the new belief state boa. This process can be labeled as the

update belief function UB(b, a, o) and has the new belief state boa as its output.

A POMDP can be considered as an MDP in which the states are belief states, called belief-state MDP.

The set of belief states of this kind of MDP, can be considered as B and it comprises the state space.

The set of actions A remains the same and the state transition function τ(b, a, b′) is now defined as

Equation (2.10).

τ(b, a, b′) = Pr(b′|a, b)

=∑o∈Ω

Pr(b′|a, b, o) Pr(o|a, b)

=∑o∈Ω

Pr(b′|a, b, o)∑s′∈S

Pr(o|s′, a, b) Pr(s′|a, b)

=∑o∈Ω


Pr(o|s′, a, b)∑s∈S

Pr(s′|a, b, s) Pr(s|a, b)

=∑o∈Ω


O(s′, a, o)∑s∈S

T (s, a, s′)b(s)

(2.10)

Where Pr(b′|a, b, o) is equal to 1 if b′ = boa and 0 otherwise. The reward function for belief states can

13

be written as ρ(b, a) and is given by Equation (2.11).

ρ(b, a) =∑s∈S

b(s)R(s, a) (2.11)

The belief-state MDP has a continuous belief space since it is the space of all distributions over the

finite state space and for that reason solving a belief-state is challenging. But if it is possible to get the

optimal policy π∗(b) for it, it can be shown that the policy is also the optimal one for the original POMDP.

The problem is that the method to solve MDPs presented in Chapter 2.4.1 is not directly applicable

to this belief-state MDP given its continuity over the belief state. A possible solution to the problem is

presented in Chapter 2.4.2.B.

2.4.2.A Value Function

The quality of a policy π(b) in belief-state MDP is measured by the value function V π(b), similarly to

what is done for MDPs. The main goal is to maximize the expected rewards for each belief, following the

optimal policy π∗ that is defined by the optimal value function V ∗. The optimal value function satisfies

the Bellman equation V ∗ = HV ∗:

V ∗(b) = maxa

[ρ (b, a) + γ

∑b′∈B

τ (b, a, b′)V ∗ (b′)

]

= maxa

[ρ (b, a) + γ

∑o∈Ω

p (o|a, b)V ∗ (boa)

].

(2.12)

It has been proved that the value function V (b) presents a particular structure, given the geometric

characteristics of its form. The value function for finite-horizon POMDPs are Piecewise Linear and

Convex (PWLC) and it can be represented by a set of piecewise linear functions over the belief space:

Vt =αit

, with i = 1, . . . , |Vt| , (2.13)

where αit is a vector, with dimension equal to the number of states. It represents a hyperplane and

it defines the value function over a bounded region of the belief. Each α-vector is associated with an

action. Then, Vt can be defined as the inner product presented in Equation (2.14).

Vt(b) = maxαi

t∈Vt

αit · b (2.14)

Given these characteristics of the value function Vt, the belief space can be divided into regions. The

regions are defined by the upper surface of the α-vectors because the maximizing vector dominates the

set of vectors for that particular region, given the goal of maximizing the Value function.

14

Figure 2.4 is an example of a value function for a two-state problem, represented as a set of α-

vectors.

Figure 2.4: Example of a Value Function with two states. Figure adapted from [2]

2.4.2.B Point-Based Value Iteration (PBVI)

The limited scalability of value iteration algorithms to solve POMDPs is motivated by the dimension of

the problem and leads to several approximations to POMDP solving. In a problem with n states, POMDP

planners must reason about belief states in a continuous space with dimension n-1. For that purpose,

discretize the belief space and selecting a small set of representative belief points B is the proposed

approach of Point-Based Value Iteration (PBVI) algorithm, presented in [12].

Point-based methods, using the approximations presented, can derive Equation (2.15) to compute

the value function at each particular belief b.

Vt+1(b) = maxa

[b · αa0 + γb ·

∑o∈Ω

arg maxgia,oi

b · gia,o

]= max

gba

b · gba,(2.15)

where,

gia,o(s) =∑s′

Pr(o|s′, a)Pr(s′|s, a)αit(s′) (2.16)

and

gba = αa0 + γ∑o∈Ω

arg maxgia,oi

b · gia,o (2.17)

The backup operator which selects the maximizing vector for the belief b becomes:

backup(b) = arg maxgbaa∈A

b · gba (2.18)

15

The value function, at each step, is the union of all the vectors resulting from previous backup of all

the belief points in the set B.

There are several PBVI algorithms in the literature. In [13] one, called Perseus is presented. This

randomized PBVI algorithm performs approximate value backup stages, ensuring that in each backup

stage the value of each point in the belief set is improved, but with the important characteristic that a

single backup may improve the value of not just the respective belief point. Perseus backs up only a

(randomly selected) subset of points in the belief set that is sufficient for improving the value of every

belief point in B.

2.4.3 POMDP with Information Rewards (POMDP-IR)

In an active perception task, the goal is typically to increase the available information by reducing

the uncertainty regarding the state. It means that the agent, considering the effects of its actions, must

decide what actions it should take to efficiently reduce the uncertainty about the state variables. A

typical POMDP is a possible decision-theoretic model for active perception. However, usually reducing

the uncertainty about the state it is not expressed as the goal but is the consequence in order to achieve

it. For example, if the goal is to pick an object, the agent may take actions that reduce its uncertainty

about the object’s location. However, rewarding an agent for reaching a certain level of belief may not be

easy to be done in these typical POMDP models. For that purpose in [14] a Partial Observable Markov

Decision Process - Information Rewards (POMDP-IR) is presented. In POMDP-IR, a reward information

gain is given, keeping the characteristic of a classical POMDP, having value functions PWLC.

POMDP-IR introduces the addition of a new set of “information-reward” actions (prediction actions)

to the problem definition. Considering that the state space can be factored as presented in Equa-

tion (2.19).

S = X1 ×X2 × ...×Xk × ...×XK (2.19)

At each time step, the agent simultaneously chooses a normal action an and a prediction action ak for

each particular state variable Xk that the agent wants to have low uncertainty. Prediction actions have

an action space AK = commit, null and they have no effect on states or observations, but may affect

rewards. The reward function in the POMDP-IR is equal to the sum of the original reward function R of

the POMDP and a reward Rk for each Xk, given by Equation (2.20).

Rk (b, ak) =

P (Xk = xk) · rcorrecti −(1− P (Xk = xk)

)· rincorrecti if ak = commit

0 if ak = null(2.20)

At every time step, the agent can choose to either execute only a normal action, choosing ak = null,

or in addition also receive a reward for its belief over Xk, choosing ak = commit. Thus, the expected

16

reward of choosing commit is only higher than the null action when Rk(b, ak) > 0, which implies

P (Xk = xk) >rincorrecti

rcorrecti + rincorrecti

. (2.21)

If rewarding the agent for having a degree of belief, P (Xk = xk), of at least β, is desired, then it is

important to set the relation between rcorrecti and rincorrecti in order to the expected reward of choosing

commit being higher than the null action, when P (Xk = xk) > β. The precise values of rcorrecti and

rincorrecti depend on the model and the original reward function R.

17

3Related Work

Some work has been developed in the research area in order to find a way for a robot to get better

conception of the environment that surrounds it. This conception is not only related with the geomet-

rical characteristics of the environment, but also related with the semantic information. The semantic

information is related to some cognitive interpretation capacities that the human has and that with the

semantic mapping methods has been applyed to the robots. The work in [1], presents an overview

about what has been done in semantic mapping, for different types of environments and different type of

applications, as it is explained in more detail in Chapter 2.1. This work will focus on semantic mapping

for domestic indoor environments. In [15], the authors present a layered model of the world at different

levels of abstraction, metric line map, navigation graph, topological map and conceptual map. The lower

levels are derived from sensor input and are used to robot localization and navigation, and the higher

levels provide a human-like categorization of the world. The metric map is obtained by SLAM. The nav-

igation graph establishes a model of free space and its connectivity, adding some semantic information

on this level, storing the objects detected and using label history, assigning the navigation nodes to one

of the classes: room, corridor, or doorway. The topological map divides the nodes in the navigation

graph into groups that are separated by a doorway node. In the last level, there is a conceptual map

and conceptual knowledge is encoded in Web Ontology Language - Description Logic (OWL-DL). With a

description-logic based reasoning software, based on the knowledge representation it is possible to infer

new knowledge about the world that is neither perceived nor given verbally. However, this work does

not provide decision making capabilities needed when performing tasks, given the knowledge acquired

about the environment.

In [16] another approach for semantic mapping representation and the way to use that information

in the performance of navigation tasks is introduced. This approach uses two parallel hierarchical rep-

resentations of the space, a spatial representation and a semantic one. The first one is related to the

18

sensor-based representations of the environment and the second one has the symbolic representation

of the space. The link between both uses the concept of anchoring. In each of the representations the

hierarchy is related to the level of detail of the information and the level of abstraction is bigger in higher

levels. Making use of the anchoring connection between both representations, two kinds of inference

were developed. Based on recognized objects the inference system is able to classify, semantically, the

room where the object was recognized and based on semantic information about rooms the inference

process deducts the probable location of a non-previously seen object. The authors validate their ap-

proach, testing the learned model by executing navigation commands. Some of the authors of [16], in

the article [6], using the semantic map representation explained above, present a task planning process

using a Temporal-Logic Progressive Planner (PTLplan), that is able to deal with partial observability and

uncertainty, however the knowledge representation system is Loom, which only supports declarative

knowledge, not allowing probabilistic annotations to the facts and probabilistic inference.

In [17], the authors propose a formalization and a standardization in the representation of seman-

tic maps and they make a proposal for evaluation and benchmarking semantic mapping methods. A

”formalization of a minimal general structure of the representation that should be implemented in a se-

mantic map” is proposed, where the representation is defined by a global reference system, a set of

geometrical elements obtained as raw sensor data and a set of predicates that provide an abstraction

of geometrical elements. Based on the idea that a ground truth for semantic maps exists, building a

dataset to be shared by the scientific community is proposed and that allows a fair comparison between

different semantic mapping methods.

The approach in [18] presents a 3D semantic mapping technique that uses the point cloud consisting

of multiple 3D scans, obtained by 6D SLAM, to do scene interpretation and labeling of the basic elements

in the scene, as for example walls, ground, doors and others. Afterwards, data is transformed into 2D

images that are used to detect and localize objects and after the object localization is transformed back

into the 3D data. For interpreting planes in the scene, a constraint solver in Prolog was used, but there

were no more inference methods used.

In [3] a spatial knowledge representation is presented, called by the authors Cognitive lAyered Rep-

resentation of Spatial knowledgE (COARSE). It is based on layered representation with different levels

of abstraction and it is designed for representing complex, cross-modal, spatial knowledge considering

the uncertain and dynamics of the space, as presented in Figure 3.1. This representation is the main

principle of the work presented in [19], where it is assumed that knowledge should be abstracted to

keep the representations compact, allowing the robot to infer additional knowledge about the environ-

ment based on combining background knowledge with observations. In order to characterize the space

in a higher level of abstraction, the system assigns properties to places, such as objects, shape, size

and appearance.

19

Figure 3.1: The layered structure of the spatial representation in [3], showing the different levels of abstraction ofthe spatial knowledge. Figure adapted from [3]

To represent the conceptual map, a probabilistic chain graph model is used and the structure is

adapted at runtime according to the state of the topological map. In order to perform inference this model

is first converted into a factor graph representation and afterwards an approximate inference engine

is applied, Loopy Belief Propagation to consider time constraints. However this work only supports

inference of unexplored concepts, such as objects or rooms, and it lacks in inference about explored

concepts. The characterization of explored concepts can suffer modifications given the fact that the

environment is stochastic. The inference process also allows, for goal-oriented exploration, to use a

distribution of possible extensions to the known world.

In [20] the authors, propose a representation of the semantic map, which they refer to as SOM+

(semantic object maps), using a symbolic knowledge in description logic, having a spatiotemporal rep-

resentation of object poses. It is also associated Prolog predicates for inference process. The SOM+ is

an abstract representation of the environment that contains facts about objects and links objects to data

structures such as appearance models or other features used by the perception system to recognize

the objects. The work was developed with the objective of making the robot able to interact with a small

environment, more specifically, a kitchen.

The authors in [21] present a system that allows acquiring new objects in the representation through

a continuous human-robot interaction. At the beginning, the robot is guided by a user in a recognition

tour that allows an initial construction of the semantic map but the robot is also able to acquire additional

20

knowledge about the environment after the initial set-up, through a multi-modal human-robot interaction.

The behavior of the robot to interact with the humans and to collect information to update the semantic

map is implemented using Petri Net Plans. Prolog is also used to store information about the topological

graph of the environment and for each object is created predicates with information about object’s type,

localization, position and properties in order to perform inference on it.

In [22], probabilistic conceptual maps and probabilistic planning have been also combined in object

search tasks, where the conceptual map is represented as the higher layer of the hierarchical knowl-

edge representation in [3]. In order to do planning, a switching continual planner was presented which

switches between Decision-Theoretic Planning Domain Definition Language (DTPDDL) and classical

modes of planning at different levels of abstraction.

The most similar work, with what is proposed in this master thesis, is the work in [23], where the prob-

abilistic representation of the semantic map is based on probabilistic programming language ProbLog.

However, probabilistic inference tasks were used to infer a query given an evidence, inferring the prob-

ability of an object to be in such a place given a statement that expresses the probability of observing

an object in that place and an evidence (observation) confirming it. This work [23] not only presents a

probabilistic knowledge representation, but also a framework for planning under uncertainty, that was

a POMDP, computing approximate solutions in order to manage the scalability problems of POMDPs.

The decision maker also takes into account phenomena that may affect the perception algorithm, as an

error in vision algorithms and possible occlusions. In this work, a POMDP with Information Rewards

(POMDP-IR) [14] is used. This framework intends to reward the agent for reaching a certain level of

belief regarding a state feature. Because, if more certain information about the state improves task

performance, it is important to increase the available information by reducing the uncertainty regarding

the state. On this paper, the work was developed to active cooperative perception for fusion of sensory

information with the goal of maximizing the amount and quality of perceptual information available to the

system.

In [24] a solution for POMDPs is also presented when the problem of having an explicit measure

of the agent’s knowledge about the system, based on the beliefs instead of states, is incorporated in

the performance criterion. For that reason, the defining rewards are based on the acquired knowledge

represented by belief states. This framework is called ρPOMDP. If the reward function for beliefs ρ

preserves the convexity, the convexity of the concerned belief-based value function is proved. If ρ is

PWLC and the initial value function is equal to 0, then the belief-based value function is also PWLC and

it is easy to adapt POMDP algorithms to solve ρPOMDPs.

21

4Proposed method

4.1 Architecture Description

As presented in Chapter 1.3, the proposed approach for the problem is to create a probabilistic

knowledge representation of the world, that is able to provide enough information to the agent to take

decisions and keeping it updated. For that reason, the architecture that was developed can be divided

into two main parts and has the structure presented in Figure 4.1. The first part, designated as Knowl-

edge Representation Engine receives the world model as an input and is responsible for the operation

of all the architecture, as explained in detail in Chapter 4.1.1. This part is also responsible for the full

generation of the second part (Decision Maker). The Decision Maker part is composed of a set of

POMDPs, where each one is responsible for having a partial representation of the global knowledge of

the world. If selected, the POMDP takes the role of deciding which actions the agent should take. For

semantic mapping in a domestic environment, as proposed, a model that makes sense to use is to have

a Decision Maker with a POMDP for each room. If the world model is a house with N rooms, the Decision

Maker will have N POMDPs. The Decision Maker is also explained in more detail in Chapter 4.1.2.

Given the fact that the Decision Maker has multiple POMDPs, it is also necessary to choose which

one takes the role of driving the behavior of the agent, at each moment. The Knowledge Representation

Engine is also responsible for taking that decision and for that purpose, before taking the decision, it

analyses the Value function of each POMDP, given the current belief state. How to make this choice is

also explained in more detail, in Chapter 4.1.2.A.

Summing up, the architecture needs to be initialized and for that the Knowledge Representation En-

gine needs to generate a global belief about the world state in ProbLog and the different POMDPs,

given the world model provided. Then, using that initial global belief, it analyses the Value Function of

each POMDP created, to choose which one should drive the agent’s behavior. The chosen POMDP

22

will keep driving the agent’s behavior, updating the internal belief of the POMDP, given the set of pairs

action-observation. This internal belief also keeps updating the global belief in Knowledge Representa-

tion Engine, as explained in Chapter 4.1.2.B. When the POMDP starts to take the action to do nothing,

it means that the agent as already accomplished the goal and it stops being the POMDP taking care

of the agent’s behavior. At this point, it will return to the Knowledge Representation Engine the final

POMDP internal belief, updating the global world representation. Given the new updates in the global

world representation, the Knowledge Representation Engine decides again which POMDP should be

chosen, repeating the cycle.

ProblogSWorld Model

POMDP 2S2

POMDP 1S1

POMDP NSN

...

action

observation

action

observation

action

observation

Knowledge RepresentationEngine

Decision maker

b1

b2

bn

Figure 4.1: Scheme of the architecture operation

4.1.1 Knowledge Representation Engine

The Knowledge Representation Engine, as it was explained before, is mainly responsible for the

architecture operation. It has the global world representation and chooses which POMDP should drive

the agent behavior, based on the current global belief. For that purpose, it starts by receiving an initial

world model, that in a semantic mapping context can be considered as a list of objects, furniture and

rooms with their characteristics, such as position, volume, size and others. That information is used to

create a set of facts in ProbLog. The Knowledge Representation Engine is also responsible for having a

representation of the interactions and relations between the world model components, considering the

uncertainties that are present in real-life models. In the semantic mapping context, it is necessary to

define the relationship between different objects, objects and furniture, furniture and rooms, etc. This

is possible to be done, defining a set of rules and probabilistic facts in ProbLog, which specifies the

23

behavior guidelines, and then taking advantage of the inference process of ProbLog. That information

will be useful to make some inference about the world state and to generate the POMDPs. The global

world representation presented in the Knowledge Representation Engine can be called global belief b,

representing the probability distribution over the set of possible world states S.

As referred before, the Knowledge Representation Engine is initially responsible for the full genera-

tion of different POMDPs, dividing the global world representation into subworlds, based on the criterion

of division defined a priori by the model. Each tuple 〈Sn,An, Tn, Rn,Ωn, On〉, that defines the POMDP n,

as explained in Chapter 2.4.2, is completely defined by the Knowledge Representation Engine. The goal

of making this division is to simplify the global world representation in a set of smaller worlds, in order

to be easy to make decisions in each one. For that reason, the dimension of Sn for each POMDP is

smaller than S, that considers all the possible world states.

The Knowledge Representation Engine considers that, at each time, a state S can be defined as

the joint discrete probability distribution of a set X = X1, X2, . . . , XK of independent discrete random

variables. Each state S in S can be defined as:

S = X1 ×X2 × . . .×Xk . . .×XK (4.1)

Each variable Xk is denominated as state variable and it has a set of possible outcomes Dk, that

corresponds to its domain. Then, the dimension of the world state |S| is equal to the combinations of

the domain Dk of each state variable Xk,

|S| =K∏k=1

Dk. (4.2)

Each POMDP n of the Decision Maker has a set of states Sn. Each state S′ in Sn is defined as the joint

discrete probability distribution of a set Xn of independent discrete random variables. It is important to

notice that,

|Xn| ≤ |X | (4.3)

and that each variable X′

k ∈ Xn has a match with the variable Xk ∈ X , because they are representing

the same feature in the world model. However, they are not the same because they have different

domains. The domain of X ′k is D′k and it is adapted to the subworld of the respective POMDP. It should

be noted that there is an important characteristic of the relation between Xk and X ′k domains, given by

Equation (4.4), because the subworld in each POMDP is restricted, compared with the global world.

|D′k| ≤ |Dk|, (4.4)

The conditions presented in Equations (4.3) and (4.4) are the reason for the dimension of Sn be smaller

24

than S.

To construct the domain of the variables D′k in a POMDP, the domain values that are not available on

that specific subworld need to keep represented, because those values are still valid in the global world

representation. For that reason, all those values can be aggregated in a single value that for example,

can be called as none. It keeps representing those values, but not discriminating each one individually,

minimizing the number of POMDP states, as desired. Every time that the Knowledge Representation

Engine needs to calculate the belief bn of each POMDP n, it needs to generate the new probability

distribution for each state variable X′

k of each POMDP n. For that purpose, it is considered a function

fk,n for each state variable X′

k of each POMDP, that associates each element of the domain Dk to a

single element of the domain D′k,

fk,n : Dk → D′

k. (4.5)

If Dk = D′k, fk,n is an endofunction, however, the most common case is to have D′k ⊆ Dk ∪ none

where none represents the set of elements Dk \D′k and then

P (X′

k = x′) =

∑

x∈Dk\D′k

P (Xk = x) , if x′ = none

P (Xk = x′) , otherwise(4.6)

4.1.2 Decision Maker

The Decision Maker designed needs to deal with uncertainty in different aspects, such as observa-

tions and action results. As was explained in Chapter 2.4.2, a POMDP is able to consider that uncertainty

and make decisions on those conditions, however, finding the optimal policy on large POMDPs is limited

by the poor scalability of existing solution algorithms and the large state spaces is one important source

of intractability. This problem can be minimized, dividing the decision making task into several POMDPs,

where which one is responsible for taking limited decisions, given the limitation of the possible states,

actions and observations of the subworld that it represents. However, all of them, together with the

Knowledge Representation Engine, can be able to get an agent behavior close to the one given by the

optimal policy, obtained from a single POMDP representing the global world model. For that purpose,

at each moment, it is important to have an engine able to select the POMDP that makes more sense

to guide the agent’s behavior, given the current global belief b and the agent’s goal, as explained in

Chapter 4.1.2.A.

When initialized, each POMDP n of the Decision Maker also needs to be solved, using a POMDP

solver to compute the optimal policy π∗ that maps all possible beliefs bn, in the belief space B, to an

action a in the set An of the possible actions that the robot can perform in that subworld. The POMDP

solver computes an approximation of the optimal policy π∗, that is the one that maximizes the agent’s

expected total reward, given by the value function V (b, π).

25

4.1.2.A POMDP selection

The POMDP selection needs to be done by the Knowledge Representation Engine, taking into ac-

count the current global belief b, that provides the information about the distribution over the possible

world states. So, it starts by computing each POMDP belief bn, as explained in Chapter 4.1.1. Then,

those initial beliefs can be used to calculate the expected total reward of each POMDP n, using the

value function Vn(bn, π∗) that was calculated previously. Therefore, comparing the expected total re-

ward of each POMDP, it is possible to use different selection criteria to choose the POMDP that should

conduct the agent’s behavior. Those different criteria are related to the main goal of the agent.

The different POMDP value functions can be compared, because the model used by the Knowledge

Representation Engine to generate them is the same. In other words, the rewards values and the

observation and transition probabilities are similar and the differences are just related with the specific

characteristics of the subworld that the POMDP represents and those characteristics are supposed to be

reflected in the Value function. The fact that the POMDP states are not the same is also a differentiating

factor and influences the value of the value functions as desired, in order to characterize the POMDP.

4.1.2.B Global knowledge representation update

Each time that a POMDP is selected, it guides the agent’s behavior, updating the internal belief bn,

with the collected information. At the same time, the updated internal belief of the POMDP also updates

the global belief b of the Knowledge Representation Engine.

Both beliefs b and bn are defined as the joint probability distribution of the sets X and Xn of inde-

pendent discrete random variables, respectively. Remembering the independence between the state

variables in Xn and in X , updating the belief b with the new belief bn is the same as updating the prob-

ability distribution of each variable Xk given the probability distribution of the respective X ′k and then

calculating the joint probability distribution of the variables updated in the set X . The probability distri-

bution of the state variables Xk remains the same when there is no correspondent X ′k in the set Xn. On

the other hand, the probability distributions of the remain variables Xk is updated individually, taking into

account the probability distribution P(X′

k|Z)

that come up from the POMDP n. This probability repre-

sents the distribution of the variable X′

k, given the observations Z that the agent collected. Considering

P (Xk) as the prior probability distribution of Xk, the subsequent probability P (Xk|Z) is given by the

Equation (4.7), where fk,n is the function that associates each element in Dk to an element in D′

k, for

the POMDP n. The complete derivation of Equation (4.7) is presented in Appendix A.

P (Xk|Z) =P (Xk)∑

x∈Dk

P(X′k|Xk = x

)P (Xk = x)

P(X′

k|Z), with X

′

k = fk,n (Xk) (4.7)

26

Considering that P(X′

k|Xk

)is given by Equation (4.8), Equation (4.7) can also be written as in (4.9).

P(X′

k|Xk

)=

1 , if X

′

k = fk,n (Xk)

0 , otherwise(4.8)

P (Xk|Z) =

P(X′

k|Z)

, if Xk ∈ Dk ∩D′

k

P (Xk)∑x∈Dk\D

′k

P (Xk = x)P(X′

k|Z)

, otherwise (4.9)

4.2 Semantic Mapping Application

The architecture designed can be applied in different contexts. The semantic mapping in a domestic

environment application is the main motivation of the work presented. In semantic mapping, it is possible

to consider a house configuration as the world model, with all its rooms, furniture, objects and their

respective characteristics and relations. In a domestic environment, using this architecture it is possible

to have a semantic map, representing the probability distribution of any objects being placed over the

considered placements (furniture where an object can be placed) or being located over the possible

rooms. Knowing that at each moment the position of each object does not depend on the position of

the remaining objects and robot, the set of state variables Xk, can be the robot and the location of the

objects, and the domain Dk can be possible robot locations and possible places where the objects can

be located, respectively. Using this world model of the house, the Knowledge Representation Engine

can generate a POMDP for each room, that represents that subworld. In each POMDP, the possible

states correspond to the different possible combinations of the location of the objects and robot location,

inside that room. The state variables X′

k are robot and objects location, considering of course that the

domain D′

k is a smaller set than Dk, because of the restrictions in the placements and robot locations.

D′

k can also take the value none in order to represent the possibility of the robot or object being located

in another room, where applicable.

The Knowledge Representation Engine also needs to define the possible actions and observations

of each POMDP. In an architecture where the goal is to generate and keep updated a semantic map

of the environment, it makes sense to have actions for moving the robot, search for objects and an

extra action for doing nothing. This action is chosen by the optimal policy when there is no more big

value to explore that room. Then, the robot behavior should stop being guided by that POMDP and the

POMDP evaluation and selection process should be repeated using the new information collected. The

observation function presented in each POMDP can be represented as the possibility of observing an

object or not, based on the object characteristics. In the semantic mapping context, the main goal of the

agent is to reduce the uncertainty about objects’ location. In order to represent this goal in the POMDP

27

model, the advantage of the POMDP-IR presented in Chapter 2.4.3 is used, rewarding the agent for

reaching a state with lower uncertainty in the location of the objects. For that reason, a reward Rk for

each state variables Xk of an object is used.

Summarizing, each POMDP model of the Decision Maker, for the semantic mapping application is

defined by:

1. States and Transitions: The model considers one state variable for the robot and a state variable

for each object that can be located in the room. The robot and the object state variables repre-

sent the location of the robot and objects, respectively. The state transition model for the robot

represents the probabilities of it being located in a certain location, given the previous one and the

action taken.

2. Observations: The model has an observation binary variable for each object variable considered

to indicate the probability of the object being observed by the perception module or not.

3. Domain Actions: There is one action for searching for objects, triggering the perception model,

one action for moving the robot to each placement and an action just to stop the robot, indicating

the end of a searching process in that room.

4. Prediction Actions: A prediction action variable for each object is considered, indicating whether

an object is believed to be in some location in the room, not found in this room, or null if there is

not enough information.

5. Rewards: Each reward value of taking an action, given the robot and object location, depends

on the environment and the desired agent’s behavior. However, given the usage of information

rewards, in general, it makes sense to give higher rewards to the stop action than to the search

object action and higher rewards to the search object than the move actions, in order to represent

the action effort.

6. Informarion Rewards: The information rewards considered depend on the desired degree of

belief about the location of the objects.

An example of a POMDP model with two objects is presented in Figure 4.2, where the arrows repre-

sent the dependencies.

The fact that objects can change locations with time, given the natural dynamic characteristic of a

domestic environment, this can be represented in the model. The Knowledge Representation Engine

updates the distribution of the global belief, considering that there is an exponential decay, in the prob-

ability distribution of each object state variable Xk. For that purpose, the probability distribution of each

variable Xk is equal to Equation (4.10), where Pprevious(Xk = x) is the value of the probability distribu-

28

Figure 4.2: POMDP model example

tion of Xk at the previous time step, Puniform(Xk = x) is the probability value for a uniform distribution

and λ is the decay rate.

P (Xk = x) = Puniform (Xk = x) +[Pprevious (Xk = x)− Puniform (X = x)

]e−λt (4.10)

Then, when the Knowledge Representation Engine receives new information about the probability

distribution of the state variables in a specific room, the update is done as presented in Equation (4.7).

Where the P (Xk), that is the previous probability distribution of the state variable Xk, has already been

updated by the exponential decay presented in Equation (4.10).

In the semantic mapping application, the agent has two different kind of goals that are related to two

different kinds of POMDP selection criteria:

1. Generate and keep updated a semantic map of the environment

If the goal of the architecture is just to generate and keep a semantic map of the environment

updated, reducing the uncertainty about the location of the objects, it just needs to deal with an

active perception task. The agent aims to select actions that reduce its uncertainty about the world

state. This goal can be represented as the ambition of maximizing the POMDP value functions

sum because they are PWLC functions, where the expected reward will be lower towards the

center of the belief space. Then, the higher the entropy (5.1) of the belief state, the closer to the

29

middle of the belief space the system is in and the lower the subsequent expected reward. For that

reason, maximizing the value function corresponds to minimizing the entropy of the belief state,

that can be translated in having a lower uncertainty about the objects’ arrangement, as desired.

For that purpose, what is desired is maximizing the sum of the value functions of each POMDP,

means that the goal is to have an updated and confident semantic map of the global world, not just

having a really high confidence about the objects arrangement in one room. In order to maximize

the value functions sum, the criterion of the Knowledge Representation Engine for choosing the

POMDP is to choose the one with the lower expected rewards value associated, for the belief

b at that point. This can be explained by the convexity of the value functions because the POMDP

has implicit the goal of minimizing the entropy of the belief state. Then it will maximize the value

function to a value close to the maximum. This means that maximizing the Value function with

the lower initial value has a higher potential to obtain the highest cumulative expected rewards

increase. The belief bn that will derive from the POMDP chosen will have a lower entropy than

initially, and when it updates the global belief b, it will also decrease the entropy of b. In turn, this

will decrease the entropy of the beliefs bn of each POMDP, increasing the cumulative expected

rewards of each POMDP.

2. Carry out a task

In an architecture where the main goal is to carry out some tasks, such as find a specific object

or move an object to a place, the agent needs to have a lower uncertainty about the world state,

to be able to reach the goal. For that purpose, in this case, the agent can take an action that

cannot be directly related to task accomplishment, the action can have the intention of reducing

the uncertainty about the environment. In this case, the criterion for choosing the POMDP is to

choose the one with the higher expected rewards value, for the belief b at that point. A POMDP

model for carrying out a task has in its basis, the structure of the model presented before, however

it is necessary to add some actions that help with the task accomplishment. Depending on the

task, it may be necessary to add new variable states to the model or just new possible values to

the variable states, new observation variables and new rewards to the new states, that must be big

enough to make the agent give priority to the task accomplishment.

30

5Experimental Results

5.1 Implementation

In order to analyze, test and validate the architecture designed and presented in Chapter 4, it was

necessary to implement it. The implementation presented is based on the Robot Operating System

(ROS) framework, because this architecture is designed with the purpose of having a robotic application

and ROS is a flexible framework for writing robot software. The Knowledge Representation Engine is

implemented as a ROS node in Python because it allows importing ProbLog 1 as a package, in order to

interact directly with it. The Decision Maker can be represented as multiple instances of a node that is

responsible for each POMDP. For this, a Matlab implementation of the Symbolic Perseus 2 algorithm,

able to solve POMDP-IR is used. Symbolic Perseus is a point-based value iteration algorithm that is

able to tackle large factored POMDPs. The Knowledge Representation Engine is able to generate the

POMDP files, with all the information needed. These files can be opened in the software OpenMarkov 3,

enabling us to have a graphical representation of the POMDP model. These files are also used by the

Matlab solver, in order to compute offline an approximation to the optimal Value function.

In the real environment experiments, it is necessary to use a perception model to detect and recog-

nize objects in real-time. So, the experiments realized use a ROS wrapper of YOLOv3 4, trained with a

dataset for the objects that need to be recognized.

1https://dtai.cs.kuleuven.be/problog/2https://cs.uwaterloo.ca/~ppoupart/software.html3http://www.openmarkov.org/4https://github.com/pjreddie/darknet

31

https://dtai.cs.kuleuven.be/problog/

https://cs.uwaterloo.ca/~ppoupart/software.html

http://www.openmarkov.org/

https://github.com/pjreddie/darknet

5.2 World Model

In order to test the architecture operation in the semantic mapping context, it is important to start

by defining the domestic environment design used in the experimental results. The environment is

an apartment based on the ISRoboNet@Home Testbed, that is a certified by ERL Consumer Service

Robots, used to benchmark domestic robot features and tasks. The testbed layout is presented in

Figure 5.1.

Figure 5.1: ISRoboNet@Home Testbed layout

For each placement considered in the testbed, it is important to define different characteristics about

it. The furniture characteristics that will be considered in the presented experiments are positioned in

the 2D plane and furniture area. Given the position of each placement, it is possible to get the euclidean

distance between each pair of placements. The ProbLog model in the Knowledge Representation En-

gine uses this distance between objects to define the rewards of taking the action of moving from one

placement to another, for each POMDP. Each POMDP only considers the placements inside the re-

spective room. Then, the distance between outside and a placement inside the room is the average of

the distances to all the placements outside. In the case of taking the action of moving from one place to

another, the reward is actually a negative reward (penalty), representing the effort of moving the robot.

The bigger the distance, the more negative is the reward. The value range used for these rewards is

[−0.5,−0.7]. Besides that, in the model presented, the distance between furniture is also used to define

32

the low probability of observing an object placed where it is not, considering a higher probability, when

the object is placed in a furniture close to the one where it was observed. The value range considered

for this is [0.05, 0.1].

In the model presented, the furniture area is the available area to place objects and is used to find

the POMDP rewards of searching objects in that place. This reward is in the range [−0.25,−0.35]. It is

negative for the same reason that the moving actions and the smaller the area, the more negative is the

reward.

In addition to furniture, the world model needs also to define the objects that the robot needs to

consider for semantic mapping and its characteristics. For each object is given the volume that is used

to find the probability of observing an object where it really is. This enables us to represent the fact

that objects with smaller volume have a lower probability of being observed because they are small

and they can be easily occluded. The value range of the observation probabilities for these cases

is [0.8, 0.9], considering a probability of having false negative observations in the range of [0.1, 0.2],

depending on the object. The distance between furniture is also used to define the probability of having

false positive observations, which means the probability of observing an object where it is not located.

For that purpose, the probability of having false positive observations, if the object is not even in that

room, is equal to 0.01. If the object is located in the same room where the robot is searching, but not in

the same placement, the probability of getting a false positive observation is in the range of [0.05, 0.1],

depending on the distance between the furniture that the robot is looking for and the furniture where

the object is placed. The reward values and the observation probabilities model can be summarized in

Table 5.1.

Do Nothing Search Object Move Robot

Reward Value 0 [−0.35 ; −0.25] [−0.7 ; −0.5]

Observation

probabilities

object

location=

robot

location

0

[0.8 ; 0.9]

0object

location=

robot

location[0.05 ; 0.1]

object

location= none 0.01

Table 5.1: POMDP reward values and observation probabilities ranges for the different actions type

In appendix B it is presented an example of the world model files that the architecture receives as

input.

For the different experiments, two different scenarios for the testbed are considered, as presented

in Table 5.2. Scenario 1 considers 5 placements spread over 3 rooms and Scenario 2 considers 8

33

placements spread over 4 rooms.

Kitchen Living Room Dining Room Bedroom

Scenario 1kitchen table coffeee table

dining tablekitchen cabinet sideboard

Scenario 2kitchen table

coffeee tabledining table

bedkitchen cabinet

sideboardnigh stand

bookshelf

Table 5.2: Testbed scenarios used in the experiments

5.3 POMDP-IR Model

As mentioned in Chapter 4.2, in semantic mapping application, the used model has a POMDP for

each room, where the set of states are all the combinations of possible positions of each object and

robot, inside that room. The set of possible actions has, for each placement of the objects, an action to

make the robot go there (e.g. goSideboard). It also has one action to search for objects (searchObject)

and another one to do nothing (doNothing). The set of observations is composed by a binary variable

for each object, representing the probability of observing it or not.

The tool used to solve the POMDP allows solving factored POMDPs, where the transition, observa-

tion and reward functions are defined in terms of the state variables Xk, action variables and observation

variables, allowing to have compacted factored representations.

In the experiments, the POMDP-IR information rewards used are rcorrect = 0.53 and rincorrect = −4.78,

as proposed in [14] , in order to get β = 0.9 and reward the robot for having a degree of belief on the

object’s location, higher than 0.9.

Another assumption that is made in the experimental results presented, to simplify the model, is that

the environment is static, while the agent is taking decisions inside a room. Basically, the state variable

transitions T of each POMDP considered are deterministic. Then, the model assumes that, while the

robot is taking actions inside the room, the objects do not change position and the robot changes its

position, according to the move actions that it takes. Those are realistic assumptions, given that it is not

expected that the objects change location too often. Then, in the time span that the robot is exploring

a room, one can consider that the objects do not change position and even if it happens, without being

modeled, the object may detect those changes in the following searching episodes. On the other hand,

nowadays there are reliable and accurate navigation algorithms that work well in this kind of domestic

environments and the problem of navigation can be separated from the decision-making task.

However, the dynamic characteristic of a domestic environment keeps being represented in the

34

model, as explained in Chapter 4.2, using an exponential decay in the object state variables X in the

Knowledge Representation Engine.

5.4 Simulated Experiments

In order to analyze the behavior of the architecture designed, it is possible to do some experiments

in a simulated environment to analyze the performance of the architecture. For that reason, what needs

to be simulated is the perception model and the actions of moving the robot. In Scenario 1, three simple

cases will be presented, one considering a static environment and two considering it dynamic but without

changing the location of the objects. In Scenario 2, how the architecture deals with environment changes

and with errors in the perception model will be presented. For the experiments with a dynamic model, as

proposed in Chapter 4.2, one consider an exponential decay in the probability distributions of the location

of the objects, augmenting the entropy with the time. In the simulated experiments with a dynamic model

of the environment, a mean life λ−1 of 5 time slots was considered, taking into account that the action of

moving the robot corresponds to a time slot, the action of searching for objects corresponds to 0.3 time

slots and the action of doing nothing corresponds to 0.1 time slots.

5.4.1 Scenario 1

5.4.1.A Static Model

In order to analyze the architecture behavior in a simple scenario, it is considered that the environ-

ment is static and for that reason, the position of the objects does not change with time. At each time, the

states variables distribution is updated, using t = 0 in Equation (4.10), ignoring the exponential decay.

In this first experiment, the objects and their respective locations are considered, as presented in

Table 5.3. It is also considered an initial uniform distribution for each object state variable over the

possible 5 placements in Scenario 1. The probability distributions, the robot position and the consequent

action, for each time step, are presented in Chapter C.1.

object locationcocacola dining tablepringles kitchen table

mug gray coffee tablemug black kitchen cabinet

Table 5.3: Location of the objects in the static model

In Figure 5.2 is possible to analyze the progression of the objects distribution entropy. The degree

of uncertainty about the location of the objects can be quantified using the entropy E. For a discrete

35

probability distribution P = (p1, ..., pn), the Entropy E is defined by Equation (5.1). For each state

variable Xk, it decreases with time, as desired.

E (P ) = −n∑i=1

pi logn (pi) , (5.1)

At the time of step 0, the robot does not have any idea about the location of the objects, so the

objects distribution have a maximum entropy, E(Xk) = 1. Then, for each time step that the robot takes

in the action of searching for an object, the entropy decreases. When the object is seen, the entropy

decreases abruptly, because the uncertainty about the object also reduces a lot. The entropy of the

cocacola state variable keeps higher than the others because it is never observed by the robot and after

14 steps, the level of uncertainty about the world state is lower enough, so that the agent will always opt

for the action doNothing thereafter.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

step

0

0.2

0.4

0.6

0.8

1

en

tro

py

cocacola

mug black

mug gray

pringles

Pomdp selection

Figure 5.2: Progression of objects distributions entropy in a static model

In Table 5.4, it is possible to analyze the expected rewards of each POMDP for each time step that

the Knowledge Representation Engine needs to select which POMDP should control the robot behavior

and which room the robot should explore. At step 0, the expected rewards of the kitchen and the living

room POMDPs are roughly the same, because the number of placements and the initial belief are the

same, given the initial global uniform distribution of the location of the objects. The small difference is

related to the placements characteristics of each room, making the agent start by exploring the living

room. The dining room has just one placement and that is the reason to have a higher value of the

expected rewards because it can easily reach a lower uncertainty about the room state. At step 8,

the living room has already a higher value because the uncertainty about the room state was reduced

after exploring it before. The same happens at step 15, with the expected rewards increment of kitchen

POMDP.

36

step 0 8 15 16 17 18kitchen 22.53 21.24 38.71 38.65 38.65 38.65

living room 21.18 40.21 40.93 40.93 40.93 40.93dining room 34.46 34.43 38.13 38.13 38.13 38.13

Table 5.4: Expected rewards for POMDP selection steps in a static model example

5.4.1.B Dynamic Model

In realistic domestic environments, the static model is not appropriate, because this kind of behav-

iors are typically dynamic and for that purpose it is necessary to consider that the objects can change

location, using the exponential decay with the parameters referred in Chapter 5.4. For this experiment,

a different configuration of the objects locations, as presented in Table 5.5 is used. This configuration

remains the same throughout the experiment. However, in this case, there is a possibility of the location

of the objects had been changed, so it is used a dynamic model.

object locationcocacola kitchen tablepringles coffee table

mug gray dining tablemug black sideboard

Table 5.5: Location of the objects in the dynamic model

The initial distribution of each object state variable is uniform, as in the previous experiment, so the

agent starts with the maximum entropy. Henceforth, it is possible to verify in Chapter C.2 that the agent

keeps moving between the living room and the kitchen and it never visits the dining room. This can be

explained by the fact that there is just one placement in that room, and the agent can infer if the object

is there or not, verifying the hypothesis of being or not in another placement and taking into account that

the object needs to be in one of the considered placements. However, the entropy of the mug gray state

variable continues higher than the remaining objects’ entropy, as can be seen in Figure 5.3. After the

system stabilization, the P (Xmug gray = dining table) keeps oscillating in the range [0.758, 0.797], giving

a good confidence about the mug gray position and a stable entropy, even without observing it.

The expected rewards of the dining room are always higher than the remaining rooms, as presented

in Table 5.6. After exploring a room and decreasing the uncertainty about the objects that are located in

that respective room, the dynamic model designed increases the entropy of that objects state variables,

while the agent is exploring another room. This increase is enough to make the expected rewards of

exploring the first room lower than the others. This conduct justifies the behavior of keeping alternation

between exploring the kitchen and the living room and can be observed in Figure 5.3. After some

time, the system stabilizes keeping a periodic entropy oscillation. At that point, the robot keeps taking

37

0 10 20 30 40 50 60 70 80 90 100 110

step

0

0.2

0.4

0.6

0.8

1

entr

opy

cocacola

mug black

mug gray

pringles

Pomdp selection

Figure 5.3: Progression of objects distributions entropy in a dynamic model

the same decisions, until some object changes place, alternating between exploring the kitchen and

the living room, decreasing the entropy of the objects that are in the kitchen and in the living room,

respectively.

step 0 10 20 28 36 43 50 58kitchen 22.53 20.61 41.67 25.28 41.82 25.40 41.56 25.13

living room 21.18 41.85 22.00 41.38 23.39 41.46 23.53 41.56dining room 34.46 34.57 34.50 34.56 34.52 34.56 34.52 34.56

step 66 73 80 88 96 103 110kitchen 41.80 25.40 41.56 25.13 41.80 25.40 41.56

living room 23.35 41.46 23.53 41.56 23.35 41.46 23.53dining room 34.52 34.56 34.52 34.56 34.52 34.56 34.52

Table 5.6: Expected rewards for POMDP selection steps in a dynamic model example

On the contrary, if the architecture decided to explore the dining room, the entropy of the mug gray

would decrease. However, the entropy of the remaining objects would increase and, therefore, also

the entropy of the global belief, taking higher values then if the behavior was the one presented by the

architecture.

5.4.1.C Carrying out a task

As was referred in Chapter 4.2, it is also possible to use the architecture to model situations where

the agent has the goal of carrying out a specific task that requires a good knowledge about the world

state. However, that requirement is just a consequence for reaching it, and it is not strictly the goal

as before. Instead of having just the ambition of reducing uncertainty about the position of the objects

in general, in the experiments presented the main goal is to move the cocacola close to the pringles.

So, it is necessary to add, to each POMDP model, the actions graspCocacola and releaseCocacola, in

order to the agent be able to change the cocacola location. It is also necessary to add the possibility

of cocacola being placed on the robot gripper. For the cocacola state variable, the domain has the new

38

value gripper. It is also necessary to add big rewards to grasping the cocacola, when it is not in the

same placement as pringles and to release the cocacola when the cocacola is on the gripper and the

Robot is on the same placement as pringles. In this experiment it is also assumed that the actions of

grasping and releasing the object always succeed and that there is no effect of the exponential decay

on the probability of the object of being in the gripper.

In order to test the architecture, in this application, two different configurations of the location of the

objects are considered, as presented in Table 5.7.

configuration object location

1cocacola sideboardpringles dining table

2cocacola kitchen tablepringles coffee table

Table 5.7: Location of the objects for experiments with the goal of moving the cocacola to close to the pringles

When the main goal of the agent is carrying out a specific task, as moving the cocacola close to the

pringles, the criterion of POMDP selection is choosing the POMDP with the higher expected rewards, as

explained before. In Table 5.8 the expected values for POMDP selection steps, for both configurations

is presented.

Configurarion 1 Configurarion 2step 0 9 20 25/26/27 0 8 15 20/21/22

kitchen 36.63 17.72 16.65 19.19 36.63 63.87 17.65 19.19dining room 28.49 11.98 17.79 21.17 24.49 43.27 8.47 19.20living room 36.82 13.83 16.79 19.19 36.82 18.32 24.52 21.12

Table 5.8: Expected rewards for POMDP selection steps to carrying out a task

In order to compare the probability distribution of each state variable with the deterministic distribution

that corresponds to the true location of the object, the Hellinger distance between both is used. For two

discrete probability distributions U = (u1, ..., un) and V = (v1, ..., vn), the Hellinger distance is defined

as

H (U, V ) =1√2

√√√√ n∑i=1

(√ui −

√vi)

2, (5.2)

measuring the `1-distance between the probability vectors U and V , allowing us to quantify the similarity

between two probability distributions. There are several methods to measure the difference between two

probability distributions. This one is chosen to quantify the architecture performance because it is an

intuitive method, related to the Euclidean norm of the difference of the square root vectors, and given the

distribution of the true location of the object characteristics. It is a degenerate distribution and then, most

of the domain values have probability zero associated, which in most of the methods, as KL-divergence

39

or cross entropy, implies an indefinite value for some terms given the logarithm function used.

In Figure 5.4, it is possible to verify the Hellinger distance at each step, for both configurations. In

Configuration 1 the robot finds the cocacola in the first room that it visits, grasping it and keeping it

in the gripper until it finds the pringles that are in the dining room. After grasping the cocacola, the

robot visits the kitchen, because there are two placements there, so intuitively the probability of finding

it there is higher, which implies that the expected rewards are also higher, as presented in Table 5.8. In

Configuration 2 the robot finds the pringles and then, as soon as it finds the cocacola, it goes back to

the placements where the pringles were placed, in order to reach the goal.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

cocacola

pringles

Pomdp selection

graspCocacola

releaseCocacola

(a) Configuration 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

cocacola

pringles

Pomdp selection

graspCocacola

releaseCocacola

(b) Configuration 2

Figure 5.4: Progression of Hellinger distance until the robot reaches the goal for two different configurations

5.4.2 Scenario 2

5.4.2.A Objects changing position

In order to analyze the behavior of the architecture to modifications in the objects location, an ex-

periment in Scenario 2 is presented, considering three objects and that there are three modifications

in the objects locations, as it is presented in Table 5.9. After step 75, the mug gray is moved from the

dining table to the night stand in a different room. after step 120, the pringles is moved to the bookshelf,

remaining in the same room. After step 175, all the objects are moved to the kitchen.

40

steps object location

0-75mug gray dining tablecocacola kitchen tablepringles coffee table

76-120mug gray night standcocacola kitchen tablepringles coffee table

121-175mug gray night standcocacola kitchen tablepringles bookshelf

176-260mug gray kitchen cabinetcocacola kitchen tablepringles kitchen table

Table 5.9: Location of the objects for each step

In this experiment, the parameters of the exponential decay are the same as those in Scenario 1,

however the scenario is bigger, which implies a higher entropy in the state variables distribution, because

the robot has a bigger environment to explore, including a new room. In Figure 5.5, for each step,

the value of the Hellinger distance between the distribution of each object state variable is presented,

obtained by the architecture, and the distribution that corresponds to the reality, a degenerate distribution

with P (Xk = x′) = 1, where x′ corresponds the real position of the object in that instance. The Hellinger

distance allows quantifying the similarity between two probability distributions. The robot decides to go

to the living room in every two POMDP selections, as it is presented in Table C.4, because it has three

placements and a big exponential decay in the architecture is used.

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

cocacola mug gray pringles Pomdp selection Modifications in object location

Figure 5.5: Progression of Hellinger distance, with modifications in the location of the objects

Analyzing the robot’s behavior to the first change in the location of the objects at step 75, it is noted

that the robot is able to detect it and update the state variables distribution without affecting its behavior.

This happens because it is the mug gray that changes location and the robot finds its new location

by chance, without noticing before that it is not in the previous one. At step 120, the pringles change

location inside the same room and the modification is detected when the robot goes to that room again

and it notices that the pringles are not in the previous place. But then, given the large uncertainty about

41

the pringles location, the agent keeps exploring the remaining placements in the room, finding it and

reducing the uncertainty. For example, in the previous two times that the robot had explored the living

room, the robot did not decide to search for objects in all the placements. However, when it figured

out that the pringles were not in the previous place, the uncertainty increase made the agent decide to

explore all the placements, as presented in Table 5.10.

step robot location action110 kitchen cabinet gocoffee table111 coffee table searchObject112 coffee table searchObject113 coffee table searchObject114 coffee table goBookshelf115 bookshelf searchObject116 bookshelf searchObject117 bookshelf doNothing

step robot location action125 bed gocoffee table126 coffee table searchObject127 coffee table searchObject127 coffee table goSideboard128 sideboard searchObject129 sideboard goBookshelf131 bookshelf searchObject132 bookshelf doNothing

Table 5.10: Comparison between living room actions, before(left) and after(right) pringles changing location

At step 175, there is the last modification in the location of the objects, with all objects being placed

somewhere in the kitchen. This modification is done after the robot’s leaving of the kitchen and, for that

purpose, it also takes some steps until the robot is able to find those modifications. The robot keeps

exploring the bedroom and the living room, increasing the uncertainty about the location of the objects

until the expected rewards of the kitchen POMDP become the lower ones. Once it is in the kitchen

and finds all the objects, the uncertainty decreases. It also has the capability of keeping the Hellinger

distance in very low values, because it just needs to keep searching for the objects in that room, not

allowing the increase in the uncertainty about the location of the objects. It is also possible to verify

that when the robot finds the objects and decides to do nothing, because it already has an entropy of

approximately zero, in the next few POMDP selections steps, the lower expected reward values are in the

living room. However, giving the high confidence about the location of the objects, the first action taken

by that POMDP is to do nothing, which means not moving the robot from the kitchen. This process is

repeated until the uncertainty increases enough to make the kitchen POMDP being selected and select

actions to make the robot figure it out if the location of the objects remains the same.

5.4.2.B Incorrect observations robustness

The architecture presented needs to have a perception model to construct the POMDP observations

about the world state. For that purpose, it is important to analyze how the system modulates the per-

ception model. When the Knowledge Representation Engine needs to generate each POMDP, it also

needs to define its observation function. In this case, it defines an observation model for each object

state variable, defining the probability distribution over possible observations for each state variable,

42

given the previous state and the action. The observation model used in the experiments, as explained in

Chapter 5.2, allows defining these observation models for each state variable, depending on the object

that it represents, the robot location, at that time, and the room furniture configuration. For Scenario 2,

the observation model can be summarized as presented in Table C.5.

To verify if the system is able to deal with incorrect observations, the observations generation in this

simulated experiments is following the model implemented for each POMDP and presented in Table C.5.

For two different configurations of the location of the objects, as presented in Table 5.11, it is obtained

the Hellinger distance over 400 steps, as presented in Figure C.1 and Figure C.2. It is possible to

verify that the architecture is able to easily remove the effect of the erroneous observations when they

are generated at the same false negative and false positive rate that the observation function of each

POMDP considers. These are denominated as the expected negative and false positive rates.

configuration object location

1mug gray bedcocacola coffee tablepringles kitchen cabinet

2mug gray night standcocacola bookshelfpringles sideboard

Table 5.11: Location of the objects for experiments with wrong observations

In order to verify if the architecture is able to deal with incorrect observations at a higher rate than

the expected by each POMDP, in Table 5.12, it is presented the mean Hellinger distance, for 400 steps

and starting with a uniform distribution, for different false positive and negative rates. In Table 5.12,

it is possible to verify that the architecture keeps having a good robustness to incorrect observations,

even when it is generated false positive and negative observations at twice of the expected rate by

the POMDP model. In that case, the false negative rate is 40% for the mug gray, and even with that

conditions, the mean Hellinger distance remains lower.

configuration 1 configuration 2cocacola mug gray pringles cocacola mug gray pringles

no wrong observations 0.213 0.348 0.394 0.269 0.369 0.179

expected falsepositive andnegative rates

x

1 0.239 0.382 0.409 0.258 0.347 0.1811.25 0.229 0.373 0.405 0.248 0.378 0.1401.5 0.253 0.370 0.407 0.260 0.365 0.2321.75 0.246 0.357 0.404 0.281 0.396 0.2022 0.287 0.399 0.430 0.268 0.344 0.2062.5 0.330 0.551 0.405 0.304 0.478 0.2163 0.486 0.535 0.542 0.353 0.687 0.285

Table 5.12: Mean Hellinger distance for different multiples of the expected false and negatives rates

43

5.4.3 Performance Analysis

In order to analyze if the architecture is able to reduce the uncertainty about the environment in

different scenarios and objects configurations, it is presented the mean value of the Hellinger distance

for different objects configurations and scenarios. In Table 5.13 and Table 5.14 is presented the mean

Hellinger distance of three objects for 100 steps for Scenario 1 and 2 respectively, considering 10 differ-

ent random object configurations and an initial uniform distribution on the location of the objects.

experiment 1 2 3 4 5cocacola KT 0.213 DT 0.403 CT 0.159 KC 0.239 CT 0.234mug gray KT 0.214 KC 0.243 KT 0.256 KT 0.241 S 0.274pringles S 0.178 KT 0.212 CT 0.156 CT 0.169 KT 0.234

experiment 6 7 8 9 10cocacola CT 0.161 S 0.215 DT 0.403 KT 0.204 DT 0.396mug gray KT 0.322 DT 0.410 KT 0.215 CT 0.164 DT 0.403pringles S 0.198 DT 0.399 KC 0.240 CT 0.161 CT 0.174

Table 5.13: Mean Hellinger distance for 100 steps for different object configurations in Scenario 1.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining Table

experiment 1 2 3 4 5cocacola S 0.233 DT 0.630 S 0.229 KC 0.433 BS 0.208mug gray DT 0.658 KT 0.478 DT 0.659 KC 0.435 B 0.436pringles NS 0.474 CT 0.243 DT 0.651 S 0.237 KT 0.388

experiment 6 7 8 9 10cocacola BS 0.215 B 0.446 NS 0.219 S 0.233 KT 0.342mug gray B 0.443 CT 0.449 B 0.234 DT 0.644 KC 0.353pringles KC 0.389 KC 0.208 BS 0.253 S 0.232 CT 0.284

Table 5.14: Mean Hellinger distance for 100 steps for different object configurations in Scenario 2.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,

BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand

Considering that a uniform distribution corresponds to Hellinger distance equal to approximately

0.74 for Scenario 1 and 0.8 for Scenario 2, it is possible to verify that for different configurations in

both scenarios the architecture is able to keep a reduced uncertainty about the environment. However,

in Scenario 2 the mean Hellinger distance is bigger than in Scenario 1 because its complexity is also

bigger.

In order to also analyze if the architecture is able to keep a reduced uncertainty about the location

of the objects when the objects configuration changes, in Table 5.15 and Table 5.16 is presented the

mean Hellinger distance for 200 steps for Scenario 1 and 2, respectively, on that conditions. For each

experiment is selected an initial random configuration of the location of the objects and an initial uniform

44

distribution. Approximately halfway, the location of at least two of the objects is modified.

experiment 1 2 3 4cocacola CT KC 0.178 KC KT 0.215 DT CT 0.333 S S 0.199mug gray KT KT 0.198 DT KC 0.285 DT KT 0.354 S CT 0.220pringles DT S 0.303 DT S 0.344 DT S 0.331 KT DT 0.292

experiment 5 6 7cocacola KT DT 0.325 DT S 0.310 S DT 0.321mug gray S DT 0.341 DT CT 0.349 S KC 0.188pringles S KT 0.186 S KC 0.242 DT S 0.284

experiment 8 9 10cocacola CT CT 0.162 KC KT 0.274 S CT 0.235mug gray CT KT 0.257 CT KC 0.261 S DT 0.345pringles DT S 0.315 CT CT 0.172 KC KT 0.208

Table 5.15: Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 1.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining Table

experiment 1 2 3 4cocacola BS NS 0.320 B CT 0.381 KT B 0.344 BS DT 0.449mug gray KC KC 0.298 DT KC 0.464 S BS 0.283 NS KC 0.447pringles KC S 0.286 CT KT 0.314 CT NS 0.260 KT CT 0.333

experiment 5 6 7cocacola NS NS 0.332 CT B 0.269 KC NS 0.396mug gray CT KT 0.374 DT NS 0.469 S S 0.206pringles NS S 0.326 KT CT 0.354 KC KT 0.363

experiment 8 9 10cocacola KT CT 0.316 KC CT 0.429 B CT 0.277mug gray S B 0.414 DT DT 0.613 B NS 0.382pringles KC KC 0.310 KT B 0.416 S KC 0.289

Table 5.16: Mean Hellinger distance for 200 steps, changing the objects configuration in Scenario 2.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,

BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand

In the experiments presented in Table 5.15 and Table 5.16 is possible to verify that the architecture

is able to keep the uncertainty lower. The objects located in the dining table have higher uncertainty

because there is just one placement in the dining room and the agent prefers exploring the remain rooms,

reducing, indirectly, the uncertainty about it without never observing it and reducing the uncertainty about

the remain objects.

45

5.5 Real scenario experiments

In order to test the architecture performance in a real scenario, the mbot was the robot used, pre-

sented in Figure 5.6(b). It is used by SocRob@Home team of Institute for Systems and Robotics as a

research tool to test and implement the work developed by the research community. The real scenario

used in the experiments is the ISRoboNet@Home testbed, presented in Figure 5.6(a).

(a) Testbed (b) Robot

Figure 5.6: Testbed and robot used for the real scenario experiments

In the real experiments, it is considered the real time for the exponential decay, using a mean life

λ−1 of 5 minutes. There are no assumptions in the actions time, using the real time that the agent

spends in each action. For the perception model a ROS wrapper of YOLOv3 is used, as mentioned at

the beginning of Chapter 5.

The location of the objects is presented in Table 5.17, where it is possible to verify that, approximately

halfway through the experiment, the cocacola is moved form the kitchen table to the sideboard.

time(s) object location

0-1150mug gray dining tablecocacola bedpringles coffee table

1151-2145mug gray dining tablecocacola sideboardpringles coffee table

Table 5.17: Location of the objects in the real scenario experiment

The Hellinger distance for each object, during the 35 minutes of the experiment, is presented in

Figure 5.7 and it is possible to verify that the architecture has a similar behavior both in the real scenario

and in the simulated experiments. When the cocacola changes the location the robot manages to figure it

out and updates the cocacola distribution in just about 2 minutes. The architecture is also able to reduce

the uncertainty about the pringles location, when it receives 2 consecutive false negatives. Thus, before

46

it left the room, the robot moves back to the coffee table and collects new observations, given the higher

uncertainty about the pringles location.

In order to complement the experiments presented in this Chapter, in the SocRob@Home Youtube

Channel 5, some videos of the real scenario experiments are available.

K: Kitchen L: Living room B: Bedroom D: Dining room

0 200 400 600 800 1000 1200 1400 1600 1800 2000

time(s)

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

L B L K L B L K L B L K L B L K L B L K L B L K

cocacola

False Positive

False negative

Pomdp selection

(a) cocacola

0 200 400 600 800 1000 1200 1400 1600 1800 2000

time(s)

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance


pringles

False Positive

False negative

Pomdp selection

(b) pringles

0 200 400 600 800 1000 1200 1400 1600 1800 2000

time(s)

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance


mug gray

False Positive

False negative

Pomdp selection

(c) mug gray

Figure 5.7: Hellinger distance over the time for each object, in the real behavior experiment

5https://www.youtube.com/playlist?list=PL8fxtCUfhUR1HcqrGb8WHZ-F0bRrhtk47. Accessed 14 Oct 2018

47

https://www.youtube.com/playlist?list=PL8fxtCUfhUR1HcqrGb8WHZ-F0bRrhtk47

5.6 Scalability Analysis

When the Knowledge Representation Engine creates the POMDPs models, it is necessary to find the

optimal policy for each one to use it in the Decision Maker. The use of POMDPs in real-world problems,

as the one presented in this work, has been limited by the poor scalability of existing solution algorithms

to find the optimal policy for a finite-horizon discrete POMDP. Large policy spaces and large state spaces

are two important sources of the scalability problem and that is the main motivation of having a Decision

Maker with several POMDPs. In the experiments presented it is used a Matlab implementation of the

Symbolic Perseus, a point-based value iteration algorithm, to get the optimal policy of each POMDP.

This implementation is not the most efficient, however, it enables us to compare and understand the

poor scalability of a solution, to find the optimal policy of a POMDP and understand how the presented

architecture tries to solve the problem.

The number of states of a POMDP depends on the number of state variables X ′k and the respective

domain D′k, that in the semantic mapping application is the number of objects and placements, where

each object can be located, respectively. The architecture presented enables us to reduce the number

of state variables of each POMDP, considering that some of the objects cannot be located in some of

the rooms, and enables us to reduce the domain of each state variable, considering just the placements

present in that room and the possibility of not being there.

Considering an alternative architecture, where the Decision Maker was composed by just one POMDP

representing all the environment, the set of state variables of the POMDP would be equal to the set off

all state variables X . Also, the number of POMDP states would be equal to∏Kk=1Dk, where K is equal

to the number of objects plus one and that is because of the Robot state variable. |Dk| is the number

of placements where the object or robot k can be located, considering that they are the same. For the

Decision Maker proposed in this architecture, the set of state variables Xn of each POMDPn is smaller

or equal to X , considering just the objects that can be located in that room. The domain D′k of each state

variable X ′k ∈ Xn is equal to the number of placements in that room plus one, given the possibility of the

object not being there. The relations between the POMDP number of states, observations, normal ac-

tions and prediction actions with the number of placements and objects are presented in Equation (5.3).

# states = (# placements + 1)# objects+1

# observations = 2# objects

# normal actions = (# placements + 2)

# prediction actions = (# objects)# placements+2

(5.3)

In Table 5.18, for different POMDPs with a different number of placements and objects, the runtime for

finding an optimal policy and the number of states, observations, normal actions and prediction actions

48

is presented. It is possible to verify that the problem is intractable when the number of placements and

objects increases. The crux of using this architecture is that the runtime, instead of depending on the

total number of objects and placements of the world model, just depends on the number of objects and

placements of the most complex room.

# placements # objects # states # observations# normalactions

# predictionactions

runtime

12 8 4 3 8 33s3 16 8 3 27 4m12s4 32 16 3 64 31m

22 27 4 4 16 02h16m3 81 8 4 81 08h30m4 243 16 4 256 26h50m

32 64 4 5 32 03h30m3 256 8 5 243 98h20m4 1024 16 5 1024 526h46m

Table 5.18: Scalability analysis of different POMDP models used in the Decision Maker

The architecture deals with offline POMDPs, which means that the problem of finding the optimal

policy just needs to be done at the beginning of the architecture operation. This could mislead one into

thinking that the complexity and the runtime, needed to find that optimal policy, are not a big problem.

However, as explained before, a domestic environment is a dynamic environment and for that reason,

even the world model is constantly changing, the objects and placements considered can change. This

architecture is able to, given a new world model, generate the new POMDPs and then find the new

optimal policies.

5.7 Discussion

In the simulated experiments as well as in the real scenario experiments, the agent is able to have

an active behavior to keep the level of uncertainty reduced, as desired. The architecture is able to deal

with the dynamism of the environment, as it is verified in Chapter 5.4.1.B. In the results presented, when

there was any modification in the state of the environment, the architecture was able to minimize the

uncertainty about the world state, in a short amount of time or steps and in different conditions: moving

the object to a placement inside the same room or to a different room, moving one or multiple objects

simultaneously, making the robot first observe the object in a new location or that is not in the previous

one, etc.

The architecture presents a good performance dealing with incorrect observations as verified in

Table 5.12. Even when there is twice as many incorrect observations rate as expected by the model,

49

the results keep expressing a good performance of the architecture in representing the location of the

objects.

It is also possible to verify, that the approach for selecting the POMDP, given the global belief, can

efficiently minimize the uncertainty in the location of all the objects. The architecture presents some

interesting behaviors, as deciding to explore more times the rooms with more placements, when the

goal is finding the location of all the objects, which makes perfect sense given the robot’s purpose. Most

of the times, the robot does not even visit the dining room because it has just one placement and it can

infer the probability of the object being there, knowing if it is somewhere else.

The results obtained in the real scenario reinforce the simulated experiment results and allow to fulfill

the proposed objective, of having a real robot able to keep an updated probabilistic representation of the

environment and use that information for decision making.

The results presented in Chapter 5.6 about the scalability analysis, allow verifying that the approach

of having a Decision Maker composed by several POMDPs, each one representing a room, can minimize

significantly the complexity of finding the optimal behavior in a domestic environment. In the architecture

presented, the complexity of the decision making no longer depends on the number of placements in

the global world, thus depending on the number of objects and the maximum number of placements in

a room.

50

6Conclusion

6.1 Achievements

In this work an efficient architecture able to keep a global representation about the complex world

states is presented, taking decisions to reduce the uncertainty and even accomplish a goal. The ap-

plication of the architecture in the semantic mapping context, allows us to create a system that is able

to keep an updated probabilistic representation of the location of the objects and eventually use that

information to carry out a task. The architecture presented is also robust to incorrect observations as

presented in the experiment results.

A method of bypassing the problem of finding the optimal policy on large POMDP is presented,

having multiple POMDPs and reducing the number of states of each one. However, this also requires a

system to control the different POMDPs and keep a representation of the global world state, presented

in this work as the Knowledge representation Engine.

Another important achievement presented in this work, is to have an architecture that is responsible

for the full generation of the POMDP models for the semantic mapping application, using just the world

model to infer the states, observation, transition and rewards function of each POMDP, avoiding the

explicit declaration of the model.

The results that were obtained are another important achievement, supporting the purpose and

presenting the performance desired. As well as the architecture implementation in a real scenario, with

a real robot. Verifying that the robot was able to keep moving autonomously inside the testbed, changing

its behavior given the location of the objects and the internal belief. The robot was able to maintain a

lower uncertainty about the state of the world, as intended.

51

6.2 Future Work

As a follow-up to this work, there is a lot of interesting studies and experiments that could be done,

such as:

• Using multiple robots, and adapting the architecture in order to have multiple robots exploring

multiple rooms.

• Exploring the ProbLog inference process to add more inference in the Knowledge Representation

Engine to update the belief

• Using techniques of Inverse Reinforcement Learning in order to define the transition, observation

and reward function of each POMDP

52

Bibliography

[1] I. Kostavelis and A. Gasteratos, “Semantic mapping for mobile robotics tasks: A survey,” Robotics

and Autonomous Systems, vol. 66, pp. 86–103, 2015.

[2] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable

stochastic domains,” Artificial Intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.

[3] A. Pronobis, K. Sjoo, A. Aydemir, A. N. Bishop, and P. Jensfelt, “Representing spatial knowledge in

mobile cognitive systems,” Intelligent Autonomous Systems 11, IAS 2010, pp. 133–142, 2010.

[4] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard,

“Past, present, and future of simultaneous localization and mapping: Toward the robust-perception

age,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1309–1332, 2016.

[5] S. Thrun, “Robotic Mapping: A Survey,” Science, vol. 298, no. February, pp. 1–35, 2002.

[6] J.-A. Fernandez-Madrigal, J. Gonzalez, C. Galindo, and A. Saffiotti, “Robot task planning using

semantic maps,” Robotics and Autonomous Systems, vol. 56, no. 2008, pp. 955–966, 2008.

[7] L. De Raedt and A. Kimmig, “Probabilistic Programming Concepts,” arXiv preprint arXiv:1312.4328,

pp. 1–42, 2013.

[8] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed., ser. Series in Artificial

Intelligence. Upper Saddle River, NJ: Prentice Hall, 2010.

[9] D. Fierens, G. Van Den Broeck, and J. Renkens, “Inference and Learning in Probabilistic Logic

Programs using Weighted Boolean Formulas,” Theory and Practice of Logic Programming, vol.

15:3, pp. 358 – 401, 2013.

[10] A. Dries, A. Kimmig, W. Meert, J. Renkens, G. V. D. Broeck, J. Vlasselaer, and L. D. Raedt,

“ProbLog2 : Probabilistic logic programming,” in Machine Learning and Knowledge Discovery in

Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015,

Proceedings, Part III, 2015, pp. 312–315.

53

[11] M. Wiering and M. van Otterlo, Reinforcement Learning: State-of-the-Art. Springer Publishing

Company, Incorporated, 2014.

[12] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An anytime algorithm for

POMDPs,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence, ser.

IJCAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003, pp. 1025–1030.

[13] M. T. Spaan and N. Vlassis, “Perseus: Randomized point-based value iteration for pomdps,” Journal

of artificial intelligence research, vol. 24, pp. 195–220, 2005.

[14] M. T. Spaan, T. S. Veiga, and P. U. Lima, “Decision-theoretic planning under uncertainty with infor-

mation rewards for active cooperative perception,” Autonomous Agents and Multi-Agent Systems,

vol. 29, no. 6, pp. 1157–1185, 2015.

[15] H. Zender, O. M. Mozos, P. Jensfelt, G.-j. M. Kruijff, and W. Burgard, “Conceptual Spatial Represen-

tations for Indoor Mobile Robots,” Robotics and Autonomous Systems, vol. 56, no. 6, pp. 493–502,

2008.

[16] C. Galindo, A. Saffiotti, S. Coradeschi, P. Buschka, J. A. Fernandez-Madrigal, and J. Gonzalez,

“Multi-hierarchical semantic maps for mobile robotics,” 2005 IEEE/RSJ International Conference on

Intelligent Robots and Systems, IROS, no. 3, pp. 3492–3497, 2005.

[17] R. Capobianco, J. Serafin, J. Dichtl, G. Grisetti, L. Iocchi, and D. Nardi, “A proposal for semantic

map representation and evaluation,” 2015 European Conference on Mobile Robots, ECMR 2015 -

Proceedings, 2015.

[18] A. Nuchter and J. Hertzberg, “Towards semantic maps for mobile robots,” Robotics and Autonomous

Systems, vol. 56, no. 11, pp. 915–926, 2008.

[19] A. Pronobis and P. Jensfelt, “Large-scale semantic mapping and reasoning with heterogeneous

modalities,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 3515–

3522, 2012.

[20] D. Pangercic, B. Pitzer, M. Tenorth, and M. Beetz, “Semantic Object Maps for robotic housework

- Representation, acquisition and use,” IEEE International Conference on Intelligent Robots and

Systems, pp. 4644–4651, 2012.

[21] E. Bastianelli, D. D. Bloisi, R. Capobianco, F. Cossu, G. Gemignani, L. Iocchi, and D. Nardi, “On-line

semantic mapping,” 2013 16th International Conference on Advanced Robotics, ICAR 2013, 2013.

[22] M. Hanheide, C. Gretton, R. Dearden, N. Hawes, J. Wyatt, A. Pronobis, A. Aydemir, M. Gobel-

becker, and H. Zender, “Exploiting probabilistic knowledge under uncertain sensing for efficient

54

robot behaviour,” IJCAI International Joint Conference on Artificial Intelligence, pp. 2442–2449,

2011.

[23] T. S. Veiga, P. Miraldo, R. Ventura, and P. U. Lima, “Efficient object search for mobile robots in

dynamic environments: Semantic map as an input for the decision maker,” IEEE International Con-

ference on Intelligent Robots and Systems, vol. 2016-Novem, pp. 2745–2750, 2016.

[24] M. Araya, O. Buffet, V. Thomas, and F. Charpillet, “A POMDP Extension with Belief-dependent

Rewards,” in Advances in Neural Information Processing Systems 23. Curran Associates, Inc.,

2010, pp. 64–72.

55

AP (Xk|Z) derivation

The probability distribution P (Xk|Z) depends on the function fk,n : Dk → D′

k of POMDPn, that

associates each element of Dk a single element of D′

k, on the prior P (Xk) and on the distribution

P(X′

k|Z)

, provided by one of the POMDPs in the Decision Maker. Dk and D′

k are the set of values

of the variable Xk and X′

k, respectively. Taking into account the characteristics of the function π, the

probability P(X′

k|Xk

)is given by Equation (A.1).

P(X′

k|Xk

)=

1 if X

′

k = fk,n (Xk)

0 otherwise(A.1)

Then, the probability P (Xk|Z) can be derived as presented in (A.2).

P (Xk|Z) =∑X′k

P(Xk, X

′

k|Z)

=∑X′k

P(Xk|X

′

k, Z)P(X′

k|Z)

=∑X′k

P(Xk|X

′

k

)P(X′

k|Z) (

Xk ⊥⊥ Z|X′

k

)

=∑X′k

P(X′

k|Xk

)P (Xk)

P(X′k

) P(X′

k|Z)

(Bayes Rule)

=P (Xk)∑

x∈Dk

P(X′k|Xk = x

)P (Xk = x)

P(X′

k|Z)

, with X′

k = fk,n (Xk)

(A.2)

56

BWorld Model Files Examples

B.1 Furniture Model example

name,room,x,y,area

"kitchen_table","kitchen","5.648537","-1.205729","2.2"

"kitchen_cabinet","kitchen","4.665875","-0.266732","2.5"

"dining_table","dining_room","7.473658","-1.135156","2.0"

"coffee_table","living_room","6.496987","-3.889012","0.5"

"sideboard","living_room","7.631555","-3.9","0.5"

B.2 Objects Model example

name,category,distribution,volume

"cocacola","drink","uniform",0.355

"pringles","snack","uniform",0.375

57

CExperiment tables and figures

C.1 Scenario 1 - Static Environment Model

step object coffeetable

diningtable

kitchencabinet

kitchentable sideboard robot

location action

0

mug gray 0.200 0.200 0.200 0.200 0.200

out goCoffee tablemug black 0.200 0.200 0.200 0.200 0.200cocacola 0.200 0.200 0.200 0.200 0.200pringles 0.200 0.200 0.200 0.200 0.200

1

mug gray 0.200 0.200 0.200 0.200 0.200coffeetable searchObjectmug black 0.200 0.200 0.200 0.200 0.200

cocacola 0.200 0.200 0.200 0.200 0.200pringles 0.200 0.200 0.200 0.200 0.200

2

mug gray 0.860 0.011 0.011 0.011 0.108coffeetable searchObjectmug black 0.049 0.243 0.243 0.243 0.221


3

mug gray 0.984 0.000 0.000 0.000 0.015coffeetable goSideboardmug black 0.011 0.259 0.259 0.259 0.214


4

mug gray 0.984 0.000 0.000 0.000 0.015

sideboard searchObjectmug black 0.011 0.259 0.259 0.259 0.214cocacola 0.004 0.260 0.260 0.260 0.215pringles 0.003 0.261 0.261 0.261 0.215

5

mug gray 0.996 0.000 0.000 0.000 0.003


6

mug gray 0.999 0.000 0.000 0.000 0.001


58


diningtable

kitchencabinet


location action

7

mug gray 0.999 0.000 0.000 0.000 0.000

sideboard doNothingmug black 0.010 0.329 0.329 0.329 0.002cocacola 0.004 0.332 0.332 0.332 0.001pringles 0.003 0.332 0.332 0.332 0.000

8

mug gray 0.999 0.000 0.000 0.000 0.000

sideboard goKitchen tablemug black 0.010 0.329 0.329 0.329 0.002cocacola 0.004 0.332 0.332 0.332 0.001pringles 0.003 0.332 0.332 0.332 0.000

9

mug gray 0.999 0.000 0.000 0.000 0.000kitchentable searchObjectmug black 0.010 0.329 0.329 0.329 0.002


10

mug gray 0.999 0.000 0.000 0.000 0.000kitchentable searchObjectmug black 0.014 0.465 0.424 0.094 0.003


11

mug gray 0.999 0.000 0.000 0.000 0.000kitchentable goKitchen cabinetmug black 0.016 0.523 0.436 0.021 0.004


12

mug gray 0.999 0.000 0.000 0.000 0.000kitchencabinet searchObjectmug black 0.016 0.523 0.436 0.021 0.004


13

mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet searchObjectmug black 0.000 0.015 0.979 0.006 0.000


14

mug gray 1.000 0.000 0.000 0.000 0.000kitchencabinet doNothingmug black 0.000 0.000 0.999 0.001 0.000


15



16



17



18



Table C.1: State variables distributions and actions for a static model of the environment

59

C.2 Scenario 1 - Dynamic Environment Model


diningtable

kitchencabinet


location action

0

mug gray 0.200 0.200 0.200 0.200 0.200

out goCoffee tablemug black 0.200 0.200 0.200 0.200 0.200cocacola 0.200 0.200 0.200 0.200 0.200pringles 0.200 0.200 0.200 0.200 0.200

10

mug gray 0.005 0.327 0.332 0.332 0.005diningroom goKitchen tablemug black 0.003 0.003 0.000 0.000 0.994


20

mug gray 0.170 0.648 0.004 0.008 0.170

kitchen goCoffee tablemug black 0.100 0.095 0.003 0.003 0.801cocacola 0.000 0.003 0.003 0.994 0.000pringles 0.801 0.095 0.003 0.003 0.100

28



36

mug gray 0.106 0.782 0.004 0.007 0.101


43



50

mug gray 0.095 0.797 0.007 0.007 0.095


58



66

mug gray 0.103 0.783 0.004 0.007 0.103


73



80

mug gray 0.095 0.797 0.007 0.007 0.095


60


diningtable

kitchencabinet


location action

88



96

mug gray 0.103 0.783 0.004 0.007 0.103


103



110

mug gray 0.095 0.797 0.007 0.007 0.095


Table C.2: State variables distributions in POMDP selection steps, for a dynamic model of the environment

61

C.3 Scenario 2 - Objects Changing Position

step object bed bookshelf coffeetable

diningtable

kitchencabinet

kitchentable

nightstand sideboard robot

location

0mug gray 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

outcocacola 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125pringles 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

10mug gray 0.197 0.007 0.007 0.195 0.195 0.195 0.197 0.007 living

roomcocacola 0.199 0.004 0.004 0.197 0.197 0.197 0.199 0.004pringles 0.000 0.002 0.990 0.002 0.002 0.002 0.000 0.002

20mug gray 0.004 0.085 0.085 0.246 0.246 0.246 0.004 0.085

bedroomcocacola 0.002 0.082 0.082 0.250 0.250 0.250 0.002 0.082pringles 0.002 0.067 0.664 0.066 0.066 0.066 0.002 0.067

31mug gray 0.097 0.005 0.005 0.257 0.261 0.261 0.097 0.017 living


40mug gray 0.184 0.083 0.083 0.360 0.005 0.005 0.184 0.096

kitchencocacola 0.002 0.000 0.000 0.002 0.002 0.993 0.002 0.000pringles 0.055 0.056 0.720 0.055 0.002 0.002 0.055 0.056

50mug gray 0.225 0.005 0.002 0.360 0.082 0.082 0.225 0.018 living


59mug gray 0.002 0.076 0.074 0.435 0.155 0.155 0.013 0.090


70mug gray 0.096 0.005 0.005 0.379 0.197 0.197 0.103 0.018 living


79mug gray 0.161 0.074 0.073 0.430 0.004 0.004 0.168 0.086


89mug gray 0.202 0.004 0.002 0.407 0.080 0.080 0.208 0.016 living


96mug gray 0.002 0.000 0.000 0.002 0.002 0.002 0.992 0.000


103mug gray 0.052 0.002 0.002 0.052 0.051 0.051 0.790 0.002 living


110mug gray 0.091 0.052 0.052 0.091 0.004 0.004 0.656 0.052


118mug gray 0.130 0.003 0.002 0.126 0.059 0.059 0.583 0.038 living


125mug gray 0.002 0.000 0.000 0.002 0.002 0.002 0.993 0.000


62


diningtable

kitchencabinet

kitchentable


location

133mug gray 0.064 0.002 0.002 0.064 0.063 0.063 0.740 0.002 living


140mug gray 0.102 0.052 0.052 0.102 0.004 0.004 0.630 0.052


149mug gray 0.148 0.010 0.004 0.144 0.073 0.073 0.545 0.004 living


156mug gray 0.002 0.000 0.000 0.002 0.002 0.002 0.993 0.000


164mug gray 0.064 0.002 0.002 0.064 0.063 0.063 0.740 0.002 living


171mug gray 0.102 0.052 0.052 0.102 0.004 0.004 0.630 0.052


180mug gray 0.148 0.010 0.004 0.144 0.073 0.073 0.545 0.004 living


190mug gray 0.002 0.105 0.096 0.298 0.196 0.196 0.012 0.096


201mug gray 0.091 0.003 0.018 0.313 0.236 0.236 0.098 0.006 living


211mug gray 0.002 0.073 0.084 0.290 0.237 0.237 0.002 0.075


220mug gray 0.075 0.005 0.016 0.303 0.261 0.261 0.075 0.005 living


228mug gray 0.002 0.059 0.068 0.291 0.258 0.258 0.004 0.059


237mug gray 0.073 0.004 0.013 0.291 0.270 0.270 0.075 0.004 living


245mug gray 0.002 0.000 0.000 0.002 0.993 0.002 0.002 0.000


246mug gray 0.004 0.000 0.000 0.004 0.986 0.004 0.004 0.000 living


247mug gray 0.003 0.001 0.001 0.003 0.986 0.004 0.003 0.001 living


252mug gray 0.002 0.000 0.000 0.002 0.993 0.002 0.002 0.000


63


diningtable

kitchencabinet

kitchentable


location

253mug gray 0.003 0.000 0.000 0.003 0.986 0.004 0.003 0.000 living


254mug gray 0.003 0.001 0.001 0.003 0.986 0.004 0.003 0.001 living


259mug gray 0.002 0.000 0.000 0.002 0.993 0.002 0.002 0.000


260mug gray 0.003 0.000 0.000 0.003 0.986 0.004 0.003 0.000 living


Table C.3: State variables distributions in POMDP selection steps, with changes in the location of the objects

step 0 10 20 31 40 50 59 70 79bedroom 19.506 19.733 31.252 22.358 20.553 20.641 30.401 22.437 20.815kitchen 19.848 20.516 20.052 20.076 31.077 22.579 20.911 21.001 31.367

dining room 25.506 25.616 25.631 25.669 25.574 25.628 25.650 25.674 25.603living room 13.217 29.050 14.121 28.005 14.515 29.530 14.379 28.448 14.533







Table C.4: Expected rewards for POMDP selection steps in a experiment with objects changing location

64

C.4 Scenario 2 - Wrong Observations Robustness

roomrobot

locationobject location cocacola pringles mug gray

livingroom

coffeetable

coffee table 0.873 0.9 0.8sideboard 0.099 0.099 0.099bookshelf 0.1 0.1 0.1

none 0.01 0.01 0.01

sideboard


none 0.01 0.01 0.01

bookshelf


none 0.01 0.01 0.01out - 0 0 0

kitchen

kitchencabinet

kitchen cabinet 0.873 0.9 0.8kitchen table 0.096 0.096 0.096

none 0.01 0.01 0.01

kitchentable

kitchen cabinet 0.096 0.096 0.096kitchen table 0.873 0.9 0.8

none 0.01 0.01 0.01out - 0 0 0

bedroom

bednight stand 0.093 0.093 0.093

bed 0.873 0.9 0.8none 0.01 0.01 0.01

nightstand

night stand 0.873 0.9 0.8bed 0.093 0.093 0.093

none 0.01 0.01 0.01out - 0 0 0

diningroom

dining tabledining table 0.873 0.9 0.8

none 0.01 0.01 0.01out - 0 0 0

Table C.5: Probabilities of observing each object in Scenario 2

65

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

cocacola

False Positive

False negative

(a) cocacola (8 false positives, 3 flase negatives)

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

pringles

False Positive

False negative

(b) pringles (0 false positives, 2 flase negatives)

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

mug gray

False Positive

False negative

(c) mug gray (3 false positives, 12 flase negatives)

Figure C.1: Hellinger distance considering wrong observations for objects in configuration 1

66

0 50 100 150 200 250 300 350

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

cocacola

False Positive

False negative

(a) cocacola (8 false positives, 2 false negatives)

0 50 100 150 200 250 300 350

step

0

0.2

0.4

0.6

0.8

1

He

llin

ge

r d

ista

nce

pringles

False Positive

False negative

(b) pringles (3 false positives, 11 false negatives)

0 50 100 150 200 250 300 350

step

0

0.2

0.4

0.6

0.8

1

Helli

nger

dis

tance

mug gray

False Positive

False negative

(c) mug gray (0 false positives, 8 false negatives)

Figure C.2: Hellinger distance considering wrong observations for objects in configuration 2

67

Date post:	06-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Active Semantic Mapping for a Domestic Service Robot · Abstract Title: Active Semantic Mapping for...

Documents