+ All Categories
Home > Documents > A computer vision-based proposal for seat occupancy ...

A computer vision-based proposal for seat occupancy ...

Date post: 15-Apr-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
76
FACULDADE DE E NGENHARIA DA UNIVERSIDADE DO P ORTO A computer vision-based proposal for seat occupancy monitoring applied to FEUP’s library José Miguel Seruca Veloso Mestrado Integrado em Engenharia Eletrotécnica e de Computadores Supervisor: Paulo José Lopes Machado Portugal July 30, 2021
Transcript
Page 1: A computer vision-based proposal for seat occupancy ...

FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO

A computer vision-based proposal forseat occupancy monitoring applied to

FEUP’s library

José Miguel Seruca Veloso

Mestrado Integrado em Engenharia Eletrotécnica e de Computadores

Supervisor: Paulo José Lopes Machado Portugal

July 30, 2021

Page 2: A computer vision-based proposal for seat occupancy ...

© José Veloso, 2021

Page 3: A computer vision-based proposal for seat occupancy ...

Abstract

The surveillance of occupancy has been an area of great interest, both for resource managementand behavioral analysis. Currently, the use of infrared-based technology is already well docu-mented and their limits are known. Consequently, it is necessary to explore new methods thatallow the extraction of information to determine occupation.

This dissertation focuses on the analysis, design and implementation of a seat occupancy moni-toring system based on computer vision. Convolutional neural networks are explored as to providethe ability of creating status mapping of all seating options available in FEUP’s library. The thesisis focused on the study and conceptual validation of the proposed detection system, which includesthe implementation of a cloud-hosted dashboard and database.

The feasibility of implementing this system is confirmed by results obtained on a select testsite. From observational evidence, it is possible to prove the general concept and reveal applica-bility of the developed work in a real, functional context.

i

Page 4: A computer vision-based proposal for seat occupancy ...

ii

Page 5: A computer vision-based proposal for seat occupancy ...

Agradecimentos

Em primeiro lugar, desejo agradecer ao Professor Doutor Paulo Portugal pela disponibilidade,paciência e dedicação demonstradas desde que iniciei este processo de completar a dissertação.

Aos Serviços da Biblioteca, em particular à Doutora Cristina Lopes, e aos Serviços Técnicos,pela assistência indispensável.

À comunidade da Faculdade de Engenharia da Universidade do Porto, que me acolheu e guiounestes últimos 5 anos.

Aos meus pais, Helena Seruca e José Veloso, pelo apoio e compreensão, por todos os sacrifí-cios que fizeram e por tudo o que me ensinaram.

À minha irmã, Joana Veloso, por me acompanhar e animar.À minha família, que me proporcionou sempre o sentimento de orgulho e pertença.Aos meus companheiros de curso, André Reis, Tomás Fonseca e Tiago Sousa, pela soli-

dariedade e influência em diferentes momentos desta jornada.Aos meus amigos de sempre, Alex Himmel, Joana Morais e Matias Schöner, por todos as

experiências que partilhamos, e pelo companheirismo que nunca deixou de existir.A todos eles, e tantos outros que de alguma forma me marcaram,Muito Obrigado.

José Veloso

iii

Page 6: A computer vision-based proposal for seat occupancy ...

iv

Page 7: A computer vision-based proposal for seat occupancy ...

“Take time to smell the roses”

Sir Robert William Robson

v

Page 8: A computer vision-based proposal for seat occupancy ...

vi

Page 9: A computer vision-based proposal for seat occupancy ...

Contents

1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Subjects and Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Review 52.1 Seat Occupancy Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Based on PIR sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Based on computer vision . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 System Requirements 173.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Non-Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Occupancy Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.2 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 System Architecture 214.1 Vision-based Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.1 Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Camera Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.3 Definition of detection algorithm . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 System Implementation 275.1 Module Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.1 Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1.2 Camera Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Image Segmentation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 295.2.1 General Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.2 Algorithm Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3 Dashboard and Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

vii

Page 10: A computer vision-based proposal for seat occupancy ...

viii CONTENTS

6 System Testing 376.1 Methods and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.1.1 Model Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.1.2 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 Conclusion and Future Development 517.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.2 Future Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

References 55

Page 11: A computer vision-based proposal for seat occupancy ...

List of Figures

2.1 Wireless desk sensor with polyethylene lens (altered) [1] . . . . . . . . . . . . . 62.2 Fresnel Design [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Concept of IR-detection [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Commercial architecture for a sensor network [1] . . . . . . . . . . . . . . . . . 72.5 Use case of installation under a desk[1] . . . . . . . . . . . . . . . . . . . . . . 82.6 Overview of a CNN’s architecture and training process [4] . . . . . . . . . . . . 92.7 Convolution operation. [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 Max Pooling, downsampling of an input tensor by a factor of 2 [4] . . . . . . . . 102.9 Prototype example. Raspberry Pi 3 equipped with Intel Neural Compute Stick 2

and wide-angle camera. [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.10 Prototype from Figure 2.9. Image capture and classification [5] . . . . . . . . . . 112.11 Summary of Models in the R-CNN family [6] . . . . . . . . . . . . . . . . . . . 122.12 YOLOv1 Architecture [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.13 Bounding Box Structure (altered) [7] . . . . . . . . . . . . . . . . . . . . . . . . 142.14 SSD Architecture, featuring VGG16 as feature extractor [7] . . . . . . . . . . . . 15

4.1 Common study area (FEUP Library: 2nd-4th floor) [8] . . . . . . . . . . . . . . 224.2 Comparison of several detection frameworks [9], COCO dataset [10] . . . . . . . 244.3 GPU time (milliseconds) for each model, for image resolution of 300x300 [9] . . 244.4 Data Structure, each library floor contains a number of active devices responsible

for monitoring several seating areas . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Model representing major system components and communications . . . . . . . 26

5.1 Raspberry Pi 4 Model B (8GB RAM) [11] . . . . . . . . . . . . . . . . . . . . . 285.2 Raspberry Pi Camera Module v2 [12] . . . . . . . . . . . . . . . . . . . . . . . 295.3 2nd Floor Plan, camera position (dotted circle) and covered area (black) . . . . . 295.4 Still from equivalent area depicted on Figure 5.3 . . . . . . . . . . . . . . . . . . 305.5 Seating areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.6 Contention Overlapping Areas, final status determined by the dashboard . . . . . 315.7 Image processing loop in application . . . . . . . . . . . . . . . . . . . . . . . . 325.8 Frame selection cycle, each transition occurs after samplingrate × processing units. 335.9 Use of OpenCV modules for image resizing, normalization and network forwarding 335.10 A visual representation of mean subtraction where the RGB mean (center) has

been calculated and subtracted from the original image (left), resulting in the out-put image (right) [13]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.11 Thingsboard Monolythic Architecture [14] . . . . . . . . . . . . . . . . . . . . . 355.12 cURL requests: URL composed by host, access token and telemetry specification 36

6.1 Module’s CPU and Memory Usage while running the application . . . . . . . . . 37

ix

Page 12: A computer vision-based proposal for seat occupancy ...

x LIST OF FIGURES

6.2 Original (green) and provisional (red) cameras, targeting Area 1 . . . . . . . . . 386.3 Seating areas tested, targeting Area 1 . . . . . . . . . . . . . . . . . . . . . . . . 396.4 Area 0 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21 . . . . . . . . . . . 396.5 Area 1 On-Off Status Chart, 08:00AM - 19:30PM . . . . . . . . . . . . . . . . . 406.6 Area 1 On-Off Status Chart (w/ provisional camera), 08:00AM - 19:30PM, 24-05-21 406.7 Area 2 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21 . . . . . . . . . . . 416.8 Area 3 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21 . . . . . . . . . . . 416.9 Area 0 On-Off Status Chart, 08:00AM - 17:00PM . . . . . . . . . . . . . . . . . 426.10 Volume of transport messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.11 Capacity for telemetry data storage . . . . . . . . . . . . . . . . . . . . . . . . . 436.12 Transport hourly activity, 14-day period . . . . . . . . . . . . . . . . . . . . . . 446.13 Telemetry persistence hourly activity, 14-day period . . . . . . . . . . . . . . . . 446.14 Hourly average of state changes over a 2-week period, 4-colored stacked bars . . 456.15 Hourly average of state changes over a 2-week period, 4-colored lines . . . . . . 456.16 4 Area Status Charts over a 2-week period . . . . . . . . . . . . . . . . . . . . . 466.17 Total hourly average over a 2-week period . . . . . . . . . . . . . . . . . . . . . 466.18 Current total calculated upon clicking Update . . . . . . . . . . . . . . . . . . . 476.19 View of widget’s (Figure 6.18) HTML editor . . . . . . . . . . . . . . . . . . . 476.20 Image Map with zero occupied areas (all green) and thermometer (blue) . . . . . 486.21 Image Map with two occupied (red), vacant (green), areas and thermometer (green) 496.22 Remaining versions of thermometer marker [15] . . . . . . . . . . . . . . . . . . 49

Page 13: A computer vision-based proposal for seat occupancy ...

List of Tables

3.1 Client Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Requirement Analysis: Detection System . . . . . . . . . . . . . . . . . . . . . 183.3 Requirement Analysis: Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 Main specifications of testing system . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Comparison of detection frameworks (from Table 3 [16]), PASCAL VOC dataset

[17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

xi

Page 14: A computer vision-based proposal for seat occupancy ...

xii LIST OF TABLES

Page 15: A computer vision-based proposal for seat occupancy ...

xiii

Page 16: A computer vision-based proposal for seat occupancy ...

xiv ABREVIATURAS E SÍMBOLOS

Abbreviations

API Application Programming InterfaceCNN Convolutional Neural NetworkCoAP Constrained Application ProtocolCPU Central Processing UnitCSI Camera Serial IntefacecURL Client URLFC Fully ConnectedFEUP Faculdade de Engenharia da Universidade do PortoFOV Field of ViewFPS Frames per secondGDPR General Data Protection RegulationGPIO General-Purpose Input/OutputGPU Graphics Processing UnitHTML HyperText Markup LanguageHTTP Hypertext Transfer ProtocolIR InfraredJSON JavaScript Object NotationJVM Java Virtual MachinemAP mean Average PrecisionmQTT Message Queuing Telemetry TransportMS COCO MicroSoft Common Objects in ContextOS Operating SystemPC Personal ComputerPIR Pyroelectric Infra-RedQR Quick ResponseRAM Random Access MemoryR-CNN Region Based Convolutional Neural NetworksRGB Red Green BlueRPN Region Proposal NetworkReLU Rectified Linear UnitREST Representational State TransferSPI Serial Peripheral InterfaceSQL Structured Query LanguageSSD Single Shot DetectorUART Universal Asynchronous Receiver-TransmitterUI User InterfaceUML Unified Modeling LanguageURL Uniform Resource LocatorUSB Universal Serial BusVOC Visual Object ClassesYOLO You Only Look Once

Page 17: A computer vision-based proposal for seat occupancy ...

Chapter 1

Introduction

1.1 Context

The library belonging to the Faculdade de Engenharia da Universidade do Porto(FEUP) is a public

space, and highly sought-after by the student community. There is therefore a need for control and

an efficient use of the available seating areas. Occupation spikes and congestion are a daily occur-

rence, while some periods of the year prove to be particularly challenging in terms of management,

specially during examination season.

As a result of these concerns, the Library’s Services have identified the need for moderniza-

tion of monitoring systems currently in place. This has been aggravated by increased alert and

restrictions caused by the ongoing COVID-19 pandemic.

Though there are commercial solutions available for occupancy detection, they rely on highly-

priced components and proprietary software. An opportunity arises for internal development of a

system which is economically more viable, and simultaneously supportive of integration, modifi-

cation and expansion.

1.2 Motivation

The library managed to set up a temporary solution [18] for determining seat occupancy. The sys-

tem, however, relies on the cooperation of students, needing each one to signal their presence via

Quick Response (QR) code readings. This manifests itself as a significant issue for data integrity,

as it cannot be expected to use this information for determining an individual seat’s current status.

At this time, the system is only able to display total counts for each floor, which also cannot be in-

terpreted literally, only indicating a trend in overall occupation. This represents another issue, not

only for real-time monitoring, but for the compilation of historical and statistical figures, which

could assist in future management methods and initiatives.

To combat this, the library’s administration is seeking a form of seat status monitoring offering

the ability to assign a state of occupation to all seating options. This aspect involves the means to

actively detect and process occupancy across the building. The problem presented demands the

1

Page 18: A computer vision-based proposal for seat occupancy ...

2 Introduction

conception of a distributed system and near real-time operation. In widely available applications

currently, there is a general use of modules adopting the infrared sensing technology. Beyond

concerns regarding economic and development issues, the technology itself proves to be relatively

limited in range and reliability of detection.

To this end, other commercial alternatives and ongoing research have emerged on the field of

computer vision to tackle the problem of building monitoring. The promise shown on early stages

of convolutional neural networks use reveals the potential involved in developing and implement-

ing a solution based on vision, highlighted by the use of a camera and on-board processing.

A solution similar to these applications possesses the power to accurately assess seating areas’

status. It also minimises the use of hardware related to the amount of seating analysed, potentially

driving down costs. If the system proves to be reliable and flexible enough for use in related

in applications, it can constitute a significant advancement in investigation of this nature. The

perfecting of such a system represents possibilities for diversification and feature expansion of the

Library’s current management methods.

1.3 Goals

The primary goal of this dissertation comprehends the development of a system capable of people

detection, and mapping out floor occupancy in a reliable and efficient manner. The proposal

must be capable of undertaking the status mapping of entire floors, indicating the location and

occupation of available seating areas through the use of a cloud-hosted platform accessible for

administrators and users alike. To support this pursuit, the dissertation encompasses the following

goals:

• Research and develop a system concept, centered on people detection, including module and

camera physical alignment, as to optimise precision and quality of data. Part of this thesis

focuses on the interaction between physical factors and the efficacy of certain convolutional

neural networks. As such, studying the manner in which these are affected becomes vital

for defining an equilibrium, which will determine the choice of a detector.

• Develop and implement a working dashboard and database, hosted and available online.

Once occupancy status is determined at the low-level, data must be transmitted and stored

appropriately to secure integrity. A web interface manages and manipulates the data to

display past statistical or real-time status information in a digestible form for the user.

• Implementation and testing of the prototype, as the dissertation’s completion relies on the

verification and validation of the system for the desired use case. Therefore, the installation

on a verifiable test site is essential to observe and confirm model behavior. The confirmation

of the concept implies the possibility for replication across the entire building, and opens up

the possibility of expansion to other immediate applications, namely detection of common

objects (books, handbags, coats, etc.).

Page 19: A computer vision-based proposal for seat occupancy ...

1.4 Subjects and Structure 3

1.4 Subjects and Structure

The dissertation is comprised of 7 main chapters, including this Introduction. On the second

chapter, a study and review of the current occupancy monitoring technologies and applications is

completed. Chapter 3 explains the process of defining the system requirements and enumerates

them. On Chapter 4, a general system architecture is proposed, including specifications for the

object detecting module and the cloud-hosted platform. Chapter 5 details the following compo-

nent selection and implementation of the entire system, looking to prove the general concept for

future, widespread replication. The sixth chapter defines the working conditions of the module

and application, detailing the respective results. The efficacy of the system is evaluated, and the

display options of the dashboard with the collected data are presented. The thesis is concluded

with the final chapter, with proposed areas for future development and improvement.

Page 20: A computer vision-based proposal for seat occupancy ...

4 Introduction

Page 21: A computer vision-based proposal for seat occupancy ...

Chapter 2

Literature Review

The aim of the dissertation is primarily the design of a general concept and the implementation

of a system capable of seat occupancy monitoring, in accordance to the environment’s specificity.

To this effect, prior relevant research and developments are detailed. The existing body of work

serves as a starting point for possible development and improvement.

The state of the art regarding situations similar to the problem at hand, namely the occupancy

monitoring of seating areas, rooms or buildings, is generally composed by two main approaches.

In Section 2.1, techniques largely based around Pyroelectric Infra-Red (PIR) sensor (2.1.1) and

computer vision (2.1.2) technologies are presented. Section 2.2 handles possible limitations and

the preference for the thesis’s groundwork.

2.1 Seat Occupancy Monitoring

2.1.1 Based on PIR sensors

Commercially the most common solution [1, 19, 20, 21, 22, 23, 24] for deployment of an intelli-

gent monitoring network, the PIR is considered an effective, low-powered alternative for applica-

tions centered on mobile computing [25].

The sensors intend to detect and interpret infrared radiation [26, 27], untraceable to the human

eye, as its wavelength is longer than visible light. Every concept hinging on a network of PIR

sensors for occupancy monitoring targets human-emanated heat for determining their presence,

which contains infrared radiation. However, infrared radiation of the human scale (1.33µm -

16.67µm) [28] is subjected to fluctuations provoked by changing conditions, but most importantly

such radiation suffers regular blockages by non-passing materials, which include most glass and

plastic. Therefore, typical versions of an IR sensor employ lenses made of polyethylene (Figure

2.1), which is adept at limiting radiation outside of the human range. To accentuate the received

radiation further, lenses usually adopt the shape of the Fresnel Lens.

5

Page 22: A computer vision-based proposal for seat occupancy ...

6 Literature Review

Figure 2.1: Wireless desk sensor with polyethylene lens (altered) [1]

The Fresnel style [29] is designed to have its grooves facing the IR sensing element, presenting

a smooth surface to the subject side of the lens (Figure 2.2)

Figure 2.2: Fresnel Design [2]

The pyroelectric sensor is made of a crystalline material that generates a surface electric charge

when exposed to heat. Once the level of radiation suffers a change, the charge is altered and can

then be measured (Figure 2.3). The varying signal is typically fed to an amplifier possessing signal

conditioning circuits. The next stage involves a window comparator, responding to positive and

negative transitions of the sensor output signal.

Page 23: A computer vision-based proposal for seat occupancy ...

2.1 Seat Occupancy Monitoring 7

Figure 2.3: Concept of IR-detection [3]

Occupancy detection applications of this physical phenomenon incorporate IR-sensors in battery-

powered modules, capable of connecting wirelessly to an access point or gateway [1, 22], relaying

relevant data to a local or cloud platform for post-processing (Figure 2.4 stands as an example).

Figure 2.4: Commercial architecture for a sensor network [1]

A common alignment of the modules involves installation beneath a desk, directly covering the

person seated (Figure 2.5). Installations taking advantage of a room’s ceiling [19, 20, 24] are also

considered for full-area control. However, unlike the first option, this situation is more predisposed

to occurrences of radiation blocking and should be mainly thought of as a complement to existing

data sources. Furthermore, the technology does not provide the capability to detect inanimate

Page 24: A computer vision-based proposal for seat occupancy ...

8 Literature Review

objects, given there is no infrared radiation to be measured. One final factor to consider are the

dimensions of the targeted space [18]. Implementing a system based on one module per seating

option represents a multiplication of costs, and thwarts the pursuit of flexibility in floor layout,

implying constant vigilance and maintenance of every module. Change in location, condition, and

battery status of multiple devices can result in loss of data integrity and temporary system failure.

Figure 2.5: Use case of installation under a desk[1]

2.1.2 Based on computer vision

Though not as widespread as other techniques, the use of vision to accomplish occupancy moni-

toring has manifested itself more frequently in recent years [4, 5, 30, 31, 32, 33].

The primary goal of this field is to enable machines to perform tasks such as Image & Video

recognition, Image Analysis & Classification, Media Recreation, etc. The advancements in Com-

puter Vision with Deep Learning have evolved over time, primarily over one particular algorithm

- a Convolutional Neural Network [34] (CNN).

In traditional Computer Vision, most of the work consists on hand-engineering filters which,

when applied to an image, can extract its features [35]. The more features can be extracted,

the more accurate a prediction is. A major setback to this approach is that each feature must

be manually engineered in the design process, which makes scaling these types of algorithms

challenging. Convolutional Neural Networks work in the opposite direction. Instead, by choosing

how many features the CNN will extract, the extraction process will follow during its training [36].

Page 25: A computer vision-based proposal for seat occupancy ...

2.1 Seat Occupancy Monitoring 9

Figure 2.6: Overview of a CNN’s architecture and training process [4]

A CNN consists of an input layer, hidden layers and an output layer (Figure 2.6). In any feed-

forward neural network, all middle layers are considered hidden as their inputs and outputs are

masked by the activation function and final convolution.

Typically, this includes a layer that performs a dot product of the convolution kernel with the

layer’s input matrix. As the convolution kernel slides along the input matrix for the layer, the

convolution operation generates a feature map (Figure 2.7), which in turn contributes to the input

of the next layer. Before serving as input for the next layer, however, outputs from convolution

operations are subjected to an activation function. The most common nonlinear activation function

used presently is the rectified linear unit (ReLU) [37]. The function is defined as f (X) = (0,max).

This process is followed by other layers such as pooling layers, fully connected layers, and nor-

malization layers (Figure 2.6).

Figure 2.7: Convolution operation. [4]

Page 26: A computer vision-based proposal for seat occupancy ...

10 Literature Review

Similar to the convolutional layer, the pooling layer is responsible for reducing the spatial

size of the convolution feature output. This process allows for a decrease of the computational

power required by reducing dimensionality. Furthermore, it is useful for extracting dominant

features which are invariant to positional and rotational alterations, thus avoiding inefficiencies

in the training process. There are two main sorts of pooling, namely Max Pooling and Average

Pooling. Max Pooling is the most popular method, downsampling by only returning the maximum

value from the portion of the image covered by the kernel (exemplified by Figure 2.8). A global

average pooling, on the other hand, performs an extreme type of downsampling, where a feature

map with size of height×width is reduced to a simple 1×1 array. This is completed by averaging

out all the elements in each feature map, whereas the depth of feature maps is retained. This

operation is typically applied only once before the fully connected layers.

Figure 2.8: Max Pooling, downsampling of an input tensor by a factor of 2 [4]

Fully connected (FC) layers map out the acquired features, resulting from convolution and

pooling operations, to the final outputs of the network. FC layers operate as a set by connecting

inputs to outputs through the use of trained weights. In the context of object detection, the outputs

at the end of such networks usually result in sets of probabilities for each class in classification

tasks. The final fully connected layer typically has the same number of output nodes as the number

of classes.

A normalization layer is typically defined as the activation function applied to the last FC

layer in the network, as they usually differ from activation functions utilised for previous layers.

Normalization layers vary according to the specific task at hand. For multiclass classification

purposes, a softmax function is adopted, which normalizes output real values from the last fully

connected layer to target class probabilities. Each value ranges between 0 and 1 and all values

sum to 1.

Page 27: A computer vision-based proposal for seat occupancy ...

2.1 Seat Occupancy Monitoring 11

Figure 2.9: Prototype example. Raspberry Pi 3 equipped with Intel Neural Compute Stick 2 andwide-angle camera. [5]

Affordable options for embedded computing power, including camera use, are rapidly becom-

ing available for both industrial applications and research (example on Figure 2.9). These devices

are essential for hardware projects that are reliant on image-based analysis, discarding the use of

a computer to perform external processing. This type of modular function allows for the possibil-

ity of implementing computer vision on-site for applications that would otherwise not be feasible

due to either cost, mobility or size constraints. Running applications while utilising the on-board

camera’s imaging feed proves to be highly beneficial on another aspect, namely permitting the

collection of detailed information, while keeping data private. Ensuring that all data collection

does not involve storage on external systems is key to comply with current privacy regulations and

standards [38].

Figure 2.10: Prototype from Figure 2.9. Image capture and classification [5]

Page 28: A computer vision-based proposal for seat occupancy ...

12 Literature Review

Most of the implementations related to room occupancy resort to a similar architecture of

Figure 2.4. Portable camera modules are placed with a clear view at the targeted area, and process

the image to harness relevant data. This information is then transmitted to a central platform for

adequate handling.

By training object-recognition networks on both occupants’, objects’ and architectural con-

text, it is possible to adapt and improve existing solutions for the dissertation’s purpose. That is,

however, not the only avenue for adaptation to a problem’s own set of characteristics. A wide

variety of pre-trained object detection algorithms have been developed in recent years, capable of

running and be relatively accurate on devices with less computing power. These present their own

trade-offs [9] depending on the situation, and can be a building block for small-scale applications

[39, 40, 41].

Among this set of algorithms, there are three main subgroups standing out, both by frequency

of use in similar applications [42, 43] and fit towards specified needs, considering the limited

capability of whatever processing unit is chosen [7]. The ability of distinguishing people as well

as commons objects, and prioritising ease of processing while maximizing precision on detections

stand at the forefront.

Region-based Convolutional Neural Networks (R-CNN)

In the R-CNN setting (and its many variants, Figure 2.11), detection happens in two stages. Dur-

ing the first stage, called the region proposal network (RPN), images are processed by a feature

extractor. The extraction [44] is a necessary step for automatic identification of the objects, which

are to be associated with certain attributes, characterizing and differentiating them. The similarity

between images can be determined through features which are represented as a vector. The various

contents of an image such as color, texture, and shape are used to represent and index an image or

an object and used to predict bounding boxes.

Figure 2.11: Summary of Models in the R-CNN family [6]

In the second stage, these box proposals are used to crop features from the same intermediate

feature map which are subsequently fed to the remainder of the feature extractor in order to predict

Page 29: A computer vision-based proposal for seat occupancy ...

2.1 Seat Occupancy Monitoring 13

a specific class for each proposal. The loss function [6] for both stages is identical, the second stage

using results from the RPN as anchors. During this process, there is part of the computation that

must be run once per region, and thus the running time depends on the number of regions proposed

by the RPN. This normally represents substantially longer processing times compared to the other

options evaluated [42, 43].

You Only Look Once (YOLO)

Distinctly from previous detectors, this algorithm employs a single convolutional network for its

predictions, framing an object as a regression problem to spatially separated bounding boxes and

associating class probabilities directly from full images in one evaluation. The complete YOLOv1

network architecture (Figure 2.12) features 24 convolutional layers and 2 fully connected layers.

Figure 2.12: YOLOv1 Architecture [7]

The algorithm [16, 45, 46, 47] divides any given image into a S×S grid. Each grid cell on

the input image predicts a fixed number of boundary (anchor) boxes for an object. As for each

boundary box, the network outputs offset 4 element values (bx, by, bh, bw), one confidence pc and

C conditional class probabilities. The coordinates (bx, by) represent the bounding box’s center

relative to the bounds of the grid cell in the input image. The bw and bh parameters are the

box’s width and height, respectively. The confidence pc is equivalent to the probability that a box

contains an object. C conditional class probabilities point out the likelihood certain objects belong

to a given classi (Figure 2.13).

Page 30: A computer vision-based proposal for seat occupancy ...

14 Literature Review

Figure 2.13: Bounding Box Structure (altered) [7]

Single Shot Detector (SSD)

A single convolutional neural network, it is less complex when compared to other methods it

intends to surpass. The SSD architecture is separated into two parts - a base network, most com-

monly MobileNet [48] or VGG16 [49] (Figure 2.14), contributing high quality image classification

applied to the front, and several convolutional feature layers added afterward to predict object de-

tection.

There are several features in the SSD model, such as multi-scale feature maps for detection,

where the sizes of convolutional feature layers added decrease gradually. This allows for pre-

dictions at different scales. Each feature layer uses a different convolutional model to predict

detections.

Each added feature layer or any existing feature layer from the base network can generate a

fixed set of detection predictions by using a set of convolutional filters [50] that are displayed on

the top of the SSD architecture.

For a feature layer with the size m × n and p channel, SSD applies small convolution filters, 3

× 3 in size, to compute the location and class scores for each cell. Subsequently, predictions for a

fixed set of the default bounding box are made. Each one contains its own boundary, with offset

shape to its default box, and scores for all classes. The class set includes a class 0, reserved for

outputs signaling no object detection). The YOLO model, instead of using a convolutional filter,

adopts an intermediate fully connected layer that is discarded by SSD.

Page 31: A computer vision-based proposal for seat occupancy ...

2.2 Summary 15

Figure 2.14: SSD Architecture, featuring VGG16 as feature extractor [7]

2.2 Summary

Both technologies have been proven, through commercial applications and research projects, to

be of use in the context of large-scale, real-time occupancy detection. The PIR technology has

been featured far more regularly with seat occupancy monitoring, through installation of sensors

beneath desks and tables. Computer vision, and more concretely CNNs, on the other hand, has

been relied upon primarily for overall room occupancy and movement tracking.

PIR sensors do show some limitations, however, namely the confinement of a module’s effi-

cacy to a single seating option. The technology lacks range and is subject to blockage and so is

limited in its potential for greater area sweepings. This represents a great cost in securing coverage

for a full building, as a large quantity of components become necessary to build and maintain a

network. Due to the form of installation, this ad hoc nature also prevents greater flexibility both in

management of the layout and feature expansion.

To the contrary effect, the concept of networked modules running CNNs is far more open-

ended in its capabilities. The potential for area coverage per module supersedes the PIR sensing,

and the wide array of tools accessible with limited computing power provide a pathway for fur-

ther improvement and diversification of the application, namely the option of detecting common

objects occupying seating areas.

Algorithm families such as You Only Look Once (YOLO) [16] and Single-Shot Detector

(SSD) [51] present a more realistic option for video stream and real-time processing as one-stage

detectors, while Region Based Convolutional Neural Networks [52, 53, 54, 55] (R-CNN) provide

more reliable results at the cost of speed with its two-stage approach [7, 6].

Existing examples from the promising field are an incentive for further research and the pursuit

of a solution to the problem at hand using this technology.

Page 32: A computer vision-based proposal for seat occupancy ...

16 Literature Review

Page 33: A computer vision-based proposal for seat occupancy ...

Chapter 3

System Requirements

The following analysis aims to define and explore the client and system requirements. Addition-

ally, it provides information about the needs met by the product, its capabilities, its operating

environment, properties and user experience.

Firstly, the functional requirements are set (Section 3.1). These client needs where determined

after meeting with the Library’s Services, confirming, and elaborating on the main objectives of

the solution.

Once the needs are established, the non-functional requirements, whose main purpose is to

support and guide the resolution of the customer’s needs, are listed. There will be a split among

two main subgroups, namely the area dealing with the occupancy detection itself (Subsection

3.2.1), and the correspondent interface for data visualisation and statistical upkeep (Subsection

3.2.2).

3.1 Functional Requirements

These are a direct result of ongoing discussions during client and orientation meetings. The infor-

mation provided engages on the needs met by the product, its capabilities, operating environment,

properties and user experience.

# DescriptionCN1 The system must be able to track and map each seating area’s current status.CN2 The system should be devoid of proprietary components.

CN3The system must include a user interface and cloud-hosted dashboard, including occupancyhistory and statistics for administrators.

CN4 The system should work modularly and allow for easy expansion.CN5 Any system setup should guarantee a low or competitive cost, relative to similar solutions.CN6 The modules should be compact and non-invasive.

Table 3.1: Client Needs

17

Page 34: A computer vision-based proposal for seat occupancy ...

18 System Requirements

3.2 Non-Functional Requirements

How the product will be designed, in order to satisfy the previously mentioned needs, is essential

for securing a coherent system design and architecture. The following are divided into two sub-

sections, each corresponding to a different index (Occupation Detection: OD; Dashboard: DB).

Every requirement is classified according to its priority (Mandatory or Desirable) and satisfies at

least one Client Need (CN).

3.2.1 Occupancy Detection

# Description Needs Priority

OD1The system shall reliably detect, count and signal the presence ofpeople, within a certain area.

CN1 Mandatory

OD2The system should reliably detect the presence of common ob-jects, within a certain area.

CN1 Desirable

OD3The system must be able to support usual data operations (includ-ing transmission, processing and storage) for all modules over theexpected lifetime of the system.

CN1, CN3 Mandatory

OD4The system’s technical design (hardware, databases, etc.) mustbe able to scale, and support projected use across floors.

CN4, CN5,CN6

Mandatory

OD5The system must provide a near real-time response to detectionand submit them to the central dashboard.

CN1 Mandatory

OD6The system must support the repair or upgrade of a component ina running system or with minimised downtime.

CN2, CN4,CN5

Desirable

OD7The system must not record any type of personal data, as estab-lished per GDPR [38] rulings (image, video, body temperature,etc.)

CN1 Mandatory

Table 3.2: Requirement Analysis: Detection System

The defined needs are fulfilled by this set of requirements on the hardware level. In terms of

targeted detection, the priority remains to detect people (OD1) occupying certain areas. However,

it became apparent the possibility of signaling the presence of common objects (OD2), such as

books, handbags or folders, is of interest for the Library’s management. OD3 and OD5 relate

to reliability and transmission of the necessary data for a proper integration of the dashboard.

OD4 and OD6 demand an open-sourced, modular system as to secure ease of maintenance and

expandability. Privacy concerns (OD7) dictate the system must be able of processing relevant

inputs without resorting to storage of sensitive data.

Page 35: A computer vision-based proposal for seat occupancy ...

3.3 Summary 19

3.2.2 Dashboard

# Description Needs Priority

DB1The user must have access to seat status mapping, and administra-tors to relevant information, such as timestamps of state changesand statistics related to the time each seat is occupied.

CN1, CN3 Mandatory

DB2The system must be extensible and/or have the ability to acceptnew features or functionalities.

CN2, CN4 Desirable

DB3The system must include access to a modifiable, hosted on thecloud database.

CN3 Desirable

Table 3.3: Requirement Analysis: Dashboard

The constitution of the dashboard is of more undefined nature, as the main feature is to display

a mapping of all relevant areas, as well as a complete history of the collected data. DB1 alludes to a

more concrete structure of the required parameters, including timestamps and a status designation

for each seating area.

3.3 Summary

The set of requirements listed creates guidelines for the design of a functional system. For this

purpose, an overall analysis of the client needs was followed by a more detailed approach, ac-

cording to a preliminary division of the system in two sections. After an initial stage of defining

the product’s main goals, as well as the necessary market research, the blueprint is laid out for an

adequate definition of the system’s architecture.

Page 36: A computer vision-based proposal for seat occupancy ...

20 System Requirements

Page 37: A computer vision-based proposal for seat occupancy ...

Chapter 4

System Architecture

Given the definition of the problem, the context of commercially available solutions, and the re-

quirements set by the previous chapter’s analysis, it is reasonable to consider the most fitting

approach to be one based on vision and image processing.

This chapter will address the main blocks needed for the implementation of the proposed

application, as well as the respective interactions. Sections 4.1 and 4.2 engage in a similar division

to Chapter 3, detailing the components related to the aspect of image processing and analysis,

followed by the system visible to the user, namely the one encompassing a dashboard, supported

by an accessible web interface and database.

4.1 Vision-based Solution

In comparison to an architecture relying on an individual device per seat, such as an infrared-sensor

(IR-sensor) [1, 19, 20, 21, 22, 23, 24], this proposal intends to multiply the detection potential per

module. Regardless of other diverging parameters, every module consists of three key blocks,

namely a processing unit, an individual camera attached to the unit, and an application centered

around a detection algorithm, selected and modified according to the specificity of the targeted

space.

4.1.1 Processing Unit

The processing unit in the presented situation is tasked with running a detection algorithm, while

controlling and processing the video stream captured by the camera. The dimensions of the device

should allow for greater flexibility and ease of installation, specially regarding access to power

and network sources. Additionally, it guarantees the transmission of data to be accessed by the

developed interface, meaning the inclusion of an Ethernet port is necessary for added reliability

and stability.

21

Page 38: A computer vision-based proposal for seat occupancy ...

22 System Architecture

4.1.2 Camera Module

Functioning in an integrated form with the computing unit, the camera captures and provides direct

access to its video feed. The device preferably includes a Camera Serial Inteface (CSI) [56] as to

secure higher bit rates (over 2 Gbit/s). The desired mounting with the respective processing unit

will reduce power cabling, as the camera module is supplied through the mentioned CSI port.

The lens option should cover the necessary field of view (FOV) for the targeted areas, which are

mostly replicated across the library’s floors, and clearly delimited by shelves populating the space

(Figure 4.1). Resolution must be maximized and takes priority over frame rate, since quality of

image greatly impacts the success of object detection [9] and movement is largely non-existent or

infrequent in a seated studying environment.

Figure 4.1: Common study area (FEUP Library: 2nd-4th floor) [8]

4.1.3 Definition of detection algorithm

During the initial phase of research, applications similar [39, 40, 57, 58, 59, 60] to the one proposed

used already known models with a clear set of characteristics. YOLOv2 and SSD compare very

favorably regarding speed of processing [42, 43], clocking at around 10x greater FPS than the more

accurate YOLOv3 or the analyzed R-CNNs. Simultaneously, SSD demonstrated lower accuracy

rates, defined as mean Average Precision (or mAP [61, 62]), in its predictions and boundary box

positioning in comparison to its counterparts. In addition to this, there was an issue with lower

resolution imagery and smaller objects, where the SSD failed regularly in detecting an object,

while other models succeeded [48].

Page 39: A computer vision-based proposal for seat occupancy ...

4.1 Vision-based Solution 23

However, these traits are relevant for considerably powerful processing units. Even the system

where these findings where confirmed (Table 4.1) possesses far greater computing power than a

potential unit fitting of the defined system architecture. Bearing these factors in mind, it becomes

clear two-stage detectors such as the R-CNNs would perform poorly with lesser hardware. The

fact remains true, even though they reveal high-level precision and sufficient frame rate or speed

of processing for the proposed application, where activity is reduced and movement detection is

not required.

Description ComponentCPU AMD Ryzen 5 3600 6-Core Processor @3.60 GHzGPU NVIDIA GeForce GTX 1650 SUPER (1740 MHz, 4GB GDDR6)RAM 16.0GB

Table 4.1: Main specifications of testing system

The conclusion is further exemplified by comparisons of detection frameworks, such as the

one presented on the research paper introducing the second iteration of the YOLO algorithm [16].

In this instance, trials were run using the PASCAL VOC dataset, and the Geforce GTX Titan X

(1000 MHz, 12 GB GDDR5) as the graphics processing unit (GPU). The comparison (Table 4.2)

shows the faster iterations of the R-CNN framework to clock at around 7 FPS, rather insufficient

given the predicted downgrade in processing capabilities.

Detection frameworks Train mAP FPSFast R-CNN 2007+2012 70.0 0.5Faster R-CNN VGG-16 2007+2012 73.2 7Faster R-CNN ResNet 2007+2012 76.4 5YOLO 2007+2012 63.4 45SSD300 2007+2012 74.3 46SSD500 2007+2012 76.8 19YOLOv2 288x288 2007+2012 69.0 91YOLOv2 352x352 2007+2012 73.7 81YOLOv2 416x416 2007+2012 76.8 67YOLOv2 480x480 2007+2012 77.8 59YOLOv2 544x544 2007+2012 78.6 40

Table 4.2: Comparison of detection frameworks (from Table 3 [16]), PASCAL VOC dataset [17]

Regarding the two remaining one-stage detectors that are indeed the most common option in

similar applications, SSD300 and YOLOv2 stand out as the better compromise between quality

and speed. Given their comparable level, both qualify as reasonable possibilities needing further

testing on the final environment, though previous analysis [9, 43] does show distinctions on perfor-

mance while working with different datasets, and differing resolution inputs. In that regard, while

smaller objects tend to be detected less frequently using SSDs, they do present better outcomes in

Page 40: A computer vision-based proposal for seat occupancy ...

24 System Architecture

the fastest detectors category, specially while using the Common Objects in Context dataset (Fig-

ure 4.2). This factor constitutes a considerable advantage, as the COCO [10] dataset is preferred

for this type of solution, providing a large-scale, open-source and particularly effective tool, when

targeting people recognition. It is also in constant development, sponsored by some of the biggest

entities of the field such as Microsoft, the Common Visual Data Foundation and Facebook.

Figure 4.2: Comparison of several detection frameworks [9], COCO dataset [10]

The model of reference going forward is therefore the SSD300, coupled with MobileNet as

feature extractor (lowest GPU time per Figure 4.3) and utilizing the COCO dataset.

Figure 4.3: GPU time (milliseconds) for each model, for image resolution of 300x300 [9]

Page 41: A computer vision-based proposal for seat occupancy ...

4.2 Dashboard 25

4.2 Dashboard

Working in parallel to this array of modules is a web interface, displaying a dashboard with rel-

evant data, which in turn will be accessed, stored, and manipulated in a database. These two

elements could be fully integrated or work in tandem, the essential aspect being they represent a

separate system from the vision-based design.

The interface is hosted through a web service, receiving data entries in JSON [63] format from

each individual processing unit. The data is of low complexity, constituted by arrays of integers

containing seating area designation and its respective binary status: occupied or vacant. These

entries can be executed on each module, either by posting directly to an interface based on HTML

through HTTP requests using cURL [64], or populating an independent database (Figure 4.4) by

running SQL commands.

The database would consist of three main classes for structuring of the data. A Floor is iden-

tified by its unique id integer, correspondent to the layout of the building. Every iteration of this

class has one or more detection modules, or Device, associated to itself. The main supported

method is updateTotalCount(), which pulls the total amount of occupied seats indicated by the

pool of associated modules.

A Device is equally identified by its unique id, defined by the administrator, and is responsible

for monitoring at least one designated seating area. The main method getStatus(in id:integer)

indicates the current status for the identified Area. The Area class contains reference to the status,

which can alternate between 0 and 1, and the respective timestamp to each update.

Figure 4.4: Data Structure, each library floor contains a number of active devices responsible formonitoring several seating areas

The simple, and singular, transmission of data for each change of state allows for reliability in

communication between systems, and flexibility in building the different tools for the display of

information.

Page 42: A computer vision-based proposal for seat occupancy ...

26 System Architecture

4.3 Summary

The complete architecture of the vision-based system rounds out the specifications of each major

component, according to the product’s objectives and adapted to the final environment. The com-

munication between subsystems will consist of status updates sent out by all modules to the web

interface. Variances throughout the targeted space can possibly determine changes to the overall

design, primarily with regards to the camera module’s lens, though the building remains fairly

consistent and uniform in layout. The implementation of the presented concept is to be thoroughly

tested in a limited area, thus aiming for replication across major areas of the building. The object

detection algorithm, MobileNet SSD300, is selected for deployment, based on past experiences

and characteristics of available models.

Figure 4.5: Model representing major system components and communications

Page 43: A computer vision-based proposal for seat occupancy ...

Chapter 5

System Implementation

Following the definition of the solution’s overall architecture, the next step consists of selecting

fitting components towards the implementation of the system. As the system revolves around the

concept of signaling the presence of people (and possibly a certain set of objects), the preliminary

stage of development began with researching suitable object detection algorithms within the scope

of the concept. Analysing these initial findings provided a framework for targeting the remnants

of the system, which include a processing unit and corresponding camera module. Isolated from

these elements is the platform used for the construction of a dashboard and data storage, which

will be reviewed as well.

5.1 Module Composition

5.1.1 Processing Unit

Considering the wide range of possibilities regarding the choice of processing unit, there are solu-

tions designed directly for the purpose of running neural networks, such as the Jetson Nano Devel-

oper Kit [65]. While including specialised software libraries for deep learning, computer vision,

GPU computing and multimedia processing, which facilitate the development process, it does rep-

resent higher costs per unit and proprietary challenges. The Raspberry Pi models, however, are a

popular, versatile and community-supported option for this type of application [66, 67, 68].

As such, the recent Raspberry Pi 4 Model B [69] (Figure 5.1) became a natural solution,

providing more flexibility regarding algorithm adjustments, given its maximization of computing

capability, as well as familiarity of use. It presents a significant leap in processing (Broadcom

BCM2711, quad-core Cortex-A72 (ARM v8) 64-bit SoC @1.5GHz) and connectivity (Wireless

2.4 GHz e 5.0 GHz IEEE 802.11b/g/n/ac) when compared to previous devices. It also remains

silent and portable, as well as requiring similarly low energy consumption levels.

In terms of communications protocols, much like its counterpart the Jetson Nano, it supports

the standards GPIO, I2C, I2S, SPI, and UART. More vitally, the inclusion of Gigabit Ethernet and

CSI port secure the transmission of data to the dashboard and on-board camera module respec-

tively, as per the system architecture. The performance resembles one of a basic x86 PC for a

27

Page 44: A computer vision-based proposal for seat occupancy ...

28 System Implementation

reduced cost. Should system demands rise in the future, either through substitution of the running

detection algorithm or another sort of feature expansion, the Raspberry Pi enables the inclusion of

computer vision accelerators. The leading options currently available are the Coral USB Acceler-

ator [70] and the Intel Neural Compute Stick 2 [71].

Figure 5.1: Raspberry Pi 4 Model B (8GB RAM) [11]

5.1.2 Camera Module

Assuming the characteristics of the algorithm and the respective processing unit, resolution of the

image captured is fundamental to accentuate the precision of the results. As for physical features,

the overall size of the module must be supported by the mounted unit, and include the CSI interface

so as to maximize speed of processing.

The Raspberry Pi Camera Module v2 [72] (Figure 5.2) supports 1080p30 and 720p60 video

stream, providing the necessary imagery quality for a normal execution of SSD, which as previ-

ously mentioned can function abnormally when confronted with lower resolution. A 15cm ribbon

cable connects to the CSI port on the Raspberry Pi and allows for many options in positioning and

angling of the lens.

The combined sensor image area (3.68 x 2.76 mm - 4.6 mm diagonal), optical size (1/4”),

focal length (3.04 mm), horizontal (62.2 degrees) and vertical (48.8 degrees) fields of view (FOV)

create an image capable of covering and detecting objects around 5m to either side and 10m deep,

as determined per tests in select areas of the library. This configuration also allows for minimal

distortion, compared to other cameras with greater FOV. This distortion, in turn can be practically

eliminated by proper calibration. Compatibility for attaching other types of lenses is extensive,

should different areas require another approach. Finally, numerous third-party libraries are created

and referenced, including the Picamera Python library [73].

Page 45: A computer vision-based proposal for seat occupancy ...

5.2 Image Segmentation and Analysis 29

Figure 5.2: Raspberry Pi Camera Module v2 [12]

5.2 Image Segmentation and Analysis

Figure 5.3: 2nd Floor Plan, camera position (dotted circle) and covered area (black)

The complete module can be installed to cover practically any given location of the library, as

there are continuous wiring and communications ports distributed across the false ceilings of the

main floors. Since these floors are generally identical, once the concept is proven and tested in a

selected area, it can be replicated to others with minor adaptations.

The test site selected is the area depicted on the 2nd floor plan (Figure 5.3). Represented are

the camera module placement, as well as the covered area, resulting in the image of Figure 5.4. On

this site, 4 seating areas (Figure 5.5) are adequately covered and were the basis for the diagnosis

and correction of the system functions.

Page 46: A computer vision-based proposal for seat occupancy ...

30 System Implementation

Figure 5.4: Still from equivalent area depicted on Figure 5.3

Figure 5.5: Seating areas

Page 47: A computer vision-based proposal for seat occupancy ...

5.2 Image Segmentation and Analysis 31

5.2.1 General Concept

With the module in place and capable of capturing image frames, it is possible to outline and

identify the planned seating areas, as it has been broadly executed on Figure 5.5. These zones are

to be designated either as "occupied" or "vacant", somewhat independent of the number of people

that could find themselves near these spots, as the goal remains to count and identify used seats,

not a total number of people present on the floor. Therefore, a single or multiple detections within

these areas indicate the same outcome.

A detection in a targeted area is defined by the formation of singular bounding boxes indicating

the presence of an object labeled as "person", classified according to the COCO dataset. Once

that occurs, the status of the area can be considered "occupied". The inverse status, "vacant", is

confirmed following a defined limit of non-detections. This procedure accounts for shortcomings

of the module in detecting people on every frame presented, even if they are indeed present, and for

the fact that a seated library environment signifies less occasions of movement or state changes,

allowing for longer periods of analysis. Higher limits naturally result in lower reactivity of the

system.

Figure 5.6: Contention Overlapping Areas, final status determined by the dashboard

Another relevant aspect is the idea of overlapping areas, one prime example being "Area 1"

(Figure 5.5). As the architecture presupposes the formation of a network of cameras across an

entire floor, certain predetermined zones are to be covered simultaneously by a pair of modules.

Assuming another unit is installed on the other side of the shelving, two devices could provide

feedback on the status of "Area 1", which minimizes the occurrence of errors. In these cases,

Page 48: A computer vision-based proposal for seat occupancy ...

32 System Implementation

status updates relating to a common identified area are transmitted by all devices involved and

arbitration is done at the higher level of the dashboard, demanding full agreement to consider the

final status to be "vacant". If any singular device sends an "occupied" status as its most recent

update, the area is considered "occupied" (Figure 5.6).

5.2.2 Algorithm Execution

The deployment, using the Caffe framework [74], of the SSD300 on the processing unit is ac-

complished through a script running on Python 3.7. The script explores OpenCV [75] as its main

library, as it is fully open-sourced and possesses extensive documentation on its modules [76], in-

cluding functionalities relating to image processing, object detection, neural networks, and camera

calibration [77]. The core of the application (Figure 5.7) is responsible for processing the captured

frames and classifying the status of each area.

Figure 5.7: Image processing loop in application

Page 49: A computer vision-based proposal for seat occupancy ...

5.2 Image Segmentation and Analysis 33

5.2.2.1 Frame Selection

Firstly, a video capture object is created, from which frames will be periodically collected. The

period depends on overall processing time, combined with a counter. This counter (area_cycle on

Figure 5.7) is equivalent to a sampling rate, hard-coded as to define the repeated number of frame

updates belonging to the identical area will be passed through the network, therefore minimizing

the error rate. The image passed through the neural network on each phase of the round-robin

cycle (Figure 5.8) is a 70x70 cropping of the resized frame, equivalent to the bounding boxes

present on Figure 5.5. A full cycle’s duration, where every area is analyzed once, depends on the

total number of areas associated to a device, the defined sampling rate and the usual processing

time per frame. Therefore, it lasts:

areastotal × samplingrate × processing_timeaverage (5.1)

Figure 5.8: Frame selection cycle, each transition occurs after samplingrate × processing units.

5.2.2.2 Status Analysis

Once the area for analysis is defined, the correspondent 70x70 input is resized to 300x300, the

standard for MobileNet, and is normalized through the performance of a mean subtraction (127.5,

127.5, 127.5) to the RGB channels. The resulting input from using blobFromImage() [78] is then

forwarded through the neural network for detection (Figure 5.9).

Figure 5.9: Use of OpenCV modules for image resizing, normalization and network forwarding

Page 50: A computer vision-based proposal for seat occupancy ...

34 System Implementation

Mean subtraction is used to help against illumination changes present in input images. This is

meant as an aiding technique for Convolutional Neural Networks. Each image contains a certain

average pixel intensity for each of the Red, Green, and Blue channels. For each training set, the

mean values differ, as is the case for ImageNet (example Figure 5.10), for which the RGB figures

are R=103.93, G=116.77, and B=123.68. In the case of CaffeNet, the values all stand at 127.5.

Figure 5.10: A visual representation of mean subtraction where the RGB mean (center) has beencalculated and subtracted from the original image (left), resulting in the output image (right) [13].

There are two alternative courses upon finding the output from the network, according to the

definition of the general concept.

In case there is a detection belonging to the "person" classification of the dataset, a detection

is immediately signaled by attributing a "1" value to the correspondent position of the detect array

(Figure 5.7). Every position in the mentioned array is a placeholder for the equivalent seating area,

i.e. Area 1 status would be changed by accessing detect[1].

On the other hand, should no detection of the "person" class emerge, a no_hit counter is added

upon which, if hitting a predetermined limit, results in a change of status to "0". On completion,

for both cases, the process returns to the phase of frame updating. The area_cycle counter defines

which area is to be analysed next.

5.3 Dashboard and Data Storage

The system architecture described on Chapter 4 predicted the necessity of not only a dashboard, but

the creation and maintenance of an available database to support it, even considering the possibility

of having both components be fully integrated. In that regard, Thingsboard [15] provides a sound

alternative for implementation, as it includes both aspects in its architecture (Figure 5.11).

Thingsboard is a free of cost, open-sourced, thoroughly documented [14], and customisable

platform, including a HTTP-based (along with MQTT, CoAP) API for connectivity and REST

APIs for the server-side, based on Java and Python. All ThingsBoard components can be launched

in a single Java Virtual Machine (JVM) and share the same OS resources. Since ThingsBoard is

written in Java, there is also a great minimization of required memory to run ThingsBoard, allow-

ing for launches with 256 or 512 MB of RAM in constrained environments, such as a Raspberry

Pi’s operating system. The same OS is fully supported by a native installation of the platform.

Page 51: A computer vision-based proposal for seat occupancy ...

5.3 Dashboard and Data Storage 35

Figure 5.11: Thingsboard Monolythic Architecture [14]

Dashboards can be created and hosted by a Web UI on the server side, collecting data from the

Thingsboard Core (Figure 5.11). The Core is, in turn, supported by a database, either PostgreSQL

or Cassandra as the NoSQL option. The content on the dashboard is presented through use of

a built-in widget library [79], editable by tools supporting HTML and Javascript advanced com-

mands. The administrator can also create their own applications based on the appropriate JSON

format.

In the context of the dissertation, Thingsboard functionalities are harnessed in two major as-

pects, namely the organization and post-processing of data. Device profiles are created for every

existing "Area" controlled by the module. These profiles support the management and storage

of telemetry data, which is essentially constituted of area designation and status variables. These

integer types are packaged using the JSON format.

Each profile contains its own unique access token, used by the HTTP protocol to reference the

destination of each cURL request sent by the application running on the Raspberry Pi, either to

a local server or the cloud. A request is sent to its respective device telemetry upon each status

change, from "0" to "1" and vice-versa.

Page 52: A computer vision-based proposal for seat occupancy ...

36 System Implementation

Figure 5.12 details the manner in which the process is executed. Once the condition for a state

change of a particular area is triggered, the previous saved status (previous_detect) is updated.

Then follows the building of a data bundle, using JSON format, containing the Area designation

and corresponding detection condition. Depending on the area, the destination URL is constructed,

including references (in order) of the host, API, access token, and the telemetry category.

Figure 5.12: cURL requests: URL composed by host, access token and telemetry specification

The concept of telemetry facilitates the organization and use of data relevant to the dashboard,

including processing capabilities such as accessing both current and previously current values.

The feature is relevant for the arbitration of overlapping zones, referenced on Section 5.2.1 (Figure

5.6). Applying the concept, contending modules send out the corresponding status updates using

an identical access token, identifying the same seating area. When presenting information on the

dashboard, such zones will only be considered "vacant" with the reception of two consecutive,

agreeing updates. Since data is only sent upon a change of status, this logic guarantees coherence

compared to the other typical areas.

5.4 Summary

In accordance with the specifications of the system architecture, the Raspberry Pi 4 Model B was

selected as the on-board processing unit. The Raspberry Pi Camera Module v2 was the indicated

mounted camera for carrying out image capturing, thus completing the modular prototype.

The image processing and analysis is outlined, including the definition and description of the

test site. The general concept, which is to be proved for replication to other areas, is presented.

Execution of the algorithm implementing the concept is chronicled, along with the in-depth aspects

of frame selection and status analysis.

Dashboard and data storage access are combined through the use of the Thingsboard plat-

form. Compatibility with the processing unit, architecture, and communication processes with

each module are addressed.

Page 53: A computer vision-based proposal for seat occupancy ...

Chapter 6

System Testing

This chapter provides insight into the methodology of system testing and evaluating the presented

results. Subsection 6.1.1 delves into the working condition of the module and analyses each seat-

ing area’s individual results. Subsection 6.1.2 presents the usage data provided by the tools present

on the dashboard’s API and delves into some options of data presentation for the user. Concluding

this chapter is a section (6.2) summarising and referring to analysis of the content.

6.1 Methods and Results

6.1.1 Model Behavior

The test configuration of the application is set to sample 6 frames per area on the round-robin

cycle, requiring a value of at least 100 on the corresponding no_hit counter to change a seating

area’s status from "occupied" to "vacant". Each phase’s processing time varies slightly around 3

seconds, equating to 2 FPS. Therefore, each full turn of the round-robin is set to last in the range

of 12-13 seconds, in accordance to the estimation on subsection 5.2.2.1. The conjunction of the

defined no_hit limit and the duration of a full cycle equate to a delay of around 50-55 seconds for

each state transition to "vacant".

Figure 6.1: Module’s CPU and Memory Usage while running the application

The delay parameters, as previously stated, can be altered as to eliminate unwanted intervals in

the context of library management. Such short breaks (under 5/10 minutes) from students, which

will quickly return to their place, are effectively irrelevant for determining seat availability. How-

ever, it remains useful at this stage to maintain a minimal temporal window, as to more effectively

analyse the responsiveness of the model to differing areas.

37

Page 54: A computer vision-based proposal for seat occupancy ...

38 System Testing

Privacy concerns play a role in determining the evaluation of the system, since footage includ-

ing the likeness of other people must not be recorded and stored. Excluding this possibility closes

off some avenues for large-scale, systematic calculations, such as rating the mAP confidently. In

the context of this application however, it is of greater importance to gather observational evidence,

and qualitatively evaluate the model in response to the desired outcomes.

Figure 6.2: Original (green) and provisional (red) cameras, targeting Area 1

For this purpose, state transitions were supervised during certain periods, while monitoring

the test site live. State transitions are depicted as ON to OFF, and vice-versa. These are equivalent

to declaring a certain seating area occupied (ON) and vacant (OFF). Theoretically, there can be no

false ON designations, neither were any observed. This is related to the need for any detection and

subsequent classification to clear the defined confidence score threshold (0.2 out of 1.0). Unjusti-

fied downtime, however, remains a concern. These are cases where a student remains in place, yet

an OFF-status update still occurs, due to lack of detections during several round-robin cycles, thus

reaching the no_hit limit. The most relevant window of observation was the 24th of May, when a

temporary installation of a second camera (Figure 6.2) on the other side of the shelving allowed to

test the concept of overlapping areas, introduced previously (Subsection 5.2.1, Figure 5.6).

Page 55: A computer vision-based proposal for seat occupancy ...

6.1 Methods and Results 39

Figure 6.3: Seating areas tested, targeting Area 1

Figure 6.4: Area 0 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21

Area 0 (Figure 6.4) finds itself essentially in ideal conditions, including lighting, distance and

resolution of image. Status updates are consistent and coherent, apart from select downtimes

lasting less than a minute. These downtimes are equivalent to a missed detection momentarily

translating to "vacant" status, and the immediate inversion after completing the next cycle.

These minor errors, however, would be eliminated in a fully working system tolerant of small

intervals. These intervals, as previously defined, include short absences from each occupant

Page 56: A computer vision-based proposal for seat occupancy ...

40 System Testing

(totaling under 5 minutes), which are irrelevant for declaring a seating option to be vacant for

other users. So while there are shortcomings in responsiveness, the individual 1-minute OFF-

miscategorisations do not represent a failure to the system.

Figure 6.5: Area 1 On-Off Status Chart, 08:00AM - 19:30PM

Area 1 (Figure 6.5) shows the effects of its location being at an extreme distance in relation

to its counterparts. Minimal intervals, common to the example shown of Area 0, persist while

becoming more frequent. More importantly, wider, unjustified gaps (see arrow) over 5 minutes

long begin to emerge, which are damaging to data integrity. Figure 6.6 shows how the overlap

logic, a second monitoring enabled by another camera, can be of aid. The dual input for status

compensates low levels of detection and achieves similar levels of sufficiently covered zones, such

as Area 0.

Figure 6.6: Area 1 On-Off Status Chart (w/ provisional camera), 08:00AM - 19:30PM, 24-05-21

Page 57: A computer vision-based proposal for seat occupancy ...

6.1 Methods and Results 41

Figure 6.7: Area 2 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21

Sharing the same common space in a table, Area 2 (Figure 6.7) and Area 3 (Figure 6.8) manage

to appreciate the limitations of the model. Area 2 resembles the efficacy of Area 0, as it finds itself

centrally on the main camera’s FOV. Area 3, however, evidences a degradation in performance

due to location at the edge of the frame, experiencing negative effects caused by lower resolution

and minor distortion. This zone also engages on minor downtimes more frequently, yet no record

exists of wider gaps capable of affecting a more tolerant configuration of the system, unlike the

unsupported version of Area 1’s monitoring.

Figure 6.8: Area 3 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21

Page 58: A computer vision-based proposal for seat occupancy ...

42 System Testing

With the introduction of a greater permanence of the ON-status (exemplified by Figure 6.9),

the functionality of the concept is proven to be successful. Another testing occasion, where no_hit

is elevated from 100 to 500, shows how shorter absences not meaningful for the assertion of a

seat’s availability are ignored, along with the minor interruptions evidenced on the previous test-

ing round. Using this configuration, there is no practical discrepancy between number of people

observed and reported through detection.

Figure 6.9: Area 0 On-Off Status Chart, 08:00AM - 17:00PM

The more demanding trial run, however, showed the influence environmental factors can have

on each area’s results. When adopting the concept to form a building-wide system, adaptations

have to be carefully undertaken to ensure precision and efficacy. These adaptations can range

from alternative camera positions, differing lens options for greater resolution or less distortion,

to assigning certain areas to more appropriate modules.

Page 59: A computer vision-based proposal for seat occupancy ...

6.1 Methods and Results 43

6.1.2 Dashboard

The dashboard supported by Thingsboard includes several already existing options for the display

of data, including diagnostics. The main categories are related to API Usage, namely the volume

of transport messages (i.e. HTTP requests, Figure 6.10) and telemetry data points stored on the

database (Figure 6.11). It becomes clear a single host is capable of supporting the limits of an

expansion from the test site to the entirety of the library’s public space.

Figure 6.10: Volume of transport messaging

Figure 6.11: Capacity for telemetry data storage

Page 60: A computer vision-based proposal for seat occupancy ...

44 System Testing

Looking further into the statistical data over a 2-week period (Figures 6.12, 6.13), it is apparent

both graphs mirror each other, predictably.

Figure 6.12: Transport hourly activity, 14-day period

Figure 6.13: Telemetry persistence hourly activity, 14-day period

Page 61: A computer vision-based proposal for seat occupancy ...

6.1 Methods and Results 45

The equivalent information is conveyed through a Time-series [80] Stacked Bar Chart on the

public dashboard, programmed to display the hourly average of state changes on the same time

period. The Figure 6.14 shows the variance and spikes of activity during multiple days, and how

entrance (8:00-9:00AM) and lunch (around 13:00PM) periods are usually the busiest in terms of

movement, for example.

Figure 6.14: Hourly average of state changes over a 2-week period, 4-colored stacked bars

By applying the same time-series data to an equivalent Line Chart (Figure 6.15), it is demon-

strable all areas, though displaying some differing traits, follow common trends over longer sam-

ples of time. As it should be expected, given all areas are located on a specific zone, with shared

behavioral patterns.

Figure 6.15: Hourly average of state changes over a 2-week period, 4-colored lines

Page 62: A computer vision-based proposal for seat occupancy ...

46 System Testing

Separate area Status Charts (Figure 6.16), similar to the ones viewed on the previous subsec-

tion 6.1.1, can be included with their time windows altered to display changes over longer periods

of time as well. Combining the four graphs into a single one in stacking mode offers a historical

review of the hourly average count (Figure 6.17).

Figure 6.16: 4 Area Status Charts over a 2-week period

Figure 6.17: Total hourly average over a 2-week period

Page 63: A computer vision-based proposal for seat occupancy ...

6.1 Methods and Results 47

For display of the real-time total count (Figure 6.18), a HTML card from the widget library is

customised, including a script pulling data from the required entities and adding up the status data

(Figure 6.19).

Figure 6.18: Current total calculated upon clicking Update

Figure 6.19: View of widget’s (Figure 6.18) HTML editor

Page 64: A computer vision-based proposal for seat occupancy ...

48 System Testing

Status mapping is intended to display the location and current occupation of each individual

seating area. This is accomplished by creating an Image Map [81] widget, utilizing the layout of

the selected floor as a background (Figure 6.20 and 6.21).

Figure 6.20: Image Map with zero occupied areas (all green) and thermometer (blue)

Permanent markers are placed on their areas’ respective position, in accordance with the rep-

resented layout. The status of each area is pictured by changing the marker’s color dynamically:

green for "vacant", and red for "occupied". A marker is also placed to indicate the status of the

entire zone, grouping status of all belonging areas. Monitoring the total count, the makeshift ther-

mometer evolves in level and color, rising from blue, to green, orange and finally red (Figure 6.22),

which indicates full capacity.

Page 65: A computer vision-based proposal for seat occupancy ...

6.2 Summary 49

Figure 6.21: Image Map with two occupied (red), vacant (green), areas and thermometer (green)

Figure 6.22: Remaining versions of thermometer marker [15]

6.2 Summary

The usage, parameters, and configuration of the module’s running application are presented and

contextualized before analysing the scanned areas’ distinct traits. Comparison and contrast target

the limitations defining some of the trends, influenced generally by factors such as resolution,

distortion, or lighting. The working version of the system is proven to function correctly, utilizing

a more lenient no_hit limit for elimination of minor errors and irrelevant short absences.

Usage and diagnostics data provided by the dashboard’s own API are discussed, and related

to activity information presented on the platform. Further widgets, customised from Thingsboard

library, are depicted. Their creation, as well as variations according to received data, are described

in detail.

Page 66: A computer vision-based proposal for seat occupancy ...

50 System Testing

Page 67: A computer vision-based proposal for seat occupancy ...

Chapter 7

Conclusion and Future Development

7.1 Conclusion

Current widespread methods for seat occupancy monitoring rely on infrared sensing technology

for determining the presence of humans. This technique reveals itself often times limiting in terms

of reliability, being applied mostly in the context of individual seating to avoid inaccuracies caused

by blocking or lack of range. Generally, commercial systems show a tremendous dependency on

large-scale hardware installations and an inability to detect other forms of occupation, namely by

common objects.

As such, in an effort to explore systems capable of greater efficiency and expansion in terms of

functionality, the work of the dissertation focused on the conception, development and validation

of a computer vision-based solution, utilizing the capabilities of convolutional neural networks.

The beginning of the thesis involved a deeper understanding of the current situation in build-

ing occupancy detection. Researching the state of the art regarding this subject allowed for the

unearthing of the main technologies and systems being applied and developed. From this starting

point, it is possible to determine the dissertation’s proposal for a working system in the context of

a library is valuable for its potential, and worthy of exploration.

Future system requirements were defined in accordance with input and demands from the Li-

brary’s services, resulting in the elaboration and analysis of a general architecture. The architec-

ture of the vision-based system rounds out the specifications of each major component, according

to the defined objectives and adapted to the final environment. This includes the definition of

communication protocols and data structure for the dashboard. Options regarding object detection

algorithms are explored, resulting in a narrower selection based on past experiences and charac-

teristics of the models.

Subsequently, a working prototype was implemented on a select testing area, aiming to prove

and replicate the concept over the entire building. The module consists of a Raspberry Pi 4 Model

B for on-board processing, and a mounted Raspberry Pi Camera Module v2 for image capture.

The running application is supported by the MobileNet SSD300 object detector, deployed through

51

Page 68: A computer vision-based proposal for seat occupancy ...

52 Conclusion and Future Development

the Caffe deep learning framework. As to enable the use of image processing and analysis func-

tions, an open-sourced, real-time optimized Computer Vision library (OpenCV) was used. The

behaviour of the model was analysed through live observation of the targeted seating areas and the

corresponding state transitions.

The results obtained during stringent testing conditions point to some limitations in certain

areas experiencing non-ideal conditions, suffering from occasional drop-offs in determined status

and reality. However, the working system accounting for short breaks, and harnessing the concept

of overlapping redundancy eliminate such errors. Small gaps and real periods of absence below

the five-minute mark are considered irrelevant to determine true availability of a seating area for

interested parties.

In summation, the primary goals of the dissertation were achieved, with exception to the de-

sirable option of detecting occupation by common objects, as the task was deemed to be of higher

complexity than the scope of this work allowed. The necessary stages of analysis, design, and

implementation of a computer vision system idealised for seat occupancy monitoring were ac-

complished.

7.2 Future Development

Though the principal objectives were achieved, it is possible to ensure greater real-time accuracy

and allow for a more complete mapping of occupancy. Considering these aspects, future develop-

ment should focus on some key areas for system improvement.

The work of this dissertation determined the functionality of the general concept and ascer-

tained the differing aspects between analysed areas. Verifying the error in total people counting

would obviously demand a wider testing setup to guarantee a large enough sample, as it has been

done in research reviewed beforehand. Francesco Paci et al. (2014) characterized the system

through full-day observations, but also by calculating the Mean Absolute Error and Root Mean

Square Error [32]. Single modules were proven to function correctly, yet assessing a wider net-

work, and the complications introduced by time synchronisation, involves the use of these equa-

tions.

Working with the current prototype, the main bottleneck in the model’s efficiency is tied to the

limitations of available datasets, such as COCO or PASCAL VOC. Though these are constructed

to deal with data-hungry neural networks, they can be expanded by applying the technique of Data

Augmentation. The method involves progressive resizing, random image rotations, shifting, as

well as vertical and horizontal flipping of existing images, creating and multiplying elements of

the dataset.

Further experimentation and fine-tuning in other areas of the building with the same prototype

should also be considered. Hyper-parameter tuning can include optimisation of the Learning Rate,

and alterations in batch size to maximize the capacity of the processing unit. Increased model

capacity, either by exploring the addition of layers for deepening of the network or extra filtering

in each convolutional area, is worthy of examination.

Page 69: A computer vision-based proposal for seat occupancy ...

7.2 Future Development 53

Departing from the current prototype, more powerful combinations of hardware components,

neural networks, feature extractors, and datasets can be targets of experimentation. This pathway

might enable a system capable of better overall precision and of non-human occupancy detection.

Page 70: A computer vision-based proposal for seat occupancy ...

54 Conclusion and Future Development

Page 71: A computer vision-based proposal for seat occupancy ...

References

[1] Pressac Communications Limited. Wireless desk occupancy sensors, 2021. https://www.pressac.com/desk-occupancy-sensors/, accessed: 27.06.2021.

[2] David Van Ess. Pyroelectric Infrared Motion Detector, PSoC Style, 2009. https://www.cypress.com/file/90886/download, accessed: 27.06.2021.

[3] Tony DiCola Lady Ada. PIR Motion Sensor, 2014. https://learn.adafruit.com/pir-passive-infrared-proximity-motion-sensor/how-pirs-work,accessed: 27.06.2021.

[4] Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. Convolutionalneural networks: an overview and application in radiology. Insights into Imaging, 9(4):611–629, Aug 2018.

[5] Jens Jørgensen, Martin Tamke, and Kåre Poulsgaard. Occupancy-informed: Introducing amethod for flexible behavioural mapping in architecture using machine vision. In Proceed-ings of the 2020 eCAADe Conference, September 2020.

[6] Lilian Weng. Object Detection for Dummies Part 3: R-CNN Family. lilianweng.github.io/lil-log, 2017. http://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html, accessed: 27.06.2021.

[7] G.S. Peng. Performance and Accuracy Analysis in Object Detection. California State Uni-versity San Marcos, 2019.

[8] Biblioteca Serviço de Documentação e Informação. GUIA DA BIBLIOTECA PARANOVOS ESTUDANTES: Espaços para estar, 2021. https://feup.libguides.com/novosestudantes/espacos, accessed: 27.06.2021.

[9] Jonathan Huang, V. Rathod, Chen Sun, Menglong Zhu, A. Balan, A. Fathi, Ian S. Fischer,Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. Speed/Accuracy Trade-Offs for ModernConvolutional Object Detectors. 2017 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 3296–3297, 2017.

[10] Tsung-Yi Lin, M. Maire, Serge J. Belongie, James Hays, P. Perona, D. Ramanan, Piotr Dol-lár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014.

[11] PC Componentes. Raspberry Pi 4 Modelo B 8GB. https://www.pccomponentes.pt/raspberry-pi-4-modelo-b-8gb, accessed: 27.06.2021.

[12] sparkfun. Raspberry Pi Camera Module V2. https://www.sparkfun.com/products/14028, accessed: 27.06.2021.

55

Page 72: A computer vision-based proposal for seat occupancy ...

56 REFERENCES

[13] Adrian Rosebrock. Deep learning: How OpenCV’s blobFromImage works, Nov 2017. https://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/, accessed: 27.06.2021.

[14] Thingsboard. ThingsBoard Documentation, 2021. https://thingsboard.io/docs/,accessed 27.06.2021.

[15] Thingsboard. ThingsBoard: Open-source IoT Platform, 2021. https://thingsboard.io/, accessed 27.06.2021.

[16] Joseph Redmon and Ali Farhadi. YOLO9000: Better, Faster, Stronger. 2017 IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017.

[17] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.The Pascal Visual Object Classes Challenge: A Retrospective. International Journal ofComputer Vision, 111(1):98–136, January 2015.

[18] Serviços Biblioteca FEUP. Ocupação dos Pisos, 2021. https://sites.google.com/g.uporto.pt/ocupa-pisos-bibfeup/home, accessed: 27.06.2021.

[19] Abintra Consulting. Occupancy, 2021. https://abintra-consulting.co.uk/products/occupancy/, accessed: 27.06.2021.

[20] Pressac Communications Limited. Wireless room occupancy sensors, 2021. https://www.pressac.com/room-occupancy-sensors/, accessed: 27.06.2021.

[21] floorsense. make sense of the modern workplace, 2021. https://floorsen.se/,accessed: 27.06.2021.

[22] Workplace Occupancy. Workplace Efficiency Monitoring Systems, 2021. https://workplaceoccupancy.com/#occupancy, accessed: 27.06.2021.

[23] infsoft. infsoft Occupancy, 2021. https://www.infsoft.com/solutions/products/infsoft-occupancy, accessed: 27.06.2021.

[24] fm:systems. Occupancy Sensors, 2021. https://fmsystems.com/our-solutions/employee-experience/workplace-occupancy-utilization-sensors/,accessed: 27.06.2021.

[25] Noureddine Lasla, Messaoud Doudou, Djamel Djenouri, Abdelraouf Ouadjaout, and CherifZizoua. Wireless energy efficient occupancy-monitoring system for smart buildings. Perva-sive and Mobile Computing, 59:101037, 2019.

[26] Khirod Chandra Sahoo and Umesh Chandra Pati. IoT based intrusion detection system usingPIR sensor. In 2017 2nd IEEE International Conference on Recent Trends in Electronics,Information Communication Technology (RTEICT), pages 1641–1645, 2017.

[27] Shengjun Xiao, Linwang Yuan, Wen Luo, Dongshuang Li, Chunye Zhou, and ZhaoyuanYu. Recovering Human Motion Patterns from Passive Infrared Sensors: A Geometric-Algebra Based Generation-Template-Matching Approach. ISPRS International Journal ofGeo-Information, 8(12), 2019.

[28] W.L. Yu, Zhen Wang, and Lei Jin. The experiment study on infrared radiation spectrum ofhuman body. In Proceedings of 2012 IEEE-EMBS International Conference on Biomedicaland Health Informatics, pages 752–754, 2012.

Page 73: A computer vision-based proposal for seat occupancy ...

REFERENCES 57

[29] Marine City. Fresnel Lenses, 2007. https://web.archive.org/web/20070927021951/http://www.marinecitymich.org/Blank%20Page.htm, accessed:27.06.2021.

[30] TrueOccupancy. Workplace Occupancy Sensors: True Occupancy technology, 2021. https://www.trueoccupancy.com/technology, accessed: 27.06.2021.

[31] Analog Devices. ADI Vision-Based Occupancy Sensing Solutions, 2021. https://www.analog.com/en/design-center/landing-pages/002/apm/vision-based-occupancy-sensing-solutions.html#, accessed: 27.06.2021.

[32] Francesco Paci, Davide Brunelli, and Luca Benini. 0, 1, 2, many — A classroom occupancymonitoring system for smart public buildings. In Proceedings of the 2014 Conference onDesign and Architectures for Signal and Image Processing, pages 1–6, 2014.

[33] Jie Zhang Zhi Liu and Li Geng. An Intelligent Building Occupancy Detection System BasedOn Sparse Auto-encoder. 2017 IEEE Winter Conference on Applications of Computer VisionWorkshops, 2017.

[34] Yosefa Gilon, Fei-Fei Li, Ranjay Krishna, and Danfei Xu. Convolutional Neural Networks(CNNs / ConvNets), 2021. https://cs231n.github.io/convolutional-networks/, accessed: 27.06.2021.

[35] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lofti A. Zadeh. Feature Extraction.Studies in Fuzziness and Soft Computing. Springer Berlin Heidelberg, 2006.

[36] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to documentrecognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[37] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for Activation Functions.arXiv preprint arXiv:1710.05941, 7:1, 2017.

[38] Intersoft Consulting. General Data Protection Regulation (GDPR), 2021. https://gdpr-info.eu/, accessed: 27.06.2021.

[39] Shashank Karthik M., Rohit Poduri, and Sachchidanand Deo. Seat Occupancy Detection,2016. http://icsl.ee.columbia.edu/iot-class/2016fall/group11/#system, accessed: 27.06.2021.

[40] Aditya Kunar. Object Detection with SSD and MobileNet, Jul 2020. https://aditya-kunar-52859.medium.com/object-detection-with-ssd-and-mobilenet-aeedc5917ad0, accessed: 27.06.2021.

[41] Adrian Rosebrock. YOLO object detection with OpenCV. https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv, accessed:27.06.2021.

[42] Rafael Padilla, Wesley L. Passos, Thadeu L. B. Dias, Sergio L. Netto, and Eduardo A. B.da Silva. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics, 10(3), 2021.

[43] Jonahan Hui. Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN,SSD, FPN, RetinaNet and YOLOv3), March 2018. https://jonathan-hui.medium.com/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359, accessed: 27.06.2021.

Page 74: A computer vision-based proposal for seat occupancy ...

58 REFERENCES

[44] Aastha Tiwari and Anil Kumar Goswami amd Mansi Saraswat. Feature Extraction for ObjectRecognition and Image Classification. International Journal of Engineering Research andTechnology (IJERT), 02(10), October 2013.

[45] Joseph Redmon, S. Divvala, Ross B. Girshick, and A. Farhadi. You Only Look Once: Uni-fied, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 779–788, 2016.

[46] Joseph Redmon and Ali Farhadi. YOLOv3: An Incremental Improvement. arXiv preprintarXiv:1804.02767, 2018.

[47] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. YOLOv4: OptimalSpeed and Accuracy of Object Detection. arXiv e-prints, pages arXiv–2004, 2020.

[48] Sik-Ho Tsang. Review: SSD — Single Shot Detector (Object Detection), 2018. https://towardsdatascience.com/review-ssd-single-shot-detector-object-detection-851a94607d11, accessed: 27.06.2021.

[49] Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating Very Deep Convo-lutional Networks for Classification and Detection. IEEE Transactions on Pattern Analysisand Machine Intelligence, 38:1943–1955, 2016.

[50] Yawei Li, Shuhang Gu, Luc Van Gool, and Radu Timofte. Learning Filter Basis for Convo-lutional Neural Network Compression. 2019 IEEE/CVF International Conference on Com-puter Vision (ICCV), pages 5622–5631, 2019.

[51] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single Shot MultiBox Detector. In ECCV, 2016.

[52] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierar-chies for accurate object detection and semantic segmentation. 2014 IEEE Conference onComputer Vision and Pattern Recognition, pages 580–587, 2014.

[53] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, 39:1137–1149, 2015.

[54] Ross B. Girshick. Fast R-CNN. 2015 IEEE International Conference on Computer Vision(ICCV), pages 1440–1448, 2015.

[55] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-FCN: Object Detection via Region-basedFully Convolutional Networks. In Advances in Neural Information Processing Systems, vol-ume 29, pages 379–387, 2016.

[56] MIPI Alliance. MIPI Camera Serial Interface 2 (MIPI CSI-2), 2021. https://www.mipi.org/specifications/csi-2, accessed: 27.06.2021.

[57] Praveen Pavithran. How to run object detection on CCTV feed, 2020. https://cloudxlab.com/blog/how-to-run-yolo-on-cctv-feed/, accessed: 27.06.2021.

[58] Faizan Shaikh. Using Deep Learning, 2017. https://www.analyticsvidhya.com/blog/2017/08/finding-chairs-deep-learning-part-i/, accessed:27.06.2021.

Page 75: A computer vision-based proposal for seat occupancy ...

REFERENCES 59

[59] eMaster Class Academy. Python: Real Time Object Detection (Image, Webcam, Video files)with Yolov3 and OpenCV, 2020. https://www.youtube.com/watch?v=1LCb1PVqzeY, accessed: 27.06.2021.

[60] Igor Panteleyev. How To Implement Object Recognition on Live Stream, 2017. https://www.iotforall.com/objects-recognition-live-stream-yolo-model,accessed: 27.06.2021.

[61] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of MachineLearning Research, 12:2825–2830, 2011.

[62] Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, OlivierGrisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Lay-ton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API design for machinelearning software: experiences from the scikit-learn project. In ECML PKDD Workshop:Languages for Data Mining and Machine Learning, pages 108–122, 2013.

[63] Tim Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259,December 2017.

[64] Daniel Stenberg. Everything cURL. 2018.

[65] NVIDIA. Jetson Nano Developer Kit, 2021. https://developer.nvidia.com/embedded/jetson-nano-developer-kit, accessed: 27.06.2021.

[66] Leigh Johnson. Real-time Object Tracking with TensorFlow, Raspberry Pi, and Pan-TiltHAT, 2019. https://towardsdatascience.com/real-time-object-tracking-with-tensorflow-raspberry-pi-and-pan-tilt-hat-2aeaef47e134,accessed: 27.06.2021.

[67] Shawn Hymel. How to Perform Object Detection with TensorFlow Lite on Raspberry Pi.https://www.digikey.com/en/maker/projects/how-to-perform-object-detection-with-tensorflow-lite-on-raspberry-pi/b929e1519c7c43d5b2c6f89984883588, accessed: 27.06.2021.

[68] Klym Yamkovyi. Object detection with Raspberry Pi and Python, 2018. https://medium.datadriveninvestor.com/object-detection-with-raspberry-pi-and-python-bc6b3a1d4972, accessed: 27.06.2021.

[69] Raspberry Pi Foundation. Raspberry Pi 4 Computer Model B, 2021. https://datasheets.raspberrypi.org/rpi4/raspberry-pi-4-product-brief.pdf, accessed:27.06.2021.

[70] Coral AI. USB Accelerator, 2019. https://coral.ai/docs/accelerator/datasheet/, accessed: 27.06.2021.

[71] Intel. Intel® Neural Compute Stick 2, 2018. https://www.intel.com/content/dam/support/us/en/documents/boardsandkits/neural-compute-sticks/NCS2_Datasheet-English.pdf, accessed: 27.06.2021.

[72] Raspberry Pi Foundation. Camera Module. https://www.raspberrypi.org/documentation/hardware/camera/, accessed: 27.06.2021.

Page 76: A computer vision-based proposal for seat occupancy ...

60 REFERENCES

[73] Dave Jones. Picamera, 2016. https://picamera.readthedocs.io/en/release-1.13/, accessed: 27.06.2021.

[74] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Gir-shick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for FastFeature Embedding. Proceedings of the 22nd ACM international conference on Multimedia,2014.

[75] G. Bradski. The OpenCV library. Dr Dobb’s J. Software Tools, 25:120–125, 2000.

[76] OpenCV. Modules. https://docs.opencv.org/4.5.2/modules.html, accessed:27.06.2021.

[77] Alexander Mordvintsev. Camera Calibration. https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html, accessed: 27.06.2021.

[78] OpenCV. Deep Neural Network module. https://docs.opencv.org/3.4/d6/d0f/group__dnn.html#ga29f34df9376379a603acd8df581ac8d7, accessed:27.06.2021.

[79] Thingsboard. Dashboards. https://thingsboard.io/docs/user-guide/dashboards/#widgets, accessed: 27.06.2021.

[80] Thingsboard. Widgets library: Time-series. https://thingsboard.io/docs/user-guide/ui/widget-library/#time-series, accessed: 27.06.2021.

[81] Thingsboard. Widgets library: Maps widgets. https://thingsboard.io/docs/user-guide/ui/widget-library/#maps-widgets, accessed: 27.06.2021.


Recommended