Post on 27-Dec-2021
transcript
Development of a Flask server
for a video-streaming intercom
Alex Costa Sanchez
Escola Tecnica Superior d’Enginyeria de Telecommunicacio de Barcelona
Universitat Politecnica de Catalunya
Supervisor: Prof. Jose A. Lazaro
In partial fulfilment of the requirements for the degree in
Telecommunications Technologies and Services Engineering
Major in Telematics Systems
ETSETB · UPC Barcelona, June 2019
”Any sufficiently advanced technology is indistinguishable from magic.”
– Arthur C. Clarke –
Acknowledgements
I would first like to acknowledge the help and support received from my teaching
assistant, Aaron Barnes, and my fellow teammate at Purdue University, Kalyan Mada,
for their help and support.
I would also like to thank both Purdue University and Universitat Politecnica de
Catalunya for giving my the opportunity of this incredible exchange program.
I would like to show my gratitude to my advisor at UPC, Jose A. Lazaro, for the
time devoted to this project and the interest he has shown.
I cannot thank enough all the teachers and professors I have ever had who not only
taught their subject but also showed passion for what they were doing; it is priceless.
I am extremely fortunate for the friends I have made during these four months at
Purdue and even more fortunate for my friends at Barcelona. I wish to thank them
from helping me having the time of my life these last four years.
Last but not least, I would like to sincerely thank my family and the ones who
love me for their advice and support in any decision I make.
i
Abstract
The definitive irruption of Internet of Things in our quotidian is already a fact. This
project intents to build from scratch an smart intercom, capable of delivering live
video from your door to your phone, empowering users to remotely control access to
their homes.
This document covers all the steps in this process, from the proposal and design of
the intercom to an overview of the protocols used to implement it. First, it explains
how and why this project was developed; following this introduction, there is a short
discussion on Internet of Things, tackling how this project fits into this concept.
The central part of the project gives a brief overview of the communication prin-
ciples, from the basics to the most interesting details of the HTTP protocol. It
also discusses the implementation of the Model-View-Controller model in a software
project. Then, technical details, difficulties and limitations of each subsystem of the
project are addressed.
Finally, possible future improvements are discussed, as well as some personal con-
clusions.
ii
Resum
La irrupcio definitiva de l’Internet de les coses en el nostre dia a dia es ja una realitat.
Aquest projecte vol desenvolupar des de zero un intercomunicador intel·ligent, capac
de transmetre vıdeo en directe des de la porta de casa al telefon mobil, donant als
usuaris la possiblitat de controlar l’acces a llurs cases remotament.
Aquest document cobreix des de la proposta i disseny de l’intercomunicador fins a
una repas dels protocols usats per a implementar-ho. Primerament, s’explica el com
i el perque d’aquest projecte; despres d’aquesta introduccio, hi ha un breu apartat
sobre l’Internet de les coses, fent emfasi en l’encaix del projecte en aquest concepte.
La part central del projecte fa un breu resum del principis de la comunicacio,
des dels conceptes mes basics fins als detalls mes interessants del protocol HTTP.
Tambe es debat la implementacio del model Model-Vista-Controlador en un projecte
de software. A continuacio, es parla dels detalls mes tecnics, dificultats i limitacions
de cada part del projecte.
Finalment, s’exploren possibles futures millores, aixı com les conclusions personals.
iii
Resumen
La irrupcion definitiva de Internet de las cosas en nuestro dıa a dıa ya es una realidad.
Este proyecto quiere desarrollar desde cero un interfono inteligente, capaz de trans-
mitir vıdeo en directo desde la puerta de casa al telefono movil, dando al usuario la
posiblidad de controlar el acceso a su casa remotamente.
Este documento cubre desde la propuesta y diseno del interfono hasta un repaso
de los protocolos usados para implementarlo. Primeramente, se explica la razon de
este proyecto; despues de esta introduccion, hay un breve apartado sobre Internet de
las cosas, poniendo enfasi en el encaje de este proyecto en este concepto.
La parte central del proyecto es un pequeno resumen de los principios de la co-
municacion, desde los conceptos mas basicos hasta los detalles mas interesantes del
protocolo HTTP. Tambien se debate la implementacion del modelo Modelo-Vista-
Controlador en un proyecto de software. A continuacion, se habla de los detalles mas
tecnicos, las dificultades y las limitaciones de cada parte del proyecto.
Finalmente, se exploran posibles mejoras futuras, ası como las conclusiones per-
sonales.
iv
Revision history and approval
record
Revision Date Purpose
0 28/05/2019 Document creation
1 14/06/2019 Document revision
2 18/06/2019 Document revision
3 21/06/2019 Document revision
4 25/06/2019 Document approval
DOCUMENT DISTRIBUTION LIST
Name e-mail
Alex Costa Sanchez costa.sanchez@estudiant.upc.edu
Jose A. Lazaro jose.lazaro@tsc.upc.edu
Written by: Reviewed and approved by:
Date 25/06/2019 Date 25/06/2019
Name Alex Costa Sanchez Name Jose A. Lazaro
Position Project Author Position Project Supervisor
v
Contents
1 Introduction 1
1.1 Statement of purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Product requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Methods and procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Work plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Incidents and modifications . . . . . . . . . . . . . . . . . . . . . . . . 4
2 State of the art 5
3 Relevant previous concepts 7
3.1 The TCP/IP Reference Model . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Understanding the Internet . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 HTTP streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Server-sent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Project development 13
4.1 Former project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Programming and the Model-View-Controller model . . . . . . . . . . 14
4.3 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Android app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
vi
4.5 Hardware device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Results 22
5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Final product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Final code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Budget 24
7 Conclusions and future development 26
7.1 Technical conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.2 Personal conclusions and learning experience . . . . . . . . . . . . . . 28
A Graphical User Interface 30
vii
List of Figures
1.1 Project’s Gantt diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1 Layers in the TCP/IP reference model. . . . . . . . . . . . . . . . . . . 8
4.1 Model-View-Controller schema. . . . . . . . . . . . . . . . . . . . . . . 14
4.2 RaspberryPi GPIO pins layout. . . . . . . . . . . . . . . . . . . . . . . 21
A.1 Phone index page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.2 Phone calls page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.3 Phone camera page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.4 Phone register page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.5 Phone login page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.6 Laptop index page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.7 Laptop calls page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.8 Laptop camera page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
A.9 Laptop register page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
A.10 Laptop login page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
viii
List of Tables
6.1 Material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Labour costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
ix
Chapter 1
Introduction
1.1 Statement of purpose
The purpose of this project is to build an connected intercom, which allows users to
control their door lock from a phone app, wherever they are. Once a visitor knocks
the door, the user will be notified and a live video of the visitor shown on their phone
screen. The user may unlock the door directly from the phone app.
This product is doubtlessly linked with the idea of Internet of Things (IoT), ”an ex-
tension of Internet connectivity into physical devices and everyday objects, which can
be remotely monitored and controlled.”1 IoT, as every technological breakthrough,
aims to improve our everyday life, making mundane activities easier and more conve-
nient.
The result of this project is a functional prototype, a mere proof of concept not
ready for the market —it was never the goal. This first approach proves the feasibility
of the proposed idea and is meant to be the base from which to explore endless
improvements and possibilities, such as face recognition or automatic door opening.
This ideas are further explored in the Conclusions and future development chapter.
1https://en.wikipedia.org/wiki/Internet_of_things
1
1.2 Product requirements
The final product shall meet the following requirements:
The intercom shall recognize a knock and act accordingly.
The intercom shall connect to local LAN.
The intercom shall stream video over HyperText Transfer Protocol (HTTP).
The intercom shall unlock the door via a smartphone command.
The intercom shall act as a HTTP server, accepting authenticated clients.
The intercom and the phone app shall communicate over the internet.
The phone app shall display the received video.
The phone app shall allow a user to remotely control their door.
1.3 Methods and procedures
This project was proposed at Purdue University, Indiana, United States. Initially,
it was meant to fulfill the requirements to pass the Senior Design Project of the
Electrical and Computer Engineering major in the aforementioned university. The
description of this project in the Electrical and Computer Engineering department
website2 states that:
Lecture sessions provide the student with background information on
the design and management of projects. (. . . ) During the laboratory
sessions, the students work in teams on a challenging open-ended electrical
engineering project that draws on previous course work.
The overall goal was to go through the whole process of building a functional
prototype.
I proposed a primitive smart intercom idea in the scope of this Senior Design
Project and developed it with the help of my teammates at Purdue. It was enlighten-
ing to carry out the design phase of the project (solution proposal and idea refinement)
there, getting relevant feedback from both teaching staff and fellow students.
2https://engineering.purdue.edu/ECE/Academics/Undergraduates/UGO/AboutUs/CourseInfo/
courseInfo?courseid=661&show=true&type=undergrad
2
For the reasons explained later in the Incidents and modifications section of this
chapter, I was not satisfied with the resulting prototype at Purdue and, after pre-
senting it there, I developed the prototype again from scratch on my own, keeping in
mind this goal of building a functional prototype from zero.
1.4 Work plan
In the Project Proposal document, also handed in for this project, there is a detailed
planification of the American project. Since that part is not that relevant any more,
in the Gantt diagram below only the redefined product is shown in detail.
Packages
Project proposal and work plan
Research
American prototype
HTTP server
Hardware device
Android app
Final product
Documentation and final report
Gantt diagram
Figure 1.1: Project’s Gantt diagram
3
1.5 Incidents and modifications
To put it short and simple, I was quite unlucky with my team assignment at Purdue
University —proposed by the teaching staff, based on personal interest on different
proposed projects.
Neither of my teammates showed any interest nor proactiveness for our project.
This fact leaded to a lack of motivation within the group and we clearly did not
perform our best as a team.
Some self-criticism is also necessary. I assumed an imperative leadership role in
this project, as we were completely stuck. I assumed that taking the lead merely
on a technical aspect —designing the whole system and splitting tasks among the
members of the team— was enough, but it clearly was not. I understood it later, and
adapted consequently my role in the team. Had I done it before, things might have
been different, but I will never know.
Despite this issues, I am proud of the outcome of the project in the United States
as I gave my best, working hard to compensate what the others did not acomplish
while taking special care in non-technological aspects, such as product definition, idea
refinement, presentations and leadership.
The result of this situation, which I had foreseen but could not prevent, was a low-
quality prototype I was not satisfied with. Since I expected this outcome in advanced,
my thesis advisor and I agreed I would improve it once I got home.
In conclusion, the conceptual idea for this project comes from Purdue but it has
been developed in Barcelona.
The thesis I am presenting is the result of these improvements and modifications.
4
Chapter 2
State of the art
As stated in the first section of the Introduction chapter, Internet of Things (IoT) aims
to improve our everyday life, making mundane activities easier and more convenient.
Generally, this is achieved by monitoring tons of data, and using this data to make
smart decisions. We could say that, by now, IoT is all about recollecting data. It
is expected that with the artificial intelligence (AI) explosion all this data will be
converted into smart decision-making.
In the present moment, the society is somehow wondering whether this technology
is a passing fashion or the future. I am convinced IoT is the future, but it raises both
technological and ethical issues we need to handle before is too late.
The two main areas in which the IoT is trying harder to penetrate are vehicles and
smart homes. Some may argue that interconnecting vehicles is not actually internet
of things; nevertheless, it is for me the best example of how IoT can affect —and
hopefully improve— out lives. What is known as Vehicle to Everything (V2X) means
connecting cars with all sorts of vehicles as well as traffic lights and other signals.
This is used to monitor their trajectories and thus improving efficiency and security.
The biggest commercial effort towards the smart home are virtual assistants, such
as Amazon Alexa, Google Assistant, Siri or Microsoft Cortana, to name a few. I dare
to presume that the future of Internet of Things has much less interaction with the
user than this, automatically collecting data and deciding accordingly. The project
detailed in this pages relies heavily on user interaction. In the Conclusions section, this
interaction is discussed as some improvements can be made to increase the product’s
autonomy.
The idea of this project comes from an intercom in American universities, which
5
is controlled by the phone but over GSM as a regular phone call. Having it connected
to the internet opens a universe of new possibilities, also discussed in the last chapter
of this thesis.
Further research showed that the product implemented in this thesis is already
available for purchase, offered by a bunch of different brands: Gira1, Doorbird2 or
Ring3. This last one is meant to be integrated with Amazon’s Alexa, to advance
towards a fully interconnected home.
The goal of this thesis is to build a smart intercom from scratch, surely lack-
ing some functionalities and proficiency but hopefully proposing a more affordable
prototype.
1https://www.gira.com/en/tuerkommunikation/tuerkommunikation-mobil.html2https://www.doorbird.com3https://eu.ring.com/products/video-doorbell
6
Chapter 3
Relevant previous concepts
3.1 The TCP/IP Reference Model
The most important issue to handle when communicating two different devices is
the fact that they have to share a common language, they need to implement the
same protocol. One of the main lessons learned from ARPANET, the predecessor
of Internet is that standardizing such protocols is not enough; a protocol framework
is essential. Such framework needs to be general enough to be able to adapt to any
circumstance but at the same time concrete enough to be useful as a common reference
for all protocols. This idea is achieved by clearly delimiting functionalities between
layers or levels, each one of them implementing different protocols.
The Open Systems Interconnection (OSI) model suggests a seven-layers protocol,
as shown in the figure. This model, however, has been simplified to a four layers
model (TCP/IP Reference Model) when used for the Internet. It is this latter and
simpler model I am explaining in detail.
As shown in figure 3.1, the four layers in the TCP/IP Reference Model are, from
top to bottom:
Application layer: this layer contains the communications protocols used in
process-to-process communications. Many protocols are used in this layer, later
in this chapter HTTP will be discovered.
Transport layer: Internet uses mostly the TCP protocol in this layer; UDP is
also fairly common. This layer provides reliability and multiplexity: ports are
numbers used to distinguish between processes (or services) in the same device.
7
Figure 3.1: Layers in the TCP/IP reference model.
They range between 0 and 65536, and the first 1024 are well-known; they are
used for known services. For instance, browsers use port 80 or 443 by default,
but it can be changed.
Internet layer: the Internet layer handles the communications between devices.
Internet uses the IP protocol in this layer. An IP address is needed to identify
machines in the Internet: four numbers from 0 to 255 separated by dots.
Link layer: this layer makes possible intercommunication between different net-
works such as WiFi or Ethernet. It is responsible for accepting IP datagrams
and communicating them on the local network link.
A last important concept to be understood before we dig into the protocols used
in this project is the client-server model. It is a distributed application structure
which distinguishes between to different sorts of machines: a client and a server.
Typically, a client wants to access information stored in the server. This model
assumes that a server is always listening to new incoming connections so it is the
client who actually needs to start such connections. In the TCP/IP Reference Model
that has just been explained, the client needs two important pieces of information
to start such connection: the IP address of the server and the port number. This
client-server model is slowly becoming obsolete, or at least needs some improvements,
as we will shortly see.
8
3.2 Understanding the Internet
Most of the times, as in the implementation detailed in the following chapter, those
protocols are fully implemented up to the application layer, so the programmer only
has to worry about the top layer, the one ‘closer’ to the user. For the proper function-
ing of the Internet, many application-level protocols are needed — each one useful for
a particular purpose.
HyperText Transfer Protocol
There is a particularly relevant protocol in this project, HTTP, the language used by
browsers and servers to communicate information over the Internet. In combination
with a HyperText Markup Language (HTML), they are responsible for delivering
nice-looking web pages. HTTP functions in a client-server fashion, where clients send
requests asking for a particular resource and the server answers accordingly.
HTTP is text-oriented, but it also supports multimedia. An HTTP request has a
resource identifier (URL or path), a method and can include extra parameters. This
methods can be many, but mostly they are GET and POST. The former is used to
access a resource, without affecting nor modifying the state of the server; the latter
does indeed modify the server. While originally URLs or paths referred to different
files in a server, they can be used now to indicate different actions to be done by the
server. This method is called RESTful (REpresentational State Transfer ), and it is
used in this project.
An HTTP response contains always a status code indicating the result of the
operation (eg. 200 indicates success, 302 redirection and 404 not found) followed by
an optional header, a blank line and a possible body. Such headers may indicate the
length of the body or equally importantly, the type of response body: plain text,
html, different video formats, mixed. . . Headers also handle cookies, essentials for
maintaining a session in an otherwise stateless protocol.
The discussion about the different versions of HTTP is outside the scope of the
project. It is important to mention, however, that the most up-to-date version is the
HTTP/2 but HTTP/1.1 is still and by far the most popular one.
9
HyperText Markup Language
Last but not least, a brief mention of HTML, though by many to be another Internet
protocol but actually a markup language. In fact, it is the standard markup language
for all documents designed to be displayed in a web browser. A markup language is a
way to add formatting to documents using only plain text. Its last version is HTML5.
3.3 HTTP streaming
Among the many particularities HTTP has, two of them are crucial for this project:
HTTP streaming and long-polling. In fact, the idea behind both concepts is the same,
how to get the server to push information to a client, when the typical server-client
relation establishes that the client always requests the desired information and the
server responds accordingly.
Among the different alternatives for server pushing, I will explore two of them,
distinguishing the purpose of it. The first approach is to stream video content, and
thus the title of this subsection. Although it is not considered actual streaming, the
final objective is the same: to serve an image from a webcame to a browser in real-
time. The second approach is to know when there is any update in the server, for
instance, notifications in a social network or, in our case, a knock in the door. For
this purpose, there are different possible approaches too, which will be explored in
the following subsection.
This method for server pushing is fairly simple. The resource —in this project is
a video— is accessed through a html video object (specifying the resource in the src
tag). The response given by the server includes a multipart mixed-replace content
type header:
Content-Type: multipart/x-mixed-replace; boundary=frame
Note that the boundary value its completely arbitrary. The server never closes
the connection, pretending it has not finished sending the content to the browser.
Whenever the server has a new image to send to the browser, it sends the boundary
special word, preceded by two dashes, with a blank line before it. It then sends the
correct content-type header, followed by a blank line, followed by the image for that
frame. In our case, since the video format is mpeg, each frame is a jpeg image. Since
the server never actually finishes sendins content, there is a persistent connection,
meaning that the browser will remains connected, waiting for new data (actually,
waiting for the response to end).The browser updates the displayed content as soon
10
as a new frame is received.
It technically goes against HTTP convention, but it is the most efficient method
to send streamable data over HTTP without reinventing it.
3.4 Server-sent Events
As suggested in the previous subsection, HTTP was not meant for server pushing of
any kind. But it has evolved to allow it with few modifications to the protocol. For
complex applications —most fo applications, nowadays— communication from server
to client without previous request is essential. While we are aiming for real server
push, two of the four proposed alternatives are actually client pull by polling for new
information.
The first option is to directly poll the server, which will respond with the newest
information available. It is efficient neither for the server nor for the client, as there
is a direct relation between latency and frequency of requests.
A direct upgrade to this solution is called long-polling. In long-polling, the server
holds the request until new data is available and then answers back. After every
response received, the client immediately polls again. This method is much better for
the client as it drastically reduces the frequency of requests. However, it can cause
concurrency problems in the server, as requests are held for long periods of time.
Another approach is to use WebSockets, a persistent and bidirectional communi-
cation between server and client over a TCP connection. I will not go into more detail
because we are talking of a whole new protocol and we are trying to stick to regular
HTTP.
Finally, let’s see the most modern approach: Server-sent Events (SSE), a unidi-
rectional server push. SSE is an HTML5 feature that allows the server to keep an
HTTP connection open and push data changes to the client. Basically, it requests a
connection to a given URL and adds a handler —a function that listens to events—
to do something whenever a response is received from this request. The server will
send a response every time an update is available, keeping the connection opened.
To fully explain Server-sent Events, there is an upgrade to HTML needed to
be explained. Originally, HTML was only a markup language, a way to express
and send text formatting over plain text. However, Internet evolved into interactive
web pages, and some kind of code was needed on the client side too. This was
the reason for JavaScript, an object-oriented, event-driven programming language
11
normally embeded in html pages using the <script> tags.
SSE uses a new JavaScript object, EventSource, supported in HTML5. This fact
could raise compatibility issues but we have to take into account that HTML5’s initial
release was already four years ago and all major web browsers -but Internet Explorer-
support it.
Although the aforementioned compatibility issues that could potentially arise from
using an HTML5 technology in old browsers, I considered Server-sent Events to the
the best and most convenient option for this project.
12
Chapter 4
Project development
The main goal of this chapter is the justify all the design decisions taken while devel-
oping this project, a step of paramount importance given it was built from scratch.
Programming and the Model-View-Controller model section introduces a program-
ming model for structuring large applications. Then a large and in-depth Server sec-
tion explains all the decisions taken in the central part of the project, including the
programming language, the web framework used and other interesting technical de-
tails. Finally, there is a brief explanation of the other subsytems of the project: the
phone app and the intercom which hosts the server and interacts with the user and
the door.
4.1 Former project
Since this project is a direct evolution from an old one, proposed and presented at
Purdue, I feel the need to give some insights of it, although technical details are not
relevant any more.
The conceptual idea of the product was the same, a device to stream video from a
door to a phone app. The project there was proposed giving much more importance
to the hardware part. We were told to use a ESP32 as a microcontroller -lacking the
computational power a RaspberryPi has, but much more efficient when it comes to
power consumption-, hardware part was large (analog knock detection, step motor
to unlock the door...) and little importance was given to software. For instance, the
communication between the microcontroller and the app was a simple TCP socket
connection lacking any security, authentication or flexibility. This is why, when given
13
the chance, I redefined the project to a more interesting one from a software perspec-
tive.
4.2 Programming and the Model-View-Controller
model
The most popular programming architectural pattern for applications with a user
interface (UI), and incredibly useful when dealing with dynamic servers as in this
project, is called Model-View-Controller (MVC). This patter separates the code into
three interconnected parts. The model handles the application data structure, in-
dependently of how it is shown to the user. The view takes care of representing the
information —there could even be more than one view for a given application. Finally,
the controller accepts inputs from the user and modifies the model with them.
Figure 4.1: Model-View-Controller schema.
In the following section, I discuss how the server fits into this model.
14
4.3 Server
Description
The server is the main actor of this project. It is an HTTP server, listening for
connections from HTTP clients (typically browsers). The server —hosted in the
RaspberryPi— also interfaces with the camera and the door lock (simulated with a
couple of LEDs). It displays a web page, which makes it flexible enough to be accessed
from any browser as well as from the phone app explained later.
Python and Flask
This HTTP server has been developed in Python3, using a popular framework called
Flask. Python is an object-oriented, high-level, general purpose interpreted program-
ming language.
Among the main advantages Python offers, I would like to highlight its popularity.
Such popularity means support almost everywhere, forums, libraries and frameworks.
The decision was whether to use Python2, still fully supported in its 2.7 version or
the newest Python3, already 3.7. This was quite an important decision since Python2
and Python3 are not fully compatible. Python3 was the final decision as it is newer
but still has been around for many years.
There are two well-known web frameworks developed for Python: aiohttp and
Flask. I chose the latter because it included more interesting login support, but the
project could have been developed with either of those. Actually, Flask is a micro
framework, but it supports extensions for extra functionalities. It is also interesting
for this project the fact that it uses templates, which will be explained later.
Flask was developed in 2009 by a group of Python enthusiasts.1 Flask implements
a RESTful server —explained in the Understanding the Internet section of the previ-
ous chapter of this document— using Python decorators, a very powerful tool which
allows to modify the behavior of a function. In this case, we do not have to worry
about the whole handling of the request and response but only the most relevant part:
what and how to respond.
The responses are delivered in HTML format. The act of combining the HTTP
protocol and HTML makes it convenient for users to access it from any web browser,
whenever they cannot access the app.
1As a fun side note, Flask’s idea was originally an Aprils Fool’s joke but it was so popular it was
turned into a proper application.
15
Model-View-Controller
The code of the application is structured following the MVC model, which is explained
in figure 4.1. Actually, it is widely accepted that in web servers there are actually
four components: routes, controller, model and views. The process of delivering a web
page goes as follows. First, a user requests to view a page by entering a given URL.
This route triggers a particular controller function. The controller interacts with the
model or models, which retrieve all the necessary information from a database —when
needed. Finally, the controller loads a view, providing the data from the model. It is
the view what is sent back to the user.
In this Flask application (the full code is available through a link in the Re-
sults chapter), the controller is separated into two different files: init .py and
routes.py. The former initializes the application —this is a standard name for the
initialization, as the application is proposed as a library; the latter has the different
functions that control the application, linked to different URLs using decorators. The
model is written in the models.py file and the views are templates.
One could say the controller is the most important part of any application, as
it is the central part and communicates with the rest of the classes. Using Flask
framework, the routes component is integrated into the controller, simply by adding a
line on top of each function —a decorator— indicating which URL to serve. From the
flask login package, an extension to Flask framework to help handling user logins,
the decorator login required is imported to help handling user authentication. I
this application, there are nine different endpoints, which I will detail below:
GET /index → Answers by rendering the index template. Requires login.
GET /camera → Answers by rendering the camera template. This template
incudes a video tag, with a source attribute that points to /video; a request to
such resource is automatically made by the browser. Requires login.
POST /action → Accepts only POST requests to prevent caching. Does
the appropiate notification to the door —accepting or rejecting—, updates the
database through the model and redirects to the index. Requires login.
GET /calls → Queries the model for database information and renders the
calls template with such information. Requires login.
GET /poll → Handles Server-sent Events requests and responses whenever a
knock is detected. Requires login.
16
GET, POST /login → If a user is already authenticated, it redirects to .
Otherwise, renders the login template if the method is GET or handles the
login if the method is a POST.
GET /logout → Endpoint for logging out. An imported function handles the
request and the user is redirected to login page, since nothing can be done in
this application unless authenticated.
GET, POST /register→ Renders the register template is the method is GET
or handles the register if the method is a POST. Only other users can register
new users. Requires login.
GET /video→ Streams life video from the device camera as a MIME multipart/x-
mixed-replace for the camera template. Requires login.
Note that all pages that require login redirect automatically to the login page.
The different templates are explained in a couple of paragraphs.
In this application, the model is quite simple. It only includes two classes: User
and Call. User inherits from UserMixin a class defined in the flask login package,
an extension to Flask framework to help handling user logins. Such users contain
a unique id, a username and the hash of their password (for security reasons, plain
passwords are never stored). Calls is used to keep a log of all calls. It includes a
unique id, a timestamp, a recording of the response (open or reject) and the id of
the user how has responded as a foreign key. All this information is stored in a SQL
database.
Views are served through templates. A template is an html file which includes
some code which can be interpreted by the controller, such as conditionals and loops.
The response consists only of html code, but it can vary slightly between requests
—it normally depends on the information retrieved from the database.
This application uses Jinja2 templating language. All templates inherit form a
base template, base.html, which includes the header and other common elements
for every view in the application, such as styling. Using a base template is certainly
useful to give a similar appearance to all pages in the same application. The base
template also includes the javascript code to long-poll for knocks at the door.
This application can serve five different views, directly related to the functionalities
explained when talking about the controller:
Index: welcome page of the application; displays a menu to an authenticated
user who can choose whether to register a new user, to watch the camera at
17
their door or see a calls log.
Login: page to log in.
Register: page for authenticated users to register new users.
Camera: shows to an authenticated user the camera at their door as well as two
buttons to decide whether to open the door or reject the call.
Calls: shows the register of the calls and actions taken.
Limitations
Flask is run in single-thread mode. This could cause issues when handling long
polling for multiple clients. This is one of the reasons for hosting the server in the
RaspberryPi, since it would never have to handle more than five clients at a time.
For a future project upgrade, a multi-thread framework could be considered.
Interfacing
By default, a Flask server listens to port 5000 and only to locahost. This makes sense
from a security point of view. Provided that this server will be hosted inside a private
network and tough firewalls could be set up inside the router, it is safe enough —and
necessary— to listen to all interfaces.
4.4 Android app
Description
The second interesting subsytem is the app. As mentioned before, this project has
been designed to be as flexible as possible and thus, delivered in a dynamic web page
fashion. The app is pretty simple —please note that it is only to demo the concept,
the main work load was on designing the server. The app is merely an HTTP client.
Technical details
The app has been programmed in Java, the main programming language for Android
applications. Android Studio is the programming environment provided by Android
to develop their applications.
18
There are a few aspects to explain. As mentioned in the description, the app acts
as an HTTP client. A WebView gadget is used to deliver the content.
As explained in the Server section, Server-sent Events are handled by a code
snippet embedded in the html code, which is executed by the phone app when the
WebView object renders the page. This only makes sense while the user is using the
app, what we can pressume only happens seldom. In order to notify the user, there
are two useful components: notifications and services.
A Service is an application component that can perform long-running operations
in the background, and it doesn’t provide a user interface -exactly what we need to
poll the server for door knocks. Given that there is a special endpoint in the server to
handle SSE (/poll), the phone application service makes an HTTP request to this
URL and, whenever and answer is received, pushes a notification to the lock screen
and/or brings to application to the foreground.
These notifications need a notifications’ channel to be pushed into. Such notifica-
tions’ channel is created the first time the app is run and is used to group notifications
of the same kind all together. Android notifications are highly customizable.
Limitations
Its main limitation is at the same time its biggest advantage. On the one hand, as
it only consists of a WebView, it can implement few features outside from displaying
the received content. On the other hand, it allows to reshape the service from top to
bottom without having to change a single line of code in the phone app, as changing
the server would affect automatically what is shown in the app.
Interfacing
The app interfaces exclusively with the host phone. It needs to connect to the in-
ternet, which is achieved by adding internet permission to the Android manifest file
AndroidManifest.xml.
19
4.5 Hardware device
Description
Physically, the intercom is a RaspberryPi, which hosts the server, has a camera pe-
ripheral, has a button for knocks and signals door opening/rejection with LEDs.
Technical details
RaspberryPi runs on a Raspbian operating system, an Debian-based distribution for
this kind of devices. Debian is a free software Unix-like operating system. This fact
makes it easy and convenient to develop all sorts of services and applications in it.
The interface with the GPIO pins in the Raspberry to control the LEDs and button
is made through a Python script which uses the GPIO Zero library 2, installed by
default in the Raspbian image.
Once the device is booted, it automatically connects to a WLAN and starts the
HTTP server using a short customized bash script.
Limitations
In a real world prototype, the LED output should be substituted by an actual door
lock.
Software-wise, this product can be made as secure as desired; however, the hard-
ware security a RaspberryPi can provide is to be discussed.
Interfacing
A RaspberryPi has 20 GPIO pins as detailed in figure 4.2 and integrated WiFi capa-
bility.
2https://gpiozero.readthedocs.io/en/stable/recipes.html
20
Figure 4.2: RaspberryPi GPIO pins layout.
21
Chapter 5
Results
This chapter discusses the result of the project. All project requirements are com-
mented and justified one by one. There is a section discussing the quality of the result-
ing product and a link to a GitHub repository with the final server code. Screenshots
of how a user sees the application can be found in Appendix A.
5.1 Requirements
The intercom shall recognize a knock and act accordingly.
� This requirement is fulfilled. A knock in the door triggers a Server-sent
Event which notifies the user.
The intercom shall connect to local LAN.
� This requirement is fulfilled. A script has been included into the Rasp-
berryPi to automatically connect to a local WLAN. WiFi capability was
already built in.
The intercom shall stream video over HTTP.
� This requirement is fulfilled. The RaspberryPi accesses the camera
through a Python script and serves the video stream to authenticated
users over HTTP.
The intercom shall unlock the door via a smartphone command.
� This requirement is fulfilled. A Python script indicates door opening
by a green LED whenever the server receives a petition to do so.
22
The intercom shall act as a HTTP server, accepting authenticated clients.
� This requirement is fulfilled. The RaspberryPi hosts the server, which
incorporates client authentication.
The intercom and the phone app shall communicate over the internet.
� This requirement is fulfilled. A already stated, the RaspberryPi has
built-in WiFi connection and the app has internet connection permission.
The phone app shall display the received video.
� This requirement is partially fulfilled. The app displays a whole web
page, including the video from the camera.
The phone app shall allow a user to remotely control their door.
� This requirement is fulfilled. Buttons are displayed to the user to
monitor the camera and open/reject calls.
5.2 Final product
Commenting the requirements is only a tiny part of the discussion on the result of
a project. It is a fact this project would have been proposed differently had it not
been for the initial approach at Purdue. Notwithstanding, the project has evolved
a lot since then. Even so, there is an unavoidable inheritance from there which has
influenced some parts of the project that would have been done differently otherwise.
Despite some flaws in the design and development of the product, I dare to say the
final product meets the expected level of proficiency.
5.3 Final code
The code of the project is available here:
https://github.com/alexcosta13/etsetb-tfg
23
Chapter 6
Budget
This chapter includes the costs involved in the development of the project. Since
the project presented in this thesis is the RaspberryPi based server developed in
Barcelona, material costs from Purdue are not included but labour hours are, since a
big designing effort was made there.
Material
Concept Cost
Electrical components 7.10e
Raspberry P1 3 B+ 49.00e
Raspberry P1 Charger 10.00e
RPi NoIR Camera V2 33.34e
SD card 7.71e
TOTAL 107.15e
Table 6.1: Material costs
Labour
As highlighted more than once in this document, dedication to the predecessor of this
is project was not equally split between all team members. However, they conducted
a valuable work which needs to be taken into account. The total hours dedicated to
this project is a rough approximation, as it is truly difficult to quantifiacte to total
24
of hours devoted to a project of such kind. The total hours dedicated by each team
member are calculated by approximating the number of hours dedicated per week
times the number of weeks working on the project.
Wage/hour Total hours Total cost
Undergraduate engineer 8e 200 h 1600e
Undergraduate engineer 8e 100 h 800e
Undergraduate engineer 8e 500 h 4000e
TOTAL 6400e
Table 6.2: Labour costs
It has to be taken into account that, being this project entirely software-based,
this labour costs are associated to the development of the product. Once it has been
designed and implemented once, the cost to replicate it is imperceptible.
Equipment
While the Raspberry may seem equipment, it is actually part of the final proto-
type. It has been programmed from a computer, which will be considered as the only
equipment needed for developing this project. Provided that I have used my per-
sonal computer and that any computer would have led to the same project output, I
purposely consider its deprecation during the project develpment to be negligible.
Total
According to the calculations explained above, the total cost of the project amounts
to 6500e, roughly.
25
Chapter 7
Conclusions and future
development
7.1 Technical conclusions
In the Results chapter, achieved items are highlighted and explained. As stated there,
this project has been proposed in a rather particular way to meet requirements at
Purdue University. This means that the main goal was to biuld something from
scratch -showing off what we have learnt during the Bachelor’s degree- and not actu-
ally proposing a novel approach to a problem or developing anything not seen before.
For the reasons stated above, these technical conclusions are more about the process
of developing a product rather than a research culmination.
If this project was prolonged and the product was to be developed further, there
are some proposals to be mentioned. They come from a combination of personal
reflection and external feedback. These proposals are divided between implementation
improvements for the current idea and functionalities additions and modifications to
give the product extra value.
Improved implementation
For a product to be in the market, robustness is needed. The first improvement to
be made in terms of implementation is adding support for devices not supporting
HTML5, since they would not be receive knocks’ notifications. Different server push
techniques have been discussed in the Server-sent Events section. Possibly adding
26
long-polling support for such devices would be the best idea since it is simple enough
not to have compatibility issues anywhere but at the same time is more efficient than
regular polling. The only aspext we would have to take special care of would be
concurrency in the server.
Another improvement in terms of product robustness and proficiency would be to
actually deploy the server in the RaspberryPi. Currently, what we are testing is a
development server, handy for testing but certainly improvable in terms of efficiency
and security.
Since this project was heavily software based, little importance has been given to
the hardware part, in particular the choice of a microcontroller. This topic would
require some interesting discussion, not in the scope of this project, trying to balance
power consumption and efficiency versus capabilities.
Last but not least, there is room for major improvements in the phone app. It
was conceived merely for testing purposes but both display and notifications can be
heavily upgraded.
Extra functionalities for future development
The first and most obvious improvement to be made to the product in terms of
functionality is adding voice communication. This would mean adding a microphone
to the RaspberryPi and taking special care in audio-image synchronization, otherwise
the user experience would have a significant drawback.
The multiuser capability of the project is not fully explored: some sort of schedule
could be proposed so every user gets notifications by default in some particular time
of the day or they could have different privileges (image viewing, door opening, new
users registration, etc.)
In a more futuristic approach, face recognition could be used to recognize frequent
visitors. This would not be meant to automatically opening the door, for security
concerns, but user experience would be benefited as the visitors’ names could show
up directly in the notifications.
Taking advantage of the fact the phone is already connected to the intercom,
the door could be automatically unlocked by geofencing when an authorized phone
approaches the house. Its implementation would not be so straightforward as it raises
major security concerns —for instance, in case of phone stealing or loss.
27
Marketing potential
It is imperative in a project of this kind to conclude with a broader reflection on
commercial potential of the designed product or impact on the development of new
technologies.
The first question to be answered it whether there is any market for this kind of
product. As see in the State of the art chapter, there seems to be a market niche.
However, it is already covered by a few specialized companies, some of which with
powerful partnerships as Amazon. These apps are genuinely complete and there is
little room for improvement.
We can assume that this product has no comercial future. I does not contribute
either to advancing any particular technology, since it uses well-known protocols and
does not propose any novel approach.
Still, we have to bear in mind this project was born as a way to demonstrate —to
other but also to ourselves— what we have learnt during this Bachelor’s degree, and
I would say this objective has been amply accomplished.
7.2 Personal conclusions and learning experience
I would like to bring to an end this document with a personal conclusion on this
project, both from a technical and team perspective, as well as on the Bachelor’s
degree this projects puts an end to.
After struggling for four intense years in this wonderful university, this final project
has not been particularly difficult. Do not get me wrong, what I mean is that this
degree has given me tools to cope with concepts I am not familiarized with. Since
there is a ton of material in the internet, it is mostly about knowing which sources of
information to trust, but most importantly to have the criterion to choose the right
solution. On top of that, it is crucial being able to have a relevant project structure
when writing code as well as being efficient and organized; otherwise, you will only
loose your time.
As I briefly mentioned in the Incidents and modifications section in chapter 1, it
was not easy to cope with my team. A lack of motivation from my teammates led to
a motivational low in myself which I was happily able to overcome.
I guess this is the way life is, and we need to learn how to cope with it. I do not
regret any decisions I took while doing this project —I always though they were the
28
correct ones. However, I like to think I would handle things differently now, since the
experience I have acquired is incredibly valuable. All in all, I have learnt a lot in this
project, and this is all it is about. I am personally very proud of the result of this
project, since I was inconformist enough to start again because I was not satisfied
with the first result.
29
Appendix A
Graphical User Interface
Figure A.1: Phone index page Figure A.2: Phone calls page
30
Figure A.3: Phone camera page Figure A.4: Phone register page
31
Figure A.5: Phone login page
32
Figure A.6: Laptop index page
Figure A.7: Laptop calls page
33
Figure A.8: Laptop camera page
Figure A.9: Laptop register page
34
Figure A.10: Laptop login page
35
Barcelona, 2019