Development of a Flask server for a video-streaming ...

transcript

Development of a Flask server

for a video-streaming intercom

Alex Costa Sanchez

Escola Tecnica Superior d’Enginyeria de Telecommunicacio de Barcelona

Universitat Politecnica de Catalunya

Supervisor: Prof. Jose A. Lazaro

In partial fulfilment of the requirements for the degree in

Telecommunications Technologies and Services Engineering

Major in Telematics Systems

ETSETB · UPC Barcelona, June 2019

”Any sufficiently advanced technology is indistinguishable from magic.”

– Arthur C. Clarke –

Acknowledgements

I would first like to acknowledge the help and support received from my teaching

assistant, Aaron Barnes, and my fellow teammate at Purdue University, Kalyan Mada,

for their help and support.

I would also like to thank both Purdue University and Universitat Politecnica de

Catalunya for giving my the opportunity of this incredible exchange program.

I would like to show my gratitude to my advisor at UPC, Jose A. Lazaro, for the

time devoted to this project and the interest he has shown.

I cannot thank enough all the teachers and professors I have ever had who not only

taught their subject but also showed passion for what they were doing; it is priceless.

I am extremely fortunate for the friends I have made during these four months at

Purdue and even more fortunate for my friends at Barcelona. I wish to thank them

from helping me having the time of my life these last four years.

Last but not least, I would like to sincerely thank my family and the ones who

love me for their advice and support in any decision I make.

Abstract

The definitive irruption of Internet of Things in our quotidian is already a fact. This

project intents to build from scratch an smart intercom, capable of delivering live

video from your door to your phone, empowering users to remotely control access to

their homes.

This document covers all the steps in this process, from the proposal and design of

the intercom to an overview of the protocols used to implement it. First, it explains

how and why this project was developed; following this introduction, there is a short

discussion on Internet of Things, tackling how this project fits into this concept.

The central part of the project gives a brief overview of the communication prin-

ciples, from the basics to the most interesting details of the HTTP protocol. It

also discusses the implementation of the Model-View-Controller model in a software

project. Then, technical details, difficulties and limitations of each subsystem of the

project are addressed.

Finally, possible future improvements are discussed, as well as some personal con-

clusions.

La irrupcio definitiva de l’Internet de les coses en el nostre dia a dia es ja una realitat.

Aquest projecte vol desenvolupar des de zero un intercomunicador intel·ligent, capac

de transmetre vıdeo en directe des de la porta de casa al telefon mobil, donant als

usuaris la possiblitat de controlar l’acces a llurs cases remotament.

Aquest document cobreix des de la proposta i disseny de l’intercomunicador fins a

una repas dels protocols usats per a implementar-ho. Primerament, s’explica el com

i el perque d’aquest projecte; despres d’aquesta introduccio, hi ha un breu apartat

sobre l’Internet de les coses, fent emfasi en l’encaix del projecte en aquest concepte.

La part central del projecte fa un breu resum del principis de la comunicacio,

des dels conceptes mes basics fins als detalls mes interessants del protocol HTTP.

Tambe es debat la implementacio del model Model-Vista-Controlador en un projecte

de software. A continuacio, es parla dels detalls mes tecnics, dificultats i limitacions

de cada part del projecte.

Finalment, s’exploren possibles futures millores, aixı com les conclusions personals.

Resumen

La irrupcion definitiva de Internet de las cosas en nuestro dıa a dıa ya es una realidad.

Este proyecto quiere desarrollar desde cero un interfono inteligente, capaz de trans-

mitir vıdeo en directo desde la puerta de casa al telefono movil, dando al usuario la

posiblidad de controlar el acceso a su casa remotamente.

Este documento cubre desde la propuesta y diseno del interfono hasta un repaso

de los protocolos usados para implementarlo. Primeramente, se explica la razon de

este proyecto; despues de esta introduccion, hay un breve apartado sobre Internet de

las cosas, poniendo enfasi en el encaje de este proyecto en este concepto.

La parte central del proyecto es un pequeno resumen de los principios de la co-

municacion, desde los conceptos mas basicos hasta los detalles mas interesantes del

protocolo HTTP. Tambien se debate la implementacion del modelo Modelo-Vista-

Controlador en un proyecto de software. A continuacion, se habla de los detalles mas

tecnicos, las dificultades y las limitaciones de cada parte del proyecto.

Finalmente, se exploran posibles mejoras futuras, ası como las conclusiones per-

sonales.

Revision history and approval

record

Revision Date Purpose

0 28/05/2019 Document creation

1 14/06/2019 Document revision

4 25/06/2019 Document approval

DOCUMENT DISTRIBUTION LIST

Name e-mail

Alex Costa Sanchez costa.sanchez@estudiant.upc.edu

Jose A. Lazaro jose.lazaro@tsc.upc.edu

Written by: Reviewed and approved by:

Date 25/06/2019 Date 25/06/2019

Name Alex Costa Sanchez Name Jose A. Lazaro

Position Project Author Position Project Supervisor

Contents

1 Introduction 1

1.1 Statement of purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Product requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Methods and procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Work plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Incidents and modifications . . . . . . . . . . . . . . . . . . . . . . . . 4

2 State of the art 5

3 Relevant previous concepts 7

3.1 The TCP/IP Reference Model . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Understanding the Internet . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 HTTP streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4 Server-sent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Project development 13

4.1 Former project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Programming and the Model-View-Controller model . . . . . . . . . . 14

4.3 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 Android app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5 Hardware device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Results 22

5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2 Final product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3 Final code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 Budget 24

7 Conclusions and future development 26

7.1 Technical conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7.2 Personal conclusions and learning experience . . . . . . . . . . . . . . 28

A Graphical User Interface 30

List of Figures

1.1 Project’s Gantt diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Layers in the TCP/IP reference model. . . . . . . . . . . . . . . . . . . 8

4.1 Model-View-Controller schema. . . . . . . . . . . . . . . . . . . . . . . 14

4.2 RaspberryPi GPIO pins layout. . . . . . . . . . . . . . . . . . . . . . . 21

A.1 Phone index page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

A.2 Phone calls page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

A.3 Phone camera page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A.4 Phone register page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A.5 Phone login page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

A.6 Laptop index page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.7 Laptop calls page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.8 Laptop camera page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

A.9 Laptop register page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

A.10 Laptop login page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

List of Tables

6.1 Material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2 Labour costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Chapter 1

Introduction

1.1 Statement of purpose

The purpose of this project is to build an connected intercom, which allows users to

control their door lock from a phone app, wherever they are. Once a visitor knocks

the door, the user will be notified and a live video of the visitor shown on their phone

screen. The user may unlock the door directly from the phone app.

This product is doubtlessly linked with the idea of Internet of Things (IoT), ”an ex-

tension of Internet connectivity into physical devices and everyday objects, which can

be remotely monitored and controlled.”1 IoT, as every technological breakthrough,

aims to improve our everyday life, making mundane activities easier and more conve-

nient.

The result of this project is a functional prototype, a mere proof of concept not

ready for the market —it was never the goal. This first approach proves the feasibility

of the proposed idea and is meant to be the base from which to explore endless

improvements and possibilities, such as face recognition or automatic door opening.

This ideas are further explored in the Conclusions and future development chapter.

1https://en.wikipedia.org/wiki/Internet_of_things

1.2 Product requirements

The final product shall meet the following requirements:

The intercom shall recognize a knock and act accordingly.

The intercom shall connect to local LAN.

The intercom shall stream video over HyperText Transfer Protocol (HTTP).

The intercom shall unlock the door via a smartphone command.

The intercom shall act as a HTTP server, accepting authenticated clients.

The intercom and the phone app shall communicate over the internet.

The phone app shall display the received video.

The phone app shall allow a user to remotely control their door.

1.3 Methods and procedures

This project was proposed at Purdue University, Indiana, United States. Initially,

it was meant to fulfill the requirements to pass the Senior Design Project of the

Electrical and Computer Engineering major in the aforementioned university. The

description of this project in the Electrical and Computer Engineering department

website2 states that:

Lecture sessions provide the student with background information on

the design and management of projects. (. . . ) During the laboratory

sessions, the students work in teams on a challenging open-ended electrical

engineering project that draws on previous course work.

The overall goal was to go through the whole process of building a functional

prototype.

I proposed a primitive smart intercom idea in the scope of this Senior Design

Project and developed it with the help of my teammates at Purdue. It was enlighten-

ing to carry out the design phase of the project (solution proposal and idea refinement)

there, getting relevant feedback from both teaching staff and fellow students.

2https://engineering.purdue.edu/ECE/Academics/Undergraduates/UGO/AboutUs/CourseInfo/

courseInfo?courseid=661&show=true&type=undergrad

For the reasons explained later in the Incidents and modifications section of this

chapter, I was not satisfied with the resulting prototype at Purdue and, after pre-

senting it there, I developed the prototype again from scratch on my own, keeping in

mind this goal of building a functional prototype from zero.

1.4 Work plan

In the Project Proposal document, also handed in for this project, there is a detailed

planification of the American project. Since that part is not that relevant any more,

in the Gantt diagram below only the redefined product is shown in detail.

Packages

Project proposal and work plan

Research

American prototype

HTTP server

Hardware device

Android app

Final product

Documentation and final report

Gantt diagram

Figure 1.1: Project’s Gantt diagram

1.5 Incidents and modifications

To put it short and simple, I was quite unlucky with my team assignment at Purdue

University —proposed by the teaching staff, based on personal interest on different

proposed projects.

Neither of my teammates showed any interest nor proactiveness for our project.

This fact leaded to a lack of motivation within the group and we clearly did not

perform our best as a team.

Some self-criticism is also necessary. I assumed an imperative leadership role in

this project, as we were completely stuck. I assumed that taking the lead merely

on a technical aspect —designing the whole system and splitting tasks among the

members of the team— was enough, but it clearly was not. I understood it later, and

adapted consequently my role in the team. Had I done it before, things might have

been different, but I will never know.

Despite this issues, I am proud of the outcome of the project in the United States

as I gave my best, working hard to compensate what the others did not acomplish

while taking special care in non-technological aspects, such as product definition, idea

refinement, presentations and leadership.

The result of this situation, which I had foreseen but could not prevent, was a low-

quality prototype I was not satisfied with. Since I expected this outcome in advanced,

my thesis advisor and I agreed I would improve it once I got home.

In conclusion, the conceptual idea for this project comes from Purdue but it has

been developed in Barcelona.

The thesis I am presenting is the result of these improvements and modifications.

Chapter 2

State of the art

As stated in the first section of the Introduction chapter, Internet of Things (IoT) aims

to improve our everyday life, making mundane activities easier and more convenient.

Generally, this is achieved by monitoring tons of data, and using this data to make

smart decisions. We could say that, by now, IoT is all about recollecting data. It

is expected that with the artificial intelligence (AI) explosion all this data will be

converted into smart decision-making.

In the present moment, the society is somehow wondering whether this technology

is a passing fashion or the future. I am convinced IoT is the future, but it raises both

technological and ethical issues we need to handle before is too late.

The two main areas in which the IoT is trying harder to penetrate are vehicles and

smart homes. Some may argue that interconnecting vehicles is not actually internet

of things; nevertheless, it is for me the best example of how IoT can affect —and

hopefully improve— out lives. What is known as Vehicle to Everything (V2X) means

connecting cars with all sorts of vehicles as well as traffic lights and other signals.

This is used to monitor their trajectories and thus improving efficiency and security.

The biggest commercial effort towards the smart home are virtual assistants, such

as Amazon Alexa, Google Assistant, Siri or Microsoft Cortana, to name a few. I dare

to presume that the future of Internet of Things has much less interaction with the

user than this, automatically collecting data and deciding accordingly. The project

detailed in this pages relies heavily on user interaction. In the Conclusions section, this

interaction is discussed as some improvements can be made to increase the product’s

autonomy.

The idea of this project comes from an intercom in American universities, which

is controlled by the phone but over GSM as a regular phone call. Having it connected

to the internet opens a universe of new possibilities, also discussed in the last chapter

of this thesis.

Further research showed that the product implemented in this thesis is already

available for purchase, offered by a bunch of different brands: Gira1, Doorbird2 or

Ring3. This last one is meant to be integrated with Amazon’s Alexa, to advance

towards a fully interconnected home.

The goal of this thesis is to build a smart intercom from scratch, surely lack-

ing some functionalities and proficiency but hopefully proposing a more affordable

prototype.

1https://www.gira.com/en/tuerkommunikation/tuerkommunikation-mobil.html2https://www.doorbird.com3https://eu.ring.com/products/video-doorbell

Chapter 3

Relevant previous concepts

3.1 The TCP/IP Reference Model

The most important issue to handle when communicating two different devices is

the fact that they have to share a common language, they need to implement the

same protocol. One of the main lessons learned from ARPANET, the predecessor

of Internet is that standardizing such protocols is not enough; a protocol framework

is essential. Such framework needs to be general enough to be able to adapt to any

circumstance but at the same time concrete enough to be useful as a common reference

for all protocols. This idea is achieved by clearly delimiting functionalities between

layers or levels, each one of them implementing different protocols.

The Open Systems Interconnection (OSI) model suggests a seven-layers protocol,

as shown in the figure. This model, however, has been simplified to a four layers

model (TCP/IP Reference Model) when used for the Internet. It is this latter and

simpler model I am explaining in detail.

As shown in figure 3.1, the four layers in the TCP/IP Reference Model are, from

top to bottom:

Application layer: this layer contains the communications protocols used in

process-to-process communications. Many protocols are used in this layer, later

in this chapter HTTP will be discovered.

Transport layer: Internet uses mostly the TCP protocol in this layer; UDP is

also fairly common. This layer provides reliability and multiplexity: ports are

numbers used to distinguish between processes (or services) in the same device.

Figure 3.1: Layers in the TCP/IP reference model.

They range between 0 and 65536, and the first 1024 are well-known; they are

used for known services. For instance, browsers use port 80 or 443 by default,

but it can be changed.

Internet layer: the Internet layer handles the communications between devices.

Internet uses the IP protocol in this layer. An IP address is needed to identify

machines in the Internet: four numbers from 0 to 255 separated by dots.

Link layer: this layer makes possible intercommunication between different net-

works such as WiFi or Ethernet. It is responsible for accepting IP datagrams

and communicating them on the local network link.

A last important concept to be understood before we dig into the protocols used

in this project is the client-server model. It is a distributed application structure

which distinguishes between to different sorts of machines: a client and a server.

Typically, a client wants to access information stored in the server. This model

assumes that a server is always listening to new incoming connections so it is the

client who actually needs to start such connections. In the TCP/IP Reference Model

that has just been explained, the client needs two important pieces of information

to start such connection: the IP address of the server and the port number. This

client-server model is slowly becoming obsolete, or at least needs some improvements,

as we will shortly see.

3.2 Understanding the Internet

Most of the times, as in the implementation detailed in the following chapter, those

protocols are fully implemented up to the application layer, so the programmer only

has to worry about the top layer, the one ‘closer’ to the user. For the proper function-

ing of the Internet, many application-level protocols are needed — each one useful for

a particular purpose.

HyperText Transfer Protocol

There is a particularly relevant protocol in this project, HTTP, the language used by

browsers and servers to communicate information over the Internet. In combination

with a HyperText Markup Language (HTML), they are responsible for delivering

nice-looking web pages. HTTP functions in a client-server fashion, where clients send

requests asking for a particular resource and the server answers accordingly.

HTTP is text-oriented, but it also supports multimedia. An HTTP request has a

resource identifier (URL or path), a method and can include extra parameters. This

methods can be many, but mostly they are GET and POST. The former is used to

access a resource, without affecting nor modifying the state of the server; the latter

does indeed modify the server. While originally URLs or paths referred to different

files in a server, they can be used now to indicate different actions to be done by the

server. This method is called RESTful (REpresentational State Transfer ), and it is

used in this project.

An HTTP response contains always a status code indicating the result of the

operation (eg. 200 indicates success, 302 redirection and 404 not found) followed by

an optional header, a blank line and a possible body. Such headers may indicate the

length of the body or equally importantly, the type of response body: plain text,

html, different video formats, mixed. . . Headers also handle cookies, essentials for

maintaining a session in an otherwise stateless protocol.

The discussion about the different versions of HTTP is outside the scope of the

project. It is important to mention, however, that the most up-to-date version is the

HTTP/2 but HTTP/1.1 is still and by far the most popular one.

HyperText Markup Language

Last but not least, a brief mention of HTML, though by many to be another Internet

protocol but actually a markup language. In fact, it is the standard markup language

for all documents designed to be displayed in a web browser. A markup language is a

way to add formatting to documents using only plain text. Its last version is HTML5.

3.3 HTTP streaming

Among the many particularities HTTP has, two of them are crucial for this project:

HTTP streaming and long-polling. In fact, the idea behind both concepts is the same,

how to get the server to push information to a client, when the typical server-client

relation establishes that the client always requests the desired information and the

server responds accordingly.

Among the different alternatives for server pushing, I will explore two of them,

distinguishing the purpose of it. The first approach is to stream video content, and

thus the title of this subsection. Although it is not considered actual streaming, the

final objective is the same: to serve an image from a webcame to a browser in real-

time. The second approach is to know when there is any update in the server, for

instance, notifications in a social network or, in our case, a knock in the door. For

this purpose, there are different possible approaches too, which will be explored in

the following subsection.

This method for server pushing is fairly simple. The resource —in this project is

a video— is accessed through a html video object (specifying the resource in the src

tag). The response given by the server includes a multipart mixed-replace content

type header:

Content-Type: multipart/x-mixed-replace; boundary=frame

Note that the boundary value its completely arbitrary. The server never closes

the connection, pretending it has not finished sending the content to the browser.

Whenever the server has a new image to send to the browser, it sends the boundary

special word, preceded by two dashes, with a blank line before it. It then sends the

correct content-type header, followed by a blank line, followed by the image for that

frame. In our case, since the video format is mpeg, each frame is a jpeg image. Since

the server never actually finishes sendins content, there is a persistent connection,

meaning that the browser will remains connected, waiting for new data (actually,

waiting for the response to end).The browser updates the displayed content as soon

as a new frame is received.

It technically goes against HTTP convention, but it is the most efficient method

to send streamable data over HTTP without reinventing it.

3.4 Server-sent Events

As suggested in the previous subsection, HTTP was not meant for server pushing of

any kind. But it has evolved to allow it with few modifications to the protocol. For

complex applications —most fo applications, nowadays— communication from server

to client without previous request is essential. While we are aiming for real server

push, two of the four proposed alternatives are actually client pull by polling for new

information.

The first option is to directly poll the server, which will respond with the newest

information available. It is efficient neither for the server nor for the client, as there

is a direct relation between latency and frequency of requests.

A direct upgrade to this solution is called long-polling. In long-polling, the server

holds the request until new data is available and then answers back. After every

response received, the client immediately polls again. This method is much better for

the client as it drastically reduces the frequency of requests. However, it can cause

concurrency problems in the server, as requests are held for long periods of time.

Another approach is to use WebSockets, a persistent and bidirectional communi-

cation between server and client over a TCP connection. I will not go into more detail

because we are talking of a whole new protocol and we are trying to stick to regular

Finally, let’s see the most modern approach: Server-sent Events (SSE), a unidi-

rectional server push. SSE is an HTML5 feature that allows the server to keep an

HTTP connection open and push data changes to the client. Basically, it requests a

connection to a given URL and adds a handler —a function that listens to events—

to do something whenever a response is received from this request. The server will

send a response every time an update is available, keeping the connection opened.

To fully explain Server-sent Events, there is an upgrade to HTML needed to

be explained. Originally, HTML was only a markup language, a way to express

and send text formatting over plain text. However, Internet evolved into interactive

web pages, and some kind of code was needed on the client side too. This was

the reason for JavaScript, an object-oriented, event-driven programming language

normally embeded in html pages using the <script> tags.

SSE uses a new JavaScript object, EventSource, supported in HTML5. This fact

could raise compatibility issues but we have to take into account that HTML5’s initial

release was already four years ago and all major web browsers -but Internet Explorer-

support it.

Although the aforementioned compatibility issues that could potentially arise from

using an HTML5 technology in old browsers, I considered Server-sent Events to the

the best and most convenient option for this project.

Chapter 4

Project development

The main goal of this chapter is the justify all the design decisions taken while devel-

oping this project, a step of paramount importance given it was built from scratch.

Programming and the Model-View-Controller model section introduces a program-

ming model for structuring large applications. Then a large and in-depth Server sec-

tion explains all the decisions taken in the central part of the project, including the

programming language, the web framework used and other interesting technical de-

tails. Finally, there is a brief explanation of the other subsytems of the project: the

phone app and the intercom which hosts the server and interacts with the user and

the door.

4.1 Former project

Since this project is a direct evolution from an old one, proposed and presented at

Purdue, I feel the need to give some insights of it, although technical details are not

relevant any more.

The conceptual idea of the product was the same, a device to stream video from a

door to a phone app. The project there was proposed giving much more importance

to the hardware part. We were told to use a ESP32 as a microcontroller -lacking the

computational power a RaspberryPi has, but much more efficient when it comes to

power consumption-, hardware part was large (analog knock detection, step motor

to unlock the door...) and little importance was given to software. For instance, the

communication between the microcontroller and the app was a simple TCP socket

connection lacking any security, authentication or flexibility. This is why, when given

the chance, I redefined the project to a more interesting one from a software perspec-

4.2 Programming and the Model-View-Controller

The most popular programming architectural pattern for applications with a user

interface (UI), and incredibly useful when dealing with dynamic servers as in this

project, is called Model-View-Controller (MVC). This patter separates the code into

three interconnected parts. The model handles the application data structure, in-

dependently of how it is shown to the user. The view takes care of representing the

information —there could even be more than one view for a given application. Finally,

the controller accepts inputs from the user and modifies the model with them.

Figure 4.1: Model-View-Controller schema.

In the following section, I discuss how the server fits into this model.

4.3 Server

Description

The server is the main actor of this project. It is an HTTP server, listening for

connections from HTTP clients (typically browsers). The server —hosted in the

RaspberryPi— also interfaces with the camera and the door lock (simulated with a

couple of LEDs). It displays a web page, which makes it flexible enough to be accessed

from any browser as well as from the phone app explained later.

Python and Flask

This HTTP server has been developed in Python3, using a popular framework called

Flask. Python is an object-oriented, high-level, general purpose interpreted program-

ming language.

Among the main advantages Python offers, I would like to highlight its popularity.

Such popularity means support almost everywhere, forums, libraries and frameworks.

The decision was whether to use Python2, still fully supported in its 2.7 version or

the newest Python3, already 3.7. This was quite an important decision since Python2

and Python3 are not fully compatible. Python3 was the final decision as it is newer

but still has been around for many years.

There are two well-known web frameworks developed for Python: aiohttp and

Flask. I chose the latter because it included more interesting login support, but the

project could have been developed with either of those. Actually, Flask is a micro

framework, but it supports extensions for extra functionalities. It is also interesting

for this project the fact that it uses templates, which will be explained later.

Flask was developed in 2009 by a group of Python enthusiasts.1 Flask implements

a RESTful server —explained in the Understanding the Internet section of the previ-

ous chapter of this document— using Python decorators, a very powerful tool which

allows to modify the behavior of a function. In this case, we do not have to worry

about the whole handling of the request and response but only the most relevant part:

what and how to respond.

The responses are delivered in HTML format. The act of combining the HTTP

protocol and HTML makes it convenient for users to access it from any web browser,

whenever they cannot access the app.

1As a fun side note, Flask’s idea was originally an Aprils Fool’s joke but it was so popular it was

turned into a proper application.

Model-View-Controller

The code of the application is structured following the MVC model, which is explained

in figure 4.1. Actually, it is widely accepted that in web servers there are actually

four components: routes, controller, model and views. The process of delivering a web

page goes as follows. First, a user requests to view a page by entering a given URL.

This route triggers a particular controller function. The controller interacts with the

model or models, which retrieve all the necessary information from a database —when

needed. Finally, the controller loads a view, providing the data from the model. It is

the view what is sent back to the user.

In this Flask application (the full code is available through a link in the Re-

sults chapter), the controller is separated into two different files: init .py and

routes.py. The former initializes the application —this is a standard name for the

initialization, as the application is proposed as a library; the latter has the different

functions that control the application, linked to different URLs using decorators. The

model is written in the models.py file and the views are templates.

One could say the controller is the most important part of any application, as

it is the central part and communicates with the rest of the classes. Using Flask

framework, the routes component is integrated into the controller, simply by adding a

line on top of each function —a decorator— indicating which URL to serve. From the

flask login package, an extension to Flask framework to help handling user logins,

the decorator login required is imported to help handling user authentication. I

this application, there are nine different endpoints, which I will detail below:

GET /index → Answers by rendering the index template. Requires login.

GET /camera → Answers by rendering the camera template. This template

incudes a video tag, with a source attribute that points to /video; a request to

such resource is automatically made by the browser. Requires login.

POST /action → Accepts only POST requests to prevent caching. Does

the appropiate notification to the door —accepting or rejecting—, updates the

database through the model and redirects to the index. Requires login.

GET /calls → Queries the model for database information and renders the

calls template with such information. Requires login.

GET /poll → Handles Server-sent Events requests and responses whenever a

knock is detected. Requires login.

GET, POST /login → If a user is already authenticated, it redirects to .

Otherwise, renders the login template if the method is GET or handles the

login if the method is a POST.

GET /logout → Endpoint for logging out. An imported function handles the

request and the user is redirected to login page, since nothing can be done in

this application unless authenticated.

GET, POST /register→ Renders the register template is the method is GET

or handles the register if the method is a POST. Only other users can register

new users. Requires login.

GET /video→ Streams life video from the device camera as a MIME multipart/x-

mixed-replace for the camera template. Requires login.

Note that all pages that require login redirect automatically to the login page.

The different templates are explained in a couple of paragraphs.

In this application, the model is quite simple. It only includes two classes: User

and Call. User inherits from UserMixin a class defined in the flask login package,

an extension to Flask framework to help handling user logins. Such users contain

a unique id, a username and the hash of their password (for security reasons, plain

passwords are never stored). Calls is used to keep a log of all calls. It includes a

unique id, a timestamp, a recording of the response (open or reject) and the id of

the user how has responded as a foreign key. All this information is stored in a SQL

database.

Views are served through templates. A template is an html file which includes

some code which can be interpreted by the controller, such as conditionals and loops.

The response consists only of html code, but it can vary slightly between requests

—it normally depends on the information retrieved from the database.

This application uses Jinja2 templating language. All templates inherit form a

base template, base.html, which includes the header and other common elements

for every view in the application, such as styling. Using a base template is certainly

useful to give a similar appearance to all pages in the same application. The base

template also includes the javascript code to long-poll for knocks at the door.

This application can serve five different views, directly related to the functionalities

explained when talking about the controller:

Index: welcome page of the application; displays a menu to an authenticated

user who can choose whether to register a new user, to watch the camera at

their door or see a calls log.

Login: page to log in.

Register: page for authenticated users to register new users.

Camera: shows to an authenticated user the camera at their door as well as two

buttons to decide whether to open the door or reject the call.

Calls: shows the register of the calls and actions taken.

Limitations

Flask is run in single-thread mode. This could cause issues when handling long

polling for multiple clients. This is one of the reasons for hosting the server in the

RaspberryPi, since it would never have to handle more than five clients at a time.

For a future project upgrade, a multi-thread framework could be considered.

Interfacing

By default, a Flask server listens to port 5000 and only to locahost. This makes sense

from a security point of view. Provided that this server will be hosted inside a private

network and tough firewalls could be set up inside the router, it is safe enough —and

necessary— to listen to all interfaces.

4.4 Android app

Description

The second interesting subsytem is the app. As mentioned before, this project has

been designed to be as flexible as possible and thus, delivered in a dynamic web page

fashion. The app is pretty simple —please note that it is only to demo the concept,

the main work load was on designing the server. The app is merely an HTTP client.

Technical details

The app has been programmed in Java, the main programming language for Android

applications. Android Studio is the programming environment provided by Android

to develop their applications.

There are a few aspects to explain. As mentioned in the description, the app acts

as an HTTP client. A WebView gadget is used to deliver the content.

As explained in the Server section, Server-sent Events are handled by a code

snippet embedded in the html code, which is executed by the phone app when the

WebView object renders the page. This only makes sense while the user is using the

app, what we can pressume only happens seldom. In order to notify the user, there

are two useful components: notifications and services.

A Service is an application component that can perform long-running operations

in the background, and it doesn’t provide a user interface -exactly what we need to

poll the server for door knocks. Given that there is a special endpoint in the server to

handle SSE (/poll), the phone application service makes an HTTP request to this

URL and, whenever and answer is received, pushes a notification to the lock screen

and/or brings to application to the foreground.

These notifications need a notifications’ channel to be pushed into. Such notifica-

tions’ channel is created the first time the app is run and is used to group notifications

of the same kind all together. Android notifications are highly customizable.

Limitations

Its main limitation is at the same time its biggest advantage. On the one hand, as

it only consists of a WebView, it can implement few features outside from displaying

the received content. On the other hand, it allows to reshape the service from top to

bottom without having to change a single line of code in the phone app, as changing

the server would affect automatically what is shown in the app.

Interfacing

The app interfaces exclusively with the host phone. It needs to connect to the in-

ternet, which is achieved by adding internet permission to the Android manifest file

AndroidManifest.xml.

4.5 Hardware device

Description

Physically, the intercom is a RaspberryPi, which hosts the server, has a camera pe-

ripheral, has a button for knocks and signals door opening/rejection with LEDs.

Technical details

RaspberryPi runs on a Raspbian operating system, an Debian-based distribution for

this kind of devices. Debian is a free software Unix-like operating system. This fact

makes it easy and convenient to develop all sorts of services and applications in it.

The interface with the GPIO pins in the Raspberry to control the LEDs and button

is made through a Python script which uses the GPIO Zero library 2, installed by

default in the Raspbian image.

Once the device is booted, it automatically connects to a WLAN and starts the

HTTP server using a short customized bash script.

Limitations

In a real world prototype, the LED output should be substituted by an actual door

Software-wise, this product can be made as secure as desired; however, the hard-

ware security a RaspberryPi can provide is to be discussed.

Interfacing

A RaspberryPi has 20 GPIO pins as detailed in figure 4.2 and integrated WiFi capa-

bility.

2https://gpiozero.readthedocs.io/en/stable/recipes.html

Figure 4.2: RaspberryPi GPIO pins layout.

Chapter 5

Results

This chapter discusses the result of the project. All project requirements are com-

mented and justified one by one. There is a section discussing the quality of the result-

ing product and a link to a GitHub repository with the final server code. Screenshots

of how a user sees the application can be found in Appendix A.

5.1 Requirements

The intercom shall recognize a knock and act accordingly.

� This requirement is fulfilled. A knock in the door triggers a Server-sent

Event which notifies the user.

The intercom shall connect to local LAN.

� This requirement is fulfilled. A script has been included into the Rasp-

berryPi to automatically connect to a local WLAN. WiFi capability was

already built in.

The intercom shall stream video over HTTP.

� This requirement is fulfilled. The RaspberryPi accesses the camera

through a Python script and serves the video stream to authenticated

users over HTTP.

The intercom shall unlock the door via a smartphone command.

� This requirement is fulfilled. A Python script indicates door opening

by a green LED whenever the server receives a petition to do so.

The intercom shall act as a HTTP server, accepting authenticated clients.

� This requirement is fulfilled. The RaspberryPi hosts the server, which

incorporates client authentication.

The intercom and the phone app shall communicate over the internet.

� This requirement is fulfilled. A already stated, the RaspberryPi has

built-in WiFi connection and the app has internet connection permission.

The phone app shall display the received video.

� This requirement is partially fulfilled. The app displays a whole web

page, including the video from the camera.

The phone app shall allow a user to remotely control their door.

� This requirement is fulfilled. Buttons are displayed to the user to

monitor the camera and open/reject calls.

5.2 Final product

Commenting the requirements is only a tiny part of the discussion on the result of

a project. It is a fact this project would have been proposed differently had it not

been for the initial approach at Purdue. Notwithstanding, the project has evolved

a lot since then. Even so, there is an unavoidable inheritance from there which has

influenced some parts of the project that would have been done differently otherwise.

Despite some flaws in the design and development of the product, I dare to say the

final product meets the expected level of proficiency.

5.3 Final code

The code of the project is available here:

https://github.com/alexcosta13/etsetb-tfg

Chapter 6

Budget

This chapter includes the costs involved in the development of the project. Since

the project presented in this thesis is the RaspberryPi based server developed in

Barcelona, material costs from Purdue are not included but labour hours are, since a

big designing effort was made there.

Material

Concept Cost

Electrical components 7.10e

Raspberry P1 3 B+ 49.00e

Raspberry P1 Charger 10.00e

RPi NoIR Camera V2 33.34e

SD card 7.71e

TOTAL 107.15e

Table 6.1: Material costs

Labour

As highlighted more than once in this document, dedication to the predecessor of this

is project was not equally split between all team members. However, they conducted

a valuable work which needs to be taken into account. The total hours dedicated to

this project is a rough approximation, as it is truly difficult to quantifiacte to total

of hours devoted to a project of such kind. The total hours dedicated by each team

member are calculated by approximating the number of hours dedicated per week

times the number of weeks working on the project.

Wage/hour Total hours Total cost

Undergraduate engineer 8e 200 h 1600e

TOTAL 6400e

Table 6.2: Labour costs

It has to be taken into account that, being this project entirely software-based,

this labour costs are associated to the development of the product. Once it has been

designed and implemented once, the cost to replicate it is imperceptible.

Equipment

While the Raspberry may seem equipment, it is actually part of the final proto-

type. It has been programmed from a computer, which will be considered as the only

equipment needed for developing this project. Provided that I have used my per-

sonal computer and that any computer would have led to the same project output, I

purposely consider its deprecation during the project develpment to be negligible.

According to the calculations explained above, the total cost of the project amounts

to 6500e, roughly.

Chapter 7

Conclusions and future

development

7.1 Technical conclusions

In the Results chapter, achieved items are highlighted and explained. As stated there,

this project has been proposed in a rather particular way to meet requirements at

Purdue University. This means that the main goal was to biuld something from

scratch -showing off what we have learnt during the Bachelor’s degree- and not actu-

ally proposing a novel approach to a problem or developing anything not seen before.

For the reasons stated above, these technical conclusions are more about the process

of developing a product rather than a research culmination.

If this project was prolonged and the product was to be developed further, there

are some proposals to be mentioned. They come from a combination of personal

reflection and external feedback. These proposals are divided between implementation

improvements for the current idea and functionalities additions and modifications to

give the product extra value.

Improved implementation

For a product to be in the market, robustness is needed. The first improvement to

be made in terms of implementation is adding support for devices not supporting

HTML5, since they would not be receive knocks’ notifications. Different server push

techniques have been discussed in the Server-sent Events section. Possibly adding

long-polling support for such devices would be the best idea since it is simple enough

not to have compatibility issues anywhere but at the same time is more efficient than

regular polling. The only aspext we would have to take special care of would be

concurrency in the server.

Another improvement in terms of product robustness and proficiency would be to

actually deploy the server in the RaspberryPi. Currently, what we are testing is a

development server, handy for testing but certainly improvable in terms of efficiency

and security.

Since this project was heavily software based, little importance has been given to

the hardware part, in particular the choice of a microcontroller. This topic would

require some interesting discussion, not in the scope of this project, trying to balance

power consumption and efficiency versus capabilities.

Last but not least, there is room for major improvements in the phone app. It

was conceived merely for testing purposes but both display and notifications can be

heavily upgraded.

Extra functionalities for future development

The first and most obvious improvement to be made to the product in terms of

functionality is adding voice communication. This would mean adding a microphone

to the RaspberryPi and taking special care in audio-image synchronization, otherwise

the user experience would have a significant drawback.

The multiuser capability of the project is not fully explored: some sort of schedule

could be proposed so every user gets notifications by default in some particular time

of the day or they could have different privileges (image viewing, door opening, new

users registration, etc.)

In a more futuristic approach, face recognition could be used to recognize frequent

visitors. This would not be meant to automatically opening the door, for security

concerns, but user experience would be benefited as the visitors’ names could show

up directly in the notifications.

Taking advantage of the fact the phone is already connected to the intercom,

the door could be automatically unlocked by geofencing when an authorized phone

approaches the house. Its implementation would not be so straightforward as it raises

major security concerns —for instance, in case of phone stealing or loss.

Marketing potential

It is imperative in a project of this kind to conclude with a broader reflection on

commercial potential of the designed product or impact on the development of new

technologies.

The first question to be answered it whether there is any market for this kind of

product. As see in the State of the art chapter, there seems to be a market niche.

However, it is already covered by a few specialized companies, some of which with

powerful partnerships as Amazon. These apps are genuinely complete and there is

little room for improvement.

We can assume that this product has no comercial future. I does not contribute

either to advancing any particular technology, since it uses well-known protocols and

does not propose any novel approach.

Still, we have to bear in mind this project was born as a way to demonstrate —to

other but also to ourselves— what we have learnt during this Bachelor’s degree, and

I would say this objective has been amply accomplished.

7.2 Personal conclusions and learning experience

I would like to bring to an end this document with a personal conclusion on this

project, both from a technical and team perspective, as well as on the Bachelor’s

degree this projects puts an end to.

After struggling for four intense years in this wonderful university, this final project

has not been particularly difficult. Do not get me wrong, what I mean is that this

degree has given me tools to cope with concepts I am not familiarized with. Since

there is a ton of material in the internet, it is mostly about knowing which sources of

information to trust, but most importantly to have the criterion to choose the right

solution. On top of that, it is crucial being able to have a relevant project structure

when writing code as well as being efficient and organized; otherwise, you will only

loose your time.

As I briefly mentioned in the Incidents and modifications section in chapter 1, it

was not easy to cope with my team. A lack of motivation from my teammates led to

a motivational low in myself which I was happily able to overcome.

I guess this is the way life is, and we need to learn how to cope with it. I do not

regret any decisions I took while doing this project —I always though they were the

correct ones. However, I like to think I would handle things differently now, since the

experience I have acquired is incredibly valuable. All in all, I have learnt a lot in this

project, and this is all it is about. I am personally very proud of the result of this

project, since I was inconformist enough to start again because I was not satisfied

with the first result.

Appendix A

Graphical User Interface

Figure A.1: Phone index page Figure A.2: Phone calls page

Figure A.3: Phone camera page Figure A.4: Phone register page

Figure A.5: Phone login page

Figure A.6: Laptop index page

Figure A.7: Laptop calls page

Figure A.8: Laptop camera page

Figure A.9: Laptop register page

Figure A.10: Laptop login page

Barcelona, 2019

Development of a Flask server for a video-streaming ...

Documents