Richard Tano
November 25, 2011 Master’s Thesis in Engineering Physics, 30
credits
Supervisor at Ericsson: David Lindegren Examiner: Jerry
Eriksson
Umea University Department of physics
SE-901 87 UMEA SWEDEN
Abstract
This Master Thesis report was written by Umea University
Engineering Physics student Richard Tano during his thesis work at
Ericsson Lulea.
Monitoring network quality is of utmost importance to network
providers. This can be done with models evaluating QoS (Quality of
Service) and conforming to ITU-T Recommen- dations. When
determining video stream quality there is of more importance to
evaluate the QoE (Quality of Experience) to understand how the user
perceives the quality. This is ranked in MOS (Mean opinion scores)
values. An important aspect of determining the QoE is the video
content type, which is correlated to the coding complexity and MOS
values of the video. In this work the possibilities to improve
quality estimation models complying to ITU-T study group 12 (q.14)
was investigated. Methods were evaluated and an algorithm was
developed that applies time series analysis of packet statistics
for determination of video streams MOS scores. Methods used in the
algorithm includes a novel assembling of frequent pattern analysis
and regression analysis. A model which incorporates the algorithm
for us- age from low to high bitrates was defined. The new model
resulted in around 20% improved precision in MOS score estimation
compared to the existing reference model. Furthermore an algorithm
using only regression statistics and modeling of related
statistical parameters was developed. Improvements in coding
estimation was comparable with earlier algorithm but efficiency
increased considerably.
Bestamning av innehall pa multimedia-strommar
Sammanfattning
Detta examensarbete skrevs av Richard Tano student pa Umea
universitet at Ericsson Lulea.
Overvakning av natets prestanda ar av yttersta vikt for
natverksleverantorer. Detta gors med modeller for att utvardera QoS
(Quality of Service) som overensstammer med ITU-T rekommendationer.
Vid bestamning av kvaliten pa videostrommar ar det mer meningsfullt
att utvardera QoE (Quality of Experience) for att fa insikt i hur
anvandaren uppfattar kvaliten. Detta graderas i varden av MOS (Mean
opinion score). En viktig aspekt for att bestamma QoE ar typen av
videoinnehall, vilket ar korrelerat till videons kodningskom-
plexitet och MOS varden. I detta arbete undersoktes mojligheterna
att forbattra kvalitet- suppskattningsmodellerna under uppfyllande
av ITU-T studygroup 12 (q.14). Metoder un- dersoktes och en
algoritm utvecklades som anvander tidsserieanalys av paketstatistik
for up- pskattning av videostrommars MOS-varden. Metoder som ingar
i algoritmen ar en nyutveck- lad frekventa monster metod
tillsammans med regressions analys. En modell som anvander
algoritmen fran lag till hog bithastighet definierades. Den nya
modellen gav omkring 20% forbattrad precision i uppskattning av
MOS-varden jamfort med existerande referensmodell. Aven en algoritm
som enbart anvander regressionsstatistik och modellerande av
statistiska parametrar utvecklades. Denna algoritm levererade
jamforbara resultat med foregaende algoritm men gav aven kraftigt
forbattrad effektivitet.
ii
Acronyms
QoS Quality of Service
QoE Quality of Experience
2.4.3 Tools utilized . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 4
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 5
3.7 Regression analysis . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 16
3.7.2 Logistic regression . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 17
4.1.3 Regression analysis . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 21
4.1.5 Algorithm stepwise . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 24
4.2 Algorithm numerical results . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 26
4.2.1 Mathematical approach . . . . . . . . . . . . . . . . . . . .
. . . . . . 26
5 Conclusions 33
5.2.2 Time series matching . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 34
5.2.3 FPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 34
5.2.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 34
5.3 Discussion of results . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 36
5.4 Restrictions and limitations . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 37
5.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 37
B Videostreams ordered in quality 53
List of Figures
3.1 Video stream as frame sizes versus time (measured in frame
number) . . . . . 8
3.2 RTP header scheme [7] . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 10
3.3 Intra coding . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 11
3.4 Inter coding . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 12
3.5 Quality (PEVQ) vs bit rate for three different contents . . . .
. . . . . . . . . 13
3.6 Basic structure of PEVQ measure algorithm [14] . . . . . . . .
. . . . . . . . 14
4.1 Compressed frame sequence . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 21
4.2 Flow of algorithm . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 25
4.3 Error reduction versus bitrates for selected algorithms . . . .
. . . . . . . . . 29
4.4 Residuals of old regression model . . . . . . . . . . . . . . .
. . . . . . . . . . 30
4.5 Residuals of linear regression model . . . . . . . . . . . . .
. . . . . . . . . . 30
4.6 Residuals of logistik regression model . . . . . . . . . . . .
. . . . . . . . . . . 31
4.7 Residuals of combined regression model . . . . . . . . . . . .
. . . . . . . . . 31
4.8 Residual comparison at 300kb/s bitrate between existing model
and new
model using algorithm V2 . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 32
4.9 RMSE reference vs RMSE modeled regression for various bitrates
. . . . . . . 32
A.1 LR regression parameter . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 43
A.2 Standard deviation of frame diff sequences . . . . . . . . . .
. . . . . . . . . 44
A.3 Median of frame sequence . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 44
A.4 Longest calm period of frame sequences . . . . . . . . . . . .
. . . . . . . . . 45
A.5 Longest small period of frame difference sequences . . . . . .
. . . . . . . . . 45
A.6 Longest large period of frame sequences . . . . . . . . . . . .
. . . . . . . . . 46
A.7 Number of passes through median of frame sequences . . . . . .
. . . . . . . 46
A.8 LOR category intercept1 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 47
A.9 LOR category intercept2 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 47
A.10 LOR category intercept3 . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 48
A.11 LOR category intercept4 . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 48
A.12 Mean of frame difference sequences . . . . . . . . . . . . . .
. . . . . . . . . . 49
A.13 Variance of frame sequences . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 49
v
A.15 Median of frame sequences . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 50
A.16 Longest calm period of frame sequences . . . . . . . . . . . .
. . . . . . . . . 51
A.17 Longest small period of frame difference sequences . . . . . .
. . . . . . . . . 51
A.18 Longest small period of frame sequences . . . . . . . . . . .
. . . . . . . . . . 52
B.1 A number of frame sequences with PEVQ values of around 0-2.8 .
. . . . . . 53
B.2 A number of frame sequences with PEVQ values of around 2.9-3.2
. . . . . . 54
B.3 A number of frame sequences with PEVQ values of around 3.2-5 .
. . . . . . 54
List of Tables
4.3 LR modeled . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 26
4.4 LOR modeled . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 27
4.7 RMSE for algorithms versus bitrate . . . . . . . . . . . . . .
. . . . . . . . . 30
vii
Introduction
Multimedia streaming has never been more used than now with Youtube
and mobile TV ex-
ploding in popularity. By monitoring these services mobile
operators and internet providers
can see how well their networks are working and possibly how happy
their customers are,
without having to arrange polls and surveys. An important service
is video streaming which
is one of the most bandwidth demanding services. One way to monitor
service quality is
that all user traffic in a live network are used, where hundreds of
thousands video clients
are reporting back to a measurement collection server. Lots of
different contents will be
used, and there is no way for the client to obtain the original,
un-coded, file. In this case
a parametric video quality model is used, that only takes
parameters from the client and
sends a quality score back to the measurement server. Parametric
models have so far no
good way to separate different all contents, meaning that the same
performance indicators
(packet loss, bit rate etc) will lead to the same score regardless
of how easy the clip is to
encode.
The task for the thesis is to use the information available in
multimedia clients, to predict
the content type, codec and other parameters that could help to
enhance parametric models.
ITU-T (International Telecommunication Union) are creating
standards for infocommu-
nications. Currently they are evolving new standards in QoS
(Quality of service) and QoE
(Quality of experience). One of the questions is handling the
development of parametric
models aimed for quality measurement purposes. Specifically
questions regarding models
discarding the use of the DPI (deep packet inspection) method are
worked on. DPI models
can be used to separate contents, however because of limitations
regarding areas like security
and technology the DPI method may not be applicable and thus the
need for alternative
models are high in the industry and at Ericsson.
1.1 Ericsson
Ericsson is one of the largest companies in Sweden. The company was
founded in 1876
by Lars Magnus Ericsson and is now headquarted in Kista, Stockholm.
It is a provider of
1
telecommunication and data communication systems, but also provides
services for a wide
range of technologies, specifically mobile networks. Ericsson is
currently one of the world’s
largest mobile telecommunications equipment vendor. More than 40%
of the worlds mobile
traffic passes through its networks and customers exists in more
than 180 countries. The
company has the vision of ”To be the Prime Driver in an
all-communicating world.” [1]
Chapter 2
Problem Description
2.1 Problem Statement
The aim of the thesis work is to find a mathematical way to
classify video parameters from
multimedia streams (video streams). Specifically find a
mathematical model which should
be able to order a video stream in levels of its coding complexity
based on the parameters.
The focus is on video streams coded in H.264 with a bit rate of 300
kb/s. The transport
protocol used will be RTP and the resolution QVGA. The thesis work
was extended to
include investigation of a wide range of bitrates.
2.2 Goals
The goal is to create and describe a mathematical model that
complies to and can be used
in the ITU-T study group 12 (q.14) standards P.NAMS and only access
basic protocol
information in the data packets of video streams to classify video
parameters.
2.3 Purposes
The purpose is to reduce the error that comes from uncertainty in
correct classification of
coding complexities of the video streams in existing parametric QoS
models.
2.4 Methods
The work starts with a literature study and search of potential
mathematical methods to
use. To test the mathematical model analysis of video streams is
needed. The creation of
video streams is done using encoding software. Matlab is used to
build the mathematical
models, simulate and analyze the results.
3
2.4.1 Planning
The work schedule were planned as follows:
1. Write specification: Specify the object, limitations and time
planning of the thesis
work.
2. Literature study: Get knowledge about the background to the
problem. Search for
existing work in the area. Look for potential mathematical methods
and possibilities
to use in model building.
3. Model decision: Determine which mathematical methods to
use.
4. Data creation: Find media stream content to use in model testing
and format it to
data (packet streams).
5. Model building: Creation of the different model
algorithms.
6. Model simulation: Try models on the data (packet streams and
PEVQ scores).
7. Data analysis: Analyze the results of the model
simulation.
8. Model analysis: Evaluate if changes to models can be made. If so
start over at 5.
9. Report writing: If results are sufficient, write report.
Besides this two presentations were given to informed coworkers at
Ericsson during the
thesis work.
Table 2.1 shows the tasks planned during the work.
2.4.3 Tools utilized
Here follows a list of the tools that were used during the
work.
– Model building and simulation: MatLab 7.10.0 (R2010a)
– Video streams: Contents on Ericsson server
– Simulation of packet streams: Oqtopus (Ericsson proprietary
encoding script system)
• X.264 encoder version r1867
– Document editor and work planner: Emacs with Org-mode
– Presentation: MS PowerPoint
Table 2.1: Tasks performed
2.5 Related Work
Ericsson article, published November 2011, described as: ”... a
tutorial overview of current
approaches for monitoring the quality perceived by users of
IP-based audiovisual media
services.” [2]
Chapter 3
New media services and networks experiences time varying
performances. Monitoring sys-
tems are used to measure how the users experience the quality of
the services. For video
streams the performance is often assessed in terms of quality of
service (QoS), which for
example includes lost, dropped and resent data packet information.
As a result of new
video standards, greater quality degradations is accepted. To know
if the end user receives
promised quality, it is required to assess the perceived quality
from the user, referred to as
quality of experience (QoE). The most correct way to do this
assessment is with perception
tests with human subjects. These tests demands large resources and
cannot be done in
a continuously manner to monitor the quality of a running service.
Because of this other
methods have been developed that use quality models that map QoS
performance indicators
to user perceived quality obtained from the perceptual quality
tests. [2] These techniques
fall under the ITU-T Recommendations (standards) for objective
video quality assessment.
When building models following the ITU-T Recommendations study
group 12 Question
14 there exists restrictions considering the usage of data packet
information (P.NAMS).
When analyzing the video streams for determination of video
parameters only packet statis-
tics can be used as input to the models. Video streams can be
viewed as time series of
frame sizes (figure 3.1). Possible mathematical methods for
analyzing the packet statistics
include time series and regression analysis. Time series analysis
focus on patterns in the
data streams while regression analysis uses statistic parameters of
the data streams.
3.1 ITU
ITU (International Telecommunication Union) has been creating
standards in infocommu-
nications since 1865. It became United Nations specialized agency
in 1947 and its standards
are used worldwide. ITU Telecommunication Standardization Sector
(ITU-T) produces the
standards called ITU-T Recommendations. The Recommendations become
mandatory only
when adopted as part of a national law. These Recommendations
define how telecommuni-
cation networks operate and interwork. ”Over 3000 Recommendations
are in worldwide use
7
8 Chapter 3. Background
Figure 3.1: Video stream as frame sizes versus time (measured in
frame number)
for various topics ranging from network architecture and security
to transmission systems
and next-generation networks.” [3]
3.1.1 ITU-T Recommendations
ITU-T has a study group for evolving new standards in QoS (Quality
of service) and QoE
(Quality of experience) (study group 12). This study group is
assigned a period from 2009-
2012 to determine new standards in this field. [4]
Question 14 (Q.14/12) handles the development of parametric models
for media qual-
ity measurement purposes (Development of parametric models and
tools for audiovisual
and multimedia quality measurement purposes). These measurement
models are used to
estimate the user experience as written by ITU-T: ”Measures that
predict user-experience
are useful in monitoring and managing time-varying performance and
help to facilitate the
rollout, efficient operation and effective service management of
such networks.”. [5]
To this date the most researched models uses the DPI (deep packet
inspection) which
extract information by going into packets and analyze packet data.
However this requires
access to the information in the data packets. For packetized
systems this may not be
possible since the information in the RTP packets are often
encrypted (SRTP [6]). Then only
the content receiver and RTP provider may have the possibility to
decrypt the information.
So far the ITU-T recommendations do not cover these systems, even
though the need
in the industry is high. In ITU-T study group 12 question 14
(Q.14/12) the work on such
model standards started. Two types of areas were specified with the
names P.NAMS and
P.NBAMS. In P.NAMS the models will only have access to basic
protocol information. In
P.NBAMS the model area is extended to also cover access and
analysis of information from
the bit stream. Since the bit stream is available there is no need
for the results in this report
when using models following P.NBAMS.
3.2. RTP 9
3.2 RTP
When sending data over networks specific transport protocols are
used. One of the most
common are the real time transport protocol (RTP). This protocol is
used in applications
for transmitting audio, video or simulation data over multicast or
unicast networks (trans-
mitting data with real-time properties). RTP uses the RTP control
protocol (RTCP) to
monitor quality of service and does not by itself provide any
quality of service guarantees.
RTP is usually run on top of another network protocol (typically
UDP). Both protocols
contribute to the transport protocol functionality. [7]
Information over networks are delivered in data packets.
Limitations in network per-
formance affect how fast and reliable packets can travel from
sender to receiver. Packets
can be of various sizes and upper limits exists. Besides containing
content information the
packets also have headers which incorporates important parameters.
These parameters are
used for delivery and specification of the information in the
packets. Parameters could for
example be IP addresses, content type parameters, compression
formats, encryption etc.
3.2.1 RTP header
When sending data packets over a network protocol the packets are
encapsulated with
new (extra) header information. The RTP protocol uses a number of
header information
parameters [7]. These are: version, padding, extension, CSRC count,
marker, payload type,
sequence number, timestamp, SSRC, CSRC. The sequence numbers on the
packets give
the receiver the possibility to reconstruct the senders packet
sequence and in the video
decoding determine the proper location of packets. This excludes
the necessity to decode
packets in correct sequence. It can also be used to detect packet
loss. The marker can
be used to signal significant events in the packet stream, for
example frame boundaries.
The timestamp determines the sampling instant of the first octet in
the RTP data packet.
The clock increments monotonically and linearly and its format is
specified statically in the
profile or payload format. The frequency is dependent of the data
carried in the payload.
Figure 3.2 shows a schematic view of the RTP header.
3.3 Video streams
Pictures (frames in video streams) are built up from large amounts
of data bits and the
information may exceed the capacity of one data packet. Thus frame
information may be
spread over multiple data packets. The timestamp (see 3.2) is based
on the playback time
and packets belonging to the same video frame thus also have the
same timestamp. Video
streams are pictures (frames) with moving content sent in a rapid
sequence. They put
greater stress on the network than normal pictures, requiring
numerous data packets to be
sent in short intervals. As is the case with pictures, different
compression techniques exists.
Some examples of video standards are H.261, H.263 MPEG-2, MPEG-4
and H.264/MPEG-4
10 Chapter 3. Background
0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3.2: RTP header scheme [7]
AVC. In recent years the most common technique used in streaming
high quality video is
H.264.
3.3.1 Compression
Compression is used when sending or streaming data to reduce the
data size (i.e. coding of
images and video). There are two basic types of compression,
lossless or lossy compression.
[8]
1. Lossless compression: All information is kept and only knowledge
about the source is
needed. Size is reduced by increasing the information contained in
every data bit sent.
Various techniques exists but limits exists depending on the
entropy of the source.
2. Lossy compression: Gives better compression but information is
lost and knowledge
about both the receiver and source is required. Size reduction is
received mainly from
removing and distorting details and not requiring exact
reconstruction in receiver.
3.3.2 Image coding
Images are built up from pixels which normally have three color
values (RGB) dictating
which color it represents. For compression its better to convert
these values to another
color space (YUV) and thus reduce the correlation. In color space
the pixels are represented
by a luminance component (brightness) and two color values.
Most coding standards use Discrete Cosine Transform (DCT). When
doing DCT a
Fourier transform is used to turn the pixel values into frequency
space and represent them
with DCT coefficients. DCT is a lossy compression technique.
3.3. Video streams 11
3.3.3 Video coding
In video coding the pictures are commonly divided into macro
blocks. These are usually
in the size of 16x16 pixels. There are two basic types of coding
these macro blocks, intra
coding and inter coding. [9]
1. Intra coding(I): The macro block is coded like a still image
block. With similar
techniques, like DCT transformations. Figure 3.3 shows an intra
coding scheme.
2. Inter coding(P): Similar macro blocks between current and
previous images are searched
for. The macro block is then coded by a motion vector and a
difference block. This
requires less data than Intra coding. Figure 3.4 visualize the
motion vector.
Figure 3.3: Intra coding
The frames in the videocoding is arranged to different categories
depending on how their
macroblocks are coded.
2. P-pictures: Macroblocks can be P-coded or skipped.
First picture in all videostreams have to be an I-picture.
Different standards have various
schemes for use of I- and P-pictures.
Value of motion dictates grade of compression, with less motion
giving better compression
and less data streams.
3.3.4 H.264
H.264, also called MPEG-4 part 10, became an international standard
in 2003 and has
twice as good compression as previous H.263 and MPEG-4 standards.
It was developed by
12 Chapter 3. Background
Figure 3.4: Inter coding
the Joint Video Team (JVT) which was a collaboration between the
ITU-T Video Coding
Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group
(MPEG). Its used
by many internet streaming resources and softwares but also with
cable/satellite televisions
services, Blue-ray players, real-time videoconferencing and
more.
H.264 is more complex (up to five times) than previous videos
standards but also gives
higher reduction in bitrates (20-50% compared to H.263) and works
both with low and high
bitrates. [9] Most important changes from previous standards
are:
1. Usage of parameter sets instead of picture headers
2. Intra coding now results in 4x4 difference blocks with many
different directional modes
3. Inter coding also results in 4x4 blocks with many previous
pictures used as references
with more motion vectors and lower resolution
4. Integer transform instead of DCT transform
5. Prediction is used in coefficient coding
6. Loopfilter used
3.4 Quality assessment
When doing quality assessments for video streams a measurement
algorithm gives quality
scores, MOS (Mean Opinion Scores), depending on the picture
quality.
MOS scores comes from the Mean Opinion Score test which have been
used for many
years to obtain a user’s view on general quality. The MOS score is
represented by a number
3.5. PEVQ values 13
from 1 to 5, with 1 being bad (lowest perceived quality) and 5
being excellent (highest
perceived quality). [10]
Subjective tests with real life persons is the most correct and
common way to mea-
sure MOS scores. Another way to do measurements is by full
reference models which use
parametric models to make comparisons between the original video
and the coded video.
Yet another way is to use a parametric model which only use the
streamed (coded) video.
Both methods need to learn from human users (subjective tests)
which parameters affect
experienced quality. The MOS values can then be estimated from
these parameters.
Video quality in video streams are highly correlated to the bit
rate. High bit rate makes
it possible for the encoder to use more data for each frame and
thus also less compression of
the picture. Figure 3.5 shows a correlation of bit rate and video
quality measured in PEVQ
values (see 3.5) for three video clips containing different
contents.
Figure 3.5: Quality (PEVQ) vs bit rate for three different
contents
For quality tests with humans it has been found that human memory
affect the quality
results and thus content length has to be considered when
undertaking tests. It has been
proposed that test lengths no longer than twenty seconds are to be
used. [12] As a result
preferred length of clips used in Ericsson models are ten
seconds.
3.5 PEVQ values
A PEVQ (Perceptual Evaluation of Video Quality) value is a type of
MOS score, MOS-LQO
(Listening Quality Objective), which is developed by OPTICOM and
has become part of
14 Chapter 3. Background
ITU-T Recommendation J.247 (2008). The MOS-LQO scale goes from 1
(worst) to 4.5
(best). [11] The actual PEVQ value can however go below 1 and even
reach negative values
because of limitations in the algorithm used.
PEVQ gives MOS estimates of video quality degradation by comparing
the undistorted
reference video signal with the streamed video signal. Its a full
reference, intrusive measure-
ment algorithm. The approach of PEVQ is to model the human visual
system and quantify
the anomaly perceived in the video signal by a number of key
performance indicators (KPIs).
This estimation includes both the packet level impairments (loss,
jitter) and signal related
impairments (blockiness, jerkiness, blur etc.) caused by coding of
the video. [13] Figure 3.6
shows an overview of the structure of the PEVQ algorithm.
Figure 3.6: Basic structure of PEVQ measure algorithm [14]
3.6 Timeseries analysis
A time series is a sequence of values or measurements over time.
Successive values are
assumed to be taken at equally spaced time intervals.
Time series analysis builds on identifying the pattern in the
observations and use it to
describe the behavior of the sequence or forecast future
values.
Similarly to other statistic analyses the data is assumed to
consist of systematic patterns
and random noise. To extract the pattern filtering various
techniques to filter out the noise
exists. [15]
3.6.1 General analysis
Most patterns in time series can be explained by two basic
components, trends and season-
ality.
Trend is a component that change over time and can be both linear
or non linear. It
doesn’t repeat itself over time.
Seasonality is a component that also changes over time but unlike
the trend it repeat
itself in systematic intervals over time.
For trend analyses there exists no automatic techniques proven to
work. The first step
in most techniques is to remove the error component by smoothing
the series. One of
the most common techniques is the use of moving averages. This
works by taking the
average of surrounding values. Either median or mean can be used as
an average, where the
median method is more stable to outliers. Next step in trend
analysis is fitting a function
to the series, commonly a linear function is used. For this to work
the series may need a
transformation to remove nonlinearity, with a logarithmic or
polynomial function.
Seasonality analysis is built to find the correlation between
values in the series. The
period between the repeat of a pattern in the series is called the
lag. The correlation
dependency between two terms can be measured by autocorrelation.
[15]
3.6.2 ARIMA
Used for generating forecasts in time series analysis. The basic
methodology behind it is
estimation of sets of coefficients that can describe consecutive
elements of the time series
based on earlier time lagged elements.
The method is complex and comes with the condition that it requires
stationarity of the
time series. To use the model one needs to make the series
stationary and remove serial
dependency (seasonal). This means differencing the series until
stationarity is achieved. For
good results the user needs to examine plots and autocorrelograms
to find a suitable level
of differencing.
3.6.3 Timeseries matching
This method considers the geometric properties of time series. The
basic concept is to
measure the distance (a value of used to describe similarity)
between the geometric series.
This is preferably done by transforming the series into a suitable
basis and compare the
distance between them with various methods. Examples of basis used
are Fourier transform
(DFT), wavelet transform (DWT), principal components (PCA),
piecewise quantized, sym-
bolic and vector quantization (VQ). The most simple measurement of
distance is done using
Euclidean distance. It does measurements only between fixed time
positions in both series.
Other forms of measurement includes dynamic time warping (DTW) and
longest common
subsequence (LCSS). Both of these methods does measurements with
variable time. [16]
16 Chapter 3. Background
Piecewise constant and symbolic basis
A piecewise quantized basis starts with the series being divided
into k segments. All points
inside a segment are represented by their mean value.
This quantization can be done in MatLab through the functions
reshape [21] and mean
[22]. A code example is shown below.
K_{segments} = reshape(DataFrames, SegmentLength,
NumberCoefficients);
K_{means} = mean(K_{segments});
SegmentLength: Length of segments
NumberCoefficients: Number of symbolic segments
These mean values can also be quantized in levels and exchanged by
the symbolic coun-
terpart representing the level (symbolic quantization).
Euclidean distance
This is the most widely used distance measurement method. The
definition is expressed in
equation (3.1).
3.6.4 Frequent pattern discovery
Specific patterns of variable length can have higher chances of
occuring in series depending
on their parameters. Occurance of patterns can be measured against
database patterns.
Thus classification can be done through similarity
comparisons.
3.7 Regression analysis
Regression is the statistical method of trying to find a
relationship between a number of
variables that predicts an outcome [17]. Regression can use one or
many (multiple regres-
sion) independent variables (predictor variables) to predict a
dependent variable (response
variable), which is the outcome.
Linear regression (LR) is the mathematical relationship of the
straight line that best
approximates the individual data points of the independent
variable. Other forms of re-
lationship between the predictor and response variable can be
found, e.g. quadratic or
logarithmic.
The general form of an ordinary linear regression is shown in
equation (3.2).
3.7. Regression analysis 17
Y = a + b×X + u (3.2)
Where Y is the dependent variable, a is the intercept, b is the
slope, x is the independent
variable and u is the residual.
The slope b is the parameter that indicate how much a change in the
predictor variable
affect the response variable.
The accuracy of the prediction is a result of the models fit. This
fit depends on how well
the linear relationship corresponds to the data points. To know how
well the fit is a number
of values can be calculated that explains the models fit.
The residuals are an important marker of how good the regression
model is. A residual
is the difference between the model prediction and the actual
prediction at that value of the
independent variable. Residual plots can explain patterns or
misbehavior in models.
R-value is another important value. It says how well the total fit
is between the model
and the independent variables. It sums up the residuals.
Through a statistic f-test the hypothesis that the model explains
the relationship (effect
different from zero) can be evaluated. The outcome of the test is a
f-value that corresponds
to a p-value. Depending on if the p-value is below or above the
statistical significance
requested, the hypothesis will either be dropped or kept.
There are various forms of regression, two of them are multiple
linear regression and
logistic regression.
3.7.1 Multiple linear regression
As with ordinary LR multiple linear regression predicts an
independent variable from in-
dependent variables. In this case there can be multiple numbers of
independent variables.
In MLR different relationships between the independent variables
can create new predictor
variables in the model.
Multiple linear regression takes a similar form as ordinary linear
regression (3.3).
Y = a + b1 ×X1 + b2 ×X2 + B3 ×X3 + ... + Bt ×Xt + u (3.3)
Where the numbers indicate the number of the independent variable
and its correspond-
ing parameter.
The same statistical parameters can be found in multiple LR as in
ordinary LR. In the
same way but with t test statistics can be carried out on all the b
values. If the corresponding
p-value is outside of the searched significance the b value can be
dropped.
Multiple linear regression can be done in MatLab through the
regstat function. [19]
3.7.2 Logistic regression
Logistic regression is another type of regression which builds upon
the theory of generalized
linear models.
18 Chapter 3. Background
The difference from other linear regressions is that the response
variable can be discrete.
Thus it can only attain certain values and the outcome of the
logistic regression model
is a prediction of which value is the best fit. The discrete values
can also be looked on
as categories. For logistic regression there is no equivalent to
R-square. Models can be
compared with another measurement called the deviance of the fit,
which is the difference
between the log-likelihood of the fitted model and the maximum
possible log-likelihood.
Logistic regression can be done in MatLab through the mnrfit
function. [20]
3.8 Cross-validation
Repeated random sub-sampling validation randomly splits data into a
validation and a
training set. The model is fitted with the training data and
assessed with the validation
data. [18] Cross validation is done to estimate how accurate a
predictive method can perform
in practice and guarding against type three errors. These errors
can occur if the same data
is used to both model and test the hypotheses.
Chapter 4
4.1.1 Overview
Through the header timestamps of packets in media streams the
frames can be assembled
into a sequence consisting of the frame sizes and their time
orders. These frame sequences
consists of two different series. Every even or odd frame follows
its own pattern, often the
case being that either the odd are really low in size while the
even being big or the reversed.
A way to handle this behavior is to put each pair of frames
together into a new frame
consisting of both their sizes. These added frames thus create a
new frame sequence which
is subsequently analyzed regarding the frame sizes (FS).
Also used in the analysis is the difference between two consecutive
frames, which are
put in a new sequence consisting of the frame difference sizes and
their corresponding time
orders (FDS).
The sequences are analyzed and conversed into predicted MOS scores
(MOS) by the
combination of three methods, frequent pattern analysis and two
types of regression analysis.
4.1.2 Frequent pattern discovery (FPD)
Frequent pattern discovery method is estimating MOS scores to
sequences by assigning them
to categories. This categorization is done by searching for
frequently occurring patterns in
each sequence and comparing this with the patterns belonging to
each category.
Each category consists of an upper and lower PEVQ limit, a PEVQ
mean and a number of
common patterns. These patterns are taken from frame sequences
which have PEVQ scores
inside the PEVQ interval of the category. The patterns consists of
symbols, indicating an
interval of values
In a categorization of a frame sequence the PEVQ mean of the
category, with the highest
similarity between the patterns in the sequence and category, is
assigned to the sequence as
19
a prediction of its MOS value.
For better prediction different numbers off lengths on patterns can
be used. Also both
the FS and FDS sequences can be used.
The method therefore consists of two parts, creation of the
database patterns and ana-
lyzing of a frame sequence.
Creation of database
Input to this method for creation of database patterns are the
frame sequences (both ordi-
nary frame size sequences and frame difference size sequences), the
number of categories to
be used, which symbol levels to use, how big the segment lengths
are going to be andthe
number of symbols in the patterns (length of patterns).
The first step of this method is to decide the category PEVQ
intervals through the frame
PEVQ scores of the frame sequences given as input to the method.
This is done by evenly
distributing the sequences over the number of categories to be
used. The frame sequences
are then put in their respective category according to their PEVQ
value.
All frame sequences are reduced in dimensionality by breaking the
sequence in certain
segments, averaging their value over these segments and labeling
them with a symbol. La-
beling is done by comparing values to a symbol database with upper
and lower limits for
each symbol. If the value is inside the symbol range, that symbol
is used. The compressed
sequences are then arranged in patterns, which can be of various
lengths. For example:
ABA, AABC, ABCDE.
The next step is to compare and count patterns in the frame
sequences belonging to
each category. Only the most common patterns in each category are
kept. A pattern can
only be used in one category, so all patterns in a category must be
unique. The final step
is to put all categories to equal length (equal number of
patterns).
Output of the method is the categories and their corresponding
patterns. These patterns
are for both FS and FDS sequences and can be of various
lengths.
An example of a pattern output is:
Category with 3 symbols FS patterns =
’LHF’ ’HFF’ ’FFE’ ’FEE’ ’EEE’ ’EED’ ’EDB’ ...
Figure 4.1 shows how a frame sequence gets compressed and arranged
in specific symbol
levels.
FPD analysis of a frame sequence
Inputs to the method are the pattern categories, symbol levels and
segment lengths used by
the database and the frame sequence to be analyzed.
4.1. Algorithm description 21
Figure 4.1: Compressed frame sequence
The method counts the number of similar patterns in each category.
This is done for all
the pattern types and lengths in the database. The one with the
most hits is chosen as the
category estimate for the specific pattern type and length.
Output from the method is a category estimate for each of the
different types of patterns
and pattern lengths used.
Two regression methods are used, logistic regression and multiple
linear regressions. Both
build upon creating statistical models depending on a number of
statistical parameters
extracted from the frame sequences.
Both methods uses statistics extracted from the frame sequences by
the statistic extrac-
tion function.
Statistic extraction function (ST)
This function takes a frame sequence and extracts various statistic
parameters from it.
Input is the frame sequence to extract statistics from. This can
either be an FS or FDS
sequence.
1. Mean of sequence (M)
2. Variance of sequence (V)
22 Chapter 4. Results
4. Modes of sequence (MO)
5. Median of sequence (ME)
6. Longest calm period of sequence (LC). Defined as the longest
length of period where
the difference of two subsequent frames is smaller than 0.1 times
the mean of the
sequence.
7. Longest active period of sequence (LA). Defined as the longest
length of period where
the difference of two subsequent frames is greater than 0.0025
times the mean of the
sequence.
8. Longest small period of sequence (LS). Defined as the longest
length of period where
frames are smaller than 0.9 times the mean of the sequence.
9. Longest large period of sequence (LL). Defined as the longest
length of period where
frames are greater than the mean of the sequence.
10. Number of bursts in sequence (NB). Defined as the number of
times where the differ-
ence of two subsequent frames is greater than 0.045 times the mean
of the sequence.
11. Number of passes through median of sequence (NP). Defined as
the number of times
the sequence goes from greater to smaller than the mean of the
sequence or the re-
versed.
Linear regression method (LR)
This method uses multiple linear regression analysis to get a
statistical method for MOS
estimation of a frame sequence. A dataset of frame sequences, their
corresponding statistical
parameters and their PEVQ scores are used to create a regression
model were the statistical
parameters are the predictor variables (independent variables) and
the PEVQ score are
response variables (dependent variables).
Input to the method is statistics from the frame sequences
extracted in ST. Example
of statistical parameters to use in a good LR regression model are
shown in table 4.1 with
corresponding regression statistics. The adjusted R squared value
for this LR regression is
0.3114.
Output of the method is a statistical model, which takes the
statistical parameters as
input and gives a MOS estimate as output.
Example of output from the LR regression is shown in equation
(4.1).
MOS =
1.7647+0.0003×ME−0.0035×LC+0.0022×LA−0.0017×LS+0.0082×LL+0.0109×NP
(4.1)
4.1. Algorithm description 23
LR regression Parameter T-statistic P-value Standard deviation of
frame difference sequence 2.7514 0.0063 Median of frame sequence
3.6444 0.0003 Longest calm period of frame sequences -2.4016 0.0170
Longest active period of frame sequences 2.4515 0.0148 Longest
small period of frame difference sequences -3.1349 0.0019 Longest
large period of frame sequences 4.3367 0.0000 Number of passes
through median of frame sequences 3.5316 0.0005 Total model
(F-statistic) 18.477 2.872e-20
Table 4.1: LR regression params
Logistic regression (LOR)
The logistic regression method uses a multinomial logistic
regression to categorize frame
sequences and then deciding a MOS estimate in a similar way to the
FPD method. The
method uses an ordinal model to fit and use no interactions between
categories.
Like in linear regression a number of statistical parameters are
used as input to the
regression (independent variables). Also used as input is the same
categories used in the
creation of database patterns, these are the dependent variables in
the regression.
Example of statistical parameters to use in a good LOR regression
model are shown in
table 4.2 with corresponding regression statistics. The deviance
for this LOR regression is
846,79.
LOR regression Parameter T-statistic P-value Mean of frame
difference sequences -0.0347 0.9723 Variance of frame sequences
-2.1505 0.0315 Standard deviation of frame sequences -3.5486 0.0004
Median of frame sequences 2.7104 0.0067 Longest calm period of
frame sequences -2.3708 0.0177 Longest active period of frame
sequences 1.6882 0.0914 Longest large period of frame sequences
-4.4320 0.0000 Number of passes through median of frame sequences
-4.1254 0.0000
Table 4.2: LOR regression params
Output is a statistical method. This method takes statistical
parameters as inputs and
calculate a category intercept value (CIV) which is compared to the
intercept values of
the categories (which are also given by the method) to estimate a
category for the frame
sequence. Number of intercept values depends on number of
categories used. In the case of
five categories, four intercept values are provided each
corresponding to the upper level of
the real MOS category.
Example of output method from the LOR regression is shown in
equation (4.1.3).
24 Chapter 4. Results
Intercept values = [1.7580, 2.9797, 4.0756, 5.3036]
Mean PEVQ of categories = [2.25431, 2.76333, 3.07321, 3.32914,
3.67939]
4.1.4 Combining the methods
There are various ways of combining the methods. The combinations
all build upon taking
a mean value of their estimations (categories or PEVQ
values).
One way is to combine the frequent pattern discovery and logistic
regression into one
category estimation. This category estimation then gives a MOS
value through the category
PEVQ mean value. The mean of this MOS estimate and the linear
regressions MOS
estimates is then used as a prediction of the MOS score of the
frame sequence in question.
4.1.5 Algorithm stepwise
Here follows a flow chart description of the steps in the
algorithm.
Setup for the algorithm:
1. Build category database consisting of patterns from frame
sequence content
2. Create both linear and logistic regression models from frame
sequence content
Analyzing of an incoming frame sequence:
1. Extract patterns from the frame sequence
2. Extract statistical parameters from frame sequence
3. Run FPD on the patterns to get category estimations
4. Put statistical parameters in logistic regression to get a
category estimation
5. Put statistical parameters in linear regression to get a MOS
estimation
6. Combine the category estimations from each method to get a more
correct estimation
7. Use the MOS mean from the category estimations as a MOS
estimate
8. Combine the MOS estimations from 5. and 7. to achieve the best
MOS prediction
Figure shows the flow of the algorithm.
4.1. Algorithm description 25
Figure 4.2: Flow of algorithm
4.1.6 Algorithm for all methods
Assembling of the methods can be done in many different ways. The
following list gives some
examples of configurations. All forms use three different lengths
on symbols for patterns
(3,4,5). Example steps of merging (algorithm version 1):
1. Mean PEVQ of every pattern categorization using only FS
series.
2. Mean of 1. and LOR MOS value.
3. Mean of 2. and LR MOS value.
Version 2 of algorithm is to skip step two and take the mean of
frame and log directly.
Version 3 is to add the usage of categorizations from FDS series in
step 1.
4.1.7 Algorithm with only regressions
Another way is to only combine the regression methods into an
algorithm.
MOS = mean(LR MOS + LOR MOS)
4.1.8 Algorithm with modeled regression
Since the statistical parameters in the regression methods requires
to be calculated for
every bit rate, either new regressions need to be carried out for
every new bit rate sequence
26 Chapter 4. Results
considered or a huge database has to be built. Instead of doing
this the statistical parameters
used in the regression can be modeled.
The equations (4.3) (4.4) (4.5) (4.6) (4.7) (4.8) are used to
parameterize the regression
parameters. Input to the equations is the bit rate in
question.
C1 ∗XC2 (4.3)
C1 ∗XC2 + C3 ∗XC4 + C5 ∗X (4.8)
Table 4.3 show the equation type and the corresponding coefficients
that was used for
parameterize the LR regression parameters. Table 4.4 shows the same
for the LOR regression
using five categories.
LR modeled Parameter Equation NR Model coefficients LR regression
parameter (4.6) -0.0044,1.17,0.018 Standard deviation of frame diff
sequences (4.6) 4.64,-1.63,2.88e-10 Median of frame sequence (4.4)
2.31e-04,-1.78e-07 Longest calm period of frame sequences (4.6)
-0.096,-0.64,6.83e-07 Longest small period of frame difference
sequences (4.6) -0.22,-0.60,1.34e-07 Longest large period of frame
sequences (4.3) 1.71,-0.90 Number of passes through median of frame
sequences (4.3) 0.41,-0.68
Table 4.3: LR modeled
Observe that the parameter ”Longest active period of frame
sequences” have been re-
moved. This is a result of the parameter showing bad behavior with
increased bit rate and
thus the difficulty of finding a suitable model.
For figures showing the correct and modeled values of the
regression look in Appendix
A.
Error reduction
To test if implementing the new algorithm into the existing model
would give a better
prediction of the media streams coding complexity, a comparison of
the errors in MOS
4.2. Algorithm numerical results 27
LOR modeled Parameter Equation Nr Model coefficients LOR category
intercept1 (4.8) 35.25,-3.295,-0.0063,1.17,0.021 LOR category
intercept2 (4.8) -0.023,1.10,0.32,-8.64,0.049 LOR category
intercept3 (4.8) 8.57e-11,3.39,-0.91,1.01,0.98 LOR category
intercept4 (4.8) 3.06e-11,3.51,0.63,0.98,-0.56 Mean of frame
difference sequences (4.3) -4.08e+02,-1.99 Variance of frame
sequences (4.3) -0.33,-2.26 Standard deviation of frame sequences
(4.5) 45.45,-1.69,6.20e-04 Median of frame sequences (4.5)
0.0015,0.16,-0.0047 Longest calm period of frame sequences (4.7)
-11.92,0.15,11.93,0.15 Longest small period of frame difference
sequences (4.4) 0.017,-5.39e-06 Longest large period of frame
sequences (4.5) -1.18,-0.014,1.053 Number of passes through median
of frame sequences (4.4) -0.034,1.0057e-05
Table 4.4: LOR modeled
prediction between the existing and new model were done.
The existing model used the mean of the PEVQ values of all the
video streams at a
particular bit rate as the MOS estimate for a test sequence. The
error for the model could
thus be calculated as:
Error new model (EN) = MOS - PEVQ)
One way to show the error improvement is to calculate how big the
error reduction was
of the total error:
EE (4.9)
(Negative EPR would instead mean an increase in error.)
With only around 300 video clips available for testing, a
cross-validation technique were
carried out to get more statistic significance of the results. This
was done by randomly
selecting ten percent of the clips as test objects while the rest
were used as training material
for the algorithm.
RMS Error
Summarises the overall error between the estiamated MOS and the
PEVQ value. Since the
differences can be either positive or negative the RMSE gives a
good measurement of how
far the error is from zero on average. [23] The equation to
calculate RMSE is shown in
28 Chapter 4. Results
(4.10)
Where d is the differences and n is the number of
differences.
Error residuals
Another way to show how good the algorithms are capable of
predicting MOS score is to
calculate the residuals between correct PEVQ scores and estimated
MOS scores.
Error residual = PEVQ - MOS (4.11)
4.2.2 Results in numbers
Results in error reduction (using (4.9)) with implemention of
various algorithms are shown
in table 4.5 and figure 4.3.
Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s LR
15,51% 16,70% 12,80% 17,56% 12,52% 11,83%
LOR 14,94% 12,70% 14,60% 17,32% 15,54% 14,14% FPD 1,76% 3,11% 3,41%
9,35% 10,35% 5,92%
Combined methods V1 18,31% 16,84% 18,04% 20,02% 18,52% 12,92%
Combined methods V2 18,47% 16,18% 18,21% 20,15% 19,50% 13,41%
Combined methods V3 18,31% 16,35% 17,72% 19,92% 19,06% 13,40%
Regression combination 18,71% 17,61% 15,87% 19,02% 15,73%
14,50%
Table 4.5: Error reduction from methods versus bitrates
Error residuals (using (4.11)) for regression methods are shown in
figures 4.4 4.5 4.6 4.7
(with simulations using 10% of the data as validation clips and 10
cross-sampling runs).
A comparison of the spread of the error residuals (difference
between correct and esti-
mated PEVQ values) are shown in figure 4.8 for both the existing
model and the new model
using algorithm V2.
Table 4.6 show error reduction for algorithms with combined
methods, modeled regres-
sions and high bitrates.
Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s
600kb/s 800kb/s 1500kb/s Mean* All methods V1 20,6% 17,9% 19,8%
20,5% 16,2% 15,0% 19,4% 18,4% 11,8% 18,5%
Reg combination 20,9% 17,5% 15,6% 15,1% 17,4% 13,4% 17,9% 14,5%
11,3% 16,8% LR modeled 17,12% 14,48% 7,19% 5,53% 9,08% 10,36% 3,11%
-70,32% -700,4% 9,6%
LOR modeled 13,5% 15,69% 13,72% 11,22% 12,6% 9,7% 14,8% 12,4%
-70,61% 13,0% Regressions modeled 23,2% 20,1% 18,9% 16,9% 18,7%
15,8% 12,2% -12,2% -399,8% 18,0%
* Disregarded extreme bitrates of 800kb/s and 1500kb/s since
resulting in negative values for certain methods.
Table 4.6: Error reduction with algorithms versus bitrates
4.2. Algorithm numerical results 29
Figure 4.3: Error reduction versus bitrates for selected
algorithms
Table 4.7 show RMSE for algorithms with combined methods, modeled
regressions and
high bit rates.
In figure 4.9 the difference between the RMSE for the reference
model and the modeled
regression model are shown for various bitrates.
30 Chapter 4. Results
Figure 4.4: Residuals of old regression model
Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s
600kb/s 800kb/s 1500kb/s Mean* LR 0.5938 0.5412 0.4734 0.4337
0.3985 0.3545 0.2858 0.2233 0.1137 0.4130
LOR 0.6106 0.5302 0.4970 0.4403 0.4141 0.3729 0.2998 0.2299 0.1139
0.4243 Reg combined 0.5809 0.5185 0.4710 0.4242 0.3976 0.3566
0.2842 0.2197 0.1107 0.4066
All methods V1 0.5752 0.5116 0.4617 0.4030 0.3990 0.3504 0.2753
0.2158 0.1130 0.3990 LR modeled 0.6022 0.5362 0.5028 0.4681 0.4208
0.3627 0.3367 0.4062 0.8522 0.4545
LOR modeled 0.6693 0.5835 0.5125 0.4648 0.4243 0.3781 0.3004 0.2273
0.2124 0.4450 Regressions modeled 0.5856 0.5310 0.4774 0.4340
0.3912 0.3490 0.3043 0.2810 0.5205 0.4192
Reference model 0.6777 0.6169 0.5499 0.4966 0.4670 0.4086 0.3468
0.2620 0.1275 0.4782 * Disregarded extreme bitrates of 800kb/s and
1500kb/s
Table 4.7: RMSE for algorithms versus bitrate
Figure 4.5: Residuals of linear regression model
4.2. Algorithm numerical results 31
Figure 4.6: Residuals of logistik regression model
Figure 4.7: Residuals of combined regression model
32 Chapter 4. Results
Figure 4.8: Residual comparison at 300kb/s bitrate between existing
model and new model using algorithm V2
Figure 4.9: RMSE reference vs RMSE modeled regression for various
bitrates
Chapter 5
5.1 Data analyzed
When examining frame sequences of video streams of different coding
complexities (see
Appendix B) one could observe that in general more changes (jumps)
in frame size is seen
in frame sequences belonging to high quality scores. Smooth curves
are more apparent in
groupings of lower quality scores. However no definite conclusion
could be drawn about a
specific frame sequence since all types of curves exist for all
quality scores. There was no
visual clue that can give a definite answer to where which quality
category a frame sequence
fit into, only various levels of probability. This could be a
result of parameters not shown
in the frame sequence statistics and only shown in the actual
packet data.
5.2 Methods analyzed
A selection of different mathematical methods were looked into and
tested in the search for
a useful algorithm. The methods evaluated were:
– Field of time series analysis
• Trend and cyclical behavior methods
• Time series matching
• Frequent pattern discovery
– Field of regression
• Multiple linear regression
5.2.1 Trends and cyclical behaviour methods
In the time series area the trend and cyclical behavior methods
were ruled out. This was a
result of video streams belonging to different PEVQ groups showing
bad common cyclical
behavior or trend. This is a result of the randomized picked time
intervals in the video
stream. Furthermore its problematic to develop an algorithm that
works automatically
without user input with these methods.
5.2.2 Time series matching
With the time series matching three different types were tried,
Euclidean, DTW and Lower
bounding. One issue were the big dissimilarities between time
series in the same PEVQ
categories, but the biggest problem was with the time and resource
management. To make
match measuerement between time series a large database needs to be
created. More prob-
lematic is that measuring distance between time series takes much
computational power.
Algorithms incorporating for example the DTW algorithm had running
times of minutes
even on a modern stationary computer (Dualcore 2,4 ghz, 4 gb ram).
Thus there would
not be possible to incorporate these methods in the parametric
models used in for example
handheld devices doing media streaming. The hardware would not be
adequate with today’s
technology.
5.2.3 FPD
Frequent pattern discovery (FPD) works without user input and it
can select and match
patterns indifferent of where they occur, which removes some of the
limitations of the other
time series methods. Even if the method requires to create a
pattern database for each
bitrate and use comparisons of patterns in the estimation it takes
much less time than
time series matching methods. The pattern databases are efficient
and small due to the
dimensionality reduction and since it’s only symbol comparisons the
measurements are done
much faster than distance measurements. Running times were in the
range of ten seconds
on the same setup as was used with the time series matching
methods.
The FPD method gives clear improvements in estimating the coding
complexity com-
pared to the old model. Error and RMSE reductions of up to and
above 10% could be
achieved. The results were not as good as each of the individual
regression methods but in
conjunction with those methods it improves the results.
5.2.4 Regression
The regression methods achieved the highest error and RMSE
reduction and are very efficient
in time and data space consumption. Multiple linear regressions
(LR) give a little bit better
MOS prediction of the streams actual PEVQ value but the logistic
regression (LOR) is easier
to use in combination with the FPD method (both returning
categorizations). The combined
5.2. Methods analyzed 35
regression method also reached very good results, around 90% of the
error reduction achieved
when using all methods (Combined methods V1).
5.2.5 Analysis of statistic parameters
Following is a discussion of the statistic parameters for each
regression.
Linear (LR)
1. Median of frame sequence: Higher values indicate higher MOS
scores. Seems to show
better linearity than mean of frame sequence. Median doesn’t
account for big changes
(especially as could happen in beginning or ending of a sequence as
a result of scene
changes) as much as the average does. This makes the median a more
stable parameter
and thus more reliable.
2. Longest calm period of frame sequences: Generally long calm
regions renders in lower
MOS scores. This might be due to that long periods of non-changing
frame sizes
already is using the maximum available number of bits for that
frame. This might
indicate that the encoder needs more bits per frame to encode these
frames with good
quality.
3. Longest active period of frame sequence: Similar to the last one
but reversed now
checking for active region. High values indicates that the encoder
gets to work a lot
but can still use enough bits per frame while maintaining decent
quality.
4. Longest small period of frame difference sequences: Small period
in the frames dif-
ference sequences sizes indicate few changes in the frames i.e.
less activity between
frames. A long period results in easier coding and lower MOS score.
Measures same
thing as longest calm period, but improves results when used in the
model.
5. Longest large period of frame sequences: A long period of large
frame sizes indicate a
higher MOS score. The relationship isn’t as obvious as with other
parameters.
6. Number of passes through median of frame sequences: This
measures when the fol-
lowing frame is very similar to the preceding frame or not. This
indicates when frames
are very similar so that very few bits (for example just after an
I-frame) are needed
to encode the frame.
7. Standard deviation of frame diff sequences: Indicates how much
each frame differs
from the next. High standard deviation means both small and large
differences, low
std means that there is either small or large differences in
general between the frame
sizes.
All parameters in the LR regression had p-values below 5%
significance witch is a com-
monly used significance level. Actually they were all below 2% and
the p-value for the total
36 Chapter 5. Conclusions
model was well below 1%. The R-squared value of the model was only
around 31%. This is
a fairly low value and indicates that only a third of the variance
is explained in the model.
However this is nothing strange considering the properties of the
data and the impossibility
of reaching perfect estimations. (See discussion about frame
sequences in 5.1).
Logistic (LOR)
1. Mean of frame difference sequences: Shows good correlation
between MOS score,
greater mean value indicate higher MOS score. This could be due to
high difference
indicates that the difference between I and P-frames is large which
might ease the
workload on the encoder.
2. Variance of frame sequences: Quadratic relationship, low and
high MOS scores seem
to have higher values of variance then a medium MOS scores. Perhaps
the coder is
less efficient for certain events happening in the frame pictures
for extreme end of the
coding complexities.
3. Standard deviation of frame sequences: Same as for variance of
frame sequences. This
is actually the same parameter modeled twice. Nevertheless it
improves the results.
4. Median of frame sequences: See linear.
5. Longest calm period of frame sequences: See linear.
6. Longest active period of frame sequences: See linear.
7. Longest large period of frame sequences: See linear.
8. Number of passes through median of frame sequences: See
linear.
For LOR regression all but one statistic parameter were at 1%
significance level. However
the parameter ”Mean of frame difference” had a really high p-value
of 90%. This would
indicate that the parameter don’t belong to the regression model.
However thorough testing
showed that this parameter improved the results of the model and
was thus kept. The
deviance was low compared to using other configurations of
statistic parameters and this
indicate that a relative good model was found.
5.3 Discussion of results
The increased precision in estimating a MOS score of a video stream
after incorporating the
new algorithms in the old model can thus be estimated to around
10-23%, depending on bit
rate and methods used.
The PEVQ moduel Oqtopus uses had problems of returning proper PEVQ
scores for
low bitrates which may have affected the results for these cases
(forced to remove around
5% of the data for low bitrates). PEVQ values should not be allowed
below one, but this
5.4. Restrictions and limitations 37
happened for some clips even at bitrates of 300 kb. This problem
was really apparent in
bitrates around 150 were even negative PEVQ values could occur. The
problem was partially
solved by adjusting averages of categories and estimations up to
one and removing clips that
had received negative PEVQ scores.
The algorithm that used all of the FPD, MLR and LR methods in
combination resulted
in the highest RMSE and error reduction and thus the best MOS
estimation.
The most cost efficient algorithm is the one only using the
regression methods. The
increased precision were almost as good as with the algorithm using
all methods but this al-
gorithm used much less time and data space since the costly work of
building and comparing
patterns were removed.
Even further optimized were to directly model the regression
parameters. Since the
parameters showed such clear behavior with the bit rate the modeled
parameters did not
differ much from the regressed ones. (Unless really extreme values
of bit rate were used,
see Appendix A). However even for extreme values the modeled
parameters still kept inside
reasonable values of the same tenth exponent. Efficiency were
improved since the need to
do new regressions for each bit rate were avoided.
5.4 Restrictions and limitations
For really high bitrates (up and above 1000kb/s) the algorithms
seem to loose their effec-
tiveness. There may be a limit to how high bitrates the algorithms
can be used in. This
is probably a result of fewer characteristics in the packet
statistics with increased bitrates
and the difference in packet sizes and video frames becomes less
apparent. This is not a
problem for the resolution QVGA used in this thesis, since bitrates
usually doesnt reach
above 400kb/s.
5.5 Future work
There is a question about how the model works when packet loss is
introduced. There
is a risk that patterns and statistics could be affected to a
degree of not being able to
distinguish differences between patterns in categories or make
correct regressions. However
more testing and understanding of the existing model needs to be
done before an evaluation
of any addition of new algorithms can be carried out.
38 Chapter 5. Conclusions
Acknowledgements
I would like to in particular thank my supervisor David Lindegren
for all of his help with
this thesis work.
[2] IEEE signal processing journal, "IP-basedmobileandfixednetwork"
audiovisual me-
dia services”,
"http://ieeexplore.ieee.org/servlet/opac?punumber=79"
[3] ITU background,
"2011-09-27,http://www.itu.int/en/ITU-T/publications/
publications/Pages/recs.aspx"
com12/sg12-q14.html"
[7] RTP description, 2011-09-27,
"http://tools.ietf.org/html/rfc3550"
[8] Introduction to Data Compression, Second edition, Khalid
Sayood, 2000, ISBN 1-
55860-558-4
[9] Multimedia for mobila system, Stockholm 5-6 november 2007, STF
Ingenjorsutbildning
AB
800-199608-I/en"
1002/acp.731/abstract"
time-series-analysis/"
tutorial_icdm06.pdf"
general-linear-models"
toolbox/stats/regstats.html"
stats/mnrfit.html"
reshape.html"
html"
statistics-glossary/r/?button=0#RMS%20Error"
Modeled regression parameters
Figures of modeled and regressed values of regression parameters.
The red line shows the
regressed values and the black line shows the modeled values.
Figure A.1: LR regression parameter
43
Figure A.2: Standard deviation of frame diff sequences
Figure A.3: Median of frame sequence
45
Figure A.5: Longest small period of frame difference
sequences
46 Chapter A. Modeled regression parameters
Figure A.6: Longest large period of frame sequences
Figure A.7: Number of passes through median of frame
sequences
47
48 Chapter A. Modeled regression parameters
Figure A.10: LOR category intercept3
Figure A.11: LOR category intercept4
49
Figure A.13: Variance of frame sequences
50 Chapter A. Modeled regression parameters
Figure A.14: Standard deviation of frame sequences
Figure A.15: Median of frame sequences
51
Figure A.17: Longest small period of frame difference
sequences
52 Chapter A. Modeled regression parameters
Figure A.18: Longest small period of frame sequences
Appendix B
Videostreams ordered in quality
Figure B.1: A number of frame sequences with PEVQ values of around
0-2.8
53
54 Chapter B. Videostreams ordered in quality
Figure B.2: A number of frame sequences with PEVQ values of around
2.9-3.2
Figure B.3: A number of frame sequences with PEVQ values of around
3.2-5
Introduction
Ericsson
Tools utilized
Related Work
Time series matching