Determining multimedia streaming content

Richard Tano
November 25, 2011 Master’s Thesis in Engineering Physics, 30 credits
Supervisor at Ericsson: David Lindegren Examiner: Jerry Eriksson
Umea University Department of physics
SE-901 87 UMEA SWEDEN
Abstract
This Master Thesis report was written by Umea University Engineering Physics student Richard Tano during his thesis work at Ericsson Lulea.
Monitoring network quality is of utmost importance to network providers. This can be done with models evaluating QoS (Quality of Service) and conforming to ITU-T Recommen- dations. When determining video stream quality there is of more importance to evaluate the QoE (Quality of Experience) to understand how the user perceives the quality. This is ranked in MOS (Mean opinion scores) values. An important aspect of determining the QoE is the video content type, which is correlated to the coding complexity and MOS values of the video. In this work the possibilities to improve quality estimation models complying to ITU-T study group 12 (q.14) was investigated. Methods were evaluated and an algorithm was developed that applies time series analysis of packet statistics for determination of video streams MOS scores. Methods used in the algorithm includes a novel assembling of frequent pattern analysis and regression analysis. A model which incorporates the algorithm for usage from low to high bitrates was defined. The new model resulted in around 20% improved precision in MOS score estimation compared to the existing reference model. Furthermore an algorithm using only regression statistics and modeling of related statistical parameters was developed. Improvements in coding estimation was comparable with earlier algorithm but efficiency increased considerably.
Bestamning av innehall pa multimedia-strommar
Sammanfattning
Detta examensarbete skrevs av Richard Tano student pa Umea universitet at Ericsson Lulea.
Overvakning av natets prestanda ar av yttersta vikt for natverksleverantorer. Detta gors med modeller for att utvardera QoS (Quality of Service) som overensstammer med ITU-T rekommendationer. Vid bestamning av kvaliten pa videostrommar ar det mer meningsfullt att utvardera QoE (Quality of Experience) for att fa insikt i hur anvandaren uppfattar kvaliten. Detta graderas i varden av MOS (Mean opinion score). En viktig aspekt for att bestamma QoE ar typen av videoinnehall, vilket ar korrelerat till videons kodningskom- plexitet och MOS varden. I detta arbete undersoktes mojligheterna att forbattra kvalitet- suppskattningsmodellerna under uppfyllande av ITU-T studygroup 12 (q.14). Metoder undersoktes och en algoritm utvecklades som anvander tidsserieanalys av paketstatistik for uppskattning av videostrommars MOS-varden. Metoder som ingar i algoritmen ar en nyutveck- lad frekventa monster metod tillsammans med regressions analys. En modell som anvander algoritmen fran lag till hog bithastighet definierades. Den nya modellen gav omkring 20% forbattrad precision i uppskattning av MOS-varden jamfort med existerande referensmodell. Aven en algoritm som enbart anvander regressionsstatistik och modellerande av statistiska parametrar utvecklades. Denna algoritm levererade jamforbara resultat med foregaende algoritm men gav aven kraftigt forbattrad effektivitet.
ii
Acronyms
QoS Quality of Service
QoE Quality of Experience
2.4.3 Tools utilized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.7 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.7.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.3 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.5 Algorithm stepwise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Algorithm numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Mathematical approach . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Conclusions 33
5.2.2 Time series matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.3 FPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 Restrictions and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B Videostreams ordered in quality 53
List of Figures
3.1 Video stream as frame sizes versus time (measured in frame number) . . . . . 8
3.2 RTP header scheme [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Intra coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Inter coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Quality (PEVQ) vs bit rate for three different contents . . . . . . . . . . . . . 13
3.6 Basic structure of PEVQ measure algorithm [14] . . . . . . . . . . . . . . . . 14
4.1 Compressed frame sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Flow of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Error reduction versus bitrates for selected algorithms . . . . . . . . . . . . . 29
4.4 Residuals of old regression model . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Residuals of linear regression model . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Residuals of logistik regression model . . . . . . . . . . . . . . . . . . . . . . . 31
4.7 Residuals of combined regression model . . . . . . . . . . . . . . . . . . . . . 31
4.8 Residual comparison at 300kb/s bitrate between existing model and new
model using algorithm V2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.9 RMSE reference vs RMSE modeled regression for various bitrates . . . . . . . 32
A.1 LR regression parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A.2 Standard deviation of frame diff sequences . . . . . . . . . . . . . . . . . . . 44
A.3 Median of frame sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A.4 Longest calm period of frame sequences . . . . . . . . . . . . . . . . . . . . . 45
A.5 Longest small period of frame difference sequences . . . . . . . . . . . . . . . 45
A.6 Longest large period of frame sequences . . . . . . . . . . . . . . . . . . . . . 46
A.7 Number of passes through median of frame sequences . . . . . . . . . . . . . 46
A.8 LOR category intercept1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.12 Mean of frame difference sequences . . . . . . . . . . . . . . . . . . . . . . . . 49
A.13 Variance of frame sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
v
A.15 Median of frame sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.16 Longest calm period of frame sequences . . . . . . . . . . . . . . . . . . . . . 51
A.17 Longest small period of frame difference sequences . . . . . . . . . . . . . . . 51
A.18 Longest small period of frame sequences . . . . . . . . . . . . . . . . . . . . . 52
B.1 A number of frame sequences with PEVQ values of around 0-2.8 . . . . . . . 53
B.2 A number of frame sequences with PEVQ values of around 2.9-3.2 . . . . . . 54
B.3 A number of frame sequences with PEVQ values of around 3.2-5 . . . . . . . 54
List of Tables
4.3 LR modeled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 LOR modeled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 RMSE for algorithms versus bitrate . . . . . . . . . . . . . . . . . . . . . . . 30
vii
Introduction
Multimedia streaming has never been more used than now with Youtube and mobile TV ex-
ploding in popularity. By monitoring these services mobile operators and internet providers
can see how well their networks are working and possibly how happy their customers are,
without having to arrange polls and surveys. An important service is video streaming which
is one of the most bandwidth demanding services. One way to monitor service quality is
that all user traffic in a live network are used, where hundreds of thousands video clients
are reporting back to a measurement collection server. Lots of different contents will be
used, and there is no way for the client to obtain the original, un-coded, file. In this case
a parametric video quality model is used, that only takes parameters from the client and
sends a quality score back to the measurement server. Parametric models have so far no
good way to separate different all contents, meaning that the same performance indicators
(packet loss, bit rate etc) will lead to the same score regardless of how easy the clip is to
encode.
The task for the thesis is to use the information available in multimedia clients, to predict
the content type, codec and other parameters that could help to enhance parametric models.
ITU-T (International Telecommunication Union) are creating standards for infocommu-
nications. Currently they are evolving new standards in QoS (Quality of service) and QoE
(Quality of experience). One of the questions is handling the development of parametric
models aimed for quality measurement purposes. Specifically questions regarding models
discarding the use of the DPI (deep packet inspection) method are worked on. DPI models
can be used to separate contents, however because of limitations regarding areas like security
and technology the DPI method may not be applicable and thus the need for alternative
models are high in the industry and at Ericsson.
1.1 Ericsson
Ericsson is one of the largest companies in Sweden. The company was founded in 1876
by Lars Magnus Ericsson and is now headquarted in Kista, Stockholm. It is a provider of
1
telecommunication and data communication systems, but also provides services for a wide
range of technologies, specifically mobile networks. Ericsson is currently one of the world’s
largest mobile telecommunications equipment vendor. More than 40% of the worlds mobile
traffic passes through its networks and customers exists in more than 180 countries. The
company has the vision of ”To be the Prime Driver in an all-communicating world.” [1]
Chapter 2
Problem Description
2.1 Problem Statement
The aim of the thesis work is to find a mathematical way to classify video parameters from
multimedia streams (video streams). Specifically find a mathematical model which should
be able to order a video stream in levels of its coding complexity based on the parameters.
The focus is on video streams coded in H.264 with a bit rate of 300 kb/s. The transport
protocol used will be RTP and the resolution QVGA. The thesis work was extended to
include investigation of a wide range of bitrates.
2.2 Goals
The goal is to create and describe a mathematical model that complies to and can be used
in the ITU-T study group 12 (q.14) standards P.NAMS and only access basic protocol
information in the data packets of video streams to classify video parameters.
2.3 Purposes
The purpose is to reduce the error that comes from uncertainty in correct classification of
coding complexities of the video streams in existing parametric QoS models.
2.4 Methods
The work starts with a literature study and search of potential mathematical methods to
use. To test the mathematical model analysis of video streams is needed. The creation of
video streams is done using encoding software. Matlab is used to build the mathematical
models, simulate and analyze the results.
3
2.4.1 Planning
The work schedule were planned as follows:
1. Write specification: Specify the object, limitations and time planning of the thesis
work.
2. Literature study: Get knowledge about the background to the problem. Search for
existing work in the area. Look for potential mathematical methods and possibilities
to use in model building.
3. Model decision: Determine which mathematical methods to use.
4. Data creation: Find media stream content to use in model testing and format it to
data (packet streams).
5. Model building: Creation of the different model algorithms.
6. Model simulation: Try models on the data (packet streams and PEVQ scores).
7. Data analysis: Analyze the results of the model simulation.
8. Model analysis: Evaluate if changes to models can be made. If so start over at 5.
9. Report writing: If results are sufficient, write report.
Besides this two presentations were given to informed coworkers at Ericsson during the
thesis work.
Table 2.1 shows the tasks planned during the work.
2.4.3 Tools utilized
Here follows a list of the tools that were used during the work.
– Model building and simulation: MatLab 7.10.0 (R2010a)
– Video streams: Contents on Ericsson server
– Simulation of packet streams: Oqtopus (Ericsson proprietary encoding script system)
• X.264 encoder version r1867
– Document editor and work planner: Emacs with Org-mode
– Presentation: MS PowerPoint
Table 2.1: Tasks performed
2.5 Related Work
Ericsson article, published November 2011, described as: ”... a tutorial overview of current
approaches for monitoring the quality perceived by users of IP-based audiovisual media
services.” [2]
Chapter 3
New media services and networks experiences time varying performances. Monitoring sys-
tems are used to measure how the users experience the quality of the services. For video
streams the performance is often assessed in terms of quality of service (QoS), which for
example includes lost, dropped and resent data packet information. As a result of new
video standards, greater quality degradations is accepted. To know if the end user receives
promised quality, it is required to assess the perceived quality from the user, referred to as
quality of experience (QoE). The most correct way to do this assessment is with perception
tests with human subjects. These tests demands large resources and cannot be done in
a continuously manner to monitor the quality of a running service. Because of this other
methods have been developed that use quality models that map QoS performance indicators
to user perceived quality obtained from the perceptual quality tests. [2] These techniques
fall under the ITU-T Recommendations (standards) for objective video quality assessment.
When building models following the ITU-T Recommendations study group 12 Question
14 there exists restrictions considering the usage of data packet information (P.NAMS).
When analyzing the video streams for determination of video parameters only packet statis-
tics can be used as input to the models. Video streams can be viewed as time series of
frame sizes (figure 3.1). Possible mathematical methods for analyzing the packet statistics
include time series and regression analysis. Time series analysis focus on patterns in the
data streams while regression analysis uses statistic parameters of the data streams.
3.1 ITU
ITU (International Telecommunication Union) has been creating standards in infocommu-
nications since 1865. It became United Nations specialized agency in 1947 and its standards
are used worldwide. ITU Telecommunication Standardization Sector (ITU-T) produces the
standards called ITU-T Recommendations. The Recommendations become mandatory only
when adopted as part of a national law. These Recommendations define how telecommuni-
cation networks operate and interwork. ”Over 3000 Recommendations are in worldwide use
7
8 Chapter 3. Background
Figure 3.1: Video stream as frame sizes versus time (measured in frame number)
for various topics ranging from network architecture and security to transmission systems
and next-generation networks.” [3]
3.1.1 ITU-T Recommendations
ITU-T has a study group for evolving new standards in QoS (Quality of service) and QoE
(Quality of experience) (study group 12). This study group is assigned a period from 2009-
2012 to determine new standards in this field. [4]
Question 14 (Q.14/12) handles the development of parametric models for media qual-
ity measurement purposes (Development of parametric models and tools for audiovisual
and multimedia quality measurement purposes). These measurement models are used to
estimate the user experience as written by ITU-T: ”Measures that predict user-experience
are useful in monitoring and managing time-varying performance and help to facilitate the
rollout, efficient operation and effective service management of such networks.”. [5]
To this date the most researched models uses the DPI (deep packet inspection) which
extract information by going into packets and analyze packet data. However this requires
access to the information in the data packets. For packetized systems this may not be
possible since the information in the RTP packets are often encrypted (SRTP [6]). Then only
the content receiver and RTP provider may have the possibility to decrypt the information.
So far the ITU-T recommendations do not cover these systems, even though the need
in the industry is high. In ITU-T study group 12 question 14 (Q.14/12) the work on such
model standards started. Two types of areas were specified with the names P.NAMS and
P.NBAMS. In P.NAMS the models will only have access to basic protocol information. In
P.NBAMS the model area is extended to also cover access and analysis of information from
the bit stream. Since the bit stream is available there is no need for the results in this report
when using models following P.NBAMS.
3.2. RTP 9
3.2 RTP
When sending data over networks specific transport protocols are used. One of the most
common are the real time transport protocol (RTP). This protocol is used in applications
for transmitting audio, video or simulation data over multicast or unicast networks (trans-
mitting data with real-time properties). RTP uses the RTP control protocol (RTCP) to
monitor quality of service and does not by itself provide any quality of service guarantees.
RTP is usually run on top of another network protocol (typically UDP). Both protocols
contribute to the transport protocol functionality. [7]
Information over networks are delivered in data packets. Limitations in network per-
formance affect how fast and reliable packets can travel from sender to receiver. Packets
can be of various sizes and upper limits exists. Besides containing content information the
packets also have headers which incorporates important parameters. These parameters are
used for delivery and specification of the information in the packets. Parameters could for
example be IP addresses, content type parameters, compression formats, encryption etc.
3.2.1 RTP header
When sending data packets over a network protocol the packets are encapsulated with
new (extra) header information. The RTP protocol uses a number of header information
parameters [7]. These are: version, padding, extension, CSRC count, marker, payload type,
sequence number, timestamp, SSRC, CSRC. The sequence numbers on the packets give
the receiver the possibility to reconstruct the senders packet sequence and in the video
decoding determine the proper location of packets. This excludes the necessity to decode
packets in correct sequence. It can also be used to detect packet loss. The marker can
be used to signal significant events in the packet stream, for example frame boundaries.
The timestamp determines the sampling instant of the first octet in the RTP data packet.
The clock increments monotonically and linearly and its format is specified statically in the
profile or payload format. The frequency is dependent of the data carried in the payload.
Figure 3.2 shows a schematic view of the RTP header.
3.3 Video streams
Pictures (frames in video streams) are built up from large amounts of data bits and the
information may exceed the capacity of one data packet. Thus frame information may be
spread over multiple data packets. The timestamp (see 3.2) is based on the playback time
and packets belonging to the same video frame thus also have the same timestamp. Video
streams are pictures (frames) with moving content sent in a rapid sequence. They put
greater stress on the network than normal pictures, requiring numerous data packets to be
sent in short intervals. As is the case with pictures, different compression techniques exists.
Some examples of video standards are H.261, H.263 MPEG-2, MPEG-4 and H.264/MPEG-4
0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3.2: RTP header scheme [7]
AVC. In recent years the most common technique used in streaming high quality video is
H.264.
3.3.1 Compression
Compression is used when sending or streaming data to reduce the data size (i.e. coding of
images and video). There are two basic types of compression, lossless or lossy compression.
[8]
1. Lossless compression: All information is kept and only knowledge about the source is
needed. Size is reduced by increasing the information contained in every data bit sent.
Various techniques exists but limits exists depending on the entropy of the source.
2. Lossy compression: Gives better compression but information is lost and knowledge
about both the receiver and source is required. Size reduction is received mainly from
removing and distorting details and not requiring exact reconstruction in receiver.
3.3.2 Image coding
Images are built up from pixels which normally have three color values (RGB) dictating
which color it represents. For compression its better to convert these values to another
color space (YUV) and thus reduce the correlation. In color space the pixels are represented
by a luminance component (brightness) and two color values.
Most coding standards use Discrete Cosine Transform (DCT). When doing DCT a
Fourier transform is used to turn the pixel values into frequency space and represent them
with DCT coefficients. DCT is a lossy compression technique.
3.3. Video streams 11
3.3.3 Video coding
In video coding the pictures are commonly divided into macro blocks. These are usually
in the size of 16x16 pixels. There are two basic types of coding these macro blocks, intra
coding and inter coding. [9]
1. Intra coding(I): The macro block is coded like a still image block. With similar
techniques, like DCT transformations. Figure 3.3 shows an intra coding scheme.
2. Inter coding(P): Similar macro blocks between current and previous images are searched
for. The macro block is then coded by a motion vector and a difference block. This
requires less data than Intra coding. Figure 3.4 visualize the motion vector.
Figure 3.3: Intra coding
The frames in the videocoding is arranged to different categories depending on how their
macroblocks are coded.
2. P-pictures: Macroblocks can be P-coded or skipped.
First picture in all videostreams have to be an I-picture. Different standards have various
schemes for use of I- and P-pictures.
Value of motion dictates grade of compression, with less motion giving better compression
and less data streams.
3.3.4 H.264
H.264, also called MPEG-4 part 10, became an international standard in 2003 and has
twice as good compression as previous H.263 and MPEG-4 standards. It was developed by
Figure 3.4: Inter coding
the Joint Video Team (JVT) which was a collaboration between the ITU-T Video Coding
Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG). Its used
by many internet streaming resources and softwares but also with cable/satellite televisions
services, Blue-ray players, real-time videoconferencing and more.
H.264 is more complex (up to five times) than previous videos standards but also gives
higher reduction in bitrates (20-50% compared to H.263) and works both with low and high
bitrates. [9] Most important changes from previous standards are:
1. Usage of parameter sets instead of picture headers
2. Intra coding now results in 4x4 difference blocks with many different directional modes
3. Inter coding also results in 4x4 blocks with many previous pictures used as references
with more motion vectors and lower resolution
4. Integer transform instead of DCT transform
5. Prediction is used in coefficient coding
6. Loopfilter used
3.4 Quality assessment
When doing quality assessments for video streams a measurement algorithm gives quality
scores, MOS (Mean Opinion Scores), depending on the picture quality.
MOS scores comes from the Mean Opinion Score test which have been used for many
years to obtain a user’s view on general quality. The MOS score is represented by a number
3.5. PEVQ values 13
from 1 to 5, with 1 being bad (lowest perceived quality) and 5 being excellent (highest
perceived quality). [10]
Subjective tests with real life persons is the most correct and common way to mea-
sure MOS scores. Another way to do measurements is by full reference models which use
parametric models to make comparisons between the original video and the coded video.
Yet another way is to use a parametric model which only use the streamed (coded) video.
Both methods need to learn from human users (subjective tests) which parameters affect
experienced quality. The MOS values can then be estimated from these parameters.
Video quality in video streams are highly correlated to the bit rate. High bit rate makes
it possible for the encoder to use more data for each frame and thus also less compression of
the picture. Figure 3.5 shows a correlation of bit rate and video quality measured in PEVQ
values (see 3.5) for three video clips containing different contents.
Figure 3.5: Quality (PEVQ) vs bit rate for three different contents
For quality tests with humans it has been found that human memory affect the quality
results and thus content length has to be considered when undertaking tests. It has been
proposed that test lengths no longer than twenty seconds are to be used. [12] As a result
preferred length of clips used in Ericsson models are ten seconds.
3.5 PEVQ values
A PEVQ (Perceptual Evaluation of Video Quality) value is a type of MOS score, MOS-LQO
(Listening Quality Objective), which is developed by OPTICOM and has become part of
ITU-T Recommendation J.247 (2008). The MOS-LQO scale goes from 1 (worst) to 4.5
(best). [11] The actual PEVQ value can however go below 1 and even reach negative values
because of limitations in the algorithm used.
PEVQ gives MOS estimates of video quality degradation by comparing the undistorted
reference video signal with the streamed video signal. Its a full reference, intrusive measure-
ment algorithm. The approach of PEVQ is to model the human visual system and quantify
the anomaly perceived in the video signal by a number of key performance indicators (KPIs).
This estimation includes both the packet level impairments (loss, jitter) and signal related
impairments (blockiness, jerkiness, blur etc.) caused by coding of the video. [13] Figure 3.6
shows an overview of the structure of the PEVQ algorithm.
Figure 3.6: Basic structure of PEVQ measure algorithm [14]
3.6 Timeseries analysis
A time series is a sequence of values or measurements over time. Successive values are
assumed to be taken at equally spaced time intervals.
Time series analysis builds on identifying the pattern in the observations and use it to
describe the behavior of the sequence or forecast future values.
Similarly to other statistic analyses the data is assumed to consist of systematic patterns
and random noise. To extract the pattern filtering various techniques to filter out the noise
exists. [15]
3.6.1 General analysis
Most patterns in time series can be explained by two basic components, trends and season-
ality.
Trend is a component that change over time and can be both linear or non linear. It
doesn’t repeat itself over time.
Seasonality is a component that also changes over time but unlike the trend it repeat
itself in systematic intervals over time.
For trend analyses there exists no automatic techniques proven to work. The first step
in most techniques is to remove the error component by smoothing the series. One of
the most common techniques is the use of moving averages. This works by taking the
average of surrounding values. Either median or mean can be used as an average, where the
median method is more stable to outliers. Next step in trend analysis is fitting a function
to the series, commonly a linear function is used. For this to work the series may need a
transformation to remove nonlinearity, with a logarithmic or polynomial function.
Seasonality analysis is built to find the correlation between values in the series. The
period between the repeat of a pattern in the series is called the lag. The correlation
dependency between two terms can be measured by autocorrelation. [15]
3.6.2 ARIMA
Used for generating forecasts in time series analysis. The basic methodology behind it is
estimation of sets of coefficients that can describe consecutive elements of the time series
based on earlier time lagged elements.
The method is complex and comes with the condition that it requires stationarity of the
time series. To use the model one needs to make the series stationary and remove serial
dependency (seasonal). This means differencing the series until stationarity is achieved. For
good results the user needs to examine plots and autocorrelograms to find a suitable level
of differencing.
3.6.3 Timeseries matching
This method considers the geometric properties of time series. The basic concept is to
measure the distance (a value of used to describe similarity) between the geometric series.
This is preferably done by transforming the series into a suitable basis and compare the
distance between them with various methods. Examples of basis used are Fourier transform
(DFT), wavelet transform (DWT), principal components (PCA), piecewise quantized, sym-
bolic and vector quantization (VQ). The most simple measurement of distance is done using
Euclidean distance. It does measurements only between fixed time positions in both series.
Other forms of measurement includes dynamic time warping (DTW) and longest common
subsequence (LCSS). Both of these methods does measurements with variable time. [16]
Piecewise constant and symbolic basis
A piecewise quantized basis starts with the series being divided into k segments. All points
inside a segment are represented by their mean value.
This quantization can be done in MatLab through the functions reshape [21] and mean
[22]. A code example is shown below.
K_{segments} = reshape(DataFrames, SegmentLength, NumberCoefficients);
K_{means} = mean(K_{segments});
SegmentLength: Length of segments
NumberCoefficients: Number of symbolic segments
These mean values can also be quantized in levels and exchanged by the symbolic coun-
terpart representing the level (symbolic quantization).
Euclidean distance
This is the most widely used distance measurement method. The definition is expressed in
equation (3.1).
3.6.4 Frequent pattern discovery
Specific patterns of variable length can have higher chances of occuring in series depending
on their parameters. Occurance of patterns can be measured against database patterns.
Thus classification can be done through similarity comparisons.
3.7 Regression analysis
Regression is the statistical method of trying to find a relationship between a number of
variables that predicts an outcome [17]. Regression can use one or many (multiple regres-
sion) independent variables (predictor variables) to predict a dependent variable (response
variable), which is the outcome.
Linear regression (LR) is the mathematical relationship of the straight line that best
approximates the individual data points of the independent variable. Other forms of re-
lationship between the predictor and response variable can be found, e.g. quadratic or
logarithmic.
The general form of an ordinary linear regression is shown in equation (3.2).
3.7. Regression analysis 17
Y = a + b×X + u (3.2)
Where Y is the dependent variable, a is the intercept, b is the slope, x is the independent
variable and u is the residual.
The slope b is the parameter that indicate how much a change in the predictor variable
affect the response variable.
The accuracy of the prediction is a result of the models fit. This fit depends on how well
the linear relationship corresponds to the data points. To know how well the fit is a number
of values can be calculated that explains the models fit.
The residuals are an important marker of how good the regression model is. A residual
is the difference between the model prediction and the actual prediction at that value of the
independent variable. Residual plots can explain patterns or misbehavior in models.
R-value is another important value. It says how well the total fit is between the model
and the independent variables. It sums up the residuals.
Through a statistic f-test the hypothesis that the model explains the relationship (effect
different from zero) can be evaluated. The outcome of the test is a f-value that corresponds
to a p-value. Depending on if the p-value is below or above the statistical significance
requested, the hypothesis will either be dropped or kept.
There are various forms of regression, two of them are multiple linear regression and
logistic regression.
3.7.1 Multiple linear regression
As with ordinary LR multiple linear regression predicts an independent variable from in-
dependent variables. In this case there can be multiple numbers of independent variables.
In MLR different relationships between the independent variables can create new predictor
variables in the model.
Multiple linear regression takes a similar form as ordinary linear regression (3.3).
Y = a + b1 ×X1 + b2 ×X2 + B3 ×X3 + ... + Bt ×Xt + u (3.3)
Where the numbers indicate the number of the independent variable and its correspond-
ing parameter.
The same statistical parameters can be found in multiple LR as in ordinary LR. In the
same way but with t test statistics can be carried out on all the b values. If the corresponding
p-value is outside of the searched significance the b value can be dropped.
Multiple linear regression can be done in MatLab through the regstat function. [19]
3.7.2 Logistic regression
Logistic regression is another type of regression which builds upon the theory of generalized
linear models.
The difference from other linear regressions is that the response variable can be discrete.
Thus it can only attain certain values and the outcome of the logistic regression model
is a prediction of which value is the best fit. The discrete values can also be looked on
as categories. For logistic regression there is no equivalent to R-square. Models can be
compared with another measurement called the deviance of the fit, which is the difference
between the log-likelihood of the fitted model and the maximum possible log-likelihood.
Logistic regression can be done in MatLab through the mnrfit function. [20]
3.8 Cross-validation
Repeated random sub-sampling validation randomly splits data into a validation and a
training set. The model is fitted with the training data and assessed with the validation
data. [18] Cross validation is done to estimate how accurate a predictive method can perform
in practice and guarding against type three errors. These errors can occur if the same data
is used to both model and test the hypotheses.
Chapter 4
4.1.1 Overview
Through the header timestamps of packets in media streams the frames can be assembled
into a sequence consisting of the frame sizes and their time orders. These frame sequences
consists of two different series. Every even or odd frame follows its own pattern, often the
case being that either the odd are really low in size while the even being big or the reversed.
A way to handle this behavior is to put each pair of frames together into a new frame
consisting of both their sizes. These added frames thus create a new frame sequence which
is subsequently analyzed regarding the frame sizes (FS).
Also used in the analysis is the difference between two consecutive frames, which are
put in a new sequence consisting of the frame difference sizes and their corresponding time
orders (FDS).
The sequences are analyzed and conversed into predicted MOS scores (MOS) by the
combination of three methods, frequent pattern analysis and two types of regression analysis.
4.1.2 Frequent pattern discovery (FPD)
Frequent pattern discovery method is estimating MOS scores to sequences by assigning them
to categories. This categorization is done by searching for frequently occurring patterns in
each sequence and comparing this with the patterns belonging to each category.
Each category consists of an upper and lower PEVQ limit, a PEVQ mean and a number of
common patterns. These patterns are taken from frame sequences which have PEVQ scores
inside the PEVQ interval of the category. The patterns consists of symbols, indicating an
interval of values
In a categorization of a frame sequence the PEVQ mean of the category, with the highest
similarity between the patterns in the sequence and category, is assigned to the sequence as
19
a prediction of its MOS value.
For better prediction different numbers off lengths on patterns can be used. Also both
the FS and FDS sequences can be used.
The method therefore consists of two parts, creation of the database patterns and ana-
lyzing of a frame sequence.
Creation of database
Input to this method for creation of database patterns are the frame sequences (both ordi-
nary frame size sequences and frame difference size sequences), the number of categories to
be used, which symbol levels to use, how big the segment lengths are going to be andthe
number of symbols in the patterns (length of patterns).
The first step of this method is to decide the category PEVQ intervals through the frame
PEVQ scores of the frame sequences given as input to the method. This is done by evenly
distributing the sequences over the number of categories to be used. The frame sequences
are then put in their respective category according to their PEVQ value.
All frame sequences are reduced in dimensionality by breaking the sequence in certain
segments, averaging their value over these segments and labeling them with a symbol. La-
beling is done by comparing values to a symbol database with upper and lower limits for
each symbol. If the value is inside the symbol range, that symbol is used. The compressed
sequences are then arranged in patterns, which can be of various lengths. For example:
ABA, AABC, ABCDE.
The next step is to compare and count patterns in the frame sequences belonging to
each category. Only the most common patterns in each category are kept. A pattern can
only be used in one category, so all patterns in a category must be unique. The final step
is to put all categories to equal length (equal number of patterns).
Output of the method is the categories and their corresponding patterns. These patterns
are for both FS and FDS sequences and can be of various lengths.
An example of a pattern output is:
Category with 3 symbols FS patterns =
’LHF’ ’HFF’ ’FFE’ ’FEE’ ’EEE’ ’EED’ ’EDB’ ...
Figure 4.1 shows how a frame sequence gets compressed and arranged in specific symbol
levels.
FPD analysis of a frame sequence
Inputs to the method are the pattern categories, symbol levels and segment lengths used by
the database and the frame sequence to be analyzed.
4.1. Algorithm description 21
Figure 4.1: Compressed frame sequence
The method counts the number of similar patterns in each category. This is done for all
the pattern types and lengths in the database. The one with the most hits is chosen as the
category estimate for the specific pattern type and length.
Output from the method is a category estimate for each of the different types of patterns
and pattern lengths used.
Two regression methods are used, logistic regression and multiple linear regressions. Both
build upon creating statistical models depending on a number of statistical parameters
extracted from the frame sequences.
Both methods uses statistics extracted from the frame sequences by the statistic extrac-
tion function.
Statistic extraction function (ST)
This function takes a frame sequence and extracts various statistic parameters from it.
Input is the frame sequence to extract statistics from. This can either be an FS or FDS
sequence.
1. Mean of sequence (M)
2. Variance of sequence (V)
22 Chapter 4. Results
4. Modes of sequence (MO)
5. Median of sequence (ME)
6. Longest calm period of sequence (LC). Defined as the longest length of period where
the difference of two subsequent frames is smaller than 0.1 times the mean of the
sequence.
7. Longest active period of sequence (LA). Defined as the longest length of period where
the difference of two subsequent frames is greater than 0.0025 times the mean of the
sequence.
8. Longest small period of sequence (LS). Defined as the longest length of period where
frames are smaller than 0.9 times the mean of the sequence.
9. Longest large period of sequence (LL). Defined as the longest length of period where
frames are greater than the mean of the sequence.
10. Number of bursts in sequence (NB). Defined as the number of times where the differ-
ence of two subsequent frames is greater than 0.045 times the mean of the sequence.
11. Number of passes through median of sequence (NP). Defined as the number of times
the sequence goes from greater to smaller than the mean of the sequence or the re-
versed.
Linear regression method (LR)
This method uses multiple linear regression analysis to get a statistical method for MOS
estimation of a frame sequence. A dataset of frame sequences, their corresponding statistical
parameters and their PEVQ scores are used to create a regression model were the statistical
parameters are the predictor variables (independent variables) and the PEVQ score are
response variables (dependent variables).
Input to the method is statistics from the frame sequences extracted in ST. Example
of statistical parameters to use in a good LR regression model are shown in table 4.1 with
corresponding regression statistics. The adjusted R squared value for this LR regression is
0.3114.
Output of the method is a statistical model, which takes the statistical parameters as
input and gives a MOS estimate as output.
Example of output from the LR regression is shown in equation (4.1).
MOS = 1.7647+0.0003×ME−0.0035×LC+0.0022×LA−0.0017×LS+0.0082×LL+0.0109×NP
(4.1)
LR regression Parameter T-statistic P-value Standard deviation of frame difference sequence 2.7514 0.0063 Median of frame sequence 3.6444 0.0003 Longest calm period of frame sequences -2.4016 0.0170 Longest active period of frame sequences 2.4515 0.0148 Longest small period of frame difference sequences -3.1349 0.0019 Longest large period of frame sequences 4.3367 0.0000 Number of passes through median of frame sequences 3.5316 0.0005 Total model (F-statistic) 18.477 2.872e-20
Table 4.1: LR regression params
Logistic regression (LOR)
The logistic regression method uses a multinomial logistic regression to categorize frame
sequences and then deciding a MOS estimate in a similar way to the FPD method. The
method uses an ordinal model to fit and use no interactions between categories.
Like in linear regression a number of statistical parameters are used as input to the
regression (independent variables). Also used as input is the same categories used in the
creation of database patterns, these are the dependent variables in the regression.
Example of statistical parameters to use in a good LOR regression model are shown in
table 4.2 with corresponding regression statistics. The deviance for this LOR regression is
846,79.
LOR regression Parameter T-statistic P-value Mean of frame difference sequences -0.0347 0.9723 Variance of frame sequences -2.1505 0.0315 Standard deviation of frame sequences -3.5486 0.0004 Median of frame sequences 2.7104 0.0067 Longest calm period of frame sequences -2.3708 0.0177 Longest active period of frame sequences 1.6882 0.0914 Longest large period of frame sequences -4.4320 0.0000 Number of passes through median of frame sequences -4.1254 0.0000
Table 4.2: LOR regression params
Output is a statistical method. This method takes statistical parameters as inputs and
calculate a category intercept value (CIV) which is compared to the intercept values of
the categories (which are also given by the method) to estimate a category for the frame
sequence. Number of intercept values depends on number of categories used. In the case of
five categories, four intercept values are provided each corresponding to the upper level of
the real MOS category.
Example of output method from the LOR regression is shown in equation (4.1.3).
Intercept values = [1.7580, 2.9797, 4.0756, 5.3036]
Mean PEVQ of categories = [2.25431, 2.76333, 3.07321, 3.32914, 3.67939]
4.1.4 Combining the methods
There are various ways of combining the methods. The combinations all build upon taking
a mean value of their estimations (categories or PEVQ values).
One way is to combine the frequent pattern discovery and logistic regression into one
category estimation. This category estimation then gives a MOS value through the category
PEVQ mean value. The mean of this MOS estimate and the linear regressions MOS
estimates is then used as a prediction of the MOS score of the frame sequence in question.
4.1.5 Algorithm stepwise
Here follows a flow chart description of the steps in the algorithm.
Setup for the algorithm:
1. Build category database consisting of patterns from frame sequence content
2. Create both linear and logistic regression models from frame sequence content
Analyzing of an incoming frame sequence:
1. Extract patterns from the frame sequence
2. Extract statistical parameters from frame sequence
3. Run FPD on the patterns to get category estimations
4. Put statistical parameters in logistic regression to get a category estimation
5. Put statistical parameters in linear regression to get a MOS estimation
6. Combine the category estimations from each method to get a more correct estimation
7. Use the MOS mean from the category estimations as a MOS estimate
8. Combine the MOS estimations from 5. and 7. to achieve the best MOS prediction
Figure shows the flow of the algorithm.
Figure 4.2: Flow of algorithm
4.1.6 Algorithm for all methods
Assembling of the methods can be done in many different ways. The following list gives some
examples of configurations. All forms use three different lengths on symbols for patterns
(3,4,5). Example steps of merging (algorithm version 1):
1. Mean PEVQ of every pattern categorization using only FS series.
2. Mean of 1. and LOR MOS value.
3. Mean of 2. and LR MOS value.
Version 2 of algorithm is to skip step two and take the mean of frame and log directly.
Version 3 is to add the usage of categorizations from FDS series in step 1.
4.1.7 Algorithm with only regressions
Another way is to only combine the regression methods into an algorithm.
MOS = mean(LR MOS + LOR MOS)
4.1.8 Algorithm with modeled regression
Since the statistical parameters in the regression methods requires to be calculated for
every bit rate, either new regressions need to be carried out for every new bit rate sequence
considered or a huge database has to be built. Instead of doing this the statistical parameters
used in the regression can be modeled.
The equations (4.3) (4.4) (4.5) (4.6) (4.7) (4.8) are used to parameterize the regression
parameters. Input to the equations is the bit rate in question.
C1 ∗XC2 (4.3)
C1 ∗XC2 + C3 ∗XC4 + C5 ∗X (4.8)
Table 4.3 show the equation type and the corresponding coefficients that was used for
parameterize the LR regression parameters. Table 4.4 shows the same for the LOR regression
using five categories.
LR modeled Parameter Equation NR Model coefficients LR regression parameter (4.6) -0.0044,1.17,0.018 Standard deviation of frame diff sequences (4.6) 4.64,-1.63,2.88e-10 Median of frame sequence (4.4) 2.31e-04,-1.78e-07 Longest calm period of frame sequences (4.6) -0.096,-0.64,6.83e-07 Longest small period of frame difference sequences (4.6) -0.22,-0.60,1.34e-07 Longest large period of frame sequences (4.3) 1.71,-0.90 Number of passes through median of frame sequences (4.3) 0.41,-0.68
Table 4.3: LR modeled
Observe that the parameter ”Longest active period of frame sequences” have been re-
moved. This is a result of the parameter showing bad behavior with increased bit rate and
thus the difficulty of finding a suitable model.
For figures showing the correct and modeled values of the regression look in Appendix
A.
Error reduction
To test if implementing the new algorithm into the existing model would give a better
prediction of the media streams coding complexity, a comparison of the errors in MOS
4.2. Algorithm numerical results 27
LOR modeled Parameter Equation Nr Model coefficients LOR category intercept1 (4.8) 35.25,-3.295,-0.0063,1.17,0.021 LOR category intercept2 (4.8) -0.023,1.10,0.32,-8.64,0.049 LOR category intercept3 (4.8) 8.57e-11,3.39,-0.91,1.01,0.98 LOR category intercept4 (4.8) 3.06e-11,3.51,0.63,0.98,-0.56 Mean of frame difference sequences (4.3) -4.08e+02,-1.99 Variance of frame sequences (4.3) -0.33,-2.26 Standard deviation of frame sequences (4.5) 45.45,-1.69,6.20e-04 Median of frame sequences (4.5) 0.0015,0.16,-0.0047 Longest calm period of frame sequences (4.7) -11.92,0.15,11.93,0.15 Longest small period of frame difference sequences (4.4) 0.017,-5.39e-06 Longest large period of frame sequences (4.5) -1.18,-0.014,1.053 Number of passes through median of frame sequences (4.4) -0.034,1.0057e-05
Table 4.4: LOR modeled
prediction between the existing and new model were done.
The existing model used the mean of the PEVQ values of all the video streams at a
particular bit rate as the MOS estimate for a test sequence. The error for the model could
thus be calculated as:
Error new model (EN) = MOS - PEVQ)
One way to show the error improvement is to calculate how big the error reduction was
of the total error:
EE (4.9)
(Negative EPR would instead mean an increase in error.)
With only around 300 video clips available for testing, a cross-validation technique were
carried out to get more statistic significance of the results. This was done by randomly
selecting ten percent of the clips as test objects while the rest were used as training material
for the algorithm.
RMS Error
Summarises the overall error between the estiamated MOS and the PEVQ value. Since the
differences can be either positive or negative the RMSE gives a good measurement of how
far the error is from zero on average. [23] The equation to calculate RMSE is shown in
(4.10)
Where d is the differences and n is the number of differences.
Error residuals
Another way to show how good the algorithms are capable of predicting MOS score is to
calculate the residuals between correct PEVQ scores and estimated MOS scores.
Error residual = PEVQ - MOS (4.11)
4.2.2 Results in numbers
Results in error reduction (using (4.9)) with implemention of various algorithms are shown
in table 4.5 and figure 4.3.
Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s LR 15,51% 16,70% 12,80% 17,56% 12,52% 11,83%
LOR 14,94% 12,70% 14,60% 17,32% 15,54% 14,14% FPD 1,76% 3,11% 3,41% 9,35% 10,35% 5,92%
Combined methods V1 18,31% 16,84% 18,04% 20,02% 18,52% 12,92% Combined methods V2 18,47% 16,18% 18,21% 20,15% 19,50% 13,41% Combined methods V3 18,31% 16,35% 17,72% 19,92% 19,06% 13,40%
Regression combination 18,71% 17,61% 15,87% 19,02% 15,73% 14,50%
Table 4.5: Error reduction from methods versus bitrates
Error residuals (using (4.11)) for regression methods are shown in figures 4.4 4.5 4.6 4.7
(with simulations using 10% of the data as validation clips and 10 cross-sampling runs).
A comparison of the spread of the error residuals (difference between correct and esti-
mated PEVQ values) are shown in figure 4.8 for both the existing model and the new model
using algorithm V2.
Table 4.6 show error reduction for algorithms with combined methods, modeled regres-
sions and high bitrates.
Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s 600kb/s 800kb/s 1500kb/s Mean* All methods V1 20,6% 17,9% 19,8% 20,5% 16,2% 15,0% 19,4% 18,4% 11,8% 18,5%
Reg combination 20,9% 17,5% 15,6% 15,1% 17,4% 13,4% 17,9% 14,5% 11,3% 16,8% LR modeled 17,12% 14,48% 7,19% 5,53% 9,08% 10,36% 3,11% -70,32% -700,4% 9,6%
LOR modeled 13,5% 15,69% 13,72% 11,22% 12,6% 9,7% 14,8% 12,4% -70,61% 13,0% Regressions modeled 23,2% 20,1% 18,9% 16,9% 18,7% 15,8% 12,2% -12,2% -399,8% 18,0%
* Disregarded extreme bitrates of 800kb/s and 1500kb/s since resulting in negative values for certain methods.
Table 4.6: Error reduction with algorithms versus bitrates
Figure 4.3: Error reduction versus bitrates for selected algorithms
Table 4.7 show RMSE for algorithms with combined methods, modeled regressions and
high bit rates.
In figure 4.9 the difference between the RMSE for the reference model and the modeled
regression model are shown for various bitrates.
Figure 4.4: Residuals of old regression model
Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s 600kb/s 800kb/s 1500kb/s Mean* LR 0.5938 0.5412 0.4734 0.4337 0.3985 0.3545 0.2858 0.2233 0.1137 0.4130
LOR 0.6106 0.5302 0.4970 0.4403 0.4141 0.3729 0.2998 0.2299 0.1139 0.4243 Reg combined 0.5809 0.5185 0.4710 0.4242 0.3976 0.3566 0.2842 0.2197 0.1107 0.4066
All methods V1 0.5752 0.5116 0.4617 0.4030 0.3990 0.3504 0.2753 0.2158 0.1130 0.3990 LR modeled 0.6022 0.5362 0.5028 0.4681 0.4208 0.3627 0.3367 0.4062 0.8522 0.4545
LOR modeled 0.6693 0.5835 0.5125 0.4648 0.4243 0.3781 0.3004 0.2273 0.2124 0.4450 Regressions modeled 0.5856 0.5310 0.4774 0.4340 0.3912 0.3490 0.3043 0.2810 0.5205 0.4192
Reference model 0.6777 0.6169 0.5499 0.4966 0.4670 0.4086 0.3468 0.2620 0.1275 0.4782 * Disregarded extreme bitrates of 800kb/s and 1500kb/s
Table 4.7: RMSE for algorithms versus bitrate
Figure 4.5: Residuals of linear regression model
Figure 4.6: Residuals of logistik regression model
Figure 4.7: Residuals of combined regression model
Figure 4.8: Residual comparison at 300kb/s bitrate between existing model and new model using algorithm V2
Figure 4.9: RMSE reference vs RMSE modeled regression for various bitrates
Chapter 5
5.1 Data analyzed
When examining frame sequences of video streams of different coding complexities (see
Appendix B) one could observe that in general more changes (jumps) in frame size is seen
in frame sequences belonging to high quality scores. Smooth curves are more apparent in
groupings of lower quality scores. However no definite conclusion could be drawn about a
specific frame sequence since all types of curves exist for all quality scores. There was no
visual clue that can give a definite answer to where which quality category a frame sequence
fit into, only various levels of probability. This could be a result of parameters not shown
in the frame sequence statistics and only shown in the actual packet data.
5.2 Methods analyzed
A selection of different mathematical methods were looked into and tested in the search for
a useful algorithm. The methods evaluated were:
– Field of time series analysis
• Trend and cyclical behavior methods
• Time series matching
• Frequent pattern discovery
– Field of regression
• Multiple linear regression
5.2.1 Trends and cyclical behaviour methods
In the time series area the trend and cyclical behavior methods were ruled out. This was a
result of video streams belonging to different PEVQ groups showing bad common cyclical
behavior or trend. This is a result of the randomized picked time intervals in the video
stream. Furthermore its problematic to develop an algorithm that works automatically
without user input with these methods.
5.2.2 Time series matching
With the time series matching three different types were tried, Euclidean, DTW and Lower
bounding. One issue were the big dissimilarities between time series in the same PEVQ
categories, but the biggest problem was with the time and resource management. To make
match measuerement between time series a large database needs to be created. More prob-
lematic is that measuring distance between time series takes much computational power.
Algorithms incorporating for example the DTW algorithm had running times of minutes
even on a modern stationary computer (Dualcore 2,4 ghz, 4 gb ram). Thus there would
not be possible to incorporate these methods in the parametric models used in for example
handheld devices doing media streaming. The hardware would not be adequate with today’s
technology.
5.2.3 FPD
Frequent pattern discovery (FPD) works without user input and it can select and match
patterns indifferent of where they occur, which removes some of the limitations of the other
time series methods. Even if the method requires to create a pattern database for each
bitrate and use comparisons of patterns in the estimation it takes much less time than
time series matching methods. The pattern databases are efficient and small due to the
dimensionality reduction and since it’s only symbol comparisons the measurements are done
much faster than distance measurements. Running times were in the range of ten seconds
on the same setup as was used with the time series matching methods.
The FPD method gives clear improvements in estimating the coding complexity com-
pared to the old model. Error and RMSE reductions of up to and above 10% could be
achieved. The results were not as good as each of the individual regression methods but in
conjunction with those methods it improves the results.
5.2.4 Regression
The regression methods achieved the highest error and RMSE reduction and are very efficient
in time and data space consumption. Multiple linear regressions (LR) give a little bit better
MOS prediction of the streams actual PEVQ value but the logistic regression (LOR) is easier
to use in combination with the FPD method (both returning categorizations). The combined
5.2. Methods analyzed 35
regression method also reached very good results, around 90% of the error reduction achieved
when using all methods (Combined methods V1).
5.2.5 Analysis of statistic parameters
Following is a discussion of the statistic parameters for each regression.
Linear (LR)
1. Median of frame sequence: Higher values indicate higher MOS scores. Seems to show
better linearity than mean of frame sequence. Median doesn’t account for big changes
(especially as could happen in beginning or ending of a sequence as a result of scene
changes) as much as the average does. This makes the median a more stable parameter
and thus more reliable.
2. Longest calm period of frame sequences: Generally long calm regions renders in lower
MOS scores. This might be due to that long periods of non-changing frame sizes
already is using the maximum available number of bits for that frame. This might
indicate that the encoder needs more bits per frame to encode these frames with good
quality.
3. Longest active period of frame sequence: Similar to the last one but reversed now
checking for active region. High values indicates that the encoder gets to work a lot
but can still use enough bits per frame while maintaining decent quality.
4. Longest small period of frame difference sequences: Small period in the frames dif-
ference sequences sizes indicate few changes in the frames i.e. less activity between
frames. A long period results in easier coding and lower MOS score. Measures same
thing as longest calm period, but improves results when used in the model.
5. Longest large period of frame sequences: A long period of large frame sizes indicate a
higher MOS score. The relationship isn’t as obvious as with other parameters.
6. Number of passes through median of frame sequences: This measures when the fol-
lowing frame is very similar to the preceding frame or not. This indicates when frames
are very similar so that very few bits (for example just after an I-frame) are needed
to encode the frame.
7. Standard deviation of frame diff sequences: Indicates how much each frame differs
from the next. High standard deviation means both small and large differences, low
std means that there is either small or large differences in general between the frame
sizes.
All parameters in the LR regression had p-values below 5% significance witch is a com-
monly used significance level. Actually they were all below 2% and the p-value for the total
36 Chapter 5. Conclusions
model was well below 1%. The R-squared value of the model was only around 31%. This is
a fairly low value and indicates that only a third of the variance is explained in the model.
However this is nothing strange considering the properties of the data and the impossibility
of reaching perfect estimations. (See discussion about frame sequences in 5.1).
Logistic (LOR)
1. Mean of frame difference sequences: Shows good correlation between MOS score,
greater mean value indicate higher MOS score. This could be due to high difference
indicates that the difference between I and P-frames is large which might ease the
workload on the encoder.
2. Variance of frame sequences: Quadratic relationship, low and high MOS scores seem
to have higher values of variance then a medium MOS scores. Perhaps the coder is
less efficient for certain events happening in the frame pictures for extreme end of the
coding complexities.
3. Standard deviation of frame sequences: Same as for variance of frame sequences. This
is actually the same parameter modeled twice. Nevertheless it improves the results.
4. Median of frame sequences: See linear.
5. Longest calm period of frame sequences: See linear.
6. Longest active period of frame sequences: See linear.
7. Longest large period of frame sequences: See linear.
8. Number of passes through median of frame sequences: See linear.
For LOR regression all but one statistic parameter were at 1% significance level. However
the parameter ”Mean of frame difference” had a really high p-value of 90%. This would
indicate that the parameter don’t belong to the regression model. However thorough testing
showed that this parameter improved the results of the model and was thus kept. The
deviance was low compared to using other configurations of statistic parameters and this
indicate that a relative good model was found.
5.3 Discussion of results
The increased precision in estimating a MOS score of a video stream after incorporating the
new algorithms in the old model can thus be estimated to around 10-23%, depending on bit
rate and methods used.
The PEVQ moduel Oqtopus uses had problems of returning proper PEVQ scores for
low bitrates which may have affected the results for these cases (forced to remove around
5% of the data for low bitrates). PEVQ values should not be allowed below one, but this
5.4. Restrictions and limitations 37
happened for some clips even at bitrates of 300 kb. This problem was really apparent in
bitrates around 150 were even negative PEVQ values could occur. The problem was partially
solved by adjusting averages of categories and estimations up to one and removing clips that
had received negative PEVQ scores.
The algorithm that used all of the FPD, MLR and LR methods in combination resulted
in the highest RMSE and error reduction and thus the best MOS estimation.
The most cost efficient algorithm is the one only using the regression methods. The
increased precision were almost as good as with the algorithm using all methods but this al-
gorithm used much less time and data space since the costly work of building and comparing
patterns were removed.
Even further optimized were to directly model the regression parameters. Since the
parameters showed such clear behavior with the bit rate the modeled parameters did not
differ much from the regressed ones. (Unless really extreme values of bit rate were used,
see Appendix A). However even for extreme values the modeled parameters still kept inside
reasonable values of the same tenth exponent. Efficiency were improved since the need to
do new regressions for each bit rate were avoided.
5.4 Restrictions and limitations
For really high bitrates (up and above 1000kb/s) the algorithms seem to loose their effec-
tiveness. There may be a limit to how high bitrates the algorithms can be used in. This
is probably a result of fewer characteristics in the packet statistics with increased bitrates
and the difference in packet sizes and video frames becomes less apparent. This is not a
problem for the resolution QVGA used in this thesis, since bitrates usually doesnt reach
above 400kb/s.
5.5 Future work
There is a question about how the model works when packet loss is introduced. There
is a risk that patterns and statistics could be affected to a degree of not being able to
distinguish differences between patterns in categories or make correct regressions. However
more testing and understanding of the existing model needs to be done before an evaluation
of any addition of new algorithms can be carried out.
38 Chapter 5. Conclusions
Acknowledgements
I would like to in particular thank my supervisor David Lindegren for all of his help with
this thesis work.
[2] IEEE signal processing journal, "IP-basedmobileandfixednetwork" audiovisual me-
dia services”, "http://ieeexplore.ieee.org/servlet/opac?punumber=79"
[3] ITU background, "2011-09-27,http://www.itu.int/en/ITU-T/publications/
publications/Pages/recs.aspx"
com12/sg12-q14.html"
[7] RTP description, 2011-09-27, "http://tools.ietf.org/html/rfc3550"
[8] Introduction to Data Compression, Second edition, Khalid Sayood, 2000, ISBN 1-
55860-558-4
[9] Multimedia for mobila system, Stockholm 5-6 november 2007, STF Ingenjorsutbildning
AB
800-199608-I/en"
1002/acp.731/abstract"
time-series-analysis/"
tutorial_icdm06.pdf"
general-linear-models"
toolbox/stats/regstats.html"
stats/mnrfit.html"
reshape.html"
html"
statistics-glossary/r/?button=0#RMS%20Error"
Modeled regression parameters
Figures of modeled and regressed values of regression parameters. The red line shows the
regressed values and the black line shows the modeled values.
Figure A.1: LR regression parameter
43
Figure A.2: Standard deviation of frame diff sequences
Figure A.3: Median of frame sequence
45
Figure A.5: Longest small period of frame difference sequences
46 Chapter A. Modeled regression parameters
Figure A.6: Longest large period of frame sequences
Figure A.7: Number of passes through median of frame sequences
47
Figure A.10: LOR category intercept3
Figure A.11: LOR category intercept4
49
Figure A.13: Variance of frame sequences
Figure A.14: Standard deviation of frame sequences
Figure A.15: Median of frame sequences
51
Figure A.17: Longest small period of frame difference sequences
Figure A.18: Longest small period of frame sequences
Appendix B
Videostreams ordered in quality
Figure B.1: A number of frame sequences with PEVQ values of around 0-2.8
53
54 Chapter B. Videostreams ordered in quality
Figure B.2: A number of frame sequences with PEVQ values of around 2.9-3.2
Figure B.3: A number of frame sequences with PEVQ values of around 3.2-5
Introduction
Ericsson
Tools utilized
Related Work
Time series matching

Date post:	04-Feb-2022
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Determining multimedia streaming content

Documents