The DELTA project has received funding from the EU’s Horizon 2020 research
and innovation programme under grant agreement No 773960
Project Acronym: DELTA
Project Full Title: Future tamper-proof Demand rEsponse framework through seLf-
configured, self-opTimized and collAborative virtual distributed
energy nodes
Grant Agreement: 773960
Project Duration: 36 months (01/05/2018 – 30/04/2021)
DELIVERABLE D3.3
DELTA Multi-factor Clustering Engine
Work Package WP3 – DELTA Fog Enabled Smart Metering at DR
consumer/prosumer nodes
Task T3.3 – Energy/Social Clustering for DELTA customers
Document Status: Final
File Name: [DELTA]_D3.3_Final
Due Date: 31.08.2020
Submission Date: September 2020
Lead Beneficiary: CERTH
Dissemination Level
Public X
Confidential, only for members of the Consortium (including the Commission Services)
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 2
Authors List
Leading Author
First Name Last Name Beneficiary Contact e-mail
Ioannis Koskinas CERTH [email protected]
Co-Author(s)
# First Name Last Name Beneficiary Contact e-mail
1 George Karagiannopoulos HIT g.karagiannopoulos@hit-
innovations.com
2 Apostolos Tsolakis CERTH [email protected]
Reviewers List
Reviewers
First Name Last Name Beneficiary Contact e-mail
Andrea Cimmino UPM [email protected]
Alexis Fragkoullides UCY [email protected]
Legal Disclaimer The DELTA has received funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 773960. The sole responsibility for the content of this publication
lies with the authors. It does not necessarily reflect the opinion of the Innovation and Networks Executive
Agency (INEA) or the European Commission (EC). INEA or the EC are not responsible for any use that
may be made of the information contained therein.
Copyright © DELTA. Copies of this publication – also of extracts thereof – may only be made with reference to the
publisher.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 3
Executive Summary
The last decade, the penetration of Renewable Energy Sources (RES) into the electricity supply in
conjunction with the deregulation of European Energy Markets has created many opportunities for
aggregators/retailers for further exploitation of their energy assets and energy savings. However, the
intermittent nature of RES and the participation of residential customers in the distributed energy generation
demands concrete Demand Response strategies that will expand the potential profits. This report describes
an approach of low/medium customers’ segmentation in a larger scale of groups with regard to their
Energy/Social Profile. The goal of this implementation is to provide support to other aggregator’s tools that
are responsible for designing and applying DR strategies. Energy Profile of customers is based on their
estimated flexibility during the day, while Social Profile is a metric that quantifies the social engagement
of users. Both of the profiles are combined in a sequential order providing meaningful insights to other
tools.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 4
Table of Contents
1. Introduction 7
1.1 Scope and objectives of the deliverable 7
1.2 Structure of the deliverable 7
1.3 Relation to Other Tasks and Deliverables 7
2. Clustering Small and Medium Customers Overview 8
2.1 Literature Review 8
2.1 DELTA Clustering Functional Overview 10
2.1.1 Basic functionalities 10
3. Energy Clustering 11
3.1 Energy Data 11
3.1.1 Data Pre-processing 12
3.2 Methodology 13
3.3 Energy Clustering Results 14
4. Social Clustering 20
4.1 Social Data 20
4.2 Social Engagement Definition Methodology 21
5. Energy/Social Clustering Interconnection 23
5.1 Interconnection 23
5.2 Assistive services towards OptiDVN 24
6. Conclusions 26
References 27
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 5
List of Figures
List of Tables
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 6
List of Acronyms and Abbreviations
Term Description
AP Affinity Propagation
DR Demand Response
EC Energy Clustering
SC Social Clustering
DVN DELTA Virtual Node
DNO Distribution Network Operator
FEID Fog Enabled Intelligent Device
IEEE Institute of Electrical and Electronics Engineers
SNR Signal to Noise Ratio
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 7
1. Introduction
1.1 Scope and objectives of the deliverable
This deliverable is associated with the Energy/Social Clustering of DELTA customers as it is described in
Task 3.3 of the DELTA project. It describes the methodology of the multi-factor clustering analysis, the
pre-processing procedure that is applied over Energy and Social Data of DELTA Customers, which are
equipped with Fog-Enabled Intelligent Devices (FEIDs) and the results of the Clustering algorithms.
The Energy/Social Clustering report focuses on the algorithmic approach that has been followed and the
features that are utilized in order to provide a straightforward grouping result. The features that are
considered as fundamental in this analysis are: load flexibility, positive or negative of each FEID combined
with the reliability metric that is related with a specific DR purpose. These features, in accordance with the
social features of the customers, yield meaningful insights to the DVN Multi Agent System. DVN from its
side is responsible to harness this information and manage its assets in the most efficient way. This approach
can be represented in a two-step process: Energy and Social Analysis in sequential order.
1.2 Structure of the deliverable
The work presented in this deliverable is structured as follows:
● Chapter 2 presents the literature review on the topic of Clustering Small and Medium Customers
● Chapter 3 provides information about the Energy Data that are taken into consideration from the
Energy clustering algorithm, the pre-processing of the data and the methodology that is applied to
group the customers.
● Chapter 4 provides information about the Social Data that are taken into consideration from the
Social clustering algorithm, the pre-processing of the data and the methodology that is applied to
group the customers.
● Chapter 5 describes the way that both of the former implementations are combined and provide
meaningful results.
1.3 Relation to Other Tasks and Deliverables
The results from the Social/Energy Clustering engine are directly associated with the DELTA Virtual Node
(DVN) Multi Agent System (MAS) as described in T3.2 and is documented in D3.2. The Social/Energy
Clustering engine is employed by the DVN Agent to inform about the clusters of FEIDs that have been
formed during the day in an hourly period and a statistical description of some meaningful features of each
cluster. These results can be exploited from the internal modules of the DVN that are responsible for the
selection of the participant FEIDs that will join the DR process and distribution of the DR demands among
these assets.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 8
2. Clustering Small and Medium Customers Overview
2.1 Literature Review
In recent years, the topic that is related with clustering of residential and medium customers through their
electricity demand time series profile or their social characteristics is an active area of research [1,2,5,6].
Many studies endeavoured to analyse the customers’ behaviour through their load profile [1,2], identifying
patterns that provide meaningful insights to the Distribution Network Operator (DNO) or the
Aggregator/Retailer in order to schedule price-based or incentive-based Demand Response (DR) signals.
The incorporation of low/medium assets in DR programs the latest years requires a low level analysis of
the customers’ reaction to DR signals [3,4].
There are a variety of approaches and pre-processing methodologies that have been applied in the literature
to analyse the load profile of each customer [18,19,20,21]. Many studies have focused on applying cluster
analysis, which is the process of classifying unsupervised data into a set of segments with high similarity
[5,6,7]. Because of high stochasticity of household-level demand and in general electricity demand profiles
of households, detailed analysis of socio-demographic characteristics and behavioural effects is required,
to define attributes of patterns in the households [8,9,10,11,22,23,24]. Load profile analysis can be applied
over individual residential data, aggregated residential measurements or non-residential buildings.
Typically, the analysis of low/medium energy customers’ behaviour depends on the intrinsic characteristics
of the household, such as, the members of the household, their age, their profession, their economic
prosperity, the building and several other factors that can affect the energy profile [26]. Despite the fact that
many studies highlight this dependency between energy behaviour variation and social parameters [27],
data availability and credibility issues lead research to modelling clusters’ social behaviour through
assumptions and guesses [2, 28, 29].
Determining the number of generated segments towards optimal coherence of points within the cluster and
maximum distinction from outer cluster points is a major concern of clustering algorithms [30], that is
estimated in [31] through the application of ant colony optimization algorithm combined with the optimal
theory, while [32] proposes an ensemble clustering methodology through the Hierarchical clustering and
partitioning clustering algorithm. Furthermore, load shape variability is a substantial feature that
encapsulates insightful information about the customers’ behaviour, as it is mentioned in [33] that adopts
the exploitation of cumulative consumptions rather than raw profiles focusing on the efficient estimation
of euclidean distance among them. Other studies [34] harness this indicator to form segments with regard
to the entropy of load shapes for individual households, thus, estimating the frequency of repetitive daily
load shape patterns. In [12], a frequency domain analysis of load consumption is proposed using Spectral
Clustering algorithm, while in [13,14] self-organizing maps and k-means algorithms are selected to identify
groups with high similarity based on extracted statistical metrics from energy time series, such as monthly
peak demand, daily mean consumption and other features.
Additionally, as it is referred to [7], stability of the clustering results can be affected by the temporal
resolution of the load consumption data, while most of the studies utilize 15, 30 or 60 minutes resolution in
accordance with the effectiveness of their experiments. Regarding the ways that is calculated the distance
between load consumption time-series data, many metrics have been proposed like Euclidean distance,
Manhattan distance, Shapelets and Dynamic time warping (DTW) [15,16]. However, the appropriate
selection is mutually connected with the methodology that will be applied in the pre-processing step
[15,16,17].
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 9
In terms of evaluation, although many evaluation metrics have been established to measure clustering
algorithms efficiency: Davies Boulding Index (DBI), Cluster Dispersion, Mean Index Adequacy (MIA),
Similarity Matrix Indicator (SMI) and Silhouette Index, it remains a challenging task that is highly
correlated with the problem’s nature [36]. Aforementioned evaluation metrics are based on a tradeoff
between the compactness and distinctness of clusters [36], whereas reliability issues are created in cases
of high bias towards outlier points and inadequacy of the formulated metrics to penalize noisy clusters [37].
Regarding the social engagement research, it is described as an emotional connection between a company
and its customers focused on their participation in activities [22]. The key element to customer engagement
is knowledge exchange, so information and communication technologies provide immense opportunities
for organizations to exchange knowledge and engage with customers. According to [22], the engagement
of the user is measured based on the actions he performs inside the platform. Similar strategy is also
proposed in [23], where preferences, comments, shares and other clicks are taken into consideration. Based
on [24], engagement is split into four different components. Integrating data into those components, it is
possible to build engagement profiles and aggregated descriptions of the engagement that each customer
exhibits.
The following table presents information about some reviewed papers that were taken into consideration in
order to develop our methodology.
Table 1. Overview of literature findings on clustering of small and medium customers.
Study Building Year Data
Resolution
Customers Domain
Motlagh, Omid, et al. [1] Residential 2019 15 minutes 7000 Time
Waczowicz, Simon, et al. [3] Residential 2018 - - Time
Benítez, Ignacio, et al [11] Residential 2014 60 minutes 759 Time
H. Hino, H. Shen, N. Murata,
S. Wakao and Y. Hayashi
[14]
Residential 2013 60 minutes 500 Time
Zhong, Shiyin, and Kwa-Sur
Tam [16]
Residential
2015
60 minutes 653 Frequency
H. Cao, C. Beckel and T.
Staake [17] Residential 2013 30 minutes 4000 Time
Jiang, Zigui, et al [18] Residential 2019 15 minutes 1168 Frequency
Auder, Benjamin, et al [19] Residential 2018 - - Time
K. Mets, F. Depuydt and C.
Develder [20]
Residential,
Small
Businesses
2016 15 minutes 244 Frequency
R. Al-Otaibi, N. Jin, T.
Wilcox and P. Flach [21] Residential 2016 30 minutes 5000 Time
Wang, Ning, and Chungu Lu
[25] - 2010 - - Frequency
Yao, Runming, and Koen
Steemers [27] Residential 2005 30 minutes 1300 Time
Flath, Christoph, et al [28] Residential 2012 15 minutes 215 Time
Piao, Minghao, et al [34] Residential 2014 - - Time
Mcloughlin, Fintan, et al. [35] Residential 2013 - - Time
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 10
2.1 DELTA Clustering Functional Overview
2.1.1 Basic functionalities
Management and exploitation of Energy Assets in Demand Response (DR) programs require sufficient
assistive tools. This report focuses on the design of an Energy/Social Clustering engine that acts as an
ancillary service and is capable of dynamically arranging DELTA customers in groups with regard to their
Energy and Social characteristics. The clustering results in conjunction with the statistical analysis of each
group facilitate the development of efficient DR strategies in the direction of reduced emissions and cost-
efficiency. Energy/Social Clustering engine comprises two separate clustering implementations: Energy
and Social that are combined to serve as an undivided tool.
As a result, the clustering will produce suitable configurations to meet the energetic needs required.
Techniques that this component implements will be fed by means of the data retrieved through the sub-
component Energy Portfolio Segmentation and Classification in the DELTA Aggregator that establishes
the DVN Clusters. In addition, data from the Consumer/Prosumer Flexibility Data Monitoring and Profiling
and incoming DR signals are used to compute the customer clusters.
Following the logical and deployment views as presented in D3.1.
Figure 1. Logical view of the Consumer/Prosumer Clustering module of the DELTA Virtual Node
Figure 2. Component diagram of the Consumer/Prosumer Clustering illustrating its
interconnections with the DVN’s components.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 11
3. Energy Clustering
Social/Energy Clustering is a two-step procedure that includes both energy and social analysis. The
individual role of Energy Clustering concerns the identification of patterns in groups of FEIDs with
common flexibility behaviour during the day, having as main objective the facilitation of DVN Multi Agent
to manage and select the assets that are the most valuable DR participants. The data granularity is one
minute and for each hourly period the FEIDs are reallocated in the appropriate group.
3.1 Energy Data
Αs T4.2 activities handle a similar aspect but at a higher level, segmentation and clustering techniques
based on consumption/load profiles has been extensively explored with both static and dynamic features.
D4.2 examined the segmentation topic at Aggregator level towards creating the DVNs, utilizing aggregated
consumption values as one of the fundamental features. A similar applied approach to the specific tool
(energy clustering within the DVN) was assessed as inadequate in terms of extracting further information
about the potential contribution of our energy portfolio in a possible DR signal. Extracting more elaborated
Energy Profiles from the consumption metric is an approach that could provide added-value to energy tools
that concern grid stability and prevention of network imbalances. However, this report focuses on the
facilitation of DR services in relevant programs, thus, the Flexibility metric as it is defined in DELTA.
Combined with the reliability of the DELTA customers compose the proposed characteristics that need to
be examined in order to point out the potential participation of our energy assets, and further facilitate the
appropriate selection process of the Optimal Dispatch engine within each DVN (OptiDVN).
The proposed approach in the DVN’s Energy Analysis is applied through the exploitation of positive and
negative flexibility measurements of each FEID (as depicted in Figure 3). As positive flexibility, is
considered the potential of each FEID to raise its power flow which is related with upwards DR signals,
while as negative flexibility is considered the potential of each FEID to drop its power flow which is related
with downwards DR signals. In terms of flexibility estimation, it is applied from an engine that is
implemented in DELTA as an independent tool. An additional metric that affects the clustering analysis is
the reliability metric that is estimated from DVN and adapts its value according to FEID’s contribution to
the DR signals that have been received. Data source originates either from historical or forecasted
measurements, as the approach supports both the options. The methodology produces two independent
Energy profiles of FEIDs in terms of the nature of the DR signal (upwards, downwards). As a result, in case
of downwards DR the tool exploits information about the downwards flexibility, whereas in case of
upwards DR, upwards flexibility is utilized.
Figure 3. Upwards and Downwards Flexibility in consumption measurements.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 12
3.1.1 Data Pre-processing
Data pre-processing is the primary step of Energy data analysis and consists of independent tasks like
isolation of outliers, estimation of real flexibility, data normalization, and data transformation to frequency
domain. As far as the outlier removal step, measurements that deviate from the baseline behaviour are
identified and removed through an SNR indicator that validates the fact that the power signal is increased
proportionally with the noise. Regarding the real flexibility calculation, it is achieved with the incorporation
of a reliability metric that is considered as a correction factor according to the following equation.
𝑅𝑒𝑎𝑙𝐹𝑙𝑒𝑥 = 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑𝐹𝑙𝑒𝑥 × 𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Afterwards, the calculated baseline real flexibility is divided in 24 hourly periods and the clustering
algorithm examines them independently. The following image displays the data segmentation and outlier
removal tasks.
Figure 4. Pre-processing step of Energy Clustering
The following step contains the standardization of the data in a
scale of (-1, 1) and then the transformation of time-series data to
frequency domain. This transformation is achieved with the
application of Continuous Wavelet Transform (CWT) method
[25]. The wavelet function that has been utilized for this
transformation is Mexican Hat wavelet. The fundamental
advantage of CWT compared to Fast Fourier Transform is the
capability to construct time-frequency representations of a
signal that offers exceptional time and frequency localization.
Finally, CWT method is efficient in transformations of non-
stationary signals preserving time dimension properties. The
following image displays a short overview of the
aforementioned tasks.
Figure 5. Mexican Hat Wavelet [25]
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 13
3.2 Methodology
The Affinity Propagation (AP) algorithm has been selected to retrieve the groups of FEIDs that share
common flexibility characteristics during the hourly time periods. AP algorithm receives as input a
similarity matrix that contains the distance of each raw data towards the others. The proposed
implementation estimates kernel similarities in higher dimensions in order to identify non-linear
correlations between the real flexibility measurements. One of the advantages of this algorithm is
considered its property to detect the number of clusters autonomously. A crucial parameter that affects this
functionality is preference that configures the number of exemplars.
Moreover, the proposed approach incorporates the adaptive mode of the algorithm’s parameters in terms of
its efficiency. Therefore, in case of reduced credibility of the latest DR responses, the parameters of the
algorithm that affect the way that the number of clusters is estimated, are re-configured in order to achieve
optimal results reallocating the FEIDs in different groups.
Figure 7. Energy Clustering methodology
3.2.1 Affinity Propagation (AP) Algorithm
Frey and Dueck introduced in 2007 [38] a new clustering algorithm that detects a sample of representative
examples in order to process signals and identify patterns in data. Since then, the last decade, several studies
examined Energy Profile Clustering through the utilization of the AP algorithm [39, 40]. It is an iterative
algorithm that selects a random subset of raw data, estimates similarities and finally through the exchange
of information between these pairs of data points identifies a set of “exemplars” and the corresponding
clusters. The initialization of the sampling pool affects the efficiency of the algorithm. AP has been initially
assessed in images, text and biology fields, achieving to detect clusters with remarkable efficiency and
Figure 6. Transformation to frequency domain.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 14
effectiveness compared to other algorithms. Similarity, Availability and Reliability are three metrics that
are combined in the iterative process in order to emerge discrete segments of data.
Similarity metric s(i,k) reflects the distance between a pair of points. For points xi,xk
s(i,k) = -||xi-xk||2
Except from the euclidean distance, the similarity metric can be configured regarding the nature of the
problem. One of the advantages of the AP algorithm is its ability to identify the number of clusters based
on this pre estimated similarity matrix and a parameter “preferences” that is dependent on s(k,k).
Data points share two types of messages: Responsibility r(i,k)information and Availability information
a(i,k). Responsibility messages sent from i to k reflects the appropriateness of a data
point with index i to be the exemplar for data point k, considering and the relationship with other potential
exemplars.
Availability metric is estimated through the sum of r(k,k) - that represents a “self-responsibility” ratio that
indicates accumulated responsibility as evidence for appropriate exemplar selection - plus the sum of
positive responsibility measurements reported from other points i'.
All this exchange of data among the data points is reflected from the Criterion matrix c(i,k) that is estimated
as the sum of responsibility and availability matrix. The highest criterion values of each row is considered
as exemplar while rows that share the same criterion values belong to the same cluster.
3.3 Energy Clustering Results
Energy clustering approach as an independent implementation has to be examined in terms of its efficiency
to distinguish the groups of FEIDs that share common flexibility behaviour in specified hourly time periods.
Regarding the objective of the DR, upwards or downwards oriented, the algorithm examines the respective
data (upwards flexibility, downwards flexibility). The incorporation of social clustering in the following
section 5 will expand the study results, creating connections between Energy clustering and social
engagement.
In terms of the validation of the clustering results, Silhouette score is estimated to examine the cohesion
between points inside a cluster and the distance with neighbour points of different clusters.
The equation that represents this relation is
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 15
where a(i) is the mean distance d(i) of all points within the same cluster
and b(i) is the mean distance of point i compared to all points of a neighbour cluster.
The silhouette score ranges from -1 to 1 where a value near 1 reflects the points within the clusters that
have high cohesion and great distance from other clusters, while negative values depict faulty clustering
situations. The following image displays indicative results of the silhouette score for each hour during the
day in conducted experiments through the DELTA platform with more than 10 thousand virtual FEIDs.
This ratio can also be utilized as a correction factor that adapts the number of clusters in case of reduced
DR efficiency.
The latest step of the Energy Clustering methodology is the extraction of fundamental statistical features
from its cluster, like: mean value, variance and slope ratio. These statistical measurements can provide
insights to DVN about the behaviour of each group of FEIDs in the direction of optimal assets DR
participation. The final output of the Energy clustering tool includes details (FEIDs, statistical
measurements) about each cluster for a specific time period as it is presented below:
{
"clusterID": "up_2020-05-06 13:12:40.245109_0_0",
"cluster_direction": "ClusterDirection.up",
"end_date": "2020-05-06 01:00:00",
"feidIDs": "['feidID101', 'feidID111']",
"mean_value": "7.36",
"powerFlowTimestamp": "2020-05-06T13:12:40Z",
"start_date": "2020-05-06 00:00:00",
"variance": "6.63",
"slope": "0.73",
}
In the section 5.2 are described potential DR scenarios that highlight the impact of statistical feature
extraction process in selecting participant assets. The following image displays the estimated silhouette
scores of the segmentation process during one day for 24 individual hourly periods. As it is observed, for
each hourly period, the efficiency of the algorithm varies from 0.4 to 0.65. However, the score remains
adequate to distinguish energy profiles segments. It is worth mentioning that the segmentation score
depends on the nature of the data and it is not always feasible to identify different behaviours efficiently.
Adjusting the number of clusters can lead to improved scores and is applied as correction action in case of
low clustering score.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 16
Figure 8. Silhouette Score of each hourly period during the day.
The following images display different Energy Profiles in terms of flexibility during the hourly period
15:00-17:00 pm as they are identified from the clustering engine through DVN2 in DELTA platform.
Upwards flexibility and Downwards flexibility are examined individually as they describe independent
metrics. DVN2 contains virtual FEIDs and these measurements are synthetic data, however, it is discernible
that similar Energy Profiles are matched and each one can potentially serve a specific DR purpose. In the
time period 15:00-16:00, the algorithm identifies four clusters with regard to Upwards flexibility. The
clusters that are illustrated from the left images could be described from a horizontal slope, while the cluster
in the down right image comprises FEIDs with negligible upwards flexibility. Finally, the upper right cluster
consists of FEIDs with declining trend.
Figure 9. Upwards Flexibility Clustering 15:00-16:00 - Different Energy Profiles.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 17
On the other side, in the subsequent time period 16:00-17:00, the algorithm detects 6 clusters, as they are
illustrated in figure 9. The group in the down right image is the most inactive, while the FEID110 that was
part of the same group in the previous hour, seems to be transferred in a more active segment as it surges
its flexibility value at about 16:20. FEID106 is selected as the only participant of a cluster as its flexibility
behaviour deviates from the rest of the assets.
Figure 10. Upwards Flexibility Clustering 16:00-17:00 - Different Energy Profiles
Regarding the analysis of the clustering results related to the Downwards Flexibility in the examined time
period 19:00-20:00, it is observed that four segments of FEIDs with different flexibility behaviour have
been identified. Upper left image displays a flat (small slope) oscillation of the flexibility measurement near
100W, while down left and right images have a similar fluctuation but with different magnitudes of values.
Finally, the upper right segment’s flexibility declines after 19:35.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 18
Figure 11. Downwards Flexibility Clustering 19:00-20:00 - Different Energy Profiles.
Accordingly, in the period between 20:00 and 21:00, the left images seem to raise their measurements
gradually, while the upper right image declines at 20:30 and then raises back again at 20:50. The lower
segment seems to differentiate its behaviour compared to others, as it reaches more than 1000W at 20:20
and then falls down progressively to 250W.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 19
Figure 12. Downwards Flexibility Clustering 20:00-21:00 - Different Energy Profiles.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 20
4. Social Clustering Social Clustering concerns the formulation of a social engagement metric and the identification of groups
of FEIDs with common social activity in DELTA forum. Participation, responsiveness, reliability and
activity are some of the parameters that are taken into consideration in order to estimate the Social
Engagement Metric and cluster the users into groups with common Social behaviour. The following
sections describe the nature of our data, the definition of Social Engagement Metric and some indicative
Social clustering results.
4.1 Social Data
The information that is taken into account for the social analysis of DELTA users originates from DELTA
forum and DR participation. DELTA records data about the Users’ participation and responsiveness in
topics, their intents to create topics and to provide helpful comments in other topics.
The parameters that are taken into consideration for the estimation of Social Engagement are the reliability
metric of the users that is estimated from the DVN Agent system and the number of actions that a user
performs within the forum. The second one can be further divided into the participation of users in the
forum and their responsiveness in DR events. The actions that indicate the activity of a user are the
following:
● Creating a question in the forum
● Answering to a topic in the forum
● Accepting/rejecting a DR event
The DELTA Forum is provided through a user-friendly UI that each customer is able to visit in the Forum
section, as it is displayed in Figure 14. DELTA Forum functionalities will be described in more detail in
D6.2
Figure 13. Delta Forum - Topics UI
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 21
4.2 Social Engagement Definition Methodology
In order to measure what's the extent of a user’s social engagement in the DELTA forum, a new metric
has been defined regarding the collected data as they are described in the previous section.
Our metric can now be defined as:
SocialEngagement user = 0.5 * Reliability user + 0.5 * Actions user
= 0.5 * Reliability user + 0.5 * [0.5 * (DR Events user) + 0.5 * (Social actions)]
= 0.5 * Reliability user + 0.5 * [0.5 * (DR Events user) + 0.5 * (0.60 * answers user + 0.40 * topics user)]
Where:
DR Events user = (number of DR Events the user answered) / (total number of DR events of user)
answers users = (∑𝑡𝑜𝑝𝑖𝑐𝑠𝑘=0
𝑟𝑒𝑝𝑙𝑖𝑒𝑠 𝑜𝑓 𝑢𝑠𝑒𝑟
𝑡𝑜𝑡𝑎𝑙 𝑟𝑒𝑝𝑙𝑖𝑒𝑠 𝑜𝑓 𝑡𝑜𝑝𝑖𝑐)/topics
topics user = (topics of user) / (total topics)
Social Engagement metric in the DELTA forum indicates, not only their interest to express a question, but
also their intentions to provide help to other members using the platform. For that reason we emphasize the
“ answers user ” metric in the above equation that describes the general participation of users in the topics
and its value ranges from 0 to 100.
Indicative results from the estimation of the social engagement metric towards the FEIDs of the DVN1 is
depicted in the following image. The assessment of social involvement of FEIDs in the DVN1 reflects the
absence of FEID107 from social interaction in DELTA Forum, whereas FEID105 is the social active
customer of our portfolio.
Figure 14. Delta Forum - Submit Topic UI.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 22
Figure 15. Social Engagement assessment of FEIDs in DVN1.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 23
5. Energy/Social Clustering Interconnection Energy and Social Clustering implementations are two individual approaches that each one focuses on the
respective field. Because of the difficulty to identify a distance metric that correlates Load Profile and
Social Profile as two independent entities and the lack of meaningful results in case of fusion of social and
energy data our approach proposes the sequential connection of these two tools as a two-stages clustering.
5.1 Interconnection
Energy Clustering as an individual implementation detects groups of customers with similar flexibility
profiles. Therefore, social clustering has the opportunity to focus on an energy group with specific energy
properties during a Demand Response period identifying patterns between these two categories. The
discretization of these two clustering methods extends the aggregator’s possibilities to manage its assets
and at the same time Energy/Social Clustering provides more insightful and explainable results. The
following image displays the linkage between the two types of clustering methods, where Energy Clustering
identifies N clusters with different Energy profiles while Social Clustering method contributes to the further
segmentation of the former clusters through information related to their social engagement characteristics
coming from DELTA forum. The final clusters, as they have been formed, are fully explainable and contain
information from both the social and energy.
Figure 16. Energy and Social Clustering interconnection.
Connection between social engagement metric and energy clusters is reflected in the following figure,
where the OptiDVN tool has to select the assets that belong to the appropriate energy cluster for a specific
DR incorporating the knowledge from social activity in DELTA forum. High social engaged FEIDs
combined with DR inclined behaviour can potentially increase the possibilities for successful DR signals.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 24
Figure 17. Example of Social and Energy Clusters assigned to FEIDs.
Figure 18. Scatter plot between social engagement value and energy cluster.
5.2 Assistive services towards OptiDVN
Except from grouping the FEIDs into larger scale entities Social/Energy Clustering provides OptiDVN, a
tool of DVN Agent (the optimal dispatch component of the DVN) that is described in D3.2 statistical results
for each established cluster. Mean value and variance and slope are indicative metrics that facilitate
OptiDVN to select the cluster of FEIDs that will participate in DR signals. For example, two simple
scenarios that can take place are:
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 25
● a cluster with low real flexibility mean value is not an efficient asset to serve DR signals with high
demands, whereas a cluster with high mean value, low variance and high Social engagement is the
optimal choice for the specific signal.
● two clusters with close mean flexibility values but with different slope metrics. In case that the DR
signal has increasing slope, the preferred cluster is the one that matches the DR slope.
As a result, the OptiDVN tool can adjust its choices according to DR signals, making optimal decisions and
selecting the clusters that meet its requirements.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 26
6. Conclusions This deliverable proposes a new clustering methodology based on Energy and Social characteristics of
customers as a two-stages implementation. Energy Clustering implementation proposes a temporal analysis
of the real flexibility measurement in the frequency domain through a Continuous Wavelet Transformation,
while Social Clustering approach proposes the definition of a Social engagement metric that exploits
information from DELTA forum and shares customers into groups on top of Energy segments. Although
both energy and social data contain useful insights from the aggregator's perspective about their
contribution to DR signals, they are examined as separate entities that are interconnected.
This proposed approach permits Aggregators to explore the DR acceptance ratio in regards to each one
factor as individual parameters, but as a combination of them as well. Conducted experiments in real and
virtual FEIDs, validated the existence of different groups of energy and social clusters during the day,
however there is need for further exploration of the connection between DR acceptance and social/energy
behaviour and the application in real households. Furthermore, as social aspects are rather challenging to
re-create in a simulated environment, the overall framework, and specifically the social clustering aspects
will be evaluated and validated through the DELTA pilots. Extracted evaluation metrics will be reported
within the evaluation report D7.3 which is due M36.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 27
References
[1] Motlagh, Omid, et al. “Clustering of Residential Electricity Customers Using Load Time Series.”
Applied Energy, vol. 237, 2019, pp. 11–24., doi: 10.1016/j.apenergy.2018.12.063.
[2] S. Haben, C. Singleton and P. Grindrod, "Analysis and Clustering of Residential Customers Energy
Behavioral Demand Using Smart Meter Data," in IEEE Transactions on Smart Grid, vol. 7, no. 1, pp.
136-144, Jan. 2016, doi: 10.1109/TSG.2015.2409786.
[3] Waczowicz, Simon, et al. “Demand Response Clustering.” Proceedings of the Ninth International
Conference on Future Energy Systems, 2018, doi:10.1145/3208903.3212049.
[4] S. Lin, F. Li, E. Tian, Y. Fu and D. Li, "Clustering Load Profiles for Demand Response Applications,"
in IEEE Transactions on Smart Grid, vol. 10, no. 2, pp. 1599-1607, March 2019, doi:
10.1109/TSG.2017.2773573.
[5] Fu, Xin, et al. “Clustering-Based Short-Term Load Forecasting for Residential Electricity under the
Increasing-Block Pricing Tariffs in China.” Energy, vol. 165, 2018, pp. 76–89.,
doi:10.1016/j.energy.2018.09.156.
[6] Gouveia, João Pedro, et al. “Daily Electricity Consumption Profiles from Smart Meters - Proxies of
Behavior for Space Heating and Cooling.” Energy, vol. 141, 2017, pp. 108–122.,
doi:10.1016/j.energy.2017.09.049.
[7] Ma, Zhenjun, et al. “A Variation Focused Cluster Analysis Strategy to Identify Typical Daily Heating
Load Profiles of Higher Education Buildings.” Energy, vol. 134, 2017, pp. 90–102.,
doi:10.1016/j.energy.2017.05.191.
[8] Swan, Lukas G., and V. Ismet Ugursal. “Modeling of End-Use Energy Consumption in the
Residential Sector: A Review of Modeling Techniques.” Renewable and Sustainable Energy
Reviews, vol. 13, no. 8, 2009, pp. 1819–1835., doi:10.1016/j.rser.2008.09.033.
[9] Torriti, Jacopo, et al. “Peak Residential Electricity Demand and Social Practices: Deriving Flexibility
and Greenhouse Gas Intensities from Time Use and Locational Data.” Indoor and Built Environment,
vol. 24, no. 7, 2015, pp. 891–912., doi:10.1177/1420326x15600776.
[10] Flett, Graeme, and Nick Kelly. “A Disaggregated, Probabilistic, High Resolution Method for
Assessment of Domestic Occupancy and Electrical Demand.” Energy and Buildings, vol. 140, 2017,
pp. 171–187., doi:10.1016/j.enbuild.2017.01.069.
[11] Benítez, Ignacio, et al. “Dynamic Clustering Segmentation Applied to Load Profiles of Energy
Consumption from Spanish Customers.” International Journal of Electrical Power & Energy Systems,
vol. 55, 2014, pp. 437–448., doi:10.1016/j.ijepes.2013.09.022.
[12] Chicco, Gianfranco. “Overview and Performance Assessment of the Clustering Methods for
Electrical Load Pattern Grouping.” Energy, vol. 42, no. 1, 2012, pp. 68–80.,
doi:10.1016/j.energy.2011.12.031.https://doi.org/10.1016/j.energy.2011.12.031
[13] Liao, T. Warren. “Clustering of Time Series Data—a Survey.” Pattern Recognition, vol. 38, no. 11,
2005, pp. 1857–1874., doi:10.1016/j.patcog.2005.01.025.
[14] H. Hino, H. Shen, N. Murata, S. Wakao and Y. Hayashi, "A Versatile Clustering Method for
Electricity Consumption Pattern Analysis in Households," in IEEE Transactions on Smart Grid, vol.
4, no. 2, pp. 1048-1057, June 2013, doi: 10.1109/TSG.2013.2240319.
[15] Ding, Rui, et al. “Yading.” Proceedings of the VLDB Endowment, vol. 8, no. 5, 2015, pp. 473–484.,
doi:10.14778/2735479.2735481..
[16] Zhong, Shiyin, and Kwa-Sur Tam. “Hierarchical Classification of Load Profiles Based on Their
Characteristic Attributes in Frequency Domain.” IEEE Transactions on Power Systems, vol. 30, no.
5, 2015, pp. 2434–2441., doi:10.1109/tpwrs.2014.2362492.
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 28
[17] H. Cao, C. Beckel and T. Staake, "Are domestic load profiles stable over time? An attempt to identify
target households for demand side management campaigns," IECON 2013 - 39th Annual Conference
of the IEEE Industrial Electronics Society, Vienna, 2013, pp. 4733-4738, doi:
10.1109/IECON.2013.6699900.
[18] Jiang, Zigui, et al. “A Fused Load Curve Clustering Algorithm Based on Wavelet Transform.” IEEE
Transactions on Industrial Informatics, vol. 14, no. 5, 2018, pp. 1856–1865.,
doi:10.1109/tii.2017.2769450.
[19] Auder, Benjamin, et al. “Scalable Clustering of Individual Electrical Curves for Profiling and
Bottom-Up Forecasting.” Energies, vol. 11, no. 7, 2018, p. 1893., doi:10.3390/en11071893.
[20] K. Mets, F. Depuydt and C. Develder, "Two-Stage Load Pattern Clustering Using Fast Wavelet
Transformation," in IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2250-2259, Sept. 2016, doi:
10.1109/TSG.2015.2446935.
[21] R. Al-Otaibi, N. Jin, T. Wilcox and P. Flach, "Feature Construction and Calibration for Clustering
Daily Load Curves from Smart-Meter Data," in IEEE Transactions on Industrial Informatics, vol. 12,
no. 2, pp. 645-654, April 2016, doi: 10.1109/TII.2016.2528819.
[22] Muñoz-Expósito, Miriam, et al. “How to Measure Engagement in Twitter: Advancing a Metric.”
Internet Research, vol. 27, no. 5, 2017, pp. 1122–1148., doi:10.1108/intr-06-2016-0170
[23] Oviedo-García, Mª Ángeles, et al. “Metric Proposal for Customer Engagement in Facebook.” Journal
of Research in Interactive Marketing, vol. 8, no. 4, 2014, pp. 327–344., doi:10.1108/jrim-05-2014-
0028.
[24] Rahman, Zoha, et al. “Fanpage Metrics Analysis. ‘Study on Content Engagement.’” 2016,
doi:10.1063/1.4960928.
[25] Wang, Ning, and Chungu Lu. “Two-Dimensional Continuous Wavelet Analysis and Its Application
to Meteorological Data.” Journal of Atmospheric and Oceanic Technology, vol. 27, no. 4, 2010, pp.
652–666., doi:10.1175/2009jtecha1338.1.
[26] Walker, C.f., and J.l. Pokoski. “Residential Load Shape Modelling Based on Customer Behavior.”
IEEE Transactions on Power Apparatus and Systems, PAS-104, no. 7, 1985, pp. 1703–1711.,
doi:10.1109/tpas.1985.319202.
[27] Yao, Runming, and Koen Steemers. “A Method of Formulating Energy Load Profile for Domestic
Buildings in the UK.” Energy and Buildings, vol. 37, no. 6, 2005, pp. 663–671.,
doi:10.1016/j.enbuild.2004.09.007.
[28] Flath, Christoph, et al. “Cluster Analysis of Smart Metering Data.” Business & Information Systems
Engineering, vol. 4, no. 1, 2012, pp. 31–39., doi:10.1007/s12599-011-0201-5.
[29] Kwac, Jungsuk, et al. “Lifestyle Segmentation Based on Energy Consumption Data.” IEEE
Transactions on Smart Grid, vol. 9, no. 4, 2018, pp. 2409–2418., doi:10.1109/tsg.2016.2611600.
[30] Patil, Channamma, and Ishwar Baidari. “Estimating the Optimal Number of Clusters k in a Dataset
Using Data Depth.” Data Science and Engineering, vol. 4, no. 2, 2019, pp. 132–140.,
doi:10.1007/s41019-019-0091-y.
[31] Chicco, Gianfranco. “Overview and Performance Assessment of the Clustering Methods for
Electrical Load Pattern Grouping.” Energy, vol. 42, no. 1, 2012, pp. 68–80.,
doi:10.1016/j.energy.2011.12.031
[32] B. Zhang, C. Zhuang and J. Hu, "Ensemble clustering algorithm combined with dimension reduction
techniques for power load profiles", Proc. CSEE, vol. 35, no. 15, pp. 3741-3749, Feb. 2015.
[33] Satre-Meloy, Aven, et al. “Cluster Analysis and Prediction of Residential Peak Demand Profiles
Using Occupant Activity Data.” Applied Energy, vol. 260, 2020, p. 114246.,
doi:10.1016/j.apenergy.2019.114
H2020 Grant Agreement Number: 773960 Document ID: WP3 / D3.3
Page 29
[34] Piao, Minghao, et al. “Subspace Projection Method Based Clustering Analysis in Load Profiling.”
IEEE Transactions on Power Systems, vol. 29, no. 6, 2014, pp. 2628–2635.,
doi:10.1109/tpwrs.2014.2309697
[35] Mcloughlin, Fintan, et al. “Evaluation of Time Series Techniques to Characterise Domestic
Electricity Demand.” Energy, vol. 50, 2013, pp. 120–130., doi:10.1016/j.energy.2012.11.048.
[36] Ling Jin et al. “Comparison of Clustering Techniques for Residential Energy Behavior Using Smart
Meter Data”. In: AAAI Work. Artif. Intell. Smart Grids Smart Build. 2017, pp. 260–266.
[37] Hassani, Marwan, and Thomas Seidl. “Using Internal Evaluation Measures to Validate the Quality
of Diverse Stream Clustering Algorithms.” Vietnam Journal of Computer Science, vol. 4, no. 3, 2016,
pp. 171–183., doi:10.1007/s40595-016-0086-9.
[38] Frey, B. J., and D. Dueck. “Clustering by Passing Messages Between Data Points.” Science, vol. 315,
no. 5814, 2007, pp. 972–976., doi:10.1126/science.1136800.
[39] Zarabie, Ahmad Khaled, et al. “Load Profile Based Electricity Consumer Clustering Using Affinity
Propagation.” 2019 IEEE International Conference on Electro Information Technology (EIT), 2019,
doi:10.1109/eit.2019.8833693.
[40] Jin, Yu, and Zhongqin Bi. “Power Load Curve Clustering Algorithm Using Fast Dynamic Time
Warping and Affinity Propagation.” 2018 5th International Conference on Systems and Informatics
(ICSAI), 2018, doi:10.1109/icsai.2018.8599336.