Date post: | 23-Dec-2014 |
Category: |
Technology |
Upload: | despina-stamkou |
View: | 371 times |
Download: | 4 times |
Clustering Internet users based on their behavior towards banner ads
Desp ina S [email protected]
14 Feb 2011
I n t roduc t i on
Theore t i ca l Background
Method
Resu l t s
Ana lys i s
Conc lus ions
Fu tu re Work
Agenda
M a r ke t i n g i s a n exc h a n g e p ro c e s s o f v a l u e s b e t w e e n c o m p a n i e s a n d c u s t o m e r s
(Philip, Armstrong, Wong and Saunders, 2010)
Online Marketing
[2nd position on Advertisement Investment](Orbit Scripts, 2011)
In t roduct ion: : Background
Online Advertisements are promoted through Web Sites (Publishers)
The goal is to motivate the internet users to click on the online advertisements
Users with similar profiles click on similar online advertisements(Giuffrida et al. 2001)
Users are more likely to click on personalised advertisements compared to non-personalised ads (automatic optimisation)
In t roduct ion : : Background
AdNetwork
Advertisement 1
Advertisement 2
Advertisement 3
Advertisement N
Advertisement Placement
publisher
automatic optimisation mechanism
…
1
2
3
4
56
7
In t roduct ion : : Background
Automatic Optimisation Mechanism for personalised online advertisements
Web Site
Company between
publishers and clients
Client’s Advertisement
s
In t roduct ion : : Prob lem Statement
ProblemAdNetworks need to develop an intelligent automatic optimisation logic
To keep a competent position in the online marketing business area
GoalEvaluate well known grouping algorithms
To use the best performing one for the automatic optimisation logic
PurposeTo prove that the performance success of the dominant algorithm is data-independent
In t roduct ion : : Method & Mater ia l
Literature Study Background Knowledge on clustering Identify algorithms with significant clustering performance
Empirical Part Compare the identified algorithms
In t roduct ion : : S ign ifi cance
Automatic optimisation can increase the revenues of an AdNetwork
The thesis topic is part of the automatic optimisation project in Tradedoubler and will use data from the specific AdNetwork
Each Adnetwork has different data but can benefit from the conclusions
The conclusions will reinforce the data-independence of the dominant clustering algorithm
In t roduct ion : : L imi tat ions
Only two clustering algorithms are examined
The number of clusters are predefined
Data set has a specific dimensionality and is not publicly available
Data set represent an instance of the user’s behaviour for a specific period
Theoret ica l Background : : C lass ifi cat ion vs C luster ing
Data mining is the process of discovering knowledge from data sources (Bing Liu, 2006)
Supervised Classification ( Classification)We know the class labels and the number of classes
1.dark blue
2.light green
3.dark orange
n. pink
…
Unsupervised Classification ( Clustering)We do not know the class labels and may not know the number of classes
2. ??? 3. ??? ?. ???
…1. ???
Groups users with similar characteristics Opportunity to predict future
actions
Groups users with the exact same characteristics
Impossible to predict future actions
Theoret ica l Background : : Se lect ing the c luster ing method
C l u s t e r i n g
E x c l u s i v eN o n -
E x c l u s i v e
H i e r a r c h i c a lP a r t i t i o n a l
D i v i s i v eA g g l o m e r a t i v e
Data object belong to one or more clusters
Data object belong to only one cluster
Theoret ica l Background : : Re lated Research
Most recent related studies were selected to be examined (2011)
These studies aimed to compare the clustering performance between the best performing algorithms from past related studies
K-means algorithm was used as a base line
The algorithms were examined with a predefined number of clusters
The performance measurement was applied through a fitness function
Theoret ica l Background : : Se lect ing the a lgor i thms
Particle Swarm Optimisation (PSO) & K-means
K-means as a base line
PSO because it outperformed the rest of the clustering algorithms
Limited studies around PSO
Interesting to evaluate PSO performance with the available data set from Tradedoubler and reinforce the data-independence
Method : : Data Se lect ion
Data set consists of real transactions within Tradedoubler’s AdNetwork
254.046 rows
Sampling by time period – 1 month
information columns:
PROGRAM_ID ID of the Campaign where the banner belongsWEBSITE_ID ID of Website from where the action was generatedBANNER_ID ID of the banner with which the user interactedEVENT_ID ID of the event: Click or SaleUSER_AGENT Visitors’ web browser agent and Operating SystemTIMESTAMP Time the transaction was made
AdvertisementCampaign info
Internet user info
Method : : Eva luat ion Cr i te r ia
Clustering evaluation is a complex and difficult problem (Liu, 2006)
Types of evaluation External
With readable and meaningful data -without numbers
Indirect With an external application which will test the results
Internal With any distance comparison function
Method : : Fi tness Funct ion
The fitness function that will be used will provide the summary value of the maximum distance of each cluster from a data object :
The smaller the value of the summary, the better the clustering algorithm performs
Hypothetical representation of clusters and vectors in a two dimension
is the maximum distance between a centroid and a data vector
Method : : A l te rnat ive Fi tness Funct ion
Summary value of average distance between the centroid and the data vectors
Summary value of minimum distance between data objects that belong to different clusters
The selected for this study fitness function has been used from relative researches for the same purpose and with the same algorithms, as the current study, and therefore was preferred among the alternatives
Resu l ts : : Methods Too ls and Time
Programs developed in Perl and parameterized for the multidimensional data set
Both algorithms ran for 10 different values of K; 5, 10, 15, 20, 25, 30, 35, 40, 45 and 50
The operating system Linux UbuntuHardware characteristics : RAM: 3GB, processor: Intel Core Duo at 2,26GHz.
Execution time between the algorithms was approximately 1:4; K-mean ran in total for 1,5 hours and PSO for 7 hours
Resu l ts: : Per formance Chart
Analys is : : Per formance Compar ison
PSO >> K-means Why?
Both algorithms calculate the next position of the clusters and continuously moving them within the search space until there is no change on their position but…
…PSO evaluates each next position in the space by using an internal fitness method
…This method keeps a memory of the previous fitness value of each cluster and compares it with the fitness of the new position
…Then a decision is made if the new position should be kept or return the cluster to the previous one
Analys is : : S imi lar i ty Eva luat ion
Through a basic external evaluation from a small sample of data vectors similarities were traced so as to prove the concept of having grouped homogeneous users within the same clusters
Even though it was discussed that external will not be used as argument for the final conclusions, it can yet provide us with confidence of having properly developed the clustering algorithms
Analys is : : L imi tat ions
Fitness Function is the main evaluation method Combined with indirect evaluation would give more accurate conclusions
Fitness was measured for a defined number of clusters Hypothetically PSO would continue performing well in a higher number of K.
Yet this is not proved through the experiments
The basic external evaluation should not be taken as a criterion for the performance of the algorithms; rather, to guarantee that the development of the algorithms is more likely correct
Conc lus ions
The experiments reinforce the superiority of PSO in terms of performance despite the nature and the dimensionality of the data
Important fact : the data belong to real life transactions
Indication that the higher the value of clusters is, the better the resulting fitness for PSO This indicates additional process effort and memory use
The best number of clusters can be defined based on processing time and fitness
Future Work
Compare different hybrids of the PSO without predefined number of clusters
Develop the personalised mechanism to propose relevant advertisements
Users’ actions will define the performance : indirect method of evaluation
Subgroup 1Has seen Advertisement A
Subgroup 3Has seen Advertisement A and Advertisement BSubgroup 2Has seen Advertisement B
Show Advertisement B
Show Advertisement A
Show Advertisement from
neighbour cluster
Inside a Cluster :
Questions / Comments
Thank you!
References
Philip, K., Armstrong, G., Wong, V. and Saunders, J., 2010. Principles of Marketing, 5 th edition. New Jersey: Pearson Education, p.7
Giuffrida, G., Reforgiato, D., Tribulato, G. and Zabra, C. , 2001. A Banner Recommendation System Based on Web Navigation History. Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium, Paris
Liu, B., 2006. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Chicago:Springer, p.6