Date post: | 06-Apr-2018 |
Category: |
Documents |
Upload: | gurdeep-singh |
View: | 215 times |
Download: | 0 times |
of 36
8/3/2019 2__dt
1/36
Application of Decision-TreeInduction Techniques to
Personalized Advertisements onInternet Storefronts
Paper By:
Source:
International Journal of Electronic Commerce/ Spring 2001, VoL5, No.3, pp. 45-62. Copyright 2001 M.E. Sharpe, Inc. Allrights reserved.
Jong Woo Kim, Byung Hun Lee, Michael J. Shaw, Hsin-LuChang, Matthew Nelson
8/3/2019 2__dt
2/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
2
Paper Authors: JONG WOO KIM is an assistant professor in the
depart-ment of statistics at Chungnam NationalUniversity in Teajon, Korea.
BYUNG HUN LEE is a researcher in marketing at MPC
LTD. in Korea High-level timing goals. MICHAEL J. SHAW is director of the Center for
Information Systems and Technology Management atthe University of Illinois at Urbana-Champaign.
HSIN-LU CHANG is a Ph.D. student in informationsystems at the School of Commerce, University of
Illinois at Urbana-Champaign. MATTHEW NELSON currently is a second-year Ph.D.student in information systems at the University ofIllinois.
8/3/2019 2__dt
3/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
3
Paper References:1. Agrawal, R.; Aming, A.; Bollinger, T.; Mehta, M.; Schafer, J.; and Srikant, R. The Quest data
mining system. In Proceedings of the 2nd International Conference on Knowledge Discovery inDatabases and Data Mining. Portland, OR, 1996, pp. 244-249.
2. Allen, c.; Kania, D.; and Yaeckel, B. Internet World Guide to One-to-One Web Marketing. NewYork: John Wiley, 1998.
3. Berry, M.J .A., and Linoff, G. Data Mining Techniques for Marketing, Sales, and CustomerSupport. New York: Wiley Computer Publishing, 1997.
4. BroadVision. A BroadVision one-ta-one white paper. White paper, BroadVision (1996).5. BroadVision. http://broadvision.com (2000).6. Business Week Graphic: On-line sales are soaring. Business Week(June 22, 1998) (based on
data from Forrester Research).7. Chaturvedi, A.R; Hutchinson, G.K.; and Nazareth, D.L. Supporting complex real-time decision
making through machine learning. Decision Support Systems, 10, 2 (September 1993),213-233.
8. CLIPS. CLIPS Riference Manual (Version 6.05). CLIPS, 1997.9. Garbonell, J.G., and Michalski, R.S. Machine learning: A historical and methodological analysis.
Al Magazine (fall 1983), 69-79.10. Gupta, 0.;, Digiovanni, M.; Norita, H.; and Goldberg, K. Jester 2.0: Evaluation of a new linear
time collaborative filtering algorithm. 22nd International ACM SIGIR Conference on Researchand Development in Informa-tion Retrieval(August 1999), pp. 291-292.
11. Johnson, R.A., and Wichern, D.W.Applied Multivariate Statistical Analysis. Englewood-Cliffs, NJ:Prentice Hall International, 1992.
12. Kim, ]., Lee, K., Shaw, M. J., Chang, H., and Nelson, M. A preference scoring tedmique topersonalized advertisements on Internet storefront. Working paper, University of lllinois atUrbana-Champaign (2000).
13. Lee, K. Personalized advertisement techniques for one-to-one marketing on Internet stores.Master's thesis, Chungnam National University, 2000.
8/3/2019 2__dt
4/36
8/3/2019 2__dt
5/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
5
Reference for the paper: http://an5qy7ag4q.scholar.serialssolutions.com/?sid=google&auinit=J
W&aulast=Kim&atitle=Application+of+Decision-Tree+Induction+Techniques+to+Personalized+Advertisements+on+Internet+Storefronts&title=International+journal+of+electronic+commerce&volume=5&issue=3&date=2001&spage=45&issn=1086-4415
FOR MORE INFO...
STONY BROOK UNIVERSITY LIBRARIES
8/3/2019 2__dt
6/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
6
Introduction This paper studies personalized recommendation
techniques that suggest products or services to thecustomers of Internet storefronts based on their
demographics or past purchasing behavior
8/3/2019 2__dt
7/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
7
Currently available techniques Currently available recommendation
techniques are primarily based: Collaborative filtering
Preference scoring
Rule-based approaches
8/3/2019 2__dt
8/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
8
Currently available techniquescont.
Collaborative filtering selects advertisements forcustomers based on the opinions of other customerswith similar past preferences.
The preference-scoring approach to personalized
recommendation uses a preference-score concept toselect personalized advertisements based on initialcustomer profile, purchase history, and behavior inInternet stores
In the rule-based approach, marketing rules from
marketing experts are a core component in providingpersonalized advertisements.
8/3/2019 2__dt
9/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
9
Rule based approach Based on the marketing rule that
undergraduates in Indiana whose
income is greater than $80000 preferto own Luxury cars, caradvertisements are displayedwhenever customers who live in
Indiana access the Internetstorefront.
8/3/2019 2__dt
10/36
Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001
10
Rule based approach
8/3/2019 2__dt
11/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
11
Issues in Rule based approach The marketing rules for personalized
recommendation usually come frommarketing experts in that domain.
The acquisition of marketing rules, however, isunsystematic, time-consuming, and dependenton the intuition of marketing experts.
The marketing rules in a knowledge base need
to be continuously updated and changed, butmanual change management is a difficult task.
8/3/2019 2__dt
12/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
12
Proposed Solution To overcome the defects of the rule-
based approach in personalizedadvertisements, the authors proposedecision-tree induction techniquesthat extract marketing rules fromdatabases in Internet stores.
Rule-based personalization has twophases: marketing-rule extraction, and
real-time advertisement selection
8/3/2019 2__dt
13/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
13
Proposed Solution In order to obtain valuable marketing rules,
decision tree induction technique is used toanalyze purchase-transaction histories,
customer profiles, and product information.
The extracted marketing rules are storedin a marketing-rule base and are used forreal-time personalized-advertisement
selection when customers visit the Internetstore
8/3/2019 2__dt
14/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
14
Proposed Solution Schematic
8/3/2019 2__dt
15/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
15
Marketing Rule Extraction The extraction of
useful marketing rulesusing decision-tree
induction techniquesbegins by defining ahierarchy tree ofproduct categories toextract marketing
rules at variousabstraction levels ofproduct categories
MP3 Files Sporting goods
Pop MusicClassical Music
Symphony ConcertRap
Dance Metal
Shakira Britany
Kim Rolla
8/3/2019 2__dt
16/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
16
Marketing Rule Extraction The rule-extraction phase has four
steps:
(1) selecting learning data, (2) generating target variables,
(3) constructing a decision tree, and
(4) selecting a marketing rule
8/3/2019 2__dt
17/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
17
Marketing Rule Ext. Phase 1 Step 1
Training data sets and test data sets are
selected from customer records, makingit possible to begin with a data set ofmanageable size. The size of a data setis determined by such factors as numberof customer-profile records, the volume
of disk storage, and the computingcapacity of the inductive learning tools.
8/3/2019 2__dt
18/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
18
Marketing Rule Ext. Phase 1 Step 2
In the next step, values of targetvariables are generated for the selected
data records. In the present approach,target variables are whether or not acustomer prefers a specific productcategory in the hierarchy tree. Since thetarget variables do not exist in thedatabases, they must be generated. Thisis done using the purchase-transactiondatabase.
8/3/2019 2__dt
19/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
19
Marketing Rule Ext. Phase 1 For example, customer profile records can have the following
attributes:
CUSTOMER =
After the target variables are generated, CUSTOMER relation isextended as follows.
CUSTOMER' =
where t_MP3_file, Etc. are binary variables to specify a customer'spreference for the specific product categories. That is, the binaryvariable is assigned to 1 when the customer prefers thecorresponding product category.
8/3/2019 2__dt
20/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
20
Marketing Rule Ext. Phase 1 Step 3
The third step is decision-tree construction.Using inductive learning tools and the training
data set, decision trees are inducted for all thetarget variables.
Step 4
After the decision tree is constructed, useful
marketing rules are filtered from constructeddecision-trees based on the validation resultsfrom test data sets and accuracy of the rules.
8/3/2019 2__dt
21/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
21
Marketing Rule Ext. Phase 1
8/3/2019 2__dt
22/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
22
TARGET-VARIABLE GENERATION
There are several possible ways togenerate target variables based on
the purchase-transaction database: (1) the counting-based method,
(2) the expected-value-based method,
(3) the statistics-based method, and
(4) the subcategory-based method.
8/3/2019 2__dt
23/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
23
Counting-based method The counting-based method, based on the number of
purchases in a specific product category, makes itpossible to decide whether or not the customerprefers the product category.
That is,PRE[subij] = 1 if NP[subij] >= Omega
where, NP[subij] = the number of purchaseof customer i for product category jOmega is a minimum threshold,
which is determined by analysts.PRE[subij] = 0 otherwise
8/3/2019 2__dt
24/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
24
Expected-value-based method
N[subj] is the number of products in product category j.
Alpha is a multiplier for the expected value. Usually alpha isequal to or greater than 1. When alpha is 1, if customer ipurchased more items in product category j than theexpected number of purchases of product category j, thismeans that customer i prefers product category j
8/3/2019 2__dt
25/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
25
Statistics-based methodStatistical values like the mean, the median, the firstquartile, and the third quartile are used to generatetarget variables
8/3/2019 2__dt
26/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
26
Subcategory based method In the case of non-leaf-node product
categories, the corresponding targetvariables can be generated based on
subcategory target variable values (whenthe target variables for subcategories arealready generated using one of the abovemethods).
PRE[subij] = 1 if PRE[subik] = 1 for some
subcategory k of jPRE[subij] = 0 otherwise
8/3/2019 2__dt
27/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
27
MARKETING-RULE SELECTION A decision tree includes marketing rules that link customer
demographics with preferences for product categories.Since marketing rules may have low predictability or maynot be accurate, the valuable marketing rules have to beselected from constructed decision trees.
Two phase heuristics are used for marketing-rule selection.1. Choose decision trees whose predictability (1 -misclassification rate), with respect to a training data set, isgreater than a certain threshold.
2. Select nodes from the filtered decision trees whoseaccuracy (i.e., purity) is greater than a certain threshold.
The selected nodes in the decision trees are translated inthe rule structure. Here is an example of a translated rule.
If age < 30 and gender = male then Ballad, with accuracy= .9 and level = 3.
8/3/2019 2__dt
28/36
Source: International Journal ofElectronic Commerce 5 no3 45-62
Spr 2001
28
ADVERTISEMENT SELECTION Personalized advertisements are selected
on a real-time basis.
Let M be the number of advertisements tobe displayed in an Internet storefront, andL the depth of the product-categoryhierarchy tree.
The advertisement-selection algorithm is as
follows. STEP 1 m: = 0,l: = L, generate fact set using
the specific customer's profile.
8/3/2019 2__dt
29/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
29
Advertisement Selection Algorithm Continued. STEP 2 Until m < M and 1 > 0
2.1 Fire marketing rules of level 1 withcustomer's facts.
2.2 If there are matching rules, matched rules
are fired by accuracy order. The productcategories appeared in consequent parts of therules are selected as preferred productcategories for the specific customer. Chooseadvertisements for that category from theadvertisement database. The number of selected
advertisements should be less than M - m. 2.3 If m = M, exit. 2.4 Update m to reflect the additionally selected
advertisements, and update l to l-1.
8/3/2019 2__dt
30/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
30
Advertisement Selection Algorithm Continued. STEP 3
If m < M, randomly select M - m advertisementsfrom the advertisement database
8/3/2019 2__dt
31/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
31
EXPERIMENT EXPERIMENTAL DATA COLLECTION
Data from 330 respondents were gathered. At first, the respondentswere asked to select preferred product categories in MP3 music filesand sporting goods or leisure equipment. Then they were asked tochoose five MP3 music files among the 32 displayed products. Thedisplayed products were equally selected from all MP3 leaf nodes. The
respondents were also asked to purchase five sporting goods or leisureequipment items among the 24 displayed products. The displayedproducts were equally selected from all sporting goods or leisureequipment leaf nodes (three products from each of the eight leafnodes). On the next two pages, respondents were asked to rank MP3music files and sporting goods/leisure equipment items on a five-pointpersonal-interest scale (1 = low interest to 5 = high interest). 16advertisements for MP3 music files were displayed to rank interestsabout the advertisements, which were selected one item from each leafnode. In addition, 16 advertisements for sporting goods or leisureequipment items were displayed to rank interests about those, whichwere two selected items from each leaf node. Finally, the respondentswere asked to specify their profiles, including age, gender, job, and soon.
8/3/2019 2__dt
32/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
32
MARKETING-RULE EXTRACTION
The 330 respondents were divided into two data sets,one for marketing-rule generation, and the other fortesting the effectiveness of the proposed approach.For marketing-rule generation, 198 responses (60%of the total data set) were used. Among the 198 data
items, 70 percent were used to construct decisiontrees, and 30 percent to validate constructed decisiontrees. For target-variable generation, an expectedvalue-based approach of alpha= 1 was used. All told,37 decision trees were constructed (26 for MP3 musicfiles and 11 for sporting goods or leisure equipment).
Decision trees with predictability greater than 65percent were selected in the marketing-rule selectionstep. Decision rules with accuracy greater than 65percent were selected as valuable marketing rules.
8/3/2019 2__dt
33/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
33
EFFECTIVENESS TEST
To test the effectiveness of the proposed decision-treeinduction approach, the results were compared to thepreference-scoring approach and a random selection. Thedata of 132 respondents (40% of the total data set) wereisolated for effectiveness testing. Three advertisements
were selected for each respondent: that is, oneadvertisement using each of the three approaches for eachrespondent. In the case of the decision-tree inductionapproach, the marketing rules generated earlier were usedto select personalized advertisements. In the case of thepreference-scoring approach, advertisements were selectedbased on the respondents' own interest-manifestationbehavior (purchase history, profile, preferred productcategories). In the case of random selection,advertisements were selected on a simple random basis.The means of the five-point-scale personal-interest scoresfor the selected advertisements were compared statistically
8/3/2019 2__dt
34/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
34
RESULT
8/3/2019 2__dt
35/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
35
Conclusion Statistically, there were differences among the five-point-scale
scored means of customer' interest in advertisements from thethree different approaches: decision-tree induction, preferencescoring, and random selection. Thus, the effectiveness ofpersonalized-recommendation techniques is affected by thedecision algorithm utilized in the personalized- advertisement
selection process. The proposed decision-tree induction approach is effective in
generating marketing rules as a means to assist rule-basedpersonalized recommendations.
In the case of sporting goods or leisure equipment, decision-treeinduction gave the best results. But in the case of MP3 music files,preference scoring gave the best results. This indicates that theappropriateness of personalized-advertisement techniquesdepends on the characteristics of product categories. Thus it isnecessary to study how various personalized-advertisementtechniques can be used cooperatively.
8/3/2019 2__dt
36/36
Source: International Journal of
Electronic Commerce 5 no3 45-62Spr 2001
36
ThankYou
Next Presentation 5/2/2006
Topic - New Advances in Data Mining
By:
Group 14
Madhavarapu,Chidroop
Sandhuria,Deepanshu
DONT FORGET TO COME TO CLASS!