+ All Categories

2__dt

Date post: 06-Apr-2018
Category:
Upload: gurdeep-singh
View: 215 times
Download: 0 times
Share this document with a friend

of 36

Transcript
  • 8/3/2019 2__dt

    1/36

    Application of Decision-TreeInduction Techniques to

    Personalized Advertisements onInternet Storefronts

    Paper By:

    Source:

    International Journal of Electronic Commerce/ Spring 2001, VoL5, No.3, pp. 45-62. Copyright 2001 M.E. Sharpe, Inc. Allrights reserved.

    Jong Woo Kim, Byung Hun Lee, Michael J. Shaw, Hsin-LuChang, Matthew Nelson

  • 8/3/2019 2__dt

    2/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    2

    Paper Authors: JONG WOO KIM is an assistant professor in the

    depart-ment of statistics at Chungnam NationalUniversity in Teajon, Korea.

    BYUNG HUN LEE is a researcher in marketing at MPC

    LTD. in Korea High-level timing goals. MICHAEL J. SHAW is director of the Center for

    Information Systems and Technology Management atthe University of Illinois at Urbana-Champaign.

    HSIN-LU CHANG is a Ph.D. student in informationsystems at the School of Commerce, University of

    Illinois at Urbana-Champaign. MATTHEW NELSON currently is a second-year Ph.D.student in information systems at the University ofIllinois.

  • 8/3/2019 2__dt

    3/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    3

    Paper References:1. Agrawal, R.; Aming, A.; Bollinger, T.; Mehta, M.; Schafer, J.; and Srikant, R. The Quest data

    mining system. In Proceedings of the 2nd International Conference on Knowledge Discovery inDatabases and Data Mining. Portland, OR, 1996, pp. 244-249.

    2. Allen, c.; Kania, D.; and Yaeckel, B. Internet World Guide to One-to-One Web Marketing. NewYork: John Wiley, 1998.

    3. Berry, M.J .A., and Linoff, G. Data Mining Techniques for Marketing, Sales, and CustomerSupport. New York: Wiley Computer Publishing, 1997.

    4. BroadVision. A BroadVision one-ta-one white paper. White paper, BroadVision (1996).5. BroadVision. http://broadvision.com (2000).6. Business Week Graphic: On-line sales are soaring. Business Week(June 22, 1998) (based on

    data from Forrester Research).7. Chaturvedi, A.R; Hutchinson, G.K.; and Nazareth, D.L. Supporting complex real-time decision

    making through machine learning. Decision Support Systems, 10, 2 (September 1993),213-233.

    8. CLIPS. CLIPS Riference Manual (Version 6.05). CLIPS, 1997.9. Garbonell, J.G., and Michalski, R.S. Machine learning: A historical and methodological analysis.

    Al Magazine (fall 1983), 69-79.10. Gupta, 0.;, Digiovanni, M.; Norita, H.; and Goldberg, K. Jester 2.0: Evaluation of a new linear

    time collaborative filtering algorithm. 22nd International ACM SIGIR Conference on Researchand Development in Informa-tion Retrieval(August 1999), pp. 291-292.

    11. Johnson, R.A., and Wichern, D.W.Applied Multivariate Statistical Analysis. Englewood-Cliffs, NJ:Prentice Hall International, 1992.

    12. Kim, ]., Lee, K., Shaw, M. J., Chang, H., and Nelson, M. A preference scoring tedmique topersonalized advertisements on Internet storefront. Working paper, University of lllinois atUrbana-Champaign (2000).

    13. Lee, K. Personalized advertisement techniques for one-to-one marketing on Internet stores.Master's thesis, Chungnam National University, 2000.

  • 8/3/2019 2__dt

    4/36

  • 8/3/2019 2__dt

    5/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    5

    Reference for the paper: http://an5qy7ag4q.scholar.serialssolutions.com/?sid=google&auinit=J

    W&aulast=Kim&atitle=Application+of+Decision-Tree+Induction+Techniques+to+Personalized+Advertisements+on+Internet+Storefronts&title=International+journal+of+electronic+commerce&volume=5&issue=3&date=2001&spage=45&issn=1086-4415

    FOR MORE INFO...

    STONY BROOK UNIVERSITY LIBRARIES

  • 8/3/2019 2__dt

    6/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    6

    Introduction This paper studies personalized recommendation

    techniques that suggest products or services to thecustomers of Internet storefronts based on their

    demographics or past purchasing behavior

  • 8/3/2019 2__dt

    7/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    7

    Currently available techniques Currently available recommendation

    techniques are primarily based: Collaborative filtering

    Preference scoring

    Rule-based approaches

  • 8/3/2019 2__dt

    8/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    8

    Currently available techniquescont.

    Collaborative filtering selects advertisements forcustomers based on the opinions of other customerswith similar past preferences.

    The preference-scoring approach to personalized

    recommendation uses a preference-score concept toselect personalized advertisements based on initialcustomer profile, purchase history, and behavior inInternet stores

    In the rule-based approach, marketing rules from

    marketing experts are a core component in providingpersonalized advertisements.

  • 8/3/2019 2__dt

    9/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    9

    Rule based approach Based on the marketing rule that

    undergraduates in Indiana whose

    income is greater than $80000 preferto own Luxury cars, caradvertisements are displayedwhenever customers who live in

    Indiana access the Internetstorefront.

  • 8/3/2019 2__dt

    10/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62Spr 2001

    10

    Rule based approach

  • 8/3/2019 2__dt

    11/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    11

    Issues in Rule based approach The marketing rules for personalized

    recommendation usually come frommarketing experts in that domain.

    The acquisition of marketing rules, however, isunsystematic, time-consuming, and dependenton the intuition of marketing experts.

    The marketing rules in a knowledge base need

    to be continuously updated and changed, butmanual change management is a difficult task.

  • 8/3/2019 2__dt

    12/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    12

    Proposed Solution To overcome the defects of the rule-

    based approach in personalizedadvertisements, the authors proposedecision-tree induction techniquesthat extract marketing rules fromdatabases in Internet stores.

    Rule-based personalization has twophases: marketing-rule extraction, and

    real-time advertisement selection

  • 8/3/2019 2__dt

    13/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    13

    Proposed Solution In order to obtain valuable marketing rules,

    decision tree induction technique is used toanalyze purchase-transaction histories,

    customer profiles, and product information.

    The extracted marketing rules are storedin a marketing-rule base and are used forreal-time personalized-advertisement

    selection when customers visit the Internetstore

  • 8/3/2019 2__dt

    14/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    14

    Proposed Solution Schematic

  • 8/3/2019 2__dt

    15/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    15

    Marketing Rule Extraction The extraction of

    useful marketing rulesusing decision-tree

    induction techniquesbegins by defining ahierarchy tree ofproduct categories toextract marketing

    rules at variousabstraction levels ofproduct categories

    MP3 Files Sporting goods

    Pop MusicClassical Music

    Symphony ConcertRap

    Dance Metal

    Shakira Britany

    Kim Rolla

  • 8/3/2019 2__dt

    16/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    16

    Marketing Rule Extraction The rule-extraction phase has four

    steps:

    (1) selecting learning data, (2) generating target variables,

    (3) constructing a decision tree, and

    (4) selecting a marketing rule

  • 8/3/2019 2__dt

    17/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    17

    Marketing Rule Ext. Phase 1 Step 1

    Training data sets and test data sets are

    selected from customer records, makingit possible to begin with a data set ofmanageable size. The size of a data setis determined by such factors as numberof customer-profile records, the volume

    of disk storage, and the computingcapacity of the inductive learning tools.

  • 8/3/2019 2__dt

    18/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    18

    Marketing Rule Ext. Phase 1 Step 2

    In the next step, values of targetvariables are generated for the selected

    data records. In the present approach,target variables are whether or not acustomer prefers a specific productcategory in the hierarchy tree. Since thetarget variables do not exist in thedatabases, they must be generated. Thisis done using the purchase-transactiondatabase.

  • 8/3/2019 2__dt

    19/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    19

    Marketing Rule Ext. Phase 1 For example, customer profile records can have the following

    attributes:

    CUSTOMER =

    After the target variables are generated, CUSTOMER relation isextended as follows.

    CUSTOMER' =

    where t_MP3_file, Etc. are binary variables to specify a customer'spreference for the specific product categories. That is, the binaryvariable is assigned to 1 when the customer prefers thecorresponding product category.

  • 8/3/2019 2__dt

    20/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    20

    Marketing Rule Ext. Phase 1 Step 3

    The third step is decision-tree construction.Using inductive learning tools and the training

    data set, decision trees are inducted for all thetarget variables.

    Step 4

    After the decision tree is constructed, useful

    marketing rules are filtered from constructeddecision-trees based on the validation resultsfrom test data sets and accuracy of the rules.

  • 8/3/2019 2__dt

    21/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    21

    Marketing Rule Ext. Phase 1

  • 8/3/2019 2__dt

    22/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    22

    TARGET-VARIABLE GENERATION

    There are several possible ways togenerate target variables based on

    the purchase-transaction database: (1) the counting-based method,

    (2) the expected-value-based method,

    (3) the statistics-based method, and

    (4) the subcategory-based method.

  • 8/3/2019 2__dt

    23/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    23

    Counting-based method The counting-based method, based on the number of

    purchases in a specific product category, makes itpossible to decide whether or not the customerprefers the product category.

    That is,PRE[subij] = 1 if NP[subij] >= Omega

    where, NP[subij] = the number of purchaseof customer i for product category jOmega is a minimum threshold,

    which is determined by analysts.PRE[subij] = 0 otherwise

  • 8/3/2019 2__dt

    24/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    24

    Expected-value-based method

    N[subj] is the number of products in product category j.

    Alpha is a multiplier for the expected value. Usually alpha isequal to or greater than 1. When alpha is 1, if customer ipurchased more items in product category j than theexpected number of purchases of product category j, thismeans that customer i prefers product category j

  • 8/3/2019 2__dt

    25/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    25

    Statistics-based methodStatistical values like the mean, the median, the firstquartile, and the third quartile are used to generatetarget variables

  • 8/3/2019 2__dt

    26/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    26

    Subcategory based method In the case of non-leaf-node product

    categories, the corresponding targetvariables can be generated based on

    subcategory target variable values (whenthe target variables for subcategories arealready generated using one of the abovemethods).

    PRE[subij] = 1 if PRE[subik] = 1 for some

    subcategory k of jPRE[subij] = 0 otherwise

  • 8/3/2019 2__dt

    27/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    27

    MARKETING-RULE SELECTION A decision tree includes marketing rules that link customer

    demographics with preferences for product categories.Since marketing rules may have low predictability or maynot be accurate, the valuable marketing rules have to beselected from constructed decision trees.

    Two phase heuristics are used for marketing-rule selection.1. Choose decision trees whose predictability (1 -misclassification rate), with respect to a training data set, isgreater than a certain threshold.

    2. Select nodes from the filtered decision trees whoseaccuracy (i.e., purity) is greater than a certain threshold.

    The selected nodes in the decision trees are translated inthe rule structure. Here is an example of a translated rule.

    If age < 30 and gender = male then Ballad, with accuracy= .9 and level = 3.

  • 8/3/2019 2__dt

    28/36

    Source: International Journal ofElectronic Commerce 5 no3 45-62

    Spr 2001

    28

    ADVERTISEMENT SELECTION Personalized advertisements are selected

    on a real-time basis.

    Let M be the number of advertisements tobe displayed in an Internet storefront, andL the depth of the product-categoryhierarchy tree.

    The advertisement-selection algorithm is as

    follows. STEP 1 m: = 0,l: = L, generate fact set using

    the specific customer's profile.

  • 8/3/2019 2__dt

    29/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    29

    Advertisement Selection Algorithm Continued. STEP 2 Until m < M and 1 > 0

    2.1 Fire marketing rules of level 1 withcustomer's facts.

    2.2 If there are matching rules, matched rules

    are fired by accuracy order. The productcategories appeared in consequent parts of therules are selected as preferred productcategories for the specific customer. Chooseadvertisements for that category from theadvertisement database. The number of selected

    advertisements should be less than M - m. 2.3 If m = M, exit. 2.4 Update m to reflect the additionally selected

    advertisements, and update l to l-1.

  • 8/3/2019 2__dt

    30/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    30

    Advertisement Selection Algorithm Continued. STEP 3

    If m < M, randomly select M - m advertisementsfrom the advertisement database

  • 8/3/2019 2__dt

    31/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    31

    EXPERIMENT EXPERIMENTAL DATA COLLECTION

    Data from 330 respondents were gathered. At first, the respondentswere asked to select preferred product categories in MP3 music filesand sporting goods or leisure equipment. Then they were asked tochoose five MP3 music files among the 32 displayed products. Thedisplayed products were equally selected from all MP3 leaf nodes. The

    respondents were also asked to purchase five sporting goods or leisureequipment items among the 24 displayed products. The displayedproducts were equally selected from all sporting goods or leisureequipment leaf nodes (three products from each of the eight leafnodes). On the next two pages, respondents were asked to rank MP3music files and sporting goods/leisure equipment items on a five-pointpersonal-interest scale (1 = low interest to 5 = high interest). 16advertisements for MP3 music files were displayed to rank interestsabout the advertisements, which were selected one item from each leafnode. In addition, 16 advertisements for sporting goods or leisureequipment items were displayed to rank interests about those, whichwere two selected items from each leaf node. Finally, the respondentswere asked to specify their profiles, including age, gender, job, and soon.

  • 8/3/2019 2__dt

    32/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    32

    MARKETING-RULE EXTRACTION

    The 330 respondents were divided into two data sets,one for marketing-rule generation, and the other fortesting the effectiveness of the proposed approach.For marketing-rule generation, 198 responses (60%of the total data set) were used. Among the 198 data

    items, 70 percent were used to construct decisiontrees, and 30 percent to validate constructed decisiontrees. For target-variable generation, an expectedvalue-based approach of alpha= 1 was used. All told,37 decision trees were constructed (26 for MP3 musicfiles and 11 for sporting goods or leisure equipment).

    Decision trees with predictability greater than 65percent were selected in the marketing-rule selectionstep. Decision rules with accuracy greater than 65percent were selected as valuable marketing rules.

  • 8/3/2019 2__dt

    33/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    33

    EFFECTIVENESS TEST

    To test the effectiveness of the proposed decision-treeinduction approach, the results were compared to thepreference-scoring approach and a random selection. Thedata of 132 respondents (40% of the total data set) wereisolated for effectiveness testing. Three advertisements

    were selected for each respondent: that is, oneadvertisement using each of the three approaches for eachrespondent. In the case of the decision-tree inductionapproach, the marketing rules generated earlier were usedto select personalized advertisements. In the case of thepreference-scoring approach, advertisements were selectedbased on the respondents' own interest-manifestationbehavior (purchase history, profile, preferred productcategories). In the case of random selection,advertisements were selected on a simple random basis.The means of the five-point-scale personal-interest scoresfor the selected advertisements were compared statistically

  • 8/3/2019 2__dt

    34/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    34

    RESULT

  • 8/3/2019 2__dt

    35/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    35

    Conclusion Statistically, there were differences among the five-point-scale

    scored means of customer' interest in advertisements from thethree different approaches: decision-tree induction, preferencescoring, and random selection. Thus, the effectiveness ofpersonalized-recommendation techniques is affected by thedecision algorithm utilized in the personalized- advertisement

    selection process. The proposed decision-tree induction approach is effective in

    generating marketing rules as a means to assist rule-basedpersonalized recommendations.

    In the case of sporting goods or leisure equipment, decision-treeinduction gave the best results. But in the case of MP3 music files,preference scoring gave the best results. This indicates that theappropriateness of personalized-advertisement techniquesdepends on the characteristics of product categories. Thus it isnecessary to study how various personalized-advertisementtechniques can be used cooperatively.

  • 8/3/2019 2__dt

    36/36

    Source: International Journal of

    Electronic Commerce 5 no3 45-62Spr 2001

    36

    ThankYou

    Next Presentation 5/2/2006

    Topic - New Advances in Data Mining

    By:

    Group 14

    Madhavarapu,Chidroop

    Sandhuria,Deepanshu

    DONT FORGET TO COME TO CLASS!