Post on 01-Jul-2018
transcript
1/17/17
1
IS5126-HowBA
Lecture2–Data,Databases,SQL,BehavioralAnalyCcs;Jan18,2017
Dr.TuanQPhanNUSIS5126
Admin
• Pickupsyllabusandschedule,alsoavailableonmywebsite:hRp://www.tuanqphan.us
• PurchaseHBSCasefromhRp://hbsp.harvard.edu– Data.gov,#9-610-075
• Signupteamof4onIVLEbyJan.30– UseIVLEforumstofindteammates
Dr.TuanQPHAN,NUSIS5126,(c)2017
LearningObjecCves
• Data.govCaseDiscussionandPresentaCons• DataManipulaCon,ETL• SQL
– DatabaseDesign– BestPracCces– NormalizaConGuidelines
• MarkeCngandBehavioralAnalyCcs• Mini-case
Dr.TuanQPHAN,NUSIS5126,(c)2017
LearningObjecCves• Products
– ProductLifeCycle– Supply/Demand– MarketBasket– MarkeCngStrategy
• People– CRM– UClityModeling
• OrganizaCons/Companies– CompeCCon– Strategy
• CorrelaConandCausaliCes• Resource:
– TheTenDayMBA,StevenSilbiger– 50Social/Psycologybooks:hRp://www.sparringmind.com/psychology-books/
Dr.TuanQPHAN,NUSIS5126,(c)2017
DatabasesandManipulaCon
RealWorld
Rawdata
Dataware-house
CollecCon Import
Transform
Analyze Report
DATAFLOW
Dr.TuanQPHAN,NUSIS5126,(c)2017
DataManipulaCon• Rawdataislarge,unstructured,noisy• Extract,Transform,Load(ETL):processto“cleanup”thedatafor
processingandstorage• Extract:parsing,collecConfrommulCplesources/formats,
webscraping• Transform:converttoappropriateformat,applysetofrules,noise
reducCon,errorhandling,translatecodes,validaCon– Python,SQL,awk,sed,….
• Load:loadsintothedatawarehouse(database)• Stagingenvironment• Resource:TheDataWarehouseETLToolkit,RalphKimball&Joe
Caserta
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
2
DataStorageTechnology
• Dataislarge,needtostore,organize,andmanipulate
• Approaches:– Filesystem:tapedrive,harddisks,RAID,solidstates,NAS
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–IntroducCon• SQL:“StructuredQueryLanguage,”(aka“sequel”)
– LanguagefordatamanipulaCon– Independentofstoragemedium
• Manyvariants,standardizedANSI• RelaConalmodelfordatabasemanagement• HeavilyusedinBA• DevelopedbyEdgarCodd,IBMResearchLaboratoryin1970s• Highlypopular1980’s,1990s,2000s,?• SoluCons:
– Commercialproducts:Oracle,MicrosopAccess,IBMDB2– Open-source:MySQL(Oracle),PostgreSQL,SQLite– BigData:Hive/Hadoop,Netezza(IBM)
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL-IntroducCon
• Dataintables,rows,andcolumns(akarelaCon,tuple,aRributes)
• ValueatparCcular(row,column)• Rowas“unitofanalysis”• Primarykey:columnwithuniqueidenCfierforrow• Fewcommands:
– TablemanipulaCon:CREATE,ALTER,DROP,(GRANT)– DatamodificaCon:INSERT,DELETE,UPDATE– Querydata:SELECT
• Resource:hRp://www.sqlite.org
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL-CREATE
• Createanamedtablewithnamedcolumnsandtypes,“schema”
CREATE TABLE books(
id int not null primary key, title text,
published_year int, price double
);
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–CREATEDatatypes
• Columnsmustbeofatype– Fixed-width:fastaccess,efficient– Variable-width:flexible
• Numbers:fixed-width– int:Cnyint,smallint,mediumint,bigint,unsigned– double
• Text:variable-width• Date:notypeinsqlite3,int,dateCme,Cmestamp
– string,eg.“Aug.28,2012”– “UnixCme”,numberofsecondssinceJan1,1970UTC– Timezones
• Binarydata(eg.Image):blob
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–ALTER&DROP
• ModifiesanexisCngtableschemaalter table books add column author text;
• Removesatableschema(anditsdata)drop table books;
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
3
SQL-INSERT
• Addsdatatotableinsert into books values (1, "Practical SQL", 1998, 14.00, "Bowman");
insert into books values (2, "Data Mining", 2011, 26.85, "Linoff");
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 14.0 Bowman 2 Data Mining 2011 26.85 Linoff
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–Loadingdata
• Loaddatafromacsvfile:sopware-specificbooks.csv
3,"Scoring Points",2008,22.00,"Humby"
4,"Business Intelligence",2009,57.85,"Vercellis”
.separator ","
.import books.csv books
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 14.0 Bowman 2 Data Mining 2011 26.85 Linoff 3 Scoring Point 2008 22.0 Humby 4 Business Inte 2009 57.85 Vercellis
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–DELETE&UPDATE
• Deletesarowdelete from books where id=4;
• Modifiesvalue(s)Update books set price=5.00;
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–SELECT
• Querydatabaseselect * from books;
• Sortresultsselect * from books order by published_year desc;
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis
id title published_year price author ---------- ----------- -------------- ---------- ---------- 2 Data Mining 2011 5.0 Linoff 4 Business In 2009 5.0 Vercellis 3 Scoring Poi 2008 5.0 Humby 1 Practical S 1998 5.0 Bowman
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–SELECT…WHERE
• Whereclausesubsetsresultsselect title, author from books where published_year > 2000;
• CombiningcondiConsselect * from books where published_year > 2000 and author="Linoff";
title author published_year ----------- ---------- -------------- Data Mining Linoff 2011 Scoring Poi Humby 2008 Business In Vercellis 2009
id title published_year price author ---------- ----------- -------------- ---------- ---------- 2 Data Mining 2011 5.0 Linoff
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–SELECT…FUZZY
• Allowsforwildcardstringmatching
select * from books where title like “%ness%”;
id title published_year price author ---------- --------------------- -------------- ---------- ---------- 4 Business Intelligence 2009 5.0 Vercellis
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
4
SQL–Groupby
• Aggregatebyacolumn:insert into books values(5,"2008 book",2008,25.00,"Phan");
select published_year, count(*), avg(price), sum(price) from books group by published_year;
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis 5 2008 book 2008 25.0 Phan
published_year count(*) avg(price) sum(price) -------------- ---------- ---------- ---------- 1998 1 5.0 5.0 2008 2 15.0 30.0 2009 1 5.0 5.0 2011 1 5.0 5.0
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL–Embeddedqueriesselect avg(sub.num_books) from (select published_year, count(*) as num_books from books group by published_year) sub;
published_year num_books -------------- ---------- 1998 1 2008 2 2009 1 2011 1
avg(sub.num_books) ------------------ 1.25
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL-JOIN
• Abilitytocombinefromtwoormoretablesbycolumns,“JOIN”
select * from books b, publish_year p where b.published_year=p.year;
Whereis1998?
year num_books ---------- ---------- 2008 100 2009 120 2010 90 2011 104
id title published_year price author ---------- ------------- -------------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 3 Scoring Point 2008 5.0 Humby 4 Business Inte 2009 5.0 Vercellis 5 2008 book 2008 25.0 Phan
id title published_year price author year num_books ---------- ----------- -------------- ---------- ---------- ---------- ---------- 2 Data Mining 2011 5.0 Linoff 2011 104 3 Scoring Poi 2008 5.0 Humby 2008 100 4 Business In 2009 5.0 Vercellis 2009 120 5 2008 book 2008 25.0 Phan 2008 100
Dr.TuanQPHAN,NUSIS5126,(c)2017
SQL-Sets
XY Z
TableA TableB
1. Innerjoin2. Le9Join
X = A B
A =Y X select * from books b left join publish_year p on b.published_year=p.year;
select * from books b inner join publish_year p on b.published_year=p.year;
id title published_year price author year num_books ---------- ------------- -------------- ---------- ---------- ---------- ---------- 1 Practical SQL 1998 5.0 Bowman 2 Data Mining 2011 5.0 Linoff 2011 104 3 Scoring Point 2008 5.0 Humby 2008 100 4 Business Inte 2009 5.0 Vercellis 2009 120 5 2008 book 2008 25.0 Phan 2008 100 Dr.TuanQPHAN,NUSIS5126,(c)2017
Longvs.wide
• Longtablesvs.widetables• pivottable,crosstabulaCon,report
trans_id book_id year num_books ---------- ------- ---- ---------- 1 1 2008 5 2 1 2008 1 3 1 2009 1 4 2 2011 3 5 3 2009 4 6 3 2009 1 7 4 2010 1 8 4 2010 5 9 4 2011 2 10 5 2010 1
book_id y2008 y2009 y2010 y2011 ---------- ----- ----- ----- ---------- 1 6 1 0 0 2 0 0 0 3 3 0 5 0 0 4 0 0 6 2 5 0 0 1 0
Dr.TuanQPHAN,NUSIS5126,(c)2017
DatabaseDesign• Howtodesigntableschema?• Whichcolumnsgowhere?• GooddesigncharacterisCcs:
– MakesinteracConswithdatabaseeasytounderstand– Consistencyofvaluesanddatabase– Highperformance
• BaddesigncharacterisCcs:– Misunderstandingofquery– Increasedriskofinconsistencies– Redundantdataentry– Difficulttochangestructureofthetables
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
5
DatabaseDesign• Normaliza:on:reduceduplicates,protectdataintegrity
• Non-lossdecomposi:on:spliungtableswithredundantvaluesintotwoormoretables– Jointo“putbacktogether”
• Clear,easytoreadtableandcolumnnames:– Eg.books_prices,author_firstname,books,authors
• EnCty-relaConship(ER)modeling• DefinerelaConshiptypes:1-1,1-N,N-N• Nomagicbullet,iterateandexperience
Dr.TuanQPHAN,NUSIS5126,(c)2017
GeneralGuidelines1. WhatkindofquesConsarewetryingto
answer?2. Whatarethesourcesofdata?3. WhicharethefocalenCCesorsubjects?
• RowwasonethingintheenCty,columnsasaRributes
• IndependentExistence4. Groupcommoncolumns,useE-Rdiagrams
tohelp5. DetermineuniqueidenCfier–primarykey6. WhataretherelaConshipsbetween
enCCes:1-1,1-N,N-N7. Normalizeandverify8. Testandreiterate
Dr.TuanQPHAN,NUSIS5126,(c)2017
NormalizaConGuidelines• Firstnormalform:
– eachrow-columnintersecConmustbeoneandonlyonevalue– mustbeatomic– norepeaCnggroups– “rectangular”tables
Bad:BeRer:
Order_id Book_id1 Transact_date1
Book_id2 Transact_date2
1 1 19/10/2010
2 1 01/10/2010 2 01/10/2010
Record_id Order_id Book_id Transact_date
1 1 1 19/10/2010
2 2 1 01/10/2010
3 2 2 01/10/2010
Dr.TuanQPHAN,NUSIS5126,(c)2017
NormalizaConGuidelines• Secondnormalform
– “Everynon-keycolumnmustdependontheenCreprimarykey”
– Compositeprimarykey• Thirdnormalform
– Nonon-keycolumndependonanothernonkeycolumn• Fourthnormalform
– Noindependent1-NrelaConshipsbetweenprimarykeycolumnsandnon-keycolumns:toomanyblanks
• Fiphnormalform– Breaktablesintosmallestpossiblepiecesinordertoeliminateallredundancywithinatable.
Dr.TuanQPHAN,NUSIS5126,(c)2017
CombineSQL&Python
• PythonloopstocreateSQLcode• UsedforaggregaConor“pivottables”• SimplescripCng
Dr.TuanQPHAN,NUSIS5126,(c)2017
WhentousePython,SQL,R?• Similartoolsforalllanguages• Excel:filters,sort,pivottable,…
– Pro:easyGUI,“intuiCve,”easyforprototyping– Cons:slow,cannothandlelargedatasets,requireshighlystructureddata,
limitedtools,$$$• Python:dicConaries,loops,NumPy,etc…
– Pro:flexible,fast,goodforbigdatasets,rich/mulCmediadata– Cons:slowfilesystems,limitedtools,complicatedforsimpletasks
• SQL:select,groupby,…– Pro:manycommercialandopensourcesoluCons,fast(whenstructured
properly)– Cons:requiresstructureddata,limitedbinarydatasupport,$$$
• R:indices,aggregate,ddply,data.table…– Pro:singlelanguage/framework,manypackagesforfastETL– Cons:Memoryinefficient,slow,singleprocessor(exceptRevoluConR),
inconsistentnotaConacrosspackages
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
6
BestPracCceGuidelines• Time-space(bestpracCces)• Bigrawdatabestinfilesystems(harddrive)
– PythonforrawdatacollecCon,binarydata– Input:rawdata– Output:semi-structured,non-normalized(eg.csv)
• ETLandmanipulaConindatawarehouse(eg.SQL)– Sqlite:easytouse,standardANSI– MySQL:free,opensource,fastreads– Oracle:transacCondata(writes)– Hadoop:bigandslow,HiveprovidesSQL-likenotaCon– Input:semi-structured– Output:highlystructured,transformeddatareadyforanalysis,unitofanalysisonreachrow
• AnalysisinstaCsCcaltools(R,Stata,SPSS,Matlab,etc…):– Commercialandopensourceavailable– Commercialfaster,higherperformance,beRermemorymanagement– Input:highlystructured– Output:reports,analysis,insights,visualizaCons
Dr.TuanQPHAN,NUSIS5126,(c)2017
Misc.
• Otherdatabasedesignparadigms• DimensionalModeling• Resource:TheDataWarehouseToolkit,TheCompleteGuidetoDimensionalModeling;RalphKimball&MargyRoss
Dr.TuanQPHAN,NUSIS5126,(c)2017
Break
Dr.TuanQPHAN,NUSIS5126,(c)2017
MarkeCngandBehavioralAnalyCcs
• Whatistheunitofanalysis?– Country– Firms– Products– Consumers/individuals
• AggregaConvs.Sparsity• “BigData”makessparsitylessofaproblem
Dr.TuanQPHAN,NUSIS5126,(c)2017
ProductLifeCycle(PLC)
• StagesofproductadopConandsales• IntroducCon,Growth,Maturity,Decline
Dr.TuanQPHAN,NUSIS5126,(c)2017
PLC–BassDiffusionModel• ANewProductGrowthforModelConsumerDurables,Bass,F.M.,ManagementScience1969
• AdopConmodelofconsumerdurables
• Pr(t):probabilityofpurchaseatCmet• m:totalmarketsize(numberofpeople)• Y(t):numberofpreviousbuyers• p:innovaCon(probability)• q:imitaCon(probability)
Pr(t) = p+ qmY (t)
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
7
PLC–Innovators&Imitators
0 5
10 15 20 25 30
0 5 10 15 20
Cum
ulat
ive
No.
of
Ado
pter
s
(in m
illio
ns)
Year
0.0
0.5
1.0
1.5
2.0
2.5
0 2 4 6 8 10 12 14 16 18 20
Non
-cum
ulat
ive A
dopt
ers
(in m
illio
ns)
Year
Innovators Imitators
Dr.TuanQPHAN,NUSIS5126,(c)2017
PLC–CrossingtheChasm
Resource:CrossingtheChasm,GeoffreyA.Moore,1991
Dr.TuanQPHAN,NUSIS5126,(c)2017
Products–Supply/Demand• Lawsofsupply&
demand• Highdemand,high
prices– DemandisnotstaCc
– PromoConcanchangedemand
• Surplussupply,lowprices– EfficientstockallocaCon
– Stockoutproblems
Dr.TuanQPHAN,NUSIS5126,(c)2017
Products–Supply/Demand
• Profit(margins)=Price–Cost• Cost=fixedcost+marginalcost• PerfectmarketcompeCCon=>efficiency• AdverCsingandpromoConscanincreasedemand
• R.O.I.:ReturnonInvestment=Profit/investment
Dr.TuanQPHAN,NUSIS5126,(c)2017
MarkeCngStrategy
Dr.TuanQPHAN,NUSIS5126,(c)2017
Product–MarketBasketAnalysis• Lookatwhatproductsarepurchasedtogether• AssociaCverules:correlaConbetweenA&B
– Prob(A|B),Prob(B|A)– Beer&Diapers
• Featureanalysis:eg.size,color,specificaCons• Cross-sell
– Upsell:sellmoreexpensive/highermarginproduct– SubsCtutes– RecommendaConengines
• Bundling:packagetwosimilarproducts– Lowcostofbundling– (WordPerfect&Lotus)vs.MicrosopOffice– Convergeddevices:(PDA&phone)vs.smartphone
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
8
Products
• Isn’tproduct-levelanalysisperfect?• Whatismissing?• Whyshouldwecareaboutindividual/consumeranalysis?
Dr.TuanQPHAN,NUSIS5126,(c)2017
People(Consumers)• 5-stepbuyingprocess,“markeCngfunnel”:
– Awareness:“Imightneedsoap”• triggersincludeadverCsing
– InformaConsearch:“Dovesoapsoundsgood,letmefindoutmoreaboutit”
• TargeCngandsegmentaContogetbestinformaContocustomers– EvaluatealternaCves:Whichisbestforme?Withinandoutsidecategory
• Influencerscanplaykeyrole– Purchase:distribuConchannel– Evaluate(postpurchase):“DidImakeamistake?”
• Repeatpurchase?• ProduceposiCveword-of-mouth(WOM)
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRM
Dr.TuanQPHAN,NUSIS5126,(c)2017
People–CRMAcquisiCon
• AcquisiCon:– Acquisition rate (%) = (Number of prospects acquired / Number
of prospects targeted) x 100 – Acquisition is defined as the first purchase or purchasing in the
first predefined period – Denotes average probability of acquiring a customer – Always calculated for a group of customers – Usually computed on a campaign-by-campaign basis
• AcquisiConcostperprospect– Acquisition cost ($) = Acquisition spending ($) / Number of
prospects acquired – Measured in monetary terms – Precise values for companies targeting prospects through direct
mail – Less precise for broadcasted communications
Dr.TuanQPHAN,NUSIS5126,(c)2017
People–CRMAcCvityMeasurements
• Trackcustomersloyaltyprogram• ObservetransacConspercustomeroverCme• RFM:
– Recency:whenwasthelastpurchase– Frequency:howopenpurchaseinaperiod– Monetary:totalvalueofsales
• Easytocalculate• HelpfulforsegmentaCon• Cons:
– NotgoodforforecasCng
Dr.TuanQPHAN,NUSIS5126,(c)2017
People–CRMAcCvityMeasurements
• Average inter-purchase time = 1 / Number of purchase incidences from first purchase till current time period – Measured in time periods – Evaluation of metric – Easy to calculate – Useful for industries with frequent customer
purchases – Marketing intervention might be warranted
anytime customers fall considerably below their AIT
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
9
People–CRMRetenCon/DefecConrates
• Retention rate – Average likelihood that a customer purchases in period t, given
that he/she has purchased in the last period t-1 – Retention rate (%) = [(Number of customers in cohort buying in
period t | buying in period t-1) / Number of customers in cohort buying in period t-1] x 100
– Retention rate (%) = 1 - (1 / Average lifetime duration) • Defection rate
– Average likelihood that a customer defects in period t, given that he/she has purchased in the last period t-1
– Defection rate (%) = 1 - Retention rate – Average lifetime duration = 1 / (1 - Average retention rate)
Dr.TuanQPHAN,NUSIS5126,(c)2017
People–CRMRetenCon/DefecConrates
• Number of retained customers in any period (t+n) = (Number of acquired customers in period t) x (Retention rate(t+n))
– Assuming a constant retention rate among acquired customers
• Example – Assume a constant retention rate of 0.75, or defection rate of
0.25 – Average lifetime duration = 4 (1 / [1 - 0.75]) – Customers starting at beginning of year 1 = 100 – Customers remaining at end of year 1 = 75.00 (100 x 0.751) – Customers remaining at end of year 2 = 56.25 (100 x 0.752) – Customers remaining at end of year 3 = 42.19 (100 x 0.753) – Customers remaining at end of year 4 = 31.64 (100 x 0.754)
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRMDefecConRatevs.CustomerTenure
• Variation (or heterogeneity) around average lifetime duration of 4 years
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Customer Tenure (Periods)
# of
Cus
tom
ers
Def
ectin
g
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRMLifeCmeDuraCon
• Less precise metric – Average lifetime duration = 1 / (1 - Average retention rate)
• More precise metric – Average lifetime duration =
– where N = cohort size, t = time period • Complete or incomplete information on customer
– Complete: customer’s time of first and last purchases are known
– Incomplete: either only time of first purchase, or only time of last purchase, or both time of first and last purchases are unknown
1Number of customers retained
T
tt
t
N=
×∑
Dr.TuanQPHAN,NUSIS5126,(c)2017
People–CRMProbability(AcCve)
• Probability of a customer being active in time t in a non-contractual setting – Probability(Active) = Tn – where n = number of purchases in a given
period, T = time of the last purchase (given as a fraction of the observation period)
– Simple approximation of probability(active) – More advanced computation methods exist
Dr.TuanQPHAN,NUSIS5126,(c)2017
People-CRMProbability(AcCve)
• Customer 1: T = (8/12) = 0.667 and n = 4 – Probability(Active) = (0.667)4 = 0.198
• Customer 2: T = (8/12) = 0.667 and n = 2 – Probability(Active) = (0.667)2 = 0.444
Customer 1
Customer 2
Observation Period Holdout Period
Month 1 Month 12 Month 8 Month 18
X indicates that a purchase was made by a customer in that month
Dr.TuanQPHAN,NUSIS5126,(c)2017
1/17/17
10
BehavioralAnalyCcs
• Buildunderstandingofconsumerlifecycle
• Segmentdifferentbehavior/moCvaCons
• Separatetypesofloyalty:– Behavioral:observedmanytransacCons
– Autudinal:emoConalloyalty
• ProvidesguidancetodifferentmarkeCngeffort
• Howtomeasureandcapturedataondifferentcustomertypes?
Mini-case:Taobao(e-commerce)
Mini-case:Buy,Search,Browse(Taobao)
• E-commerceincreasinglypopular
• HowcananalyCcsbuildinsightandtakeacCon?
• Howisonlinedifferentthanofflineshopping?
• WhataddiConaldataisavailable?
• Whatkindofbehaviorcanweobserve?Moe,WendyW.“Buying,Searching,orBrowsing:DifferenCaCngbetweenOnlineShoppersUsingin-StoreNavigaConalClickstream.”JournalofConsumerPsychology13,no.1(2003):29–39.
Data
• ProductinformaCon/pricing
• TransacCons
ProductID DescripFon Size AHributes Price Date
12345 CatT-shirt L Red 15.00 Winter2016
Timestamp TransacFonID
ProductID
UserID QuanFty Price Shipping
Dec.1,2016 1 12345 tphan 2 30.00 SingPost
Dec.1,2016 1 34567 tphan 1 15.00 SingPost
Data
• Clickstream– Webserver(Apache)logs
Dr.TuanQPHAN,NUSIS5126,(c)2017
Timestamp URL Client IP SessionID UserID
Dec.1,2016,00:00:01
hRp://qoo10.sg/
Firefox 192.168.1.1 12345ABCD tphan
Dec.1,2016,00:00:10
hRp://qoo10.sg/Mens_Shirts/
Firefox 192.168.1.1 12345ABCD tphan
… … … … … ….
Approach
• Categorizepages:• HomePage• CategoryPages• BrandPages• ProductPages• SearchPages
1/17/17
11
Metrics• Avg.Cmespentperpage• %searchpages• #categorypages• #productpages• Diff#Cat• #Brand• #Prod
Behaviors
• KnowledgeBuilding
• HedonicBuilding
• DirectedBuying
• Search/DeliberaCon
• Shallowsessions
Admin
• Pickupsyllabusandschedule,alsoavailableonmywebsite:hRp://www.tuanqphan.us
• PurchaseHBSCasefromhRp://hbsp.harvard.edu– Data.gov,#9-610-075
• Signupteamof4onIVLEbyJan.30– UseIVLEforumstofindteammates
Dr.TuanQPHAN,NUSIS5126,(c)2017