Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | binzidd007 |
View: | 221 times |
Download: | 0 times |
of 25
8/10/2019 Business Intelligence & Data Mining-14
1/25
Lessons & Challenges
from Mining Retail E-Commerce Data
Kohavi et. al (2004)
8/10/2019 Business Intelligence & Data Mining-14
2/25
Motivation
n Important domain of data miningn Massive amounts of data is collected
n Data collection is automatic not prone to errors
n Data is Rich has a lot of potential for discoveringpatterns
n Three types of Data: Clickstream data, Transactionaldata and User Profile data
n Combined mining of these 3 types of data is possible
90%
10%
8/10/2019 Business Intelligence & Data Mining-14
3/25
The E-Commerce Data Mining Suite
n E-Commerce data mining suite developed byBlue Martini Software
n Purchased and used by many Brand Nameretailers: Debenhams, Harley Davidson,Sainburys, Sprint etc.
n System designed specifically for BI
n End-to-end solution:
n Data Collection
n Data Warehousing
n Data Transformations
n Visualization
n Data Mining
8/10/2019 Business Intelligence & Data Mining-14
4/25
The Business Intelligence Process
Data Cleaning
Data Integration
Data Sources
Data Warehouse
Task-
relevant Data
Selection and
Reduction
Data Mining
Pattern Evaluation
8/10/2019 Business Intelligence & Data Mining-14
5/25
The Experience Shared
Business Lessons & Technical Lessons have been shared Data Miningprojects executed for more than 20 clients
Clients from different industry verticals with varying
business models
Clients spread over: US, Europe, Asia & Africa
Data Sizes upto 100 million records Diverse data:
Clickstream
User Profile
Demographic
Response to Mail CampaignsOrders Placed through website / telephone / in-store
8/10/2019 Business Intelligence & Data Mining-14
6/25
Business Lessons
8/10/2019 Business Intelligence & Data Mining-14
7/25
Requirements Gathering is Challenging
n Clients are reluctant to list business questionsn They may not know what questions to ask
n They do not understand the underlying technologyand how much it can do
n Clients present standard reporting type
questions, e.g.n What is the gender-wise distribution of customers?
n What is the region-wise response rate of the mailcampaign?
n Instead of asking questions like:n What are the characteristics of customers who spend
more than $500?
n What kind of people responded to the mailcampaign?
8/10/2019 Business Intelligence & Data Mining-14
8/25
Educating the Users
n Involving the users is critical for success
n Understanding the business
n Uncovering the real needs
n Users will have to educated
n What can be achieved by BI
n Prototypes / Demo Systems
n
Case studies
8/10/2019 Business Intelligence & Data Mining-14
9/25
Business Events
n The architecture recordsn Every customer search and number of results returned:
Too many rows, No rows
n Shopping cart events: Add to cart, Change Quantity,Delete
n
Registration, log-in, checkout, payment, orderconfirmation
n Any failure / crashes
n Users timezone
n Technical capabilities of the users computer
n These details are collected particularlybecause they are useful for ANALYSIS
8/10/2019 Business Intelligence & Data Mining-14
10/25
Data Collection
n
Usual methods of data collection:n Stateless Http requests from multiple web servers
n Parsing and loading them session-wise and user-wise
n Difficult Web logs were designed for debugging
web servers not to provide data for BIn Blue Martini architecture was designed for BI
n Session & user data collected and linked togetherat Application Server level
n Transactions automatically tied to sessions
n All data automatically recorded in a databasen Pre-processing and data cleaning is not required
8/10/2019 Business Intelligence & Data Mining-14
11/25
Data Collection Lessons
n Collect the right data upfront
nAll data that could be useful should becollected and integrated
n Stored in a database / data warehouse
n Integrate with External Events
n Marketing events like promotions
n Cannot be captured by the data collectionsystems
8/10/2019 Business Intelligence & Data Mining-14
12/25
Creating the Data Warehouse
n DW creation requires substantial datatransformations
n Can take 80% of the time taken to thecomplete BI exercise
n Requires integration of several data sources:
n Website
n Payment gateway
n Call center
n POS terminals / shops systems
n External systems / inputs (e.g. promotions /campaigns data)
8/10/2019 Business Intelligence & Data Mining-14
13/25
Logical DW Architecture
8/10/2019 Business Intelligence & Data Mining-14
14/25
Data Warehousing: Challenges
Loading and Maintaining Consistent Data
Loading and Storing Large Volumes of Data
Coping with Changes in Operational Definitions
Providing Reasonable Response Times
If it is an E-Commerce site the website itself will
be outside the Firewall, so data will have to be
copied across the Firewall
8/10/2019 Business Intelligence & Data Mining-14
15/25
Business Intelligence Tools
n The software provided: Reports, Visualizationand Data Mining
n Data Mining algorithms included:
n Rule Inductionn Anomaly (outlier) detection
n Entropy-based statistics
n Association Rules
8/10/2019 Business Intelligence & Data Mining-14
16/25
Business Intelligence Lessons (1)
n Operational transactions have higherpriority than BI
nBI can be taken up after the systemstabilizes
n Can take several months to get startedn Users are happy with basic reports /
MIS
n Unexpectedly insightful findings capture their
interest
n This can start the BI process
8/10/2019 Business Intelligence & Data Mining-14
17/25
Business Intelligence Lessons (2)
n Trained Data Analysts are requiredn Domain knowledge is important
n Technical know-how is essential
n
Terminology needs to be Definedn Users can misinterpret results
n Potentially useful findings may be ignored orunrealistic expectations can arise
8/10/2019 Business Intelligence & Data Mining-14
18/25
Business Intelligence: Challenges
Designing user-friendly interactive interface
Automatic Feature Construction
Building models that users can interpret
Making users understand that correlation does not
imply causality
Explaining insights
Linking ROI to insights
8/10/2019 Business Intelligence & Data Mining-14
19/25
Deployment
n Insights need to be shared
n Insights obtained by Data Mining needs to beshared across the organization
n Easy to use tools for capturing andcommunicating (e.g. by E-mail) will help
n Taking Action
n Business users must see the value
nActing on the results may be difficult (e.g.
designing a campaign for a special segmentof customers)
nA good architecture would help
8/10/2019 Business Intelligence & Data Mining-14
20/25
Technical Lessons
8/10/2019 Business Intelligence & Data Mining-14
21/25
Data Collection and Management Lessons
n Collect data at the right leveln Data was collected at the Application Server
level
n Reduced pre-processing of weblog data
n Design the GUI with Data Mining in mindnAll useful data can be captured
n Default values should be avoided
n
Validate data to reduce cleaning effort
8/10/2019 Business Intelligence & Data Mining-14
22/25
Data Collection and ManagementChallenges
n Should data be sampled?n E-Commerce data is huge in volume
n Is it necessary to store all the data?
n Will rare events be missed if sampling is done?
n
Slowly changing dimensionsn Customers evolve (e.g. lifetime changes, lifestyle
changes)
n Products evolve (e.g. new lines, new technology)
n Frequency of DW uploads
n DW uploads take time and processing power
n Should not disrupt BI analysts work
8/10/2019 Business Intelligence & Data Mining-14
23/25
Data Cleaning and Pre-processingLessons & Challenges
n Time-outs, incomplete sessions, crashesn Needs to be detected
n What to do with such data?
n Duplicates
n
Same customer with more than one IDn Same account used by multiple customers
n Guest log-ins
n Missing, unknown, not applicable or default
valuesn Hierarchical Attributes
n Most algorithms cannot handle hierarchical attributes
8/10/2019 Business Intelligence & Data Mining-14
24/25
An Attribute Hierarchy
all
Europe North_America
MexicoCanadaSpainGermany
Vancouver
M. WindL. Chan
...
......
... ...
...
TorontoFrankfurt
all
region
office
country
city
8/10/2019 Business Intelligence & Data Mining-14
25/25
Analysis Lessons & Challenges
n Enriching the Datan Add demographic attributes
n Create derived attributes
n Calculate weighted averages, moving averages
n Exploration
n Visualization
n
Domain knowledge can help in gaining insightn Customer propensity scoring
n Building Models
n Start with simple models (easy to explain to users)
n Build models at the right level of the attribute hierarchy
n
Address scalability issues (to maintain users interest andconfidence)
n Test and validate the models
n Estimate accuracy levels