Decision Making using Big Data
Analytics in International
Business
Presenter Bidyut Kumar Mondal Roll – 5 MBA (IB) 2012 - 15
Under Supervision OfProf. Dr. P. K. Das
IIFT (Kolkata Campus)
Agenda
1 Background
2 Objectives
4 Repository of Analytical Tools5 Repository of Big Data Techniques
7 Binary Logistic Regression
3 Perspective on Big Data
8 Research Methodology
9 Results & Interpretation
6 An Application to Credit Risk Modeling
10 Conclusion
Background
Recent trends towards data
driven industry
Huge volume of data is being
generated everyday.
Issue is how to store & analyze the data to get
information
So, big data analytics came into existence.
Organizations utilizing power of
big data are ahead of competition.
Big data will change the way
people live
Objectives
To develop a repository of analytical tools appropriate for real-life problem solving in different sectors.
To study the use of big data analytics in different domains for taking international business decision.
To apply appropriate classification techniques to establish a model to classify defaulter in loan on secondary big data.
Big data – a perspective
Repository of Analytical TechniquesItem Area Statistical Method Data Requirement
1 Market Segmentation Cluster Analysis Buy & sell data for long period
2 Purchase Intention Factor AnalysisSurvey to get rating of each product
attribute
3 Churn Analysis Binary Logistic RegressionInstances of customers who left and who
stayed with the service/organization
4 Credit Default Probability Binary Logistic RegressionData where there are instances of
default and non-default both
5 Group belongingness Discriminant AnalysisData where there are instances of
person belonging to a group and do not belong
6 Probability of Disease of a group Binary Logistic RegressionData where there are instances of
person having disease and do not have
7 Calculate Price Elasticity Regression AnalysisPrice of a product in different times and
sales of the product at that time
8Calculate Productivity of
EmployeesANOVA
Data of employee output on different work condition
9Find out brand
positioning/product positioningMultidimensional scaling
Customer rating on similarity for each pair of product or brand in 7 point Likert
scale.
10 Lost Sales Analysis Binary Logistic RegressionData where there are instances of bids resulting in sales and which do not got
converted into sales
11 Demand Forecasting Time Series ForecastingHistorical demand data of previous years
for more than 10 years of data
12 New Product Design Conjoint AnalysisPreference rank data for each of the
attribute of the product is taken from the respondent.
13 Quality Control Hypotheses TestingA random sample is drawn from the
production floor and Z test or t test is applied on the sample.
14 Customer Loyalty Analysis Regression AnalysisData should be collected from sample
respondent about how satisfied they are with product and how long he is buying
Repository of Big data TechniquesItem Data Pattern Big Data Analysis Business Area Analysis Tool
1
Customer activity based data like Website tracking history, purchase data, call centre data, mobile data etc. are example of activity-based data Predictive Analysis Segmentation Cluster Analysis
2User online profile data and their online purchase history and pattern
Predictive Analysis Digital Marketing Factor Analysis
3Customer’s footprints in network, clicks, browse, comments, review etc.
Predictive Analysis Purchase Intention Binary Logistic Regression
4Customer product/service usage pattern data and customer demography data
Predictive Analysis
Churn AnalysisBinary Logistic
Regression
5
Bank and financial institution data about loan and their current status along with customer demography
Predictive Analysis Credit Default Modelling
Binary Logistic Regression
6Historical health parameter data of animals in a dairy firm
Predictive Analysis
Agriculture Discriminant Analysis
7
Historical data received from the GPS tracker of consignments in shipment about its location and condition
Predictive AnalysisLogistics Discriminant Analysis
8
Data on customer buying pattern and clicking pattern on different cultural festival from online retail website.
Predictive AnalysisRetail Classification Techniques
9Customer purchase data given that the customers are provided with facilities
like bonus card.Predictive Analysis
Retail Regression Analysis
10Patient health data and their track record of disease.
Predictive Analysis
HealthcareBinary Logistic
Regression
11
Historical Data of marketing expenses and the demand of that period for several years
Predictive AnalysisCRM
Multiple Regression
12
customers' spending,usage and other behaviour exhibited in a retail shop
Predictive Analysis Marketing(Cross Sell)
Multiple Regression
13Historical demand data in store level and inventory level.
Predictive Analysis
Retail(Inventory
Requirement)Time Series Forecasting
14Historical data of risk and return of a portfolio.
Predictive Analysis
Finance Regression Analysis.
15Historical data of unemployment of a country.
Predictive Analysis
Economics Time Series Forecasting
16Different document and their key words while uploading the document in online
website.Predictive Analysis
Web Publishing Discriminant Analysis
Big data – Case Studies
AgricultureAgriculture Texan Dairy: Case – Cattle HealthTexan Dairy: Case – Cattle Health
LogisticsLogistics DHL: Case – Predictive AnalysisDHL: Case – Predictive Analysis
Online RetailOnline Retail Amazon: Case – Predictive ShipmentAmazon: Case – Predictive Shipment
RetailRetail Walmart: Case – Customer LoyaltyWalmart: Case – Customer Loyalty
HealthcareHealthcare CCHHS: Case – Disease PredictionCCHHS: Case – Disease Prediction
An Application to Credit Risk Modelling
Is it possible to predict whether a customer is likely to default in the
loan before sanctioning?
Lowering NPALowering NPAIncrease
Customer Base
Increase Customer
Base
Binary Logistic Regression - Variables
Dependant Variable - Dichotomous
Independent Variable – Categorical or numerical
Independent Variable – Categorical variables need coding
Binary Logistic Regression - Assumptions
Logistic regression does not rely on distributional assumptions in the same sense that discriminant analysis does.
However, your solution may be more stable if your predictors have a multivariate normal distribution
Additionally, as with other forms of regression, multi-collinearity among the predictors can lead to inflated standard errors
The procedure is most effective when group membership is a truly categorical variable
Binary Logistic Regression - Odds
Odds RatioOdds Ratio
Log of odds Ratio
Log of odds Ratio
Research Methodology
Data CollectionData Collection
Data CleaningData Cleaning
Data CodingData Coding
Binary Logistic Regression
Binary Logistic Regression
ROC Analysis & Model SelectionROC Analysis & Model Selection
Data
Raw Data(3,91,000)
Debt Consolidatio
nCredit Card Home Loan
Binary Logistic Regression - Variables
Results & Interpretation – Credit Card Segment
Model 1
Observed
PredictedSelected Casesb Unselected Casesc
is_defaulterPercentage
Correct
is_defaulter Percentage
Correct0 1 0 1
Step 11is_defaulter
0 46085 825 98.24224
7796 98.2
1 2311 3053 56.9 2057 2626 56.1Overall Percentage 94.00 94.00
a. The cut value is .500
Model 2
Observed
Predicted
Selected Casesb Unselected Casesc
is_defaulter Percentage Correct
is_defaulter Percentage Correct0 1 0 1
Step 11is_defaulter
0 46012 898 98.1 42246 797 98.1
1 2658 2706 50.4 2342 2341 50
Overall Percentage 93.2 93.4
a. The cut value is .500
ROC Curve - Model1
Model Cutoff TP TN FP FN Sensitivity Specificity 1- Specificity
1
0.1 40473 4822 6437 542 0.99 0.43 0.570.2 43423 4370 3487 994 0.98 0.56 0.440.3 44843 3953 2067 1411 0.97 0.66 0.340.4 45624 3490 1286 1874 0.96 0.73 0.270.5 46085 3053 825 2311 0.95 0.79 0.210.6 46402 2631 508 2733 0.94 0.84 0.160.7 46609 2141 301 3223 0.94 0.88 0.120.8 46772 1651 138 3713 0.93 0.92 0.080.9 46864 1074 46 4290 0.92 0.96 0.04
ROC Curve – Model2
Model Cutoff TP TN FP FN Sensitivity Specificity 1- Specificity
2
0.1 39834 4771 7076 593 0.99 0.40 0.600.2 43216 4236 3694 1128 0.97 0.53 0.470.3 44667 3658 2243 1706 0.96 0.62 0.380.4 45466 3171 1444 2193 0.95 0.69 0.310.5 46012 2706 898 2658 0.95 0.75 0.250.6 46362 2246 548 3118 0.94 0.80 0.200.7 46586 1815 324 3549 0.93 0.85 0.150.8 46746 1374 164 3990 0.92 0.89 0.110.9 46869 881 41 4483 0.91 0.96 0.04
ROC Curve – Credit Card Segment
Model1 WinsModel1 Wins
Model1 – Credit Card Segment
Variables B S.E. Wald Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
annual_inc 0 0 3184.977 0.0 1 1 1
delinq_2yrs 0.708 0.089 62.6 0.0 2.03 1.704 2.419
dti 0.278 0.005 3643.476 0.0 1.32 1.15 1.564
emp_length_year -0.027 0.007 14.967 0.0 0.973 0.96 0.987
funded_amnt 0.004 0 6074.822 0.0 1.004 1.004 1.004
funded_amnt_inv -0.004 0 6165.636 0.0 0.996 0.996 0.996
inq_last_6mths -1.077 0.032 1128.505 0.0 0.341 0.32 0.363
int_rate 1.597 0.719 901.239 0.0 4.9382 4.8201 5.101
mths_since_last_delinq
0.031 0.001 492.118 0.0 1.031 1.029 1.034
term_months -0.11 0.005 532.142 0.0 0.896 0.888 0.905
total_rec_late_fee 5.256 0.104 2538.213 0.0 191.686 156.24 235.175
Constant 10.511 0.238 1951.251 0.0 36735.161
Conclusion
The organization who will interpret it and convert them to actionable information will outperform among the competitors.
The organization who will interpret it and convert them to actionable information will outperform among the competitors.
Google, Amazon, Microsoft, IBM, DHL, P&G are some the leading organization who have leading the big data analytics in current marketGoogle, Amazon, Microsoft, IBM, DHL, P&G are some the leading organization who have leading the big data analytics in current market
How big data analytics and its strength will be used in an organization depends on organization culture
How big data analytics and its strength will be used in an organization depends on organization culture
Challenges – Data Collection, Technical, ExpertiseChallenges – Data Collection, Technical, Expertise
Threats – Individual Privacy, Need Govt regulation & monitoringThreats – Individual Privacy, Need Govt regulation & monitoring