+ All Categories
Home > Data & Analytics > Event Attendee Count Prediction

Event Attendee Count Prediction

Date post: 21-Apr-2017
Category:
Upload: neeraj-tiwary
View: 43 times
Download: 1 times
Share this document with a friend
16
Event – Webinar Attendee Count Prediction v1.0 pilot Neeraj Tiwary Data Scientist
Transcript
Page 1: Event Attendee Count Prediction

Event – Webinar Attendee Count Prediction v1.0 pilot

Neeraj Tiwary

Data Scientist

Page 2: Event Attendee Count Prediction

Problem StatementMarketers need to know the prospective attendee count for In-Person/Online events conducted for any product / geographic location during the event setup and budget planning stage. Predicting attendee counts at the time of planning will help in improving the overall success rate of events conducted.

Business Cases1. Predict event attendance with basic event attributes at the time of creation of event.

Impact: This will help event owner/marketer in pre-planning of event.2. Predict the event attendance with registration counts and basic event attributes.

Impact: This will help marketers/event owner to improve #registrations during the duration between event setup and start date.

Page 3: Event Attendee Count Prediction

DataTraining dataset contains 4000 records whereas test dataset contains 800 records.

Page 4: Event Attendee Count Prediction

Architecture

Page 5: Event Attendee Count Prediction

Architecture - ContinuedHere for each use case, we created two separate models and then ensemble them into a wrapper model. The reason for creating two separate models is that to Simplify the problem space Distribution of response variable was suggesting that the data follows gamma distribution. Gamma distribution didn’t have very good support for ZERO inflated kind of problems though Poisson /

Negative Binomial distribution have it. Here the requirement is to predict the number of attendees of any event. This was a count regression

problem, and we can’t use any other regression algorithms like linear / neural network as those follow the ranges from – infinity to + infinity whereas for count variable, it should follow the range from 0 to infinity.

Page 6: Event Attendee Count Prediction

Data Cleansing Trimmed all the variables to remove white spaces Converted all the categorical variable values into lower case Replaced all the null values to “Not Assigned” to have uniformity in the data Data transformation to have proper data values for some common categorical variables Removed low frequency categorical data as those were impacting the model

Missing value imputation Went to the business and derived the missing value with the actual value as far as possible For remaining missing values, used “Multiple Imputation” methods to impute the data as most of the

data were missing at random and belongs to categorical variables.

Page 7: Event Attendee Count Prediction

Feature EngineeringThis is the man step of any model development activity. We need to enhance our features to have a better predictability. Created dummy variables for categorical variables like “Product” and “TargetAudience” by using mtabulate in

R Drop unused levels for all categorical variables. Created “Hour of Day” attribute which will tell that at which hour the event is going to start Created “Month of Day” attribute which will tell that at which month the event is going to start Created “Duration” attribute which will tell the duration of event Created “DaysBetweenEventCreationAndStartDate” attribute which will tell the period between event start

date from its creation date Initially all the data were available in text string. Parsed the data to fetch relevant information. We did the pre-cooking /text parsing of data before landing into R for developing the model

Page 8: Event Attendee Count Prediction

Descriptive Statistics – Attendee Count

Response Variable: Statistics:Attendee Count of a randomly chosen in-person event for a future date

Distribution (Log-Likelihood):

Boxplot Density Plot Histogram

Mean: 28.65435Standard Deviation: 32.89823Skewness: 2.267742Kurtosis: 5.9273

Page 9: Event Attendee Count Prediction

Response Variable - Distribution• Here response variable

“AttendeeCount” follows the Gamma Distribution

• We had many instances (~23%) with ZERO attendee counts for the events

• Since gamma model doesn’t support ZERO response variable, we divided the problem into two sets

1. Zero attendee count problem

2. Non-Zero attendee count problem

Page 10: Event Attendee Count Prediction

Exploratory Analysis - ProgramOwner

Page 11: Event Attendee Count Prediction

•Model1: Logistic Regression• ROC Curve

••

• AUC: •• Confusion

Matrix

Model Output: Business Case 1Model2: Gamma RegressionAccuracy:

Model Parameters

Page 12: Event Attendee Count Prediction

•Model1: Logistic Regression• ROC Curve

••

• AUC:•• Confusion

Matrix

Model Output: Business Case 2Model2: Gamma RegressionAccuracy:

Model Parameters

Page 13: Event Attendee Count Prediction

Model - Actual vs Predicted + Registration

Page 14: Event Attendee Count Prediction

Model - AzureMLWe developed the same model in AzureML and deployed it as web service.

Below is the snippet of the same in excel.

Page 15: Event Attendee Count Prediction

Model - AzureML

Page 16: Event Attendee Count Prediction

MethodologyUsed -> Gamma Regression, Logistic regression,

Tried -> Poisson, Negative Binomial, Neural Networks regression etc

Results Models developed with Gamma / Logistic regression have better results. Marketer will change any attributes and then can check the predicted attendee count score through

AzureML model and based on that score, he/she will be in a better state to take his/her own decision.

Conclusions and Next StepsAfter a thoroughly understanding of the problem, below are my further recommendations to proceed ahead Need to explore Vowpal Wabbit in AzureML Need to embed the model with Power BI reporting


Recommended