Post on 08-Aug-2020
transcript
Team: BIDM Project A1
Members : Deepshikha Yadav (61110463)
Shanawaz Janmohamed (61119007)
Vibha Naryan (61110350)
Udayan Dasgupta (61110544)
Santosh P N (61110163)
Logo
Agenda
Objective1
Snapshot of Data2
Model Building3
Key Findings4
Further Improvements5
Appendix6
Logo
•To predict Opening Box Office
Revenue of a movie
•To Predict whether a movie
will break even in first weekend
Project Objective
Logo
Snapshot of the Data
Logo
Model Building
1. Regression Tree
2. Multi-Linear Regression
3. Naïve Bayes
4. Classification Tree
5. Logistic Regression
6. Ensemble Model
Logo
Distributed byParamountWarner brosDream –workMiramax20th century fox
Summer
Adventure, DramaHorror , Sequel
Type of Content
Release timing
Screen and Budget
Key FindingsKey Revenue Drivers
Logo
Log domestic = -0.164+1.09 Log 1st weekend
Further Improvements
Simonoff’s Model
Additional Parameters
•Marketing Budget distribution across various channels
•Duration(Running time of movie)
•Lead actors and actress salary
•We were able to predict breakeven with accuracy of 70%•We were not able to predict box office revenue
Current Model Findings
Logo
Thank You
Name:
Address:
Email:
Phone:
Website:
Logo
Logo
Original data
Additional
data
•www.imdb.com•http://www.imdb.com/title/tt0116191/•http://www.imdb.com/title/tt0116191/parentalguide#certification•http://www.1728.com/page8.htm
•www.the-numbers.com
•www.leesmovieinfo.com
•www.boxofficemojo.com
• http://www.bollywoodtrade.com/box-office/movies-domestic.htm
• http://en.wikipedia.org/wiki/Bollywood_films_of_2010
Data Source: Hollywood Movies
Bollywood data set
Column Names•Breakeven• ROI• Box office opening week revenues• Budget• MPAA Ratings• Oscar Actor/Director/Producer• KIM (Sex/ Violence /Profanity)• Genre• Distributor• opening week/month/year
1800 data points for training and validation 800 data points as test dataData from year 1996-2010
Original data
Additional data
Logo
Data Exploration
Logo
ROI for a movie is negatively
correlated to the presence of
violence and profanity. However, the
data shows that ROI has a positive
trend in relation to presence of
sexual content.
Logo
Budget for a movie shows an
increasing trend in relation to
the presence of Oscar actor ,
director or producer.
Logo
Average budget is typically high for movies belonging to Action ,
fantasy, Animation , Sci-Fi genre.
Logo
Avg ROI for ‘R’ rated movies tends to be high as compared to PG and PG-13
movies
Logo
The suspense genre data shows spikes in the data for Avg ROI for movies
like Open Water and SAW which are medium budget movies but have earned
a huge revenue.
Logo
In the month of December,
most of the movies belong to
drama genre.
However, in the month of
October from the year 2001 to
2005 , the data shows a
pattern of genres preferred
namely – Drama, comedy,
suspense.
Logo
The Blair Witch project stands out as an anomaly with an extremely high
ROI as compared to other movies of the same genre. This anomaly is due
to extremely low budget of the movie (35000$)
The Blair Witch
Project
Logo
Model : Regression Tree(Continuous Response)
Logo
Regression Model(Continuous Response)
Logo
Naïve Bayes
Logo
Model :Classification Tree
Logo
Model : Logistic Regression(Categorical Response)
Logo
Ensemble Model with Optimization