Josiah ShelleyVP, Digital Strategy at Jan KelleyData, Marketing, TechnologyDad of a 10-month old
What I’m sharing today1. Data is changing everything2. What is Machine Learning?3. Machine Learning in action4. Where are you on the Data Spectrum?5. Example of ML in Education6. What are you going to do now?
Let’s talk about Data
We create data.
90% of all data today was created in the last two years.
Your contribution to this is calledyour Digital Footprint.
Digital FootprintYour digital footprint is data that's created through your activities and communication online. This can include more passive activities, such as if a website collects your IP address, as well as more active digital activities, such as sharing images on social media.
Let’s look at my personal digital footprint
Google knows a lot
Websites I visit
Videos I watch
What I keep or delete
Conversations I have
Facebook knows a lot too
Here’s what data Facebook has… and about 20 more data points.
Industry interests
Sports interests
Travel and places
Entertainment
Facebook even allows businesses to find me with their data.
Oh and there’s Amazon
Everything I’ve bought
Everything I’ve viewed
The conversations I haveThe photos I takeThe friends I haveThe topics I searchThe files I storeThe files I shareThe products I look atThe products I buyThe sports teams I likeThe groups I joinThe instant messages I sendThe videos I watchThe places I’ve travelledThe influencers I followThe events I attendThe stuff I buyThe stories I shareThe groups I’m a part ofThe things I comment onThe stuff I don’t buyThe stuff I don’t care aboutThe things I’ve never searchedThe people I don’t respond toThe files I deleteThe products I don’t buy
So here’s what 3 tech companies know.
That’s 24 GB on Josiah.
Want your data?
https://takeout.google.com/ https://www.facebook.com/settings
And that creates demand for Data Centers to store all of it.
3,501,274 sq ft
6,460,000 sq ft
Data is powerful and that’s why people want it so badly.
Data is also creating demand for jobs. 3.5 million unfilled cybersecurity jobs globally by 2021
The amount of data in the world is two parts wonderful and one part terrifying.
Throw back to 10 years ago.
Gillette has no clue who’s buying.While Harry’s knows it all.
Harry’s uses Google and Facebook’s data to be ultra targeted. Gillette targets a “city”, Harry’s targets a “person”.
Harry’s is winning based on data.
Data is the fuel for the future
Why does all this matter?How we do business is changing and it’s thanks to data.
Let’s take a closer look at businesses and yes, let’s look at the data.
On average 55% of data is collected and not used.
85% say because there’s no tool to capture and analyze.
And there’s a big gap with knowledge and software to help non-technical people do data analysis.
But data should help all companies, not just Facebook and Google.
Data used in the right way can help the world.
And this is all powered by machine learning… but what is it?
Is the application of AI where machines are given access to data and then can learn from it rather than needing to be programmed by humans what to think and do about the data.
Machine Learning
The Key is that ML performs tasks without using explicit instructions.
• Neural Networks• Random Forest• Logistic Regression• Kernel Methods• K-Means Clustering• Gradient Boosting Algorithms• Naive Bayes• kNN
It performs tasks using algorithms.
This is the standard way of coding.
Netflix has 76,897 “altgenres” or unique ways to determine the type of movies and shows it should recommend to each of its users to not only personalize their experience but also make them come back for more.
Content recommendation system used by Netflix.
1. Machine Learning is not new (dating back to 1950s)2. There’s no world where data becomes less important in 5 years 3. Large companies (Like Google) are all in with AI and ML4. You are already using machine learning today 5. Your understanding of ML and spotting when it can be useful
will allow you to use resources more effectively and be more proactive instead of reactive
A few useful observations
You’re using it today at work.
It’s being used at airports.
And walmart stores to spot thieves.
And all companies must go through the stages of data awareness and analytics.
Stages of analytics
• Data unaware - what do we have?
• Data aware - I know what we have
• Dashboards / visuals / excel - I’ve organized what we have
• Real-time dashboards with insights – Use of R, Tableau or Other
• Prediction - I’m using ML or other to make predictions
What stage are you?
The primary question that needs answering before you start… what problem am I solving?
• who are my best students? What makes them unique?• what marketing is working best?• what programs need more support?• How do we reduce…?
This could be:
What I wanted to share with you today is our venture into Machine Learning with our platform called “EnrolCast”.
We have worked with:
How do we allocate marketing spend effectively across 100’s of programs?
Core problem we set out to solve
• Low enrolment definition• Hectic time when requests fly in to support all the
different programs
Auxiliary problems
Enrolment varies over time
• Low enrolment is about being off-trend based on historical data• Very few reports are directed specifically for marketing
We also learned
Our vision: build something to help evaluate enrolment health and inform marketing decisions
• We use Machine Learning to predict Enrolment Trends by program and cluster
• Develops deeper enrolment understanding and clarity on the best next step
What does it do?
• We used Lean Startup Methodology• Built a dashboard (web application)• Developed algorithm to predict enrolment and “Cast” score• Developed different clusters and views• Developed reporting and visualizations• Automated insights
How did we do it?
We used SCRUMOur operating system for development• Daily stand-ups to keep momentum• Group of experts working together• 1-month sprints, retrospectives • Greater collaboration • Share uncomfortably early and often
Evaluation of processes and current structure of the company.
Suggestions for improvement and process optimization.
Application design together with the client.
Application construction and implementation.
Evaluation and monitoring.
How?Sprint Week 1.Data Management
1. Input data2. Data filtering3. Structured data with JSON4. Q/A Data Management
Sprint Week 3. Prediction Analytics Engine
1. Algorithm2. Marketing Spend (Google
Analytics/Facebook/Google Ads)3. Enrolment Pacing4. Trends
Sprint Week 2. Web Application (using MVC and Pattern)
1. Data views2. Data entry3. User management4. Reports5. Notifications/Alerts6. Q/A Web Application
Sprint Week 4. User Experience & Customization
1. User Feedback Sessions2. UX Design updates3. Data Visualizations4. Integrations (Branding/reports,
CRM, other)
Chart and List View
Cluster View
Program View
Add a status
The Cast Metric
We used XGBoost… an open source machine learning model to make predictions.
• Extremely powerful, and often outperforms Neural Nets on this type of data• Able to leverage complex non-linear relationships between• multiple features and target simultaneously• It has also been used to win many Kaggle ML competitions in the last few years
The Cast MetricFeatures we put in..
Applications• Applications Year-To-Date• Acceptances• Acceptances Year-To-Date• Declines• Declines Year-To-Date• Days Remaining• Goal• Cluster (Business, Engineering, Health, International, Justice, Trades)
Training our machine learning model
The Cast Metric• Predicts whether a program will achieve their target seats.• Brings our model’s accuracy to the median absolute error +/- 5.59
students.
Metric XGBoost
Avg Error +0.71
Avg |Error| +10.07
Median |Error| +5.59
Making human knowledge more powerful with Data….+ Marketing Meetings+ Registrars Office Meetings+ Strategic Enrolment Meetings
Provide focus on the right programsRespond to inquiries with data supportDeeper intelligence into enrolment targets
What’s next for Enrolcast
• Provide our platform to a small group of higher ed for use and expand thoughtfully
• Add more features (economic factors as eg.) to the cast number to improve accuracy
• Run data discovery workshops (business problems, data sources, approach) to help higher ed use their data better
What you can learn through our project.
1. You need a clear pain point or problem to solve.
2. You need to be beyond data aware and you need a source of data (ideally lots of cumulative data).
3. You need a team (2-3 People) who have experience with Python or solving data challenges and 2-3 months to make progress.
4. You need an MVP approach to your project. So you can pull the plug early if it’s not working.
• TensorFlow for managing• Algorithms like XGBoost• Read articles like “10 Must
Try Open Source ML Tools”
5. Use open source if you can!
In closing
90
Data is the fuel of your future, when are you going to get started?
Find out more about this project or just to chat over coffee:Josiah - [email protected] - 905-220-4111Or come see me after!