The Zen of Data Science
Eugene DubossarskyChief Data Scientist – Principal Founder –
Presentation Summary - Promised
-Key concepts, dos and don'ts of Data Science
-Science and engineering : very different!
- What are Data Scientists for?
- Where should Data Science sit in the business?
- How should data science be measured, managed, planned?
- Starting, nourishing and growing a successful Data Science function in your business skills and experience
- Becoming an effective data scientist
Presentation Summary – But Actually More Like...
Shameless self promotion Parables Metaphors Abstract Philosophical Stuff Surprises Challenges and Reframes You saying “This is relevant to my life
how?”
Presentation Summary
Tools vs Ideas – Science vs Technology Finding vs Building – Science and Engineering Engagement Exploration – a legitimate, vital and strategic
business activity Intelligence – a business function Mastery Apprenticeship
The “Zen” bit
The bare essence The kernel of truth The thing that isn't illusion The way (Tao) to enlightenment (Satori) Clarity and simplicity derived from meditation,
possibly quite different to everyday experience
Parable 1: Getting Airports Wrong
Everybody thinks that this is an airplane:
Parable 1: Getting Airports Wrong
Imagine your job is to build an Airport You need to take the design of airplanes in to
account. The only problem is:
Parable 1: Getting Airports Wrong
This is what is called a “fundamental category error”. Anything done with this misconception in place will be a waste of time, money and resources.
“Working around it”, and “being realistic about the client's expectations” is a bit beside the point.
Parable 1: Getting Airports Wrong
Most people probably want to focus on the aerodynamics of the “airplane” as currently conceived, the buzz around technology to support such “airplanes” and may see this as being “business focused”, while more fundamental discussions would be seen as “negative”, “academic” or too “challenging”.
Parable 1: Getting Airports Wrong
Nevertheless, getting the fundamental issue sorted out would seem to be the first order of business, no matter how abstract, controversial, politically inconvenient or offensive to some quarters, or how many people have built careers managing, selling and practicing in this paradigm.
Parable 1: Getting Airports Wrong
Because... Uh.. Donkey ?
Data, Science, Tools and Definitions
Data Scientist = “Hadoop Guy” ? “Guy Who Does Stuff with Data” ? Guy Who Does Stuff with Lots of Data ? Guy Who Does Stuff with Big Data ? Guy Who Does Stuff With Big Data That
Sounds Cool or Businessy?(And what makes Data “Big” anyway?)
Science and Engineering
Is there a difference ? What is it ? What is a “Data Engineer” ? What is a “non-Data Engineer” ?
Science and Engineering
Are actually direct opposites Skills, positioning, personality types,
appropriate management frameworks and place in the business are quite different.
The confusion needs sorting out.
Science and Engineering
Now I've Lost You...
That's not “realistic” - most “data scientists” are actually “engineers” by this framework !
That sounds too “technical”, “academic” or not “relevant to business”
Now I've Lost You...
That's not “realistic” - most “data scientists” are actually “engineers”
Yep.
That sounds too “technical”, “academic” or “not relevant to business”
Maybe, Too Bad and No
Engineering Start with an identified idea, end with a design Build or maintain something to pre-defined
parameters Uncertainty is the enemy (time, budget,
resources, performance)
Engineering
Plans, Timeframes and Specifications, vs ongoing (loosely focused) discussion
Delivers Products and pre-determined KPIs. The Unexpected is a (usually unwelcome) exception
Works to milestones and a specification Engaged with operational and technical
management
Engineers
Outcomes are Things An Engineer may do more or less the same
thing many times An Engineer performs “projects” and manages
“processes” An engineer is managed according to tight
requirements
Engineers
easier to identify easier to manage easier to understand less stressful to deal with Easier to train more plentiful easier to recruit
Engineers And Data
Data is a resource to move and manipulate Focus is on building and maintaining
processes that do that Data is a “commodity” that flows through the
system. The focus is on the system.
Science and Scientists Start with reality - derive new insights Uncertainty is your job “Projects” and “processes” are anathema, and
people who manage them don't help Explore and Interrogate Data No two jobs are the same No job can be specified too tightly Findings are inherently uncertain, otherwise
why bother ?
Scientists and Data Focused on The Data. Tools help but don't feature. Data is complex, an undiscovered country to
explore. Data is not a commodity : it is complex, ever-
changing and information rich
Scientists and The CEO
Data is “The Last Frontier”, where dangers lurk and opportunities abound. The scientist is the guide.
Objective is to Tell the Story of the Data, to someone who cares and matters (ideally CEO), preferably as part of an ongoing conversation
Science and Engineering Scientists help you identify new risks and
opportunities, they provide transformational insights.
Engineers make transformations tangible Scientists explore Engineers deliver and maintain The personality types are actually quite
different
Science and Engineering
There is a lot of crossover It is good to be skilled in both Many of the tools used are the same The distinction is not obvious to most outsiders The distinction is crucial
Why the Confusion? It's all “technical”, apparently It has the word “data” in it. Some vendors like it that way. Much of management likes it that way. Much of management is out of its depth And almost all of HR and recruting
.
Science and Engineering
Real Business Needs Both Pretend Business only needs Engineering
(and maybe not even that) Science is crucial for real competition and
risk Science is irrelevant otherwise Engineering is Delivery Science is Intelligence
The Intelligence Function – Where Data Science Should Sit in the
Business?
Absent in most “enterprises” Present informally in most real businesses A strategic, secret asset not to be bragged
about or shared
“Data” is not just structured, electronic, concerete or even conscious
The Intelligence Function
Strategic, secret role Trusted, discreet, low-key advisor, mentor,
guide A mix of Mr Spock, James Bond and Steve
Jobs May guises, many names Well understood by militaries at war, and
organisations with real challenges, risks and uncertainty
Often next in line for CEO
The Intelligence Function – Where Data Science Should Sit in the
Business Not IT Not Operations Right near the CEO Reporting directly, discretely, interactively Not managed by Prince2, waterfall or any
other “project management” or “Business Analysis” methods
Lean Startup, real Agile (see Manifesto) and OODA loop much more like it
Data Science and Analytics Today
Insights or Process ? Tools or Outcomes ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today
Insights or Process ? Tools or Outcomes ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Insights vs Process
Insights CANNOT be the same each time. But Much of “Analytics” can
Deriving value from predictive targeting is a repeatable, mechanical process.
Deriving value from insights derived from the same model is not.
Insights vs Process
Only one requires a scientist. Only one is valued by businesses that don't
have real competitive, environmental and other change pressures.
Data Science and Analytics Today
Insights or Process ? Tools ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Tools and Trinkets
Is “Hadoop” really the most important thing on a “data scientist's resume ?
Why or why not ? What is missing ?
Data Science and Analytics Today
Insights or Process ? Tools ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today
Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today
Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today
Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Engaged or Disengaged ? Measured ?
Value, Compliance or Vanity ?
What would happen to the business if the analytics/data science/data mining function disappered overnight ?
Who would care ? Why ? Why does the function exist in the business in
the first place ? Science does not serve vanity well, and is
not necessary for compliance.
Data Science and Analytics Today
Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Leadership Engaged or Disengaged ? Measured ?
Engagement in Parables
Is investing in data analytics like investing in stocks or investing in an education (or gym membership) ?
If analytics was a taxi, does the CEO think the analytics function are car mechanics, drivers or tour guides, does he know, does he care ?
Engagement in Extremes
Analytics in a hedge fund Analytics in a bank Basel II compliance analytics in a bank What are the KPIs ? Does the CEO personally care about them ? Can the organisation do without the analytics
function ? Can the organisation afford the CEO ignoring
the analytics function ?
Data Science and Analytics Today
Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Leadership Engaged or Disengaged ? Measured ?
Measurement
How many predictive analytics function in banking, telco, insurance etc are measured explicitly on improvement in predictive accuracy, with the CEO keeping an eye on this (retention, acquisition, risk, pricing models) ?
How many know/care about the predictive accuracy of their competitors ?
Finding Training and Managing Data Scientists
Not Easy
Finding Data Scientists Data Scientists are part engineer, part
enterpreneur and part hunter/gatherer – outcome focused explorers !
ADHD is an asset, personality profile is not typical corporate
Communication skills and lateral thinking as important as technical skill
Technical skills are DEEEEP, eclectic
Finding Data Scientists Most severely recruiters out of their depth Ditto most HR The best people are un-/under-/mis-
employed ! It takes one to know one
Training Data Scientists
Eclectic skill set Hard Skills
Stats/Machine Learning/Computing/Psychology
Domain expertise Many “soft skills”
Conceptual Communication Science ! Agile/Lean Startup/Cynefin/OODA
Training Data Scientists
Experience is crucial Mistakes are valuable Apprenticeship is Key ! Courses help, but not a substitute. Won't teach
the soft skills and conceptual outlook
Managing Data Scientists
Yes: Real Agile, Lean Startup, Cynefin, OODA loop
No: PRINCE2, Project Management, “Business Analysis”, Operational Management, the IT function.
Yes: someone who is engaged, empowered, interested.
No: Just about everyone actually doing this out there...
So Who Needs Data Scientists?
Businesses facing real competition, real threats, real uncertainty and real change.
Who Doesn't Really Need Data Scientists ?
Everyone Else.