The profile of the management (data) scientist: Potential scenarios and skills for B/SMD-based
Management research
Juan Mateos-Garcia, Nesta P&RNEMODE PDW
BAM Conference 9-11 September, 2014
2
Organisational + personal context• Nesta: The UK’s innovation foundation.,
with a mission to help people and organisations bring great ideas to life.
• Doing research on data skills for BIS data capability strategy in partnership with RSS and Creative Skillset
• Doing some ‘big’ data work myself• I used to do management research
(CENTRIM).Draw on all this to reflect on the implications of big data for management research, focusing on skills.
3
Data-driven (automated, personalised) products,
processes and services. New formats for data communication
1. Definitions
More varieties of data
More online activity, digital processes, better hardware.
Generated at faster velocities
Larger volumes of data
New applications
4
More complexity
5
New opportunities for researchers• Coverage: Large samples• Revelation: Make the invisible
visible, reveal preferences, run experiments.
• Granularity: High level of resolution (temporal + dimensional).
• Cheap! £££
6
3. MOR examplesI looked at abstracts of 103 papers in last three issues of [1] AOMJ, [2] BJM, [3] Management Science. No ‘big data’ papers in [1] and [2]. 11 in MS (8 in a ‘Business Analytics’ special issue)
Data source TopicAral +
WalkerFacebook(Proprietary)
Use RCTs to study social influence. Large samples and high levels of granularity allows them to consider how social influence interacts with tie embeddedness and tie strength.
Bao + Datta
SEC (Open) Use unsupervised learning to identify and quantify risk types in ~14,000 annual reports, benchmark them against other methods for classification, and develop an interactive platform to explore the findings.
Goshe + Han
App Store + Google Play (open)
Scrape App Store and Google Play data to create a sales panel they use to estimate consumer demand and how it is affected by App features, including pricing model.
Tambe LinkedIn (Proprietary)
Quantify business big data capabilities and measure inter-company recruitment networks to estimate inter-company skill investment spillover
7
Display findings visually + interactively: Data visualisation
Initial visualisation: Exploratory data analysisDimension reduction: Cluster analysis, PCA.Model selection, estimation, evaluation: Econometrics/statistics/machine learning
Get data: Web scraping/API programming skillsRun experiments: Experimental designsManage and process the data: Database management Clean the data: ‘wrangling’ (and patience).
Technical skills required, or the profile of the management data scientist
Access data
Model data
Present findings
Dat
a Pi
pelin
e
8
Dealing with false positives bound to happen with large samples and multiple tests.Encouraging consilience through reproducibility and relating finding to wider bodies of knowledge
Ask the right questions: “The best dimension reduction tool that there is.”Be careful with biases: N = All? Rarely. It is important to understand the (administrative and organisational) processes that generated the data.
Obtain proprietary dataManage anonymity and ethical issues (including experimental research cf. Facebook infamous RCT).
Challenges (not all technical)
Access data
Model data
Present findings
Dat
a Pi
pelin
e
Requ
ires
theo
ry a
nd d
omai
n kn
owle
dge
9
Institutional solutions• People with technical skills and domain
knowledge are rare -> Unicorns. • Supply push + Demand pull to increase
MOR big data capabilities.• Internal dialogue within the discipline
and with other disciplines (Computer Science, Information Systems)
• Acknowledge big data limitations for looking at important issues (power, perceptions, structural change.)