Objectives
The information herein is for informational purposes only and represents the opinions and views of Project
Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several
factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this
presentation.
Portions © 2013 Project Botticelli Ltd & entire material © 2012 Microsoft Corp unless noted otherwise.
Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as
already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista
and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other
countries. The information herein is for informational purposes only and represents the current view of
Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond
to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft,
and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the
date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the
information in this presentation. E&OE.
Introduction to BI & Big Data
DAX
MDX
Data Mining
Register on
projectbotticelli.com
Big data, or just complex data?
velocity
variety complexity
volume
Data
interpretingpreparing
Today’s big data, tomorrow’s little dataComplexity vs. current capabilities
FAA International Flight Service Station, Honolulu, Hawaii, 1964 (Public Domain Image)
Domain Common big data scenarios
Financial services Modeling true risk
Threat analysis and fraud detection
Trade surveillance
Credit scoring and analysis
Media & Entertainment
Recommendation engines
Ad targeting
Search quality
Abuse and click fraud detection
Retail Point of sales transaction analysis
Customer churn analysis
Sentiment analysis
Telecommunications Customer churn prevention
Network performance optimization
Call Detail Record (CDR) analysis
Network failure prediction
Government Cyber security (botnets, fraud)
Traffic congestion and re-routing
Environmental monitoring
Antisocial monitoring via social media
Healthcare Genomics research
Cancer research
Health pandemics early detection
Air quality monitoring
Big data + traditional BI = power & simplicity
Big, fast, or
complex
data
Microsoft
HDInsight
SQL Server tabular,
multidimensional,
relational DW, or
PDW
Interaction,
exploration,
visualisation
Apache Hadoop distribution
Developed by Hortonworks & Microsoft
Integrated with Microsoft BI
Microsoft HDInsight
Hadoop Principles
Practical method for
massive parallelisation of
analytical data processing
DEMO
Part 1: the job
Hadoop Principles: Data
Hadoop Principles: MapReduce
Hadoop cluster
Hadoop cluster
Buster Cluster, an early research
project by Miles Osborne, University
of Edinburgh, School of Informatics.
Picture used with permission.
http://homepages.inf.ed.ac.uk/miles/
Hadoop cluster
Cloudrent-a-Hadoop-cluster, or:
“Supercomputer for cents”
Windows Azure HD Insight
Processing logic in HDInsight
JS MapReduce Wordcount
Pig Latin Example — It’s All Parallel!
… [see http://pig.apache.org/docs/r0.7.0/tutorial.html]
Reusing processing logic — libraries
Collaborative filtering,
recommenders, clustering,
singular value decomposition,
parallel frequent pattern
mining, naive Bayes, decision
tree
DEMO
Part 2: the results
From HDInsight to attractive Microsoft BI
Operationalising Hadoop
Summary
projectbotticelli.com
video PPTs articles
rafal.net
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties,
express, implied or statutory, as to the information in this presentation.
Portions © 2013 Project Botticelli Ltd & entire material © 2013 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright
ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and
represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and
Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.