+ All Categories
Home > Documents > Rise of Data Science in Age of Big Data

Rise of Data Science in Age of Big Data

Date post: 04-Apr-2018
Category:
Upload: m8r4mqdon
View: 217 times
Download: 0 times
Share this document with a friend

of 38

Transcript
  • 7/30/2019 Rise of Data Science in Age of Big Data

    1/38

    Revolution Confidential

    T he R is e of DataS c ience in the age of

    B ig Data Analytic s

    Why Data Dis tillation and MachineL earning A rent E nough

    David M S mith

    VP Marketing and C ommunityR evolution A nalytics

  • 7/30/2019 Rise of Data Science in Age of Big Data

    2/38

    Revolution ConfidentialToday, well dis cus s :

    What is Data Science? Why machine learning isnt enough

    Why Data Science works

    The Data Scientists Toolkit

    The Future of Big Data Analytics

    Closing thoughts and resources

    2

  • 7/30/2019 Rise of Data Science in Age of Big Data

    3/38

    Revolution Confidential

    3 Dov Harrington, CC By-2.0http://www.flickr.com/photos/idovermani/4110546683/

  • 7/30/2019 Rise of Data Science in Age of Big Data

    4/38

    Revolution ConfidentialWhere is it s afe to fis h near S an F rancis co?

    4San Francisco Estuary Institutehttp://www.sfei.org/tools/wqt

  • 7/30/2019 Rise of Data Science in Age of Big Data

    5/38

    Revolution ConfidentialHurric ane S andy

    Bob Rudishttp://rud.is/b/2012/10/28/watch-sandy-in-r-including-forecast-cone/

    5

  • 7/30/2019 Rise of Data Science in Age of Big Data

    6/38

    Revolution ConfidentialHurric ane S andy

    Ed Chenhttp://blog.echen.me/hurricane-sandy-outages/

    6

  • 7/30/2019 Rise of Data Science in Age of Big Data

    7/38

    Revolution Confidential

    When did Mic hael J acks on have his

    biggest hits?

    New York Times, June 25 2009 (3 hours after Michael Jacksons death)http://www.nytimes.com/interactive/2009/06/25/arts/0625-jackson-graphic.html 7

  • 7/30/2019 Rise of Data Science in Age of Big Data

    8/38

    Revolution ConfidentialT hree E s s ential S kills of Data S c ientis ts

    8Drew Conwayhttp://www.dataists.com/2010/09/the-data-science-venn-diagram/

    Data IntegrationMashups

    Applications

    ModelsVisualizationPredictionsUncertainty

    ProblemsData SourcesCredibility

    EffectiveDataApplications

  • 7/30/2019 Rise of Data Science in Age of Big Data

    9/38

    Revolution Confidential

    9Image Abode of Chaos, CC BY 2.0http://www.flickr.com/photos/home_of_chaos/6418989233/

  • 7/30/2019 Rise of Data Science in Age of Big Data

    10/38

    Revolution ConfidentialMac hine learning (ML ) for predictions

    10

    Response

    Features

    Responses

    ML

    scoringrules

    Building the Model

    Validat

    ion

    set

    Predictions

    scoringrules

    Validating the Model

    New

    Data

    P

    redictions(scores)

    scoringrules

    Scoring new data

    Accuracy

  • 7/30/2019 Rise of Data Science in Age of Big Data

    11/38

    Revolution ConfidentialP roblem: A lack of pers pective

    11Image 2010 David M Smith. Some rights reserved CC BY-2.0

  • 7/30/2019 Rise of Data Science in Age of Big Data

    12/38

    Revolution ConfidentialP roblem: L ac k of credibility

    12

  • 7/30/2019 Rise of Data Science in Age of Big Data

    13/38

    Revolution ConfidentialP roblem: C omplexity

    13

  • 7/30/2019 Rise of Data Science in Age of Big Data

    14/38

    Revolution ConfidentialData Science to the

    Rescue!

    14

  • 7/30/2019 Rise of Data Science in Age of Big Data

    15/38

    Revolution ConfidentialA ns wer Unas ked Ques tions

    15Revolutions blog: The Uncanny Valley of Big Datahttp://blog.revolutionanalytics.com/2012/02/the-uncanny-valley-of-big-data.html

  • 7/30/2019 Rise of Data Science in Age of Big Data

    16/38

    Revolution Confidential

    16

    More data beatsbetter algorithms,every time Google

    Companies that have

    massive amounts of datawithout massive amounts

    of clue are going to bedisplaced by startups thathave less data but more

    clue.--Tim OReilly

    Google Research, The Unreasonable Effectiveness of Data:

    http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html

    Tim OReilly on Google+: https://plus.google.com/107033731246200681024/posts/4Xa76AtxYwdTechnoCalifornia: http://technocalifornia.blogspot.com/2012/07/more-data-or-better-models.html

    F ill in knowledge gaps

  • 7/30/2019 Rise of Data Science in Age of Big Data

    17/38

    Revolution ConfidentialAvoid ineffective reactions

    17Stupid Data Miner Trickshttp://nerdsonwallstreet.typepad.com/my_weblog/files/dataminejune_2000.pdf

    S&P500

  • 7/30/2019 Rise of Data Science in Age of Big Data

    18/38

    Revolution Confidential

    18 Henricks Photos CC-BY-ND 2.0http://www.flickr.com/photos/hendricksphotos/3240667626/

  • 7/30/2019 Rise of Data Science in Age of Big Data

    19/38

    Revolution Confidential0. Data (B ig & Mes s y)

    19

  • 7/30/2019 Rise of Data Science in Age of Big Data

    20/38

    Revolution Confidential1. A language for programming with data

    20

    Download the White Paper

    R is Hotbit.ly/r-is-hot

    http://info.revolutionanalytics.com/R-is-Hot-Whitepaper.htmlhttp://info.revolutionanalytics.com/R-is-Hot-Whitepaper.html
  • 7/30/2019 Rise of Data Science in Age of Big Data

    21/38

    Revolution Confidential

    21

    Grant awards to homeless veterans FY09Data: Data.govAnalysis: Drew Conway

    User-defined functions

    Internet API interfaceXML parsing

    Custom graphics

    Data import and pre-processing

    Iterative data processing

    http://explore.data.gov/National-Security-and-Veterans-Affairs/VA-Homeless-Grant-and-Per-Diem-FY09/2uzu-vjiahttp://www.drewconway.com/zia/?p=2486http://www.drewconway.com/zia/?p=2486http://explore.data.gov/National-Security-and-Veterans-Affairs/VA-Homeless-Grant-and-Per-Diem-FY09/2uzu-vjia
  • 7/30/2019 Rise of Data Science in Age of Big Data

    22/38

    Revolution Confidential2. S peed. L ots and lots of s peed.

    22

    VariableTransformation

    ModelEstimation

    ModelRefinement

    ModelComparison /Benkmarking

    Feature

    SelectionSampling

    Aggregation

    Data Predictions

  • 7/30/2019 Rise of Data Science in Age of Big Data

    23/38

    Revolution Confidential

    Core 0(Thread 0)

    Core n(Thread n)

    Core 2(Thread 2)

    Core 1(Thread 1)

    Multicore Processor (4, 8, 16+ cores)

    DataData Data

    Disk

    Shared Memory

    Us e all available c omputing c yc les

    23

  • 7/30/2019 Rise of Data Science in Age of Big Data

    24/38

    Revolution Confidential

    ComputeNode

    ComputeNode

    MasterNode

    DataPartition

    DataPartition

    ComputeNode

    Compute

    Node

    DataPartition

    DataPartition

    3. A lgorithms that dont c hoke on B ig Data

    PEMAs: Parallel External-Memory Algorithms

    24

    BIGDATA

  • 7/30/2019 Rise of Data Science in Age of Big Data

    25/38

    Revolution ConfidentialDrink les s c offee!

    25

    Single ThreadedNon-optimized

    algorithms

    OptimizedParallelized

    Algorithms

  • 7/30/2019 Rise of Data Science in Age of Big Data

    26/38

    Revolution Confidential4. Move code to data (not vice versa)

    26

    Map-Reduce

    RHadoop: http://bit.ly/RHadoop

  • 7/30/2019 Rise of Data Science in Age of Big Data

    27/38

    Revolution ConfidentialB ig Data A ppliances

    27

    More info: http://bit.ly/R-Netezza

    http://bit.ly/R-Netezzahttp://bit.ly/R-Netezza
  • 7/30/2019 Rise of Data Science in Age of Big Data

    28/38

    Revolution ConfidentialPlay Nice with Others

    Business Intelligence Tools Web-based data apps

    Reporting / Spreadsheets

    Presentation Layer

    R

    Analytics Layer

    Relational datastores Unstructured datastores

    Data Layer

    28

  • 7/30/2019 Rise of Data Science in Age of Big Data

    29/38

    Revolution ConfidentialWhat every data s c ientis t needs

    Open-Source RRevolution R

    Enterprise

    Interface with multiple data sources

    Exploratory data analysis

    Wide range of statistical methods

    High-speed computation

    Big Data support

    Data/code locality (Hadoop, etc.)

    Print-quality data visualization

    Scheduled batch production

    Works in a multi-tool ecosystem

    Integration into Data Apps

    29

  • 7/30/2019 Rise of Data Science in Age of Big Data

    30/38

    Revolution ConfidentialR evolution R E nterpris e: B ig-Data R

    Open-Source RRevolution R

    Enterprise

    Interface with multiple data sources

    Exploratory data analysis

    Wide range of statistical methods

    High-speed computation

    Big Data support

    Data/code locality (Hadoop, etc.)

    Print-quality data visualization

    Scheduled batch production

    Works in a multi-tool ecosystem

    Integration into Data Apps

    30www.revolutionanalytics.com/products

  • 7/30/2019 Rise of Data Science in Age of Big Data

    31/38

    Revolution Confidential

    31Image www.tinyplanetphotography.com

  • 7/30/2019 Rise of Data Science in Age of Big Data

    32/38

    Revolution ConfidentialAnd the future?

    Even more data

    Cloud computing

    Demand forData Scientists

    Diverging paradigms for data analytics

    32http://www.indeed.com/jobtrends

  • 7/30/2019 Rise of Data Science in Age of Big Data

    33/38

    Revolution ConfidentialDiverging data paradigms

    33

    Hadoop

    NoSQL

    Files

    Clusters

    Data

    Appliances

    More data, better fault tolerance

    Easier programming, better performance

    Exploration

    Modeling

    Storage

    Preprocessing

    Production

  • 7/30/2019 Rise of Data Science in Age of Big Data

    34/38

    Revolution ConfidentialData S c ience in P roduction

    Real-time Big Data Analytics: FromDeployment to Production

    Thursday, November 29, 2012

    10:00AM - 11:00AM Pacific Time

    www.revolutionanalytics.com/news-events/free-webinars/

    34

  • 7/30/2019 Rise of Data Science in Age of Big Data

    35/38

    Revolution ConfidentialB uilding Data S c ience Teams

    DJ Patil in OReilly Radar: http://oreil.ly/I3H5fI

    Statistics and Data Science graduates

    Kaggle and Chorus

    Revolution Analytics R Training: http://www.revolutionanalytics.com/services/training/

    35

    http://oreil.ly/I3H5fIhttp://www.revolutionanalytics.com/services/training/http://www.revolutionanalytics.com/services/training/http://oreil.ly/I3H5fI
  • 7/30/2019 Rise of Data Science in Age of Big Data

    36/38

    Revolution ConfidentialClosing Thoughts

    Data Science process leads to morepowerful, and more useful models

    Data Scientists need a technology platformto think about, explore, and model data

    Revolution R Enterprise is R for Big Data

    36

  • 7/30/2019 Rise of Data Science in Age of Big Data

    37/38

    Revolution ConfidentialResources

    Revolution R Enterprise : R for Big Data www.revolutionanalytics.com/products

    Rhadoop : Connecting R and Hadoop

    bit.ly/r-hadoop

    Contact David Smith

    [email protected]

    @revodavid

    blog.revolutionanalytics.com

    37

    http://www.revolutionanalytics.com/productshttp://bit.ly/r-hadoopmailto:[email protected]://blog.revolutionanalytics.com/http://blog.revolutionanalytics.com/mailto:[email protected]://bit.ly/r-hadoophttp://www.revolutionanalytics.com/products
  • 7/30/2019 Rise of Data Science in Age of Big Data

    38/38

    Revolution ConfidentialT hank you.

    www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR

    The leading commercial provider of software and support for the popularopen source R statistics language.

    http://www.revolutionanalytics.com/http://twitter.com/RevolutionRhttp://twitter.com/RevolutionRhttp://www.revolutionanalytics.com/http://www.revolutionanalytics.com/

Recommended