Date post: | 01-Jun-2018 |
Category: |
Documents |
Upload: | krishnanand |
View: | 226 times |
Download: | 0 times |
of 36
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
1/36
Laur ence Li ew
Gener al Manager , APAC
BigData
= BigMaths
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
2/36
Global Industries SerFi nanci al Servi ces
Di gi t al Medi a
Gover nment
Heal t h & Li f e Sci ences
Hi gh Tech
Manufact uri ng
Reta i l
Tel co
Our Software DeliversPower: Di st r i but ed, scal abl e hi gh perf or mance
advanced anal yt i cs
Productivity: Easi er t o bui l d and depl oy anal yti c
appl i cat i ons
Enterprise Readiness: Mul t i - pl atf orm
Our PhilosophyCust omer- cent r i c i nnovati on
Easy t o do busi ness wi t h
Who we are
Leadi ng pr ovi der of commer ci al anal yti cs pl atf or m
based on open sour ce R st at i st i cal comput i ng
l anguage
Customers200+ Gl obal 2000
Global PresenceNor t h Amer i ca / EMEA / A
Our Services DeliverKnowledge: Our expert s enabl e you t o be expert s
Time-to-Value: Our Qui ckSt art pr oj ect s gi ve you
a j umpst art
Guidance: Our cust omer support t eam i s her e to
hel p you
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
3/36
Consumer & Info Svcs
200 Corporate Customers and Growing
Finance & Insurance Healthcare & Life Sciences
Manuf & TechAcademic & Govt
Revolution Co
http://www.google.com/imgres?imgurl=https://www.detecon.com/media.php/images/references/Cell_C_Logo_150x100.jpg&imgrefurl=https://www.detecon.com/en/services/references.html?page=13&sort=Referenzen_Sort_DESC&usg=__GH3P5sK3kx9QLJQJiYdGAaJ0gd0=&h=100&w=150&sz=23&hl=en&start=2&itbs=1&tbnid=XqUC3mBkwM790M:&tbnh=64&tbnw=96&prev=/images?q=cellc+logo&hl=en&gbv=2&tbs=isch:1http://www.edftrading.com/default.aspxhttp://www.astellas.us/http://www.google.com/imgres?imgurl=http://www.responsible-investor.com/images/uploads/resources/profile/logo/21196678884Aberdeen1.JPG&imgrefurl=http://www.responsible-investor.com/resource/profile_page/aberdeen_asset_management/&usg=__DF-OV7v-YNbEXS-Zf56xV5EWPQE=&h=162&w=299&sz=13&hl=en&start=4&itbs=1&tbnid=k10Hn_J6hgbhvM:&tbnh=63&tbnw=116&prev=/images?q=aberdeen+asset+management+logo&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://static.85broads.com/images/AllianceBernsteinLogo.gif&imgrefurl=http://static.85broads.com/images/&usg=__JYdkyIGo5u8ky3ppR1wPFJnRr7k=&h=112&w=210&sz=4&hl=en&start=2&itbs=1&tbnid=YZleIaA1P4ShCM:&tbnh=57&tbnw=106&prev=/images?q=alliance+bernstein+logo&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://www.stockwatch.in/files/Procter-Gamble.bmp&imgrefurl=http://www.stockwatch.in/procter-amp-gamble-gains-growth-27-its-q4-net-profit-21991&usg=__locyPrugpH5UDMvKU0SK90l_uSM=&h=269&w=448&sz=354&hl=en&start=1&itbs=1&tbnid=DrQpFdRnGHHqtM:&tbnh=76&tbnw=127&prev=/images?q=procter+gamble&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://www.globalpathwaysproject.org/images/aegon.jpg&imgrefurl=http://www.globalpathwaysproject.org/fund_sponsor.html&usg=__G0tdJIQK_oqO6oJtTNqqGadkz_I=&h=91&w=159&sz=6&hl=en&start=1&itbs=1&tbnid=Z4f7zHM2BMLqvM:&tbnh=56&tbnw=97&prev=/images?q=aegon+direct+marketing+services+logo&hl=en&gbv=2&tbs=isch:1http://www.google.com/imgres?imgurl=http://www.valueclickmedia.com/images/vcm_logo_press.jpg&imgrefurl=http://www.valueclickmedia.com/about_press.shtml&usg=__z_EcDYiGvfiq9OykUmXbMJVA1Po=&h=492&w=1176&sz=62&hl=en&start=1&itbs=1&tbnid=9LBc_CYkUUTeLM:&tbnh=63&tbnw=150&prev=/images?q=valueclick+media+inc&hl=en&gbv=2&tbs=isch:18/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
4/36
Centre of Excellence COE
Partner with iLEs to create new IPs in big data
analytics in Singapore
Big data analytics training/workshops
We wi l l have our dat a sci ent i st and devel oper s wal ongsi de our col l abor at i on par t ner s.
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
5/36
Centre of Attachment COA
To accelerate formation of data science team wit
organization
Anal yt i cs/ stat i st i cs ski l l s
Bi g dat a i nf r ast r uct ur e ski l l s such as Hadoop HPC cl ust er s
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
6/36
THE PERFECT STORM
CONVERGENCEOF
Why Big Data Now?
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
7/36
ERP
Cost
Records
Summary
Operating
Statistics
Vehicle
Monitoring
Incidents
Alarms
Systems
Logs
Volumes
Text
Instructions
Workorders
Reports
Video
And
Imagery
Machine
SensorsRealtime
Telemetry
3D/4D
Seismic
Exabytes
Petabytes
Terabytes
Gigabytes
Increasing Volume, Variety and Velocity
7 Dec
Communication
Logs
Geospatial
ESRI
Logistics
Daily
Activity
Reports
Backdrop - Massive Data Volumes
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
8/36
Whats big data?
Volume Variety Velocity
N t G ti Bi D t A l ti
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
9/36
Next Generation Big Data AnalyticsPlayers
INFRASTRUCTURE AND D
ANALYTICS
? ? ?
HDD -> SSD -> In-Memo
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
10/36
What is R (Video)
http://www.youtube.com/watch?feature=player_embe
dded&v=TR2bHSJ_eck
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
11/36
Statistical data analysis programming language Huge library algorithms for data acce
analysis & graphics
= Language + Analytics
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
12/36
Data Analytics Workflow
INGEST DISTILL & ANALYZE CONSUM
R is open source and drives analytic innovat
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
13/36
R is open source and drives analytic innovatbut.has some limitations for Enterprises
Disk based
scalabilit
Parallelthreading
Commercialsupport
Leverage source paplus Big ready pac
CommercialLicense
In memory bound
Singlethreaded
Community support
4500+ innovativeanalytic packages
Risk ofdeployment ofopen source
Big Data
Speed of
Analysis
Enterprise
Readiness
Analytic
Breadth
& Depth
Commercial
Viability
13
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
14/36
Big Data Speed @ Scalewith Revolution R Enterprise
Fast Math Libraries
Parallelized Algorithms
In-Database Execution
Multi-Threaded Execution
Multi-Core Processing
In-Hadoop Exe
Memory Management
Parallelized User Code
14
i i
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
15/36
Revolution R Enterprise ScaleRPerformance and Capacity
15
SAS HPA B h ki i *
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
16/36
SAS HPA Benchmarking comparison*Logistic Regression
Rows of data 1 billion 1 billion
Parameters just a few 7
Time 80 seconds 44 seconds
Data location In memory On disk
Nodes 32 5
Cores 384 20
RAM 1,536 GB 80 GB
Revolution R is faster on the same amount of data, despite using approximately a 20thas many cores, a 20th
much RAM, a 6thas many nodes, and not pre-loading data into RAM.
*As published by SAS in HPC Wire, April 21, 2011
Double
45%
1/6th
5%
5%
Revolution R Enterprise Delivers Performance at 2% of the Cost
16
32 nodes
appliance
~ $2.5M
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
17/36
Benchmarks: RevoR vs legacy tool
Airline data set: 123,534,969 rows and 29 columns in its original state.
All tests were run on laptop: 16GB RAM, SSD, and i7-3632QM [email protected].
mailto:[email protected]:[email protected]8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
18/36
Allstate compares SAS and R for BigData Insurance Models
150 million observations and 70
freedom.
"It's difficult to be product
on a tight schedule if it tak
over 5 hours to fit one
candidate models "
Approach Platform T
1: SAS 16-core Sun Server 5
2 R 250 GB Server I
3: RRE 5-node (4 cores / node) LSF cluster 5
So what have we learned: SAS works, but is slow.
The data is too big for open-source R, even on a verylarge server. Revolution R Enterprise gets the same results as SAS,but about 50x faster.
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
19/36
DistributedR
ScaleR
ConnectR
DeployR
Write Once. Deploy Anywhere.
DESIGNED FOR SCALE, PORTABILITY & PERFORMA
In the Cloud CloudR
Workstations &Servers
Desktop
ServerLinux
Clustered SystemsLinux HPWindows
EDW Teradata
HadoopHortonwCloudera
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
20/36
From Laptops, workstations andServers
To H gh Performance Compute
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
21/36
To H gh Performance ComputeClusters
Frontend
- 2-way or 4-
- Cluster Ma
- Fast HDD
- Lots of RAM
Comput
- 2-way
- Compu
- Fast CP
- Fast H
- Lots of
Storage Node
- 2-way or NAS
- external SCSI OR
- external SAN
- FAST HDD- Cluster FS
Supercomputing Network
- High Bandwidth (>250MB/s)
- Low Latency (1.2-8us)
- Cost effective: GE
- Performance:
- Infiniband
- 1GE or 10 GE
- NumaConnect
Admin N
- Good B
- Route a
- Typical
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
22/36
To Hadoop-scale on-disk analytics
The Apachsof t war e
a f r amewoal l ows f odi st r i butpr ocessi ndat a set scl ust er s
comput er s
1 node 12TB10 nodes 100 nodes
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
23/36
To in-database
clusters
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
24/36
clusters
We first became interested in shared me
the programming paradigmI think we ar
lot more people looking at this type of en- William W. Thigpen, Chief, Engineering Branch, NASA Advan
SMP S
8 nod
16 CP
256 co1TB R
One L
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
25/36
To Cloud
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
26/36
Write Once. Deploy Anywhere.
1
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
27/36
Hadoop + R
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
28/36
Hadoop
Dell PowerEdge Servers
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
29/36
Linear Regression with RevoR on a
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
30/36
Linear Regression with RevoR on aHadoop Cluster!
Total: ~ 2 lines of R code, Productivity of 50 times
RevoR with Hadoop
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
31/36
RevoR with Hadoop
Complex & B
Big Anal tics on Big Data in Hadoop
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
32/36
Big Analytics on Big Data in Hadoop
100% R on Hadoop
Ful l Ski l l Tr ansf er - needed.
Use 4500+ CRAN Package
Bl end Combi ne R & Ot he/ Methods
100% Por t abi l i t y
Bui l d Once Depl oy Ma
Tr ack Evol ut i on of Had
Pr ot ect Agai nst Pl at f oUncer t ai nt y
Avoi d Pl at f or m Lock- i n
Hadoop Per f or mance & S
Lever age Hadoop Par al lEasi l y
Anal yze Data Wi t hout M
Data
Analytics
Applications
Hadoop
+
Scalable
Compute
HDFS
HBase
Portability.
Parallel Storage
Hive
Big Data
Scale
100% R.
32
RRE V7 inside Hadoop
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
33/36
RRE V7 inside Hadoop
Analytics
Applications
Edge Node
MapReduce
Hadoop
HDFS
Other MapReduce Jobs
HBase
Revolution
R Enterprise
DistributedR
Framework
ScaleR Algorithms
ConnectR:
HBase
HDFS
ODBC &
High-Speed Connectors
Revolution
R Enterprise
DistributedR
Framework
ScaleR Algorithms
ConnectR:
HBase
HDFS
ODBC &
High-Speed Connectors
DeployR
DB, EDW
M2M
Applications
Analytics
Data
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
34/36
So how do I start?
bi d t t t kit
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
35/36
www.bigdatastarterkit.comwww.bigdataconsume
8/9/2019 Bd Day 1 1425. Bigdataequalsbigmaths
36/36
Q & A
Revol ut i on Anal yt i cs i s t he l eadi ng
commer ci al pr ovi der of sof t war e andsuppor t f or t he popul ar open sour ce Rst at i st i cs l anguage.E: Laur ence. l i ew@r evol ut i onanal yt i cs. com
W: www. r evol ut i onanal yt i cs. com
http://www.revolutionanalytics.com/http://www.revolutionanalytics.com/http://www.revolutionanalytics.com/http://www.revolutionanalytics.com/