Date post: | 25-Jun-2015 |
Category: |
Documents |
Upload: | hortonworks |
View: | 572 times |
Download: | 0 times |
© Hortonworks Inc. 2013
Quick House Keeping Rule
• Q&A panel is available if you have any questions during the
webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording
Page 1
© Hortonworks Inc. 2013
Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers
Jeff Markham
Solution Engineer
© Hortonworks Inc. 2013
Agenda
• Introductions• Use Case Description• Preparation• Demo• Review• Q & A
Page 3
© Hortonworks Inc. 2013
Use Case Description
• Visualizing data• Tools vs. application development• Choosing the technology
• Hortonworks Data Platform• RHadoop• Google Charts
Page 4
© Hortonworks Inc. 2013
OS Cloud VM Appliance
Preparation: Install HDP
Page 5
HORTONWORKS DATA PLATFORM (HDP)
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness: HA, DR, Snapshots, Security, …
Distributed Storage & ProcessingHDFS YARN (in 2.0)
WEBHDFS MAP REDUCE
DATASERVICES
Store, Process and Access Data
HCATALOG
HIVEPIGHBASE
SQOOP
FLUME
OPERATIONAL SERVICES
Manage & Operate at
ScaleOOZIE
AMBARI
© Hortonworks Inc. 2013
Preparation: Install R
Page 6
• Install R language
• Install appropriate packages– rhdfs– rmr2–googleVis– shiny–Dependencies for all above
© Hortonworks Inc. 2013
Preparation
Page 7
• rmr2–Functions to allow for MapReduce in R apps
• rhdfs–Functions allowing HDFS access in R apps
• googleVis–Use of Google Chart Tools in R apps
• shiny– Interactive web apps for R developers
© Hortonworks Inc. 2012
Demo WalkthroughUsing Hadoop, R, and Google Chart Tools
© Hortonworks Inc. 2013
Visualization Use Case
• Data from CDC– Vital statistics publicly available data– 2010 US birth data file
Page 9
S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 0321 1006 314 2000 2 222 2 2 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 2 2 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 1 0 1 1 1 111111 11 1 1 1 1
SAM
PLE
RECO
RD
source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm
© Hortonworks Inc. 2013
Visualization Use Case
Page 10
> hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/
PUT
DATA
INTO
HD
FS
> hadoop fs –mkdir /user/jeff/natality
CREA
TE H
DFS
DIR
• Put data into HDFS– Create input directory– Put data into input directory
© Hortonworks Inc. 2013
Visualization Use Case
Page 11
#!/usr/bin/env Rscript
require('rmr2')require('rhdfs')hdfs.init()
hdfs.data.root = 'natality'hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')hdfs.out.root = hdfs.data.roothdfs.out = file.path(hdfs.out.root, 'out')
. . .
R SC
RIPT
• Write R script– Specify use of RHadoop packages– Initialize HDFS– Specify data input and output location
© Hortonworks Inc. 2013
Visualization Use Case
Page 12
. . .
mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1)}
reducer = function(key, vv) {# count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE))} . . .
R SC
RIPT
• Write R script– Write mapper function– Write reducer function
© Hortonworks Inc. 2013
Visualization Use Case
Page 13
. . .
job = function (input, output) { mapreduce(input = input, output = output, input.format = "text", map = mapper, reduce = reducer, combine = T)} . . .
R SC
RIPT
• Write R script– Write job function
© Hortonworks Inc. 2013
Visualization Use Case
Page 14
. . .
out = from.dfs(job(hdfs.data, hdfs.out))results.df = as.data.frame(out,stringsAsFactors=F)R
SCRI
PT
• Write R script– Write result to HDFS output directory
© Hortonworks Inc. 2013
Visualization Use Case
Page 15
> mkdir ~/my-shiny-app
SHIN
Y AP
P D
IR
• Create Shiny application
– Create directory– Create ui.R– Create server.R
© Hortonworks Inc. 2013
Visualization Use Case
Page 16
shinyUI(pageWithSidebar(
# Application title headerPanel("2010 US Births"),
sidebarPanel(. . .),
mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) )))
UI.R
SO
URC
E
• Create Shiny application– Create ui.R
© Hortonworks Inc. 2013
Visualization Use Case
Page 17
library(googleVis)library(shiny)library(rmr2)library(rhdfs)
hdfs.init()
hdfs.data.root = 'natality'hdfs.data = file.path(hdfs.data.root, 'out')df = as.data.frame(from.dfs(hdfs.data))
. . .
SERV
ER.R
SO
URC
E
• Create Shiny application– Create server.R
© Hortonworks Inc. 2013
Visualization Use Case
Page 18
. . . shinyServer(function(input, output) {
output$lineChart <- renderGvis({ gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) . . .
SERV
ER.R
SO
URC
E
• Create Shiny application– Create server.R
© Hortonworks Inc. 2013
Visualization Use Case
Page 19
> shiny::runApp('~/my-shiny-app')Loading required package: shiny
Welcome to googleVis version 0.4.0
. . .
HADOOP_CMD=/usr/bin/hadoop
Be sure to run hdfs.init()
Listening on port 8100
RUN
SH
INY
APP
• Run Shiny application
© Hortonworks Inc. 2013
Visualization Use Case
Page 20
• View Shiny application
© Hortonworks Inc. 2012
Demo LiveUsing Hadoop, R, and Google Chart Tools
© Hortonworks Inc. 2013
Visualization Use Case
Page 22
• Architecture recap– Analyze data sets with R on Hadoop– Choose RHadoop packages– Visualize data with Google Chart Tools via googleVis package– Render googleVis output in Shiny applications
• Architecture next steps– Integrate Shiny application into existing web apps– Create further data models with R
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 23
HORTONWORKS DATA PLATFORM (HDP)
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
PLATFORM SERVICES
HADOOP CORE
Enterprise Readiness: HA, DR, Snapshots, Security, …
Distributed Storage & ProcessingHDFS YARN (in 2.0)
WEBHDFS MAP REDUCE
DATASERVICES
Store, Process and Access Data
HCATALOG
HIVEPIGHBASE
SQOOP
FLUME
OPERATIONAL SERVICES
Manage & Operate at
ScaleOOZIE
AMBARI
© Hortonworks Inc. 2013
HDP Sandbox
Page 24