Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | john-de-goes |
View: | 103 times |
Download: | 0 times |
Introduction to Quirrel & ROSCON, July 25
John A. De Goes@jdegoes
Quirrel is an open standard language designed for the analysis of large-scale, heterogeneous data sets.
overview
R is an open source programming language and interactive environment for statistical computing and graphics.
Quirrel R
● Young language, still evolving
● Nascent community● Intentionally limited
● Simple, consistent core● Fully parallel● Purely functional● Programmatic or
interactive
quirrel versus r
Quirrel R
CONS / PROS
PROS / CONS
● Mature language, "feature-complete"
● Robust community● Turing-complete
● Complex core● Mostly parallel● Imperative● Interactive
what's the right tool for the job?
Small amount of
data?
Simple analytics?
Simple analytics?
YES
NO
NO
YES
YES
NO
Quirrel
Hive / Pig
SQL
R
pageViews := //pageViewsavg := mean(pageViews.duration)bound := 1.5 * stdDev(pageViews.duration)pageViews.userId where pageViews.duration > avg + bound
sneak peek
pageViews <- read.csv("pageViews.csv")avg <- mean(pageViews$duration)bound <- 1.5 * sd(pageViews$duration)userIds <- subset(pageViews, duration > avg + bound, select=userId)
Quirrel
R
data models
Everything is a random variable.
true, false1, 3.1415null, undefined"Mary Jane"[1, 2, 3][[1, 2, 3], [4, 5, 6], [7, 8, 9]]{"name": "John"}1 || 2 || 3 || 4 || 5 || 6[1, "foo", [1, false]]
Quirrel REverything is an ordered sequence of values.*
TRUE, FALSE1, 3.1415NA, NaN, Inf"Mary Jane"c(1, 2, 3)array(c(1,4,7,2,5,9,3,6,9), dim=c(3,3))data.frame(name=c("John"))c(1, 2, 3, 4, 5, 6)list(1, "foo", list(1, FALSE))
*Except when it's not.
comments
-- ignore me
(- ignore me too! -)
Quirrel R
# ignore me
# ignore # me # too!
basic expressions
2 * 4
(1 + 2) * 3 / 9 > 23
3 > 2 & (1 != 2)
2 + 2 = 4
false & true | !false
undefined = undefined
Quirrel R2 * 4
(1 + 2) * 3 / 9 > 23
3 > 2 & (1 != 2)
2 + 2 == 4
FALSE & TRUE | !FALSE
NA == NA
named expressions
x := 2
square := x * x
Quirrel R
x <- 2
square <- x * x
loading data
//pageViews
load("/pageViews")
//daily_snapshots/*
Quirrel R
read.csv("pageViews")
read.csv("pageViews")
lapply(Sys.glob("daily_snapshots/*", read.csv))
drilldown
pageViews := //pageViews
pageViews.userId
pageViews.keywords[2]
Quirrel R
pageViews <- read.csv("pageViews")
pageViews$userId
vector[2]
list[[1]]
reductions
count(purchases)
sum(purchases.total)
mean(purchases.total)
stdDev(purchases.total)
Quirrel R
length(purchases)
sum(purchases$total)
mean(purchases$total)
sd(purchases$total)
filtering
views.userId where views.duration > 1000
Quirrel Rsubset(views, duration > 100, select=userId)
augmentation
clicks with {dow: dayOfWeek(clicks.ts)}
Quirrel Rclicks$dow <- weekdays(clicks$ts)
libraries
import std::stats::rank
pageViews := //pageViews
rank(pageViews.duration)
Quirrel Rlibrary(data.table)
pageViews <- read.csv("views.csv)
rank(pageViews$duration)
user-defined functions
ctr(day) := count(clicks where clicks.day = day) / count(impressions where impressions.day = day)
ctr("Monday")
Quirrel Rctr <- function(d) { c1 <- subset(clicks, clicks$day == d) c2 <- subset(impressions, impressions$day == d) length(c1$day) / length(c2$day)}
ctr("Monday")
grouping - implicit constraints
solve 'day {day: 'day, ctr: count(clicks where clicks.day = 'day) / count(impressions where impressions.day = 'day)}
Quirrel Rclicks$count1 <- 0
c1 <- aggregate(count1 ~ day, data = clicks, FUN=length)
impressions$count2 <- 0 c2 <- aggregate(count2 ~ day, data = impressions, FUN=length)
r <- merge(c1, c2)
ctr <- data.frame(day = r$day, ctr = r$count1 / r$count2)
grouping - explicit constraints
solve 'date = purchases.date {date: 'date, cummTotal: sum(purchases.total where purchases.date < 'date)}
Quirrel Rpurchases2 <-purchases[ order(purchases$date)]
data.frame( date = purchases2$date, cummTotal = cumsum(purchases2$total))
Questions?Nov - Dec 2012
Quirrel / R Challenge ProblemsNov - Dec 2012
■ Using the /london_medals/summer_games data, find the youngest athlete to win a medal
challenge problem #1
Download dataset at http://labcoat.precog.com
■ Using the /london_medals/summer_games data, find the oldest athlete to win a medal
challenge problem #2
Download dataset at http://labcoat.precog.com
■ Using the /london_medals/summer_games data, find the average age at which athletes win medals
challenge problem #3
Download dataset at http://labcoat.precog.com
■ Using the /london_medals/summer_games data, find the most common age to win a medal
challenge problem #4
Download dataset at http://labcoat.precog.com
Thank you!
Follow me on Twitter:@jdegoes
Learn more about R:r-project.org
Download R:r-project.org/mirrors.html
Sign up for a free Precog account:precog.com
Learn more about Quirrel:quirrel-lang.org
Nov - Dec 2012