How we are using BigQuery and Apps Scripts at teowaki

Post on 27-Jan-2015

110 views 1 download

Tags:

description

I was invited to speak at the Google Startup Launch Summit in London about how we are using the google cloud to power our startup

transcript

javier ramirez@supercoco9

How we are usingBigQuery andApps Scripts

at teowaki

Set a distance.

Set an expiration time.

Bye bye noise.

Analytics flow

Analytics flow, by segment

Automatic Alerts

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

REST API (Ruby on Rails) +

Web on top (AngularJS)

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.

Ed Dumbill program chair for the O’Reilly Strata Conference

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

Cloud Storage:Cost-efficient storage of files

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

HadoopCassandraAmazon Redshift...

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

tools we considered:

Our choice:

Google BigQuery

Data analysis as a service

http://developers.google.com/bigquery

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

Based on “Dremel”

Specifically designed for interactive queries over petabytes of real-time data

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

loading data

You just send the data intext (or JSON) format

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

SQL

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

select name from USERS order by date;

select count(*) from users;

select max(date) from USERS;

select sum(total) from ORDERS group by user;

specific extensions for analytics

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

withinflattennest

stddev

topfirstlastnth

variance

var_popvar_samp

covar_popcovar_samp

quantiles

web console screenshot

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

our most active user

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

country segmented traffic

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

10 request we should be caching

javier ramirez @supercoco9 http://teowaki.com startup launch summit london 14

5 most created resources

new users per month

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (

SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url

)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25

Automation with Apps Script

Read from bigquery

Create a spreadsheet on Drive

E-mail it everyday as a PDF

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

cloud storage pricing

$0.032 per GB

a gzipped 4.8 MB file stores 1MM rows

$0.000092 / month per 1MM rows

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

bigquery pricing

$26 per stored TB1000000 rows => $0.00416 / month

£0.00243 / month

$5 per processed TB1 full scan = 160 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

£0.054307 / month*

per 1MM rows

*the 1st 100GB every month are free of charge

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14

ig

Find related links at

https://teowaki.com/teams/javier-community/link-categories/bigquery-talk

Thanks!

Javier Ramírez@supercoco9

startup launch summit london 14