+ All Categories
Home > Data & Analytics > Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Date post: 14-Apr-2017
Category:
Upload: work-bench
View: 7,439 times
Download: 1 times
Share this document with a friend
35
Dr. Datascience Or: How I Learned to Stop Munging and Love Tests Mike Malecki ([email protected]) Neal Richardson ([email protected])
Transcript
Page 1: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Dr.Datascience

Or:HowILearnedtoStopMungingandLoveTests

MikeMalecki([email protected])

NealRichardson([email protected])

Page 2: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Aboutus

•Politicalscientists

•Thenworkedinsurveyresearchindustry

•Nowindataproductdevelopment

•Crunch.io

Page 3: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Data“Science”

Page 4: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

vs.“Faith-basedcoding”

•Misplacedfaithinowninfallability ✔︎

•Yourcodeworksbecauseyoubelieveitdoes

•Itsoutputfeelstrue

Page 5: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Tests

•Maketheimplicitexplicit

•Turnassumptionsintoassertions

•Areaformofdocumentation

•Reducecomplexity

•Areliberating

Page 6: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Whataretests?

•Assertions,writtenincode,thatyourfunctionsdowhatyouexpect

•Thatifyougivecertaininputs,you’llgetknown,expectedoutputs

•Thatgivinginvalidinputresultsinanexpectedfailure

•Testsarecode:codethatmustberuneverytimeyoumakechanges

Page 7: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Gettingstarted

•Makeapackage

Page 8: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Gettingstarted

•Makeapackage

source("mycode.R")df<-read.csv("data.csv")doThings(df)

Page 9: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Gettingstarted

•Makeapackage.Notthatdifferent.

Useapackageskeleton,suchashttps://github.com/nealrichardson/skeletor

library(rmycode)df<-read.csv("data.csv")doThings(df)

Page 10: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Testingflow

•Writetest.Runitandseeitfail.

•Writecodethatmakestestpass.

•Runtestsagain.Seethempass.

•Repeat

Page 11: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

ReadandanalyzeAWSElasticLoadBalancerlogs

Page 12: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

enpiar:cnpr$R-e'skeletor::skeletor("elbr")'enpiar:cnpr$cdelbrenpiar:elbrnpr$atom.

Page 13: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

#elbr/tests/testthat/test-read.R

context("read.elb")

test_that("read.elbreturnsadata.frame",{ expect_true(is.data.frame(read.elb("example.log")))})

Page 14: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:1

Failed-------------------------------------------------------------------------1.Error:read.elbreturnsadata.frame(@test-something.R#4)------------------couldnotfindfunction"read.elb"1:.handleSimpleError(function(e){e$call<-sys.calls()[(frame+11):(sys.nframe()-2)]register_expectation(e,frame+11,sys.nframe()-2)signalCondition(e)},"couldnotfindfunction\"read.elb\"",quote(eval(expr,envir,enclos)))attestthat/test-something.R:42:eval(expr,envir,enclos)

DONE===========================================================================Error:Testfailures

Page 15: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

#elbr/R/read-elb.R

read.elb<-function(file,stringsAsFactors=FALSE,...){read.delim(file,sep="",stringsAsFactors=stringsAsFactors,col.names=c("timestamp","elb","client_port","backend_port","request_processing_time","backend_processing_time","response_processing_time","elb_status_code","backend_status_code","received_bytes","sent_bytes","request","user_agent","ssl_cipher","ssl_protocol"),...)}

Page 16: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:.

DONE===========================================================================

Page 17: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

test_that("read.elbreturnsadata.frame",{df<-read.elb("example.log")expect_true(is.data.frame(df))expect_equal(dim(df),c(4,15))})

Page 18: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:.1

Failed-------------------------------------------------------------------------1.Failure:read.elbreturnsadata.frame(@test-something.R#6)----------------dim(df)notequaltoc(4,15).1/2mismatches[1]3-4==-1

DONE===========================================================================Error:Testfailures

Page 19: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

read.elb<-function(file,stringsAsFactors=FALSE,...){read.delim(file,sep="",header=FALSE,#<--Oh,right.stringsAsFactors=stringsAsFactors,col.names=c("timestamp","elb","client_port","backend_port","request_processing_time","backend_processing_time","response_processing_time","elb_status_code","backend_status_code","received_bytes","sent_bytes","request","user_agent","ssl_cipher","ssl_protocol"),...)}

Page 20: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Example

enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:..

DONE===========================================================================

Page 21: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Testsmakeexplicit

•Tradeoffseverywhere⚖

•isanintegeranimplicitcategorical?

•Don’ttrytobeclever.

Page 22: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Testsassert

•Youcanassertdumbthingslikerowcounts

•Despitelubridate, isneversimple

•Don’tbesurprisedbybeingwronglater

Page 23: Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Page 24: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Testsdocument

•“Icombinedcategories”aka“recode”

•Thedataitselfdoesn’tpreservethisrelationship

•Missingnessishard

•DidIalreadydoit?

Page 25: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

•df$col[df$col==1||df$col==2]<-1

•expect_equal(unique(col),1:5)

Page 26: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Testssimplify

•Turnbig,hard-to-reason-aboutproblemsintosmallones

•expect_equal(dimnames(pred),dimnames(population))

num[1:4,1:4,1:6,1:51,1:3]0.01960.04140.0380.01060.0167...-attr(*,"dimnames")=Listof5..$edu:chr[1:4]"<HS""HS""Some""Grad"..$age:chr[1:4]"18-29""30-44""45-64""≥65"..$race.female:chr[1:6]"WhiteM""BlackM""HispanicM""WhiteF".....$state:chr[1:51]"AK""AL""AR""AZ".....$party:chr[1:3]"R""I""D"

Page 27: Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Page 28: Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Page 29: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Testsliberate

•Freetoextendyourcodewithoutworryingaboutbreakingwhatitalreadydoes

•Fixbugsandhandleunforeseencomplicationsonlyonce

Page 30: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Whynotjusthack?

Page 31: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Becausedatacontractscan'tbetrusted

Page 32: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Becauseyou'llhavetoextendyourcodetodosomething

else

Page 33: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Becausesomeoneelsewillpickupyourcodeinthefuture

Becausethatsomeoneelsecouldbeyourfutureself

Page 34: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Becauseyou’realreadytesting,justnotsystematically

Page 35: Dr. Datascience or: How I Learned to Stop Munging and Love Tests

Dr.Datascience

Or:HowILearnedtoStopMungingandLoveTests

MikeMalecki([email protected])

NealRichardson([email protected])


Recommended