Date post: | 14-Apr-2017 |
Category: |
Data & Analytics |
Upload: | work-bench |
View: | 7,439 times |
Download: | 1 times |
Dr.Datascience
Or:HowILearnedtoStopMungingandLoveTests
MikeMalecki([email protected])
NealRichardson([email protected])
Aboutus
•Politicalscientists
•Thenworkedinsurveyresearchindustry
•Nowindataproductdevelopment
•Crunch.io
Data“Science”
vs.“Faith-basedcoding”
•Misplacedfaithinowninfallability ✔︎
•Yourcodeworksbecauseyoubelieveitdoes
•Itsoutputfeelstrue
Tests
•Maketheimplicitexplicit
•Turnassumptionsintoassertions
•Areaformofdocumentation
•Reducecomplexity
•Areliberating
Whataretests?
•Assertions,writtenincode,thatyourfunctionsdowhatyouexpect
•Thatifyougivecertaininputs,you’llgetknown,expectedoutputs
•Thatgivinginvalidinputresultsinanexpectedfailure
•Testsarecode:codethatmustberuneverytimeyoumakechanges
Gettingstarted
•Makeapackage
Gettingstarted
•Makeapackage
source("mycode.R")df<-read.csv("data.csv")doThings(df)
Gettingstarted
•Makeapackage.Notthatdifferent.
Useapackageskeleton,suchashttps://github.com/nealrichardson/skeletor
library(rmycode)df<-read.csv("data.csv")doThings(df)
Testingflow
•Writetest.Runitandseeitfail.
•Writecodethatmakestestpass.
•Runtestsagain.Seethempass.
•Repeat
Example
ReadandanalyzeAWSElasticLoadBalancerlogs
Example
enpiar:cnpr$R-e'skeletor::skeletor("elbr")'enpiar:cnpr$cdelbrenpiar:elbrnpr$atom.
Example
#elbr/tests/testthat/test-read.R
context("read.elb")
test_that("read.elbreturnsadata.frame",{ expect_true(is.data.frame(read.elb("example.log")))})
Example
enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:1
Failed-------------------------------------------------------------------------1.Error:read.elbreturnsadata.frame(@test-something.R#4)------------------couldnotfindfunction"read.elb"1:.handleSimpleError(function(e){e$call<-sys.calls()[(frame+11):(sys.nframe()-2)]register_expectation(e,frame+11,sys.nframe()-2)signalCondition(e)},"couldnotfindfunction\"read.elb\"",quote(eval(expr,envir,enclos)))attestthat/test-something.R:42:eval(expr,envir,enclos)
DONE===========================================================================Error:Testfailures
Example
#elbr/R/read-elb.R
read.elb<-function(file,stringsAsFactors=FALSE,...){read.delim(file,sep="",stringsAsFactors=stringsAsFactors,col.names=c("timestamp","elb","client_port","backend_port","request_processing_time","backend_processing_time","response_processing_time","elb_status_code","backend_status_code","received_bytes","sent_bytes","request","user_agent","ssl_cipher","ssl_protocol"),...)}
Example
enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:.
DONE===========================================================================
Example
test_that("read.elbreturnsadata.frame",{df<-read.elb("example.log")expect_true(is.data.frame(df))expect_equal(dim(df),c(4,15))})
Example
enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:.1
Failed-------------------------------------------------------------------------1.Failure:read.elbreturnsadata.frame(@test-something.R#6)----------------dim(df)notequaltoc(4,15).1/2mismatches[1]3-4==-1
DONE===========================================================================Error:Testfailures
Example
read.elb<-function(file,stringsAsFactors=FALSE,...){read.delim(file,sep="",header=FALSE,#<--Oh,right.stringsAsFactors=stringsAsFactors,col.names=c("timestamp","elb","client_port","backend_port","request_processing_time","backend_processing_time","response_processing_time","elb_status_code","backend_status_code","received_bytes","sent_bytes","request","user_agent","ssl_cipher","ssl_protocol"),...)}
Example
enpiar:elbrnpr$maketest...Loadingrequiredpackage:elbrread.elb:..
DONE===========================================================================
Testsmakeexplicit
•Tradeoffseverywhere⚖
•isanintegeranimplicitcategorical?
•Don’ttrytobeclever.
Testsassert
•Youcanassertdumbthingslikerowcounts
•Despitelubridate, isneversimple
•Don’tbesurprisedbybeingwronglater
Testsdocument
•“Icombinedcategories”aka“recode”
•Thedataitselfdoesn’tpreservethisrelationship
•Missingnessishard
•DidIalreadydoit?
•df$col[df$col==1||df$col==2]<-1
•expect_equal(unique(col),1:5)
Testssimplify
•Turnbig,hard-to-reason-aboutproblemsintosmallones
•expect_equal(dimnames(pred),dimnames(population))
num[1:4,1:4,1:6,1:51,1:3]0.01960.04140.0380.01060.0167...-attr(*,"dimnames")=Listof5..$edu:chr[1:4]"<HS""HS""Some""Grad"..$age:chr[1:4]"18-29""30-44""45-64""≥65"..$race.female:chr[1:6]"WhiteM""BlackM""HispanicM""WhiteF".....$state:chr[1:51]"AK""AL""AR""AZ".....$party:chr[1:3]"R""I""D"
Testsliberate
•Freetoextendyourcodewithoutworryingaboutbreakingwhatitalreadydoes
•Fixbugsandhandleunforeseencomplicationsonlyonce
Whynotjusthack?
Becausedatacontractscan'tbetrusted
Becauseyou'llhavetoextendyourcodetodosomething
else
Becausesomeoneelsewillpickupyourcodeinthefuture
Becausethatsomeoneelsecouldbeyourfutureself
Becauseyou’realreadytesting,justnotsystematically
Dr.Datascience
Or:HowILearnedtoStopMungingandLoveTests
MikeMalecki([email protected])
NealRichardson([email protected])