DF1 - R - Natekin - Improving Daily Analysis with data.table

Post on 16-Jan-2017

396 views 3 download

transcript

Improving daily analysis with data.table

a [brief] tutorial

Alex NatekinDeloitte Analytics Institute

2

Been there, done that

natekin@dmlabs.orgvk.com/natekinlinkedin.com/in/natekinfacebook.com/alex.natekin

3

Data.table

4

Legend says

And many others…“the R god of number crunching”

5

Legend says (2)

… to read the manual

With great poweR comes great Responsibility

of fasteR & richeR data crunching …

6

Choose your side

dplyr sqldfdata.table

“Hadleyverse” Way of the warrior…

…each one is way different from data.frame

7

Choose your side… wisely

from recent Matt Dowle’s meetup presentations

8

from recent Matt Dowle’s meetup presentations

…just search for “data.table benchmarks”

Choose your side… wisely (2)

9

data.table applicability

SolutionData

extraction & checks

Data processing

Feature engineering Models Stories

…trying to find your place under the sun

10

data.table applicability

SolutionData

extraction & checks

Data processing

Feature engineering Models Stories

Naïve functionality

Most awesome functionality

Is closest to production code

(if applicable to R)

11

Core functionality

1. Data reading & memory management

2. Data access & ordering3. Grouping & aggregation

…feature engineering

More efficient:

12

Core functionality (2)

1. Data reading & memory management

2. Data access & ordering3. Grouping & aggregation

…feature engineering

More efficient:

…as data.frame extension (~100% compatible)

1. Reduce machine time

2. Reduce human programming time

13

Core principle

DT[i, j, by]1. Take DT2. Subset rows by i3. Calculate j4. …grouped by by

14

Core principle (2)

from data.table tutorial

15

Example: churn

Sorry

Laptop died last evening,

no interactive tutorial

Screenshots from remaining

files

16

Example

17

Example

18

Example

19

Example (manual injection)

setkey(DT, colA, colB)

Yet another recent Matt Dowle’s meetup presentations

20

Example

21

Example

22

Example

23

Example

24

Example

25

Example

26

Example: churn

27

Example: churn

28

Example: churn

29

Example: churn

30

Example: churn

31

Functionality: more

1. Fread

2. Column updates

3. Set functions (set, setnames, …)

4. Special symbols (.SD, .I, …)

5. Joins

… next time

32

More: resources

33

SummaRy

1. data.table is helpful & awesome

2. go forth and use it

3. RTFM

Thanks!

Alex Natekinanatekin@deloitte.ru+7 915 070 45 74