+ All Categories
Home > Documents > Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder...

Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder...

Date post: 17-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
46
Andrew Head · Fred Hohman · Titus Barik · Steven M. Drucker · and Robert DeLine UC Berkeley · Georgia Tech · Microsoft Research [1] [7] [3] [6] [2] Managing Messes in Computational Notebooks
Transcript
Page 1: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Andrew Head · Fred Hohman · Titus Barik · Steven M. Drucker · and Robert DeLine

UC Berkeley · Georgia Tech · Microsoft Research

[1]

[7]

[3]

[6]

[2]Managing Messes in Computational Notebooks

Page 2: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Computational Notebooks: Code, Text, and Output

Rich descriptions

Code

Output

Page 3: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Notebook Programming Interfaces Abound

Page 4: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Notebook Model of Exploratory Programming

1. Incremental execution

Page 5: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Notebook Model of Exploratory Programming

1. Incremental execution 2. In-situ output

Page 6: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Notebook Model of Exploratory Programming

1. Incremental execution 2. In-situ output 3. Incremental changes

Page 7: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

Notebook Model of Exploratory Programming

Page 8: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

1 WEEK PASSES

Notebook Model of Exploratory Programming

Page 9: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

Notebook Model of Exploratory Programming

Page 10: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

Notebook Model of Exploratory Programming

1 WEEK LATERHow did I produce this?

1. How did I produce this result?

Page 11: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

Notebook Model of Exploratory Programming

1 WEEK LATERHow did I produce this?

1. How did I produce this result?

which petal_length?

Page 12: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

1 WEEK LATER1. How did I produce this

result? 2. Didn't I have a better

version of this?

Didn't I have a better version of this?

Notebook Model of Exploratory Programming

Page 13: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout

1 WEEK LATER1. How did I produce this

result? 2. Didn't I have a better

version of this? 3. What can I get rid of?

What can I get rid of?

Notebook Model of Exploratory Programming

Page 14: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Messes in Computational Notebooks

[1]

[7]

[3]

[6]Disorder

Out-of-order execution 1/2 of notebooks on

GitHub [Rule et al. 2018]

Dispersion

Disappearance

Too many cells

[2]Deleted / overwritten code Notebooks contain

ugly code and dirty tricks [Rule et al. 2018]

31 / 41 surveyed participants had trouble finding prior analyses [Kery et al. 2018]

Page 15: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Managing Messes in Computational NotebooksHow can tools help analysts find, recover, and compare code in messy notebooks?

CODE GATHERING TOOLS[*]

Implementation[]

Qualitative usability study[]

How messes happen[1]

Tools in context[]

Page 16: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

1 WEEK PASSES

Page 17: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 1: Recovering Code

How did I produce this?

Page 18: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Variables

Outputs

Task 1: Recovering Code

How did I produce this?

Page 19: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 1: Recovering Code

How did I produce this?

Page 20: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1 WEEK PASSES

CODE GATHERING TOOLS Demo

Request cell subset that produced the result.

Task 1: Recovering Code

How did I produce this?

Page 21: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1 WEEK PASSES

CODE GATHERING TOOLS Demo

Request cell subset that produced the result.

Task 1: Recovering Code

How did I produce this?

Page 22: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

The gathered code is... • reduced • ordered • complete

Request cell subset that produced the result.

Task 1: Recovering Code

How did I produce this?

Page 23: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 2: Comparing Versions

Didn't I have a better version of this?

Request cell subset that produced the result.

Task 1: Recovering Code

Page 24: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1 WEEK PASSES

CODE GATHERING TOOLS Demo

Task 2: Comparing Versions

Didn't I have a better version of this?

Request cell subset that produced the result.

Task 1: Recovering Code

Open a version browser for a result.

Page 25: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 2: Comparing Versions

Request cell subset that produced the result.

Task 1: Recovering Code

Open a version browser for a result.

Didn't I have a better version of this?

Page 26: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 2: Comparing Versions

Request cell subset that produced the result.

Task 1: Recovering Code

Open a version browser for a result.

Didn't I have a better version of this?

Page 27: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 2: Comparing Versions

Request cell subset that produced the result.

Task 1: Recovering Code

Open a version browser for a result.

Didn't I have a better version of this?

Page 28: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

1 WEEK PASSES

CODE GATHERING TOOLS Demo

Task 2: Comparing Versions

Request cell subset that produced the result.

Task 1: Recovering Code

Open a version browser for a result.

Didn't I have a better version of this?

Page 29: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Open a version browser for a result.

Task 3: Cleaning Notebook

What code can I get rid of?

Task 2: Comparing Versions

Request cell subset that produced the result.

Task 1: Recovering Code

Page 30: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 3: Cleaning Notebook

What code can I get rid of?

... Request cell subset that produced the result.

Open a version browser for a result.

Task 2: Comparing Versions

Request cell subset that produced the result.

Task 1: Recovering Code

Page 31: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

CODE GATHERING TOOLS Demo

Task 1: Recovering Code

Task 2: Comparing Versions

Request cell subset that produced the result.

Open a version browser for a result.

Task 3: Cleaning Notebook... Request cell subset that produced the result.

How can tools help analysts manage messes in their notebooks?

Page 32: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Post-Hoc Mess Management

Variolite, CHI '17

Helping analysts clean and navigate their code whether or not they adopted a strategy to version or organize their code.

Page 33: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Managing Messes in Computational NotebooksHow can tools help analysts find, recover, and compare code in messy notebooks?

CODE GATHERING TOOLS[2]

Implementation[*]

Qualitative usability study[]

How messes happen[1]

Tools in context[3]

Page 34: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Implementation: Slicing Notebooks

[10]

[11]

[1]

[2]

[3]

[12]

Notebook1some cells missing,

some cells out-of-order

versioned results

cleaned, ordered notebooks

[]

[]

[]

[]

?

Page 35: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Implementation: Slicing Notebooks

[10]

[11]

[1]

[2]

[3]

[12]

Notebook Execution Log

· · ·[1]

[6]

[7]

[10]

[11]

[12]

· · · execution time

1 2some cells missing,

some cells out-of-orderall cells present, in-order

Page 36: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Implementation: Slicing Notebooks

[10]

[11]

[1]

[2]

[3]

[12]

Notebook Execution Log

· · ·[1]

[6]

[7]

[10]

[11]

[12]

· · · execution time

1 2some cells missing,

some cells out-of-orderall cells present, in-order

Page 37: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Program Slices [Weiser '81]

Implementation: Slicing Notebooks

[10]

[11]

[1]

[2]

[3]

[12]

Notebook Execution Log

· · ·[1]

[6]

[7]

[10]

[11]

[12]

· · · execution time

1 2 3some cells missing,

some cells out-of-orderall cells present, in-order

Page 38: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Program Slices [Weiser '81]

Implementation: Slicing Notebooks

cleaned, ordered notebooks

(preserve cell boundaries and

outputs)

[10]

[11]

[1]

[2]

[3]

[12]

Notebook Execution Log

· · ·[1]

[6]

[7]

[10]

[11]

[12]

· · · execution time

which can be used to make...

versioned results

(slice all cell versions)

1 2 3

[]

[]

[]

[]

some cells missing, some cells out-of-order

all cells present, in-order

Page 39: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Cleaning and Exploring

Interactions for Untangling Messy History in a Computational Notebook Kery et al., VL/HCC '18

Towards Effective Foraging by Data Scientists to Find Past Analysis Choices Kery et al., CHI '19

output recipes artifact explorer

cell version diffs tabbed browsing of cell versions

cell folding

Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell FoldingRule et al., CSCW '18

Design and Use of Computational NotebooksRule, Ph.D. Thesis, '18

Messy NotebooksA Sample of Recent Research

Page 40: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Evaluating Code Gathering ToolsQ1. What is the meaning of "cleaning"?

Q2. How do analysts use code gathering tools during exploratory data analysis?

Page 41: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

A Qualitative Study of Gathering

Participants: N = 12 professional data analysts

Cleaning Task × 2: Clean a computational notebook, with and without code gathering tools.

Exploration: Rank movies in from a movies dataset. Use code gathering tools as you wish.

Page 42: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

"I picked a plot that looked interesting and, if you think of a dependency tree of cells, walked backwards and removed everything that wasn’t necessary."

Q1. The Meaning of "Cleaning"

Picking a subset of cells [P1-P12]... and removing the rest [P8, P10-12].

... And many additional stages:writing documentation

polishing visualizations

merging cells

restructuring code

integrating with version control

[P1, P5, P7, P10, P11]

[P1, P6] [P3, P4, P6, P12]

[P7]

[P11]

Page 43: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Q2. How do analysts use code gathering tools during exploratory data analysis?

Gathering to a notebook

Highlighting dependencies

Version browser

0 3 6 9 12

# participants

Very useful

Somewhat usefulNot useful

No basis to answer

Participants described gathering to a notebook as "beautiful" and "amazing": it "hits the nail on the head."

Page 44: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

"Finishing moves"

Some Observed Uses of Gathering Tools

Creating personal referencesLightweight branching

Gathering for multiple audiences

x

Page 45: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

Takeaways from StudyQ1. Gathering covers an important yet incomplete set of notebook cleaning tasks.

Q2. Code gathering tools can be picked up quickly and readily applied to new use cases.

Page 46: Managing Messes in [2] - Andrew Head · Messes in Computational Notebooks [1] [7] [3] [6] Disorder Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] Dispersion

$jupyterlabextensioninstallnbgather

Contributions encouraged:github.com/Microsoft/gather


Recommended