+ All Categories
Home > Documents > DE BT PR O CE SS A ND T E CH NI CA L - GitHub Pages

DE BT PR O CE SS A ND T E CH NI CA L - GitHub Pages

Date post: 18-Dec-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
75
PROCESS AND TECHNICAL PROCESS AND TECHNICAL DEBT DEBT Christian Kaestner Required Reading: Sculley, David, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. " ." In Advances in neural information processing systems, pp. 2503-2511. 2015. Suggested Readings: Fowler and Highsmith. Steve McConnell. Soware project survival guide. Chapter 3 Pfleeger and Atlee. Soware Engineering: Theory and Practice. Chapter 2 Kruchten, Philippe, Robert L. Nord, and Ipek Ozkaya. " ." IEEE Soware 29, no. 6 (2012): 18-21. Patel, Kayur, James Fogarty, James A. Landay, and Beverly Harrison. " ." In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 667-676. 2008. Hidden technical debt in machine learning systems The Agile Manifesto Technical debt: From metaphor to theory and practice Investigating statistical machine learning as a tool for soware development
Transcript

PROCESS AND TECHNICALPROCESS AND TECHNICALDEBTDEBT

Christian Kaestner

Required Reading:

Sculley, David, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary,Michael Young, Jean-Francois Crespo, and Dan Dennison. "

." In Advances in neural information processing systems, pp. 2503-2511. 2015.

Suggested Readings:

Fowler and Highsmith. Steve McConnell. So�ware project survival guide. Chapter 3Pfleeger and Atlee. So�ware Engineering: Theory and Practice. Chapter 2Kruchten, Philippe, Robert L. Nord, and Ipek Ozkaya. "

." IEEE So�ware 29, no. 6 (2012): 18-21.Patel, Kayur, James Fogarty, James A. Landay, and Beverly Harrison. "

." In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, pp. 667-676. 2008.

Hidden technical debt in machine learningsystems

The Agile Manifesto

Technical debt: From metaphor to theory andpractice

Investigating statistical machinelearning as a tool for so�ware development

1 . 1

LEARNING GOALSLEARNING GOALSContrast development processes of so�ware engineers and data scientistsOutline process conflicts between different roles and suggest ways tomitigate themRecognize the importance of processDescribe common agile practices and their goalsUnderstand and correctly use the metaphor of technical debtDescribe how ML can incur reckless and inadvertent technical debt, outlinecommon sources of technical debt

1 . 2

CASE STUDY: REAL-ESTATE WEBSITECASE STUDY: REAL-ESTATE WEBSITE

2 . 1

ML COMPONENT: PREDICTING REAL ESTATE VALUEML COMPONENT: PREDICTING REAL ESTATE VALUEGiven a large database of house sales and statistical/demographic data from

public records, predict the sales price of a house.

f(size, rooms, tax, neighborhood, . . . ) → price

2 . 2

DATA SCIENCE: ITERATIONDATA SCIENCE: ITERATIONAND EXPLORATIONAND EXPLORATION

3 . 1

DATA SCIENCE IS ITERATIVE AND EXPLORATORYDATA SCIENCE IS ITERATIVE AND EXPLORATORY

(Source: Guo. " ." Blog@CACM,Oct 2013)

Data Science Workflow: Overview and Challenges

3 . 2

DATA SCIENCE IS ITERATIVE AND EXPLORATORYDATA SCIENCE IS ITERATIVE AND EXPLORATORY

(Microso� Azure Team, " " Microso�Documentation, Jan 2020)What is the Team Data Science Process?

3 . 3

DATA SCIENCE IS ITERATIVE AND EXPLORATORYDATA SCIENCE IS ITERATIVE AND EXPLORATORY

Source: Patel, Kayur, James Fogarty, James A. Landay, and Beverly Harrison." ." In

Proc. CHI, 2008.Investigating statistical machine learning as a tool for so�ware development

3 . 4

This figure shows the result from a controlled experiment in which participants had 2 sessions of 2h each to build amodel. Whenever the participants evaluated a model in the process, the accuracy is recorded. These plots show theaccuracy improvements over time, showing how data scientists make incremental improvements through frequentiteration.

Speaker notes

DATA SCIENCE IS ITERATIVE AND EXPLORATORYDATA SCIENCE IS ITERATIVE AND EXPLORATORYScience mindset: start with rough goal, no clear specification, unclearwhether possibleHeuristics and experience to guide the processTry and error, refine iteratively, hypothesis testingGo back to data collection and cleaning if needed, revise goals

3 . 5

SHARE EXPERIENCE?SHARE EXPERIENCE?

3 . 6

COMPUTATIONAL NOTEBOOKSCOMPUTATIONAL NOTEBOOKS

Origins in "literal programming",interleaving text and code, treatingprograms as literature (Knuth'84)First notebook in WolframMathematica 1.0 in 1988Document with text and code cells,showing execution results undercellsCode of cells is executed, per cell,in a kernelMany notebook implementationsand supported languages, Python+ Jupyter currently most popular

3 . 7

See also Demo with public notebook, e.g.,

Speaker notes

https://en.wikipedia.org/wiki/Literate_programminghttps://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb

NOTEBOOKS SUPPORT ITERATION ANDNOTEBOOKS SUPPORT ITERATION ANDEXPLORATIONEXPLORATION

Quick feedback, similar to REPLVisual feedback including figures and tablesIncremental computation: reexecuting individual cellsQuick and easy: copy paste, no abstraction neededEasy to share: document includes text, code, and results

3 . 8

BRIEF DISCUSSION: NOTEBOOK LIMITATIONS ANDBRIEF DISCUSSION: NOTEBOOK LIMITATIONS ANDDRAWBACKS?DRAWBACKS?

3 . 9

SOFTWARE ENGINEERINGSOFTWARE ENGINEERINGPROCESSPROCESS

4 . 1

INNOVATIVE VS ROUTINE PROJECTSINNOVATIVE VS ROUTINE PROJECTSLike data science tasks, most so�ware projects are innovative

Google, Amazon, Ebay, NetflixVehicles and roboticsLanguage processing, Graphics, AI

Routine (now, not 20 years ago)E-commerce websites?Product recommendation? Voice recognition?Routine gets automated -> innovation cycle

4 . 2

A SIMPLE PROCESSA SIMPLE PROCESS1. Discuss the so�ware that needs to be written2. Write some code3. Test the code to identify the defects4. Debug to find causes of defects5. Fix the defects6. If not done, return to step 1

4 . 3

SOFTWARE PROCESSSOFTWARE PROCESS

Examples?

“The set of activities and associated results that produce aso�ware product”

4 . 4

Writing down all requirements Require approval for all changes to requirements Use version control for all changesTrack all reported bugs Review requirements and code Break down development into smaller tasks and schedule andmonitor them Planning and conducting quality assurance Have daily status meetings Use Docker containers to pushcode between developers and operation

Speaker notes

4 . 5

Visualization following McConnell, Steve. Software project survival guide. Pearson Education, 1998.

Speaker notes

4 . 6

Idea: spent most of the time on coding, accept a little rework

Speaker notes

4 . 7

negative view of process. pure overhead, reduces productive work, limits creativity

Speaker notes

4 . 8

Real experience if little attention is payed to process: increasingly complicated, increasing rework; attempts to rescue byintroducing process

Speaker notes

EXAMPLE OF PROCESS PROBLEMS?EXAMPLE OF PROCESS PROBLEMS?

4 . 9

Collect examples of what could go wrong:

Change Control: Mid-project informal agreement to changes suggested by customer or manager. Project scope expands25-50% Quality Assurance: Late detection of requirements and design issues. Test-debug-reimplement cycle limitsdevelopment of new features. Release with known defects. Defect Tracking: Bug reports collected informally, forgottenSystem Integration: Integration of independently developed components at the very end of the project. Interfaces out ofsync. Source Code Control: Accidentally overwritten changes, lost work. Scheduling: When project is behind,developers are asked weekly for new estimates.

Speaker notes

TYPICAL PROCESS STEPS (NOT NECESSARILY INTYPICAL PROCESS STEPS (NOT NECESSARILY INTHIS ORDER)THIS ORDER)

Understand customers, identify what to build, by when, budgetIdentify relevant qualities, plan/design system accordinglyTest, deploy, maintain, evolvePlan, staff, workaround

4 . 10

SURVIVAL MODESURVIVAL MODEMissed deadlines -> "solo development mode" to meet own deadlinesIgnore integration workStop interacting with testers, technical writers, managers, ...

4 . 11

Hypothesis: Process increases flexibility and efficiency + Upfront investment forlater greater returns

4 . 12

ideal setting of little process investment upfront

Speaker notes

4 . 13

Empirically well established rule: Bugs are increasingly expensive to fix the larger the distance between the phasewhere they are created vs where they are corrected.

Speaker notes

4 . 14

Complicated processes like these are often what people associate with "process". Software process is needed, but doesnot need to be complicated.

Speaker notes

SOFTWARE PROCESSSOFTWARE PROCESSMODELSMODELS

5 . 1

AD-HOC PROCESSESAD-HOC PROCESSES1. Discuss the so�ware that needs to be written2. Write some code3. Test the code to identify the defects4. Debug to find causes of defects5. Fix the defects6. If not done, return to step 1

5 . 2

WATERFALL MODELWATERFALL MODEL

taming the chaos, understand requirements, plan before coding, remember testing

( )CC-BY-SA-2.5

5 . 3

Although dated, the key idea is still essential -- think and plan before implementing. Not all requirements and design canbe made upfront, but planning is usually helpful.

Speaker notes

RISK FIRST: SPIRAL MODELRISK FIRST: SPIRAL MODEL

1.Determineobjectives

2. Identify and resolve risks

3. Development and Test

4. Plan the next iteration

ProgressCumulative cost

Requirementsplan

Concept ofoperation

Concept ofrequirements

Prototype 1 Prototype 2Operationalprototype

Requirements DraftDetaileddesign

Code

IntegrationIntegration

Test

Implementation

Release

Test plan Verification & Validation

Developmentplan

Verification & Validation

Review

incremental prototypes, starting with most risky components

5 . 4

CONSTANT ITERATION: AGILECONSTANT ITERATION: AGILE

30 days

24 h

Working incrementof the software

Sprint Backlog SprintProduct Backlog

working with customers, constant replanning

(CC BY-SA 4.0, Lakeworks)

5 . 5

CONTRASTING PROCESS MODELSCONTRASTING PROCESS MODELSAd-hoc -- Waterfall -- Spiral -- Agile

5 . 6

DATA SCIENCE VSDATA SCIENCE VSSOFTWARE ENGINEERINGSOFTWARE ENGINEERING

6 . 1

DISCUSSION: ITERATION IN NOTEBOOK VS AGILE?DISCUSSION: ITERATION IN NOTEBOOK VS AGILE?

30 days

24 h

Working incrementof the software

Sprint Backlog SprintProduct Backlog

(CC BY-SA 4.0, Lakeworks)

6 . 2

There is similarity in that there is an iterative process, but the idea is different and the process model seems mostlyorthogonal to iteration in data science. The spiral model prioritizes risk, especially when it is not clear whether a model isfeasible. One can do similar things in model development, seeing whether it is feasible with data at hand at all and buildan early prototype, but it is not clear that an initial okay model can be improved incrementally into a great one later. Agilecan work with vague and changing requirements, but that again seems to be a rather orthogonal concern. Requirementson the product are not so much unclear or changing (the goal is often clear), but it's not clear whether and how a modelcan solve it.

Speaker notes

POOR SOFTWARE ENGINEERING PRACTICES INPOOR SOFTWARE ENGINEERING PRACTICES INNOTEBOOKS?NOTEBOOKS?

*

Little abstractionGlobal stateNo testingHeavy copy and pasteLittle documentationPoor version controlOut of order executionPoor development features (vs IDE)

6 . 3

UNDERSTANDING DATA SCIENTIST WORKFLOWSUNDERSTANDING DATA SCIENTIST WORKFLOWSInstead of blindly recommended "SE Best Practices" understand contextDocumentation and testing not a priority in exploratory phaseHelp with transitioning into practice

From notebooks to pipelinesSupport maintenance and iteration once deployedProvide infrastructure and tools

6 . 4

DataScientists

SoftwareEngineers

6 . 5

PROCESS FOR AI-ENABLED SYSTEMSPROCESS FOR AI-ENABLED SYSTEMSIntegrate So�ware Engineering and Data Science processesEstablish system-level requirements (e.g., user needs, safety, fairness)Inform data science modeling with system requirements (e.g., privacy,fairness)Try risky parts first (most likely include ML components; ~spiral)Incrementally develop prototypes, incorporate user feedback (~agile)Provide flexibility to iterate and improveDesign system with characteristics of AI component (e.g., UI design,safeguards)Plan for testing throughout the process and in productionManage project understanding both so�ware engineering and data scienceworkflows

No existing "best practices" or workflow models

6 . 6

6 . 7

TECHNICAL DEBTTECHNICAL DEBT

7 . 1

TECHNICAL DEBT METAPHORTECHNICAL DEBT METAPHORAnalogy to financial debt

Have a benefit now (e.g., progress quickly, release now)accepting later cost (loss of productivity, e.g., highermaintenance/operating cost, rework)debt accumulates and can suffocate project

Ideally a deliberate decision (short term tactical or long term strategic)Ideally track debt and plan for paying it down

Examples?

7 . 2

Source: Martin Fowler 2009,https://martinfowler.com/bliki/TechnicalDebtQuadrant.html

7 . 3

TECHNICAL DEBT FROM ML COMPONENTS?TECHNICAL DEBT FROM ML COMPONENTS?

(see reading)

Sculley, David, et al. . Advances in Neural InformationProcessing Systems. 2015.

Hidden technical debt in machine learning systems

7 . 4

THE NOTEBOOKTHE NOTEBOOK

Jupyter Notebooks are a gi� from God to those who workwith data. They allow us to do quick experiments with

Julia, Python, R, and more -- John Paul Ada

7 . 5

Discuss benefits and drawbacks of Jupyter style notebooks

Speaker notes

ML AND TECHNICAL DEBTML AND TECHNICAL DEBTO�en reckless and inadvertent in inexperienced teamsML can seem like an easy addition, but it may cause long-term costsNeeds to be maintained, evolved, and debuggedGoals may change, environment may change, some changes are subtle

Example problemsSystems and models are tangled and changing one has cascadingeffects on the otherUntested, brittle infrastructure; manual deploymentUnstable data dependencies, replication crisisData dri� and feedback loopsMagic constants and dead experimental code paths

Further reading: Sculley, David, et al. . Advances in NeuralInformation Processing Systems. 2015.

Hidden technical debt in machine learning systems

7 . 6

CONTROLLING TECHNICAL DEBT FROM MLCONTROLLING TECHNICAL DEBT FROM MLCOMPONENTSCOMPONENTS

7 . 7

CONTROLLING TECHNICAL DEBT FROM MLCONTROLLING TECHNICAL DEBT FROM MLCOMPONENTSCOMPONENTS

Avoid AI when not neededUnderstand and document requirements, design for mistakesBuild reliable and maintainable pipelines, infrastructure, good engineeringpracticesTest infrastructure, system testing, testing and monitoring in productionTest and monitor data qualityUnderstand and model data dependencies, feedback loops, ...Document design intent and system architectureStrong interdisciplinary teams with joint responsibilitiesDocument and track technical debt...

7 . 8

17-445 So�ware Engineering for AI-Enabled Systems, Christian Kaestner

SUMMARYSUMMARYData scientists and so�ware engineers follow different processesML projects need to consider process needs of bothIteration and upfront planning are both important, process models codifygood practicesDeliberate technical debt can be good, too much debt can suffocate aprojectEasy to amount (reckless) debt with machine learning

8


Recommended