+ All Categories
Home > Software > What is Software Engineering Research Good For?

What is Software Engineering Research Good For?

Date post: 16-Apr-2017
Category:
Upload: andrzej-wasowski
View: 378 times
Download: 0 times
Share this document with a friend
176
What Is Software Engineering Research Good For? Andrzej W ˛ asowski @AndrzejWasowski PROCESS AND SYSTEM MODELS GROUP pyrrhocoris apterus (firebug) c Andrzej W ˛ asowski, IT University of Copenhagen 1
Transcript

What IsSoftwareEngineeringResearchGood For?Andrzej Wasowski

@AndrzejWasowski

PROCESS AND SYSTEM MODELS GROUP

pyrrhocoris apterus (firebug)

c© Andrzej Wasowski, IT University of Copenhagen 1

AB

ETT

ER

QU

ES

TIO

N

c© Andrzej Wasowski, IT University of Copenhagen 2

AB

ETT

ER

QU

ES

TIO

NWhat is interesting SE

research accordingto Andrzej?

c© Andrzej Wasowski, IT University of Copenhagen 2

AB

ETT

ER

QU

ES

TIO

NWhat is interesting SE

research accordingto Andrzej?

a h

a

mmer lurk

ing b

eh

ind

the question

*modelinglanguagessemanticsanalysis

c© Andrzej Wasowski, IT University of Copenhagen 2

AB

ETT

ER

QU

ES

TIO

NWhat is interesting SE

research accordingto Andrzej?

a h

a

mmer lurk

ing b

eh

ind

the question

*modelinglanguagessemanticsanalysis

What relevant SEquestions can be

addressed by defininglanguages & analyzing

models/programs?c© Andrzej Wasowski, IT University of Copenhagen 2

AGENDA

Correctness of Software (bug finding)

Software Engineering is Codified Knowledge (online privacy)Legacy Systems (Software Modernization)

c© Andrzej Wasowski, IT University of Copenhagen 3

AGENDA

Correctness of Software (bug finding)Software Engineering is Codified Knowledge (online privacy)

Legacy Systems (Software Modernization)

c© Andrzej Wasowski, IT University of Copenhagen 3

AGENDA

Correctness of Software (bug finding)Software Engineering is Codified Knowledge (online privacy)Legacy Systems (Software Modernization)

c© Andrzej Wasowski, IT University of Copenhagen 3

What is Linux Kernel ?Incredibly versatile operating system

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

68-98% webserversrun on Linux

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

68-98% webserversrun on Linux

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

68-98% webserversrun on Linux

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

68-98% webserversrun on Linux

$0.5M/Y platinum membership fee

Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

What is Linux Kernel ?Incredibly versatile operating system

GNU/Linux runssupercomputers andinternet servers

AndroidphonestabletssmartTVsetc.

Routers, storage servers,entertainment systems,robots, IoT devices, ...Cloud infrastructure

68-98% webserversrun on Linux

$0.5M/Y platinum membership fee

The most popular OS kernel on the planet!Sources: Gartner and https://en.wikipedia.org/wiki/Usage_share_of_operating_systemshttps://techcrunch.com/2016/11/16/microsoft-joins-the-linux-foundation/

c© Andrzej Wasowski, IT University of Copenhagen 4

Linux Kernel is very large

The source code has 700 million characters, 21 million lines of code(quick measurements on the Raspberry Pi version of Linux)

Boeing 747 has 6 million mechanical parts, half of them simple fastenersAre humans able to understand the entire kernel?

c© Andrzej Wasowski, IT University of Copenhagen 5

Linux Kernel is very large

The source code has 700 million characters, 21 million lines of code(quick measurements on the Raspberry Pi version of Linux)

Boeing 747 has 6 million mechanical parts, half of them simple fasteners

Are humans able to understand the entire kernel?

c© Andrzej Wasowski, IT University of Copenhagen 5

Linux Kernel is very large

The source code has 700 million characters, 21 million lines of code(quick measurements on the Raspberry Pi version of Linux)

Boeing 747 has 6 million mechanical parts, half of them simple fastenersAre humans able to understand the entire kernel?

c© Andrzej Wasowski, IT University of Copenhagen 5

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art� Nobody access to hardware on which others work� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art� Nobody access to hardware on which others work� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art� Nobody access to hardware on which others work� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art� Nobody access to hardware on which others work� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art

� Nobody access to hardware on which others work� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art� Nobody access to hardware on which others work

� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

Linux Kernel Moves Fast

� 4000 programmers from 440 companies contributed to the kernel(approximate numbers from 2015 only)

� 10,800 lines of code added, 5,300 removed, 1,875 modifiedEvery. Single. Day. (on average)

� Over 8 changes per second

� Is any human able to comprehend this evolution speed?

� Incidentally, this makes it impossible to verify with current state of the art� Nobody access to hardware on which others work� Each of them potentially breaks things for others

c© Andrzej Wasowski, IT University of Copenhagen 6

c© Andrzej Wasowski, IT University of Copenhagen 7

Linus has power to say no.And not much more...

� Linus Thorvalds� Creator of a free kernel project in 1991� Today a benevolent dictator� Coordinates the kernel with a handful of lieutenants

� Can block developments� Hardly has power to give the project a consistent direction� Project is not managed in the usual sense

c© Andrzej Wasowski, IT University of Copenhagen 7

Linus has power to say no.And not much more...

� Linus Thorvalds� Creator of a free kernel project in 1991� Today a benevolent dictator� Coordinates the kernel with a handful of lieutenants� Can block developments� Hardly has power to give the project a consistent direction� Project is not managed in the usual sense

c© Andrzej Wasowski, IT University of Copenhagen 7

Success from The IutsideSoftware engineering challenge from the inside

Very LargeVery Large

Very ComplexVery Complex

c© Andrzej Wasowski, IT University of Copenhagen 8

Success from The IutsideSoftware engineering challenge from the inside

Very LargeVery Large

Very ComplexVery Complex

Moving VeryMoving VeryFASTFAST

c© Andrzej Wasowski, IT University of Copenhagen 8

Success from The IutsideSoftware engineering challenge from the inside

Very LargeVery Large

Very ComplexVery Complex

Moving VeryMoving VeryFASTFAST

EssentiallyEssentiallyNOT MANAGEDNOT MANAGED

c© Andrzej Wasowski, IT University of Copenhagen 8

Success from The IutsideSoftware engineering challenge from the inside

Very LargeVery Large

Very ComplexVery Complex

Moving VeryMoving VeryFASTFAST

EssentiallyEssentiallyNOT MANAGEDNOT MANAGED

A fascinating object forsoftware engineering studies

Jewels bound to be found ...

Problems bound to appear ...

c© Andrzej Wasowski, IT University of Copenhagen 8

Warning!You may get dirty

c© Andrzej Wasowski, IT University of Copenhagen 9

Let’s look closely at a bug

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structscasts, function pointersno specifications, notests

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)

Inter-procedural flowPointers nested incomplex structscasts, function pointersno specifications, notests

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flow

Pointers nested incomplex structscasts, function pointersno specifications, notests

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structs

casts, function pointersno specifications, notests

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structscasts, function pointers

no specifications, notests

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structscasts, function pointersno specifications, notests

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structscasts, function pointersno specifications, notests

while(*)

lockρ

unlockρ

lockρunlockρ

lockρunlockρ

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structscasts, function pointersno specifications, notests

lockρ

unlockρ

lockρunlockρ

lockρunlockρ

while(*)

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Let’s look closely at a bug1 void ath10k_htt_rx_msdu_buff_replenish (struct ath10k_htt *htt) {2 spin_lock_bh(&htt->rx_ring.lock); // 6. DEADLOCK3 // ...4 spin_unlock_bh(&htt->rx_ring.lock);5 }6 void ath10k_htt_rx_in_ord_ind (struct ath10k *ar, struct sk_buff *skb) {7 // ...8 ath10k_htt_rx_msdu_buff_replenish(&ar->htt); // 5. CALL9 }

10 void ath10k_htt_txrx_compl_task (unsigned long ptr) { // 1. ENTRY POINT11 struct ath10k *ar = (struct ath10k *)ptr; // 2. CAST12 // ...13 while ((skb = __skb_dequeue(&rx_ind_q))) {14 spin_lock_bh(&ar->htt->rx_ring.lock); // 3. LOCK15 ath10k_htt_rx_in_ord_ind(ar, skb); // 4. CALL16 spin_unlock_bh(&ar->htt->rx_ring.lock);17 dev_kfree_skb_any(skb);18 }19 // ...

Domain knowledge(tasklets, bottom halves,locks)Inter-procedural flowPointers nested incomplex structscasts, function pointersno specifications, notests

lockρ

unlockρ

lockρunlockρ

while(*)

lockρ

unlockρ

Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17c© Andrzej Wasowski, IT University of Copenhagen 10

Inference of Shapes & Effects

Formalized and implemented for the entire C languageIncluding spec. of selected kernel functions, e.g:

c© Andrzej Wasowski, IT University of Copenhagen 11

Inference of Shapes & Effects

Formalized and implemented for the entire C languageIncluding spec. of selected kernel functions, e.g:

c© Andrzej Wasowski, IT University of Copenhagen 11

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theorems

Nine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)

Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noise

Still a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)

You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)

Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)

[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

Does this work?Software engineering research method strikes back

We have proven no theoremsNine thousand files in drivers analyzed (you do get dirty!)Dozen reports for 9K lines, not a lot of noiseStill a lot of work to filter out false positives (you get dirty!)You talk to devs: they want you to fix bugs! (you may get dirty!)Dozen new bugs confirmed and 5 fixed in the Linux kernel projects (somein the main tree already)[recall] On 26 random historical double lock bugs, EBA finds 22; much morethan competing tools (≤ 12), despite negative bias

http://eba.wikit.itu.dk/Iago Abal, Claus Brabrand, Andrzej Wasowski. Effective Bug Finding in C Programs with Shape and Effect Abstractions VMCAI’17

c© Andrzej Wasowski, IT University of Copenhagen 12

c© Andrzej Wasowski, IT University of Copenhagen 13

Ariane V (1996)

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (1996)

A floating point cast bug,A decade of development,$7B development budget,$0.5B lost rocket & cargo,but ...

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (1996)A floating point cast bug,

A decade of development,$7B development budget,$0.5B lost rocket & cargo,but ...

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (1996)A floating point cast bug,A decade of development,

$7B development budget,$0.5B lost rocket & cargo,but ...

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (1996)A floating point cast bug,A decade of development,$7B development budget,

$0.5B lost rocket & cargo,but ...

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (1996)A floating point cast bug,A decade of development,$7B development budget,$0.5B lost rocket & cargo,

but ...

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (1996)A floating point cast bug,A decade of development,$7B development budget,$0.5B lost rocket & cargo,but ...

c© Andrzej Wasowski, IT University of Copenhagen 14

Ariane V (2013)

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996� 3 crashes since 1996� Only the first linked to a software bug

(so is HW really more reliable?)� Last 75 launches with no incidents� Most recent launch: Nov 17, 2016

Have you heard about it?� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996

� 3 crashes since 1996� Only the first linked to a software bug

(so is HW really more reliable?)� Last 75 launches with no incidents� Most recent launch: Nov 17, 2016

Have you heard about it?� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996� 3 crashes since 1996

� Only the first linked to a software bug(so is HW really more reliable?)

� Last 75 launches with no incidents� Most recent launch: Nov 17, 2016

Have you heard about it?� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996� 3 crashes since 1996� Only the first linked to a software bug

(so is HW really more reliable?)

� Last 75 launches with no incidents� Most recent launch: Nov 17, 2016

Have you heard about it?� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996� 3 crashes since 1996� Only the first linked to a software bug

(so is HW really more reliable?)� Last 75 launches with no incidents

� Most recent launch: Nov 17, 2016Have you heard about it?

� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996� 3 crashes since 1996� Only the first linked to a software bug

(so is HW really more reliable?)� Last 75 launches with no incidents� Most recent launch: Nov 17, 2016

Have you heard about it?

� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

Ariane V (2013)

� 89 launches since 1996� 3 crashes since 1996� Only the first linked to a software bug

(so is HW really more reliable?)� Last 75 launches with no incidents� Most recent launch: Nov 17, 2016

Have you heard about it?� They never show you this slide ...

c© Andrzej Wasowski, IT University of Copenhagen 15

c© Andrzej Wasowski, IT University of Copenhagen 16

c© Andrzej Wasowski, IT University of Copenhagen 16

c© Andrzej Wasowski, IT University of Copenhagen 16

1.27 fatality per 100 million miles

including human failures

c© Andrzej Wasowski, IT University of Copenhagen 16

1.27 fatality per 100 million miles

including human failures

0.76 fatality per 100 million milesc© Andrzej Wasowski, IT University of Copenhagen 16

1.27 fatality per 100 million miles

including human failures

0.76 fatality per 100 million miles

0.03 fatalities per 100 million milesincluding human errors

c© Andrzej Wasowski, IT University of Copenhagen 16

1.27 fatality per 100 million miles

including human failures

0.76 fatality per 100 million miles

0.03 fatalities per 100 million milesincluding human errors

If we areDoing so well,

Why are we stillSO OBSESSED

with correctness ?c© Andrzej Wasowski, IT University of Copenhagen 16

c© Andrzej Wasowski, IT University of Copenhagen 17

performance

c© Andrzej Wasowski, IT University of Copenhagen 17

security

c© Andrzej Wasowski, IT University of Copenhagen 17

c© Andrzej Wasowski, IT University of Copenhagen 17

c© Andrzej Wasowski, IT University of Copenhagen 17

c© Andrzej Wasowski, IT University of Copenhagen 17

c© Andrzej Wasowski, IT University of Copenhagen 17

c© Andrzej Wasowski, IT University of Copenhagen 17

diversity of domains:consumer electronicsautomotiveindustry automationbusiness softdata analytics

c© Andrzej Wasowski, IT University of Copenhagen 17

c© Andrzej Wasowski, IT University of Copenhagen 17

Let’s look into one domain

Online Privacy and Data Analyticsc© Andrzej Wasowski, IT University of Copenhagen 17

smartphone ownerc© Andrzej Wasowski, IT University of Copenhagen 18

smartphone ownerc© Andrzej Wasowski, IT University of Copenhagen 18

smartphone ownerc© Andrzej Wasowski, IT University of Copenhagen 18

smartphone owner

� I cannot do muchabout this, as a SEresearcher

� My hammer — notgood enough

� Others work witheducation,awareness,regulation, politics,alternative businessmodels

c© Andrzej Wasowski, IT University of Copenhagen 18

smartphone owner

� I cannot do muchabout this, as a SEresearcher

� My hammer — notgood enough

� Others work witheducation,awareness,regulation, politics,alternative businessmodels

software developerc© Andrzej Wasowski, IT University of Copenhagen 18

smartphone owner

� I cannot do muchabout this, as a SEresearcher

� My hammer — notgood enough

� Others work witheducation,awareness,regulation, politics,alternative businessmodels

software developerc© Andrzej Wasowski, IT University of Copenhagen 18

smartphone owner

� I cannot do muchabout this, as a SEresearcher

� My hammer — notgood enough

� Others work witheducation,awareness,regulation, politics,alternative businessmodels

software developerc© Andrzej Wasowski, IT University of Copenhagen 18

smartphone owner

� I cannot do muchabout this, as a SEresearcher

� My hammer — notgood enough

� Others work witheducation,awareness,regulation, politics,alternative businessmodels

software developer

� Architecturalprinciples protectingpersonal data

� Detect libraries used� Detect information

flow to the libraryvendor

� Warn the developerof bad practice (likethe security scannersdo for security)

� Help programmersand companiesconform to GDPR

c© Andrzej Wasowski, IT University of Copenhagen 18

smartphone owner

� I cannot do muchabout this, as a SEresearcher

� My hammer — notgood enough

� Others work witheducation,awareness,regulation, politics,alternative businessmodels

software developer

� Architecturalprinciples protectingpersonal data

� Detect libraries used� Detect information

flow to the libraryvendor

� Warn the developerof bad practice (likethe security scannersdo for security)

� Help programmersand companiesconform to GDPRLots of potentially interesting work

Seeking thesis studentsc© Andrzej Wasowski, IT University of Copenhagen 18

caring parentc© Andrzej Wasowski, IT University of Copenhagen 19

caring parentc© Andrzej Wasowski, IT University of Copenhagen 19

caring parentc© Andrzej Wasowski, IT University of Copenhagen 19

caring parentc© Andrzej Wasowski, IT University of Copenhagen 19

caring parent

� 3000+ schools inPoland uses thesystem (data from 2014)

� The database trackseasily over halfmilion data subjects

� Not only grades� Communication with

parents, conduct,illness, etc.

� This dataset is boundto grow fast

� Cannot help much.We need education,regulation, andgovernance, etc.

c© Andrzej Wasowski, IT University of Copenhagen 19

caring parent

� 3000+ schools inPoland uses thesystem (data from 2014)

� The database trackseasily over halfmilion data subjects

� Not only grades� Communication with

parents, conduct,illness, etc.

� This dataset is boundto grow fast

� Cannot help much.We need education,regulation, andgovernance, etc.

c© Andrzej Wasowski, IT University of Copenhagen 19

caring parent

� 3000+ schools inPoland uses thesystem (data from 2014)

� The database trackseasily over halfmilion data subjects

� Not only grades� Communication with

parents, conduct,illness, etc.

� This dataset is boundto grow fast

� Cannot help much.We need education,regulation, andgovernance, etc.

� High value in thebig (personal) data

� Companies want toextract the value

� But how to store andprocess these datarespectfully?

� How to anonymizethe data?

� Which anonymizationmethod to use? Howto configure it? Howto test whether weuse it correctly?

� Help programmersand companies toconform to GDPR

c© Andrzej Wasowski, IT University of Copenhagen 19

Differential PrivacyAn example from software engineering perspective

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

R1 R2

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ P2

ꞒCynthia Dwork. Differential privacy. ICALP 2006

c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ P2

P1

P2≤ eε

For any such set of results S and for anyneighbouring sets D1 and D2

Can we expect an average programmerto implement this?If so then how to test this ?Can we implement reusable differentialprivacy components, similar toencryption libraries?

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ P2

P1

P2≤ eε

For any such set of results S and for anyneighbouring sets D1 and D2

Can we expect an average programmerto implement this?If so then how to test this ?Can we implement reusable differentialprivacy components, similar toencryption libraries?

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ P2

P1

P2≤ eε

For any such set of results S and for anyneighbouring sets D1 and D2

Can we expect an average programmerto implement this?

If so then how to test this ?Can we implement reusable differentialprivacy components, similar toencryption libraries?

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ P2

P1

P2≤ eε

For any such set of results S and for anyneighbouring sets D1 and D2

Can we expect an average programmerto implement this?If so then how to test this ?

Can we implement reusable differentialprivacy components, similar toencryption libraries?

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Differential PrivacyAn example from software engineering perspective

D1 D2

K K

S

P1Ꞓ P2

P1

P2≤ eε

For any such set of results S and for anyneighbouring sets D1 and D2

Can we expect an average programmerto implement this?If so then how to test this ?Can we implement reusable differentialprivacy components, similar toencryption libraries?

Cynthia Dwork. Differential privacy. ICALP 2006c© Andrzej Wasowski, IT University of Copenhagen 20

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation data

How do we select the value of epsilon ?Does this notion of privacy at all capture what data subjects would expect ?This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: studentsRQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?Ultimately a handbook/blueprint for selection and use of anonymization technology

Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovskic© Andrzej Wasowski, IT University of Copenhagen 21

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation data

How do we select the value of epsilon ?Does this notion of privacy at all capture what data subjects would expect ?This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: studentsRQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?Ultimately a handbook/blueprint for selection and use of anonymization technology

maximum anonymity maximum utility

Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovskic© Andrzej Wasowski, IT University of Copenhagen 21

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation dataHow do we select the value of epsilon ?

Does this notion of privacy at all capture what data subjects would expect ?This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: studentsRQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?Ultimately a handbook/blueprint for selection and use of anonymization technology

maximum anonymity maximum utility

ε = 0 ε = +∞Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S]

?Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovski

c© Andrzej Wasowski, IT University of Copenhagen 21

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation dataHow do we select the value of epsilon ?Does this notion of privacy at all capture what data subjects would expect ?

This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: studentsRQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?Ultimately a handbook/blueprint for selection and use of anonymization technology

maximum anonymity maximum utility

ε = 0 ε = +∞Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S]

?Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovski

c© Andrzej Wasowski, IT University of Copenhagen 21

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation dataHow do we select the value of epsilon ?Does this notion of privacy at all capture what data subjects would expect ?This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: students

RQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?Ultimately a handbook/blueprint for selection and use of anonymization technology

maximum anonymity maximum utility

ε = 0 ε = +∞Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S]

?Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovski

c© Andrzej Wasowski, IT University of Copenhagen 21

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation dataHow do we select the value of epsilon ?Does this notion of privacy at all capture what data subjects would expect ?This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: studentsRQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?

Ultimately a handbook/blueprint for selection and use of anonymization technology

maximum anonymity maximum utility

ε = 0 ε = +∞Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S]

?Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovski

c© Andrzej Wasowski, IT University of Copenhagen 21

Anonymization is difficult, e.g. we are able to re-identify a good number ofstudents in the "anonymized" course evaluation dataHow do we select the value of epsilon ?Does this notion of privacy at all capture what data subjects would expect ?This week running an experiment trying to understand this problem at ITUData: collected by WiFi access points at ITU, data subjects: studentsRQ1: How to relate the noise (ε) to privacy concerns of the data subjects?RQ2: Can data-subjects inform design of data protection in a system?Ultimately a handbook/blueprint for selection and use of anonymization technology

maximum anonymity maximum utility

ε = 0 ε = +∞Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S]

?Joint work with Mark Berthelsen, Gediminas Kucas, Tina Cecilie Schultz, Irina Shklovski

c© Andrzej Wasowski, IT University of Copenhagen 21

c© Andrzej Wasowski, IT University of Copenhagen 23

Many more other issues than correctness

Let’s look into: Aging Systemsc© Andrzej Wasowski, IT University of Copenhagen 24

c© Andrzej Wasowski, IT University of Copenhagen 25

c© Andrzej Wasowski, IT University of Copenhagen 25

(ancient religions and philosophies)

c© Andrzej Wasowski, IT University of Copenhagen 25

(ancient religions and philosophies) (Nocturne E flat major, op. 55 no. 2)

(Gustav Klimt, Adele Bloch-Bauer)

c© Andrzej Wasowski, IT University of Copenhagen 25

(ancient religions and philosophies) (Nocturne E flat major, op. 55 no. 2)

(Gustav Klimt, Adele Bloch-Bauer)(Søren Kierkegaard)

c© Andrzej Wasowski, IT University of Copenhagen 25

(ancient religions and philosophies) (Nocturne E flat major, op. 55 no. 2)

(Gustav Klimt, Adele Bloch-Bauer)(Søren Kierkegaard)

c© Andrzej Wasowski, IT University of Copenhagen 25

(ancient religions and philosophies) (Nocturne E flat major, op. 55 no. 2)

(Gustav Klimt, Adele Bloch-Bauer)(Søren Kierkegaard)

Is LEGACY a MISNOMERIs LEGACY a MISNOMERfor SOFTWARE?for SOFTWARE?

c© Andrzej Wasowski, IT University of Copenhagen 25

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

In Search of Lost Time by Marcel ProustProust actually died before finishing14 years of Proust’s work9.6 million characters

Many readers suffered long after

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

In Search of Lost Time by Marcel ProustProust actually died before finishing14 years of Proust’s work9.6 million charactersMany readers suffered long after

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

The Linux Kernel by Thousands of Engineers25 years since 1991700 million characters

An important intellectual contributionbenefiting largely the entire societyIs Linux kernel an outlier?

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

The Linux Kernel by Thousands of Engineers25 years since 1991700 million charactersAn important intellectual contributionbenefiting largely the entire society

Is Linux kernel an outlier?

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

The Linux Kernel by Thousands of Engineers25 years since 1991700 million charactersAn important intellectual contributionbenefiting largely the entire societyIs Linux kernel an outlier?

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

1977 Oldsmobile Toronado by General Motors(Likely) the first car with control softwareAn ECU controls the spark timingWe guess 1-3 K lines of code

By 1981 each GM car had 50 KLOCA modern car has about 100 MLOCIncidentally more than a Dreamliner (7MLOC)Still less than our brain (1015 synapses not 106)Software systems are not only key for our lifestyle, butalso likely most complex human creations ever

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

1977 Oldsmobile Toronado by General Motors(Likely) the first car with control softwareAn ECU controls the spark timingWe guess 1-3 K lines of codeBy 1981 each GM car had 50 KLOCA modern car has about 100 MLOC

Incidentally more than a Dreamliner (7MLOC)Still less than our brain (1015 synapses not 106)Software systems are not only key for our lifestyle, butalso likely most complex human creations ever

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

1977 Oldsmobile Toronado by General Motors(Likely) the first car with control softwareAn ECU controls the spark timingWe guess 1-3 K lines of codeBy 1981 each GM car had 50 KLOCA modern car has about 100 MLOCIncidentally more than a Dreamliner (7MLOC)

Still less than our brain (1015 synapses not 106)Software systems are not only key for our lifestyle, butalso likely most complex human creations ever

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

1977 Oldsmobile Toronado by General Motors(Likely) the first car with control softwareAn ECU controls the spark timingWe guess 1-3 K lines of codeBy 1981 each GM car had 50 KLOCA modern car has about 100 MLOCIncidentally more than a Dreamliner (7MLOC)Still less than our brain (1015 synapses not 106)

Software systems are not only key for our lifestyle, butalso likely most complex human creations ever

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

1977 Oldsmobile Toronado by General Motors(Likely) the first car with control softwareAn ECU controls the spark timingWe guess 1-3 K lines of codeBy 1981 each GM car had 50 KLOCA modern car has about 100 MLOCIncidentally more than a Dreamliner (7MLOC)Still less than our brain (1015 synapses not 106)Software systems are not only key for our lifestyle, butalso likely most complex human creations ever

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

100K commits in Facebook. Every. Week. Size of Facebook: 60 MLOCSize of Google 2 000 MLOCExamples start to get boring

Software is important for us, a bit like art.Imagine life without legacy systems (no banking, no credit cards, no railways, noairlines, no tax office)But software is also different. Why is art aging well, and software ages badly?Software does not change a single bitWe change, our needs change, our engineers change

Complexity is the Only Constant (Jürgen Dingel)

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

100K commits in Facebook. Every. Week. Size of Facebook: 60 MLOCSize of Google 2 000 MLOCExamples start to get boring

Software is important for us, a bit like art.Imagine life without legacy systems (no banking, no credit cards, no railways, noairlines, no tax office)

But software is also different. Why is art aging well, and software ages badly?Software does not change a single bitWe change, our needs change, our engineers change

Complexity is the Only Constant (Jürgen Dingel)

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

100K commits in Facebook. Every. Week. Size of Facebook: 60 MLOCSize of Google 2 000 MLOCExamples start to get boring

Software is important for us, a bit like art.Imagine life without legacy systems (no banking, no credit cards, no railways, noairlines, no tax office)But software is also different. Why is art aging well, and software ages badly?

Software does not change a single bitWe change, our needs change, our engineers change

Complexity is the Only Constant (Jürgen Dingel)

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

100K commits in Facebook. Every. Week. Size of Facebook: 60 MLOCSize of Google 2 000 MLOCExamples start to get boring

Software is important for us, a bit like art.Imagine life without legacy systems (no banking, no credit cards, no railways, noairlines, no tax office)But software is also different. Why is art aging well, and software ages badly?Software does not change a single bitWe change, our needs change, our engineers change

Complexity is the Only Constant (Jürgen Dingel)

https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

An output of complex intellectual activitySoftware is legacy like art, perhaps not a misnomer after all

100K commits in Facebook. Every. Week. Size of Facebook: 60 MLOCSize of Google 2 000 MLOCExamples start to get boring

Software is important for us, a bit like art.Imagine life without legacy systems (no banking, no credit cards, no railways, noairlines, no tax office)But software is also different. Why is art aging well, and software ages badly?Software does not change a single bitWe change, our needs change, our engineers change

Complexity is the Only Constant (Jürgen Dingel)https://www.technologyreview.com/s/508231/many-cars-have-a-hundred-million-lines-of-code | Juergen Dingel. Complexity is theOnly Constant: Trends in Computing and Their Relevance to MDE. ICGT’16 | Michael Feathers. Working with Legacy Code

c© Andrzej Wasowski, IT University of Copenhagen 26

Legacy Code iscode we’re afraid to change

[James Shore]

c© Andrzej Wasowski, IT University of Copenhagen 27

Legacy Code iscode without tests

[Michael Feathers]

c© Andrzej Wasowski, IT University of Copenhagen 28

Legacy Code iscode without a caretaker

[yours truly]

c© Andrzej Wasowski, IT University of Copenhagen 29

Some Solutions for Software ModernizationSome methods for taming legacy code

Doing nothingDoing nothing

c© Andrzej Wasowski, IT University of Copenhagen 30

Some Solutions for Software ModernizationSome methods for taming legacy code

Doing nothingDoing nothing ReplatformingReplatforming

c© Andrzej Wasowski, IT University of Copenhagen 30

Some Solutions for Software ModernizationSome methods for taming legacy code

Doing nothingDoing nothing ReplatformingReplatforming

VirtualizationVirtualization

c© Andrzej Wasowski, IT University of Copenhagen 30

Some Solutions for Software ModernizationSome methods for taming legacy code

Doing nothingDoing nothing ReplatformingReplatforming

VirtualizationVirtualization Re-architectingRe-architecting

c© Andrzej Wasowski, IT University of Copenhagen 30

Some Solutions for Software ModernizationSome methods for taming legacy code

Doing nothingDoing nothing ReplatformingReplatforming

VirtualizationVirtualization Re-architectingRe-architecting

c© Andrzej Wasowski, IT University of Copenhagen 30

An Example of a Modernization ProjectAn Example of a Modernization Project

c© Andrzej Wasowski, IT University of Copenhagen 31

An Example of a Modernization ProjectAn Example of a Modernization Project

Slide elements by Alexandru F. Iosif-Lazar

c© Andrzej Wasowski, IT University of Copenhagen 31

Can we trust a complex transformation?Can we trust a complex transformation?

c© Andrzej Wasowski, IT University of Copenhagen 32

Can we trust a complex transformation?Can we trust a complex transformation?

Luckily the program was finite stateVerify that the input and output are functionally equivalent

automatic and fast modernizing

transformation

symbolic execution

equivalence check

using an SMT solver

symbolic execution

modernized code

behavior ofmodernized

code

legacycode

legacycode

Impossible without others turning theory into engineering componentsSemantics → symbolic executorsGrammar theory → transformation languagesDeductive systems → SMT solvers

Slide elements by Alexandru F. Iosif-Lazar

c© Andrzej Wasowski, IT University of Copenhagen 32

Can we trust a complex transformation?Can we trust a complex transformation?

Luckily the program was finite stateVerify that the input and output are functionally equivalent

automatic and fast modernizing

transformation

symbolic execution

equivalence check

using an SMT solver

symbolic execution

modernized code

behavior ofmodernized

code

legacycode

legacycode

Impossible without others turning theory into engineering componentsSemantics → symbolic executorsGrammar theory → transformation languagesDeductive systems → SMT solvers

Slide elements by Alexandru F. Iosif-Lazar

c© Andrzej Wasowski, IT University of Copenhagen 32

Can we trust a complex transformation?Can we trust a complex transformation?

Luckily the program was finite stateVerify that the input and output are functionally equivalent

automatic and fast modernizing

transformation

symbolic execution

equivalence check

using an SMT solver

symbolic execution

modernized code

behavior ofmodernized

code

legacycode

legacycode

Impossible without others turning theory into engineering componentsSemantics → symbolic executorsGrammar theory → transformation languagesDeductive systems → SMT solvers

Slide elements by Alexandru F. Iosif-Lazar

100% correctness not a goalIt does not pay offIdentifying errors key

c© Andrzej Wasowski, IT University of Copenhagen 32

Can we trust a complex transformation?Can we trust a complex transformation?

Luckily the program was finite stateVerify that the input and output are functionally equivalent

automatic and fast modernizing

transformation

symbolic execution

equivalence check

using an SMT solver

symbolic execution

modernized code

behavior ofmodernized

code

legacycode

legacycode

Impossible without others turning theory into engineering componentsSemantics → symbolic executorsGrammar theory → transformation languagesDeductive systems → SMT solvers

Slide elements by Alexandru F. Iosif-Lazar

100% correctness not a goalIt does not pay offIdentifying errors key

c© Andrzej Wasowski, IT University of Copenhagen 32

What is interesting in SE research according to AW?What is interesting in SE research according to AW?

c© Andrzej Wasowski, IT University of Copenhagen 33

What is interesting in SE research according to AW?What is interesting in SE research according to AW?

Warning!You may get dirty

c© Andrzej Wasowski, IT University of Copenhagen 9

c© Andrzej Wasowski, IT University of Copenhagen 33

What is interesting in SE research according to AW?What is interesting in SE research according to AW?

Warning!You may get dirty

c© Andrzej Wasowski, IT University of Copenhagen 9

1.27 fatality per 100 million miles

including human failures

0.76 fatality per 100 million miles

0.03 fatalities per 100 million milesincluding human errors

If we areDoing so well,

Why are we stillSO OBSESSED

with correctness ?c© Andrzej Wasowski, IT University of Copenhagen 15

c© Andrzej Wasowski, IT University of Copenhagen 33

What is interesting in SE research according to AW?What is interesting in SE research according to AW?

Warning!You may get dirty

c© Andrzej Wasowski, IT University of Copenhagen 9

1.27 fatality per 100 million miles

including human failures

0.76 fatality per 100 million miles

0.03 fatalities per 100 million milesincluding human errors

If we areDoing so well,

Why are we stillSO OBSESSED

with correctness ?c© Andrzej Wasowski, IT University of Copenhagen 15

c© Andrzej Wasowski, IT University of Copenhagen 33

What is interesting in SE research according to AW?What is interesting in SE research according to AW?

Warning!You may get dirty

c© Andrzej Wasowski, IT University of Copenhagen 9

1.27 fatality per 100 million miles

including human failures

0.76 fatality per 100 million miles

0.03 fatalities per 100 million milesincluding human errors

If we areDoing so well,

Why are we stillSO OBSESSED

with correctness ?c© Andrzej Wasowski, IT University of Copenhagen 15

(ancient religions and philosophies) (Nocturne E flat major, op. 55 no. 2)

(Gustav Klimt, Adele Bloch-Bauer)(Søren Kierkegaard)

Is LEGACY a MISNOMERIs LEGACY a MISNOMERfor SOFTWARE?for SOFTWARE?

c© Andrzej Wasowski, IT University of Copenhagen 23

c© Andrzej Wasowski, IT University of Copenhagen 33


Recommended