+ All Categories
Home > Education > A Machine Learning approach to predict Software Defects

A Machine Learning approach to predict Software Defects

Date post: 12-Apr-2017
Category:
Upload: chetan-hireholi
View: 28 times
Download: 3 times
Share this document with a friend
30
Escalation Prediction on Defects Database Dr. K. V. Subramaniam Chetan Hireholi, 01FM14ESE006 Guide Project author
Transcript
Page 1: A Machine Learning approach to predict Software Defects

Escalation Prediction on Defects Database

Dr. K. V. SubramaniamChetan Hireholi, 01FM14ESE006

GuideProject author

Page 2: A Machine Learning approach to predict Software Defects

Problem statement Determine what lead to Escalation by interpreting the Defects Corpus of the

customer support cases Alert on the Escalation based on the nature of the Defects, correlate the

Escalations on defects discovered by the customers and find the trigger point which leads to one such Escalation

Page 3: A Machine Learning approach to predict Software Defects

Data Source

Incident Database

CRs Database

The Incident Database: Contained the Customer Support cases. The CRs Database: Internally used database which details the cases which were Change Requests

Page 4: A Machine Learning approach to predict Software Defects

Data CleansingThe data in the Incidents and CRs Database had a lot of discrepancy (Ex. Rows

not in order, special characters in the Date Field, Multiple discrepancy in the company names viz. Boeing, Boeing Inc.,)

Tools such as OpenRefine & Microsoft Excel helped in removing such discrepancies.

Green Red Yellow0

50010001500200025003000350040004500

3831

125 329

Total

Total

Incident Database

Page 5: A Machine Learning approach to predict Software Defects

Understanding the workflow

Page 6: A Machine Learning approach to predict Software Defects

Algorithms

20.779

70.22

J48 Decision Tree

Correctly Classified Incorrectly classified

1. J 48 Decision Tree: 2. Naïve Bayes (RED & YELLOW corpus):

Attributes selected: Escalation, Expectation, Modules, Severity.

Motivation to do Textual Analysis:The discussion between the client and the developer is captured in the ‘Comments’ attribute in the Incidents Database. By analyzing this can unearth additional info about the defects (viz. what triggered the escalation?, initial escalation of a defect, nature of the client, etc.). This lead to the use of R to do Text Mining

a. Attributes selected: Escalation, Expectation, Modules, Severity.

b. Probability distribution for:i. RED Escalation: 0.242 (24.2%)ii. YELLOW Escalation: 0.758 (75.8%)iii. When Escalation is RED, then it is more likely that the

Severity is URGENT, with its probability distribution: 0.449 (44.9%)

iv. When Escalation is YELLOW, then it is more likely that the Severity is HIGH, with its probability distribution: 0.634 (63.4%)

3. Simple K Means method:

a. Cluster 1 formed: YELLOW, Investigate Issue & Hotfix required, Installation, High

b. Cluster 2 formed: RED, Investigate Issue, Installation, High

Page 7: A Machine Learning approach to predict Software Defects

Text Mining using RWhy R over NLTK (Python)?Easy to code, abundant packages Faster Pre Processing of the text

Mining the E- mail dump

Create Corpus(RED, YELLOW & GREEN)

Pre Processing of the Text(Removing punctuations,

Stop words, Numbers, Noise)

Apply ‘tm’ package for Text Mining the Corpus

Extract Graphs, Word Clouds of the trigger points

which are causing Escalations

Page 8: A Machine Learning approach to predict Software Defects

Results from Text mining

Page 9: A Machine Learning approach to predict Software Defects

Final escalation state= GREEN; Observations made prior to RED

Most frequently usedThe affected module

Page 10: A Machine Learning approach to predict Software Defects

Final escalation state= GREEN; Observations made prior to YELLOW

Aiding words / Prefix- Postfix Most frequently used

Words with highest frequency mined

Page 11: A Machine Learning approach to predict Software Defects

Final escalation state= YELLOW; Observations made prior to RED(only 4 cases)

Developer who is associated with the bug/incident

Page 12: A Machine Learning approach to predict Software Defects

Final escalation state= RED; Observations made prior to RED(Incidents jumped to RED from YELLOW state)

Most frequently usedThe affected module

Page 13: A Machine Learning approach to predict Software Defects

Observations made on RED corpus(The whole RED escalated dump)

The term “escalation” used along with “please” and “support” indicates that the escalation is RED or it will get converted to RED

Page 14: A Machine Learning approach to predict Software Defects

Observations made on GREEN corpus(The whole GREEN escalated dump)

The use of “Please” is not frequent; which in turn indicates- there are no much RED escalations happening in the incident history

Escalation count on the defect dump

Green Red Yellow0

500

1000

1500

2000

2500

3000

3500

4000

4500

3831

125329

Total

Total

Page 15: A Machine Learning approach to predict Software Defects

Other observations made on Incidents For RED cases:

(Where SEVERITY is URGENT) The Average number of days for a case to get escalated = 13.56 days

(Where SEVERITY is HIGH) The Average number of days for a case to get escalated= 25.29 days

(Where SEVERITY is MEDIUM) The Average number of days for a case to get escalated= 19.66 days

Page 16: A Machine Learning approach to predict Software Defects

Analyzing Incidents: Customers vs Escalations

RHEINENERGIE, HEWLETT PACKARD, DEUTSCHE BUNDESBANK: Highest number of RED escalations

RHEINENERGIE Hewlett Packard Ltd. CHOREGIE0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4

33

222222222222222

1111111111111111111111111111111111111111111111111111111111111111111111111111111111111

Total

Total

Page 17: A Machine Learning approach to predict Software Defects

Total RED escalations: 125/6433; The below shows the highest number of escalations on modules

Ops - Action Agent (opcacta) & Installation: Highest number of RED escalations

Ops - M

onitor A

gent (opcm

ona)

Installa

tion

Perf - C

ollecto

r

Ops - M

essage Agent (

opcmsg

a)

Ops - Trap In

terceptor (

opctrapi)

Ops - Lo

gfile Encapsu

lator (opcle

)

LCore - BBC

Ops - O

ther

Ops - Acti

on Agent (opca

cta)

Perf - C

oda

Ops - Agent R

epository

(agtrep)

LCore - XPL

Documentation

Lcore - D

eploy

Cluster A

wareness (ClAw)

Other

Ops - M

essage In

terceptor (o

pcmsg

i)

Perf - O

ther

Ops - O

psAgt

Perf

Unknown

Perf - G

lancePlus

Collecti

on Framework

Perf - A

larm

LCore - Contro

l

Perf - A

RM

LCore - Config

LCore - Secu

rity

0

5

10

15

20

25

30

3531

129 9 8

6 5 5 4 4 4 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1

Total

Total

Analyzing Incidents: Modules vs Escalations

Page 18: A Machine Learning approach to predict Software Defects

8.6 11.14 11.02 11.03 11 11.11 11.13 8.60.501 11.12 11.04 11.01 11.1 unknown 8.53 patch

0

5

10

15

20

25

3028

12 1211 11

109

76 6 6

3 3

1

Count of ESCALATION

Analyzing Incidents: S/w release vs Escalations

Row Labels Count of ESCALATION

8.6 2811.14 1211.02 1211.03 11

11 1111.11 1011.13 9

8.60.501 711.12 611.04 611.01 611.1 3

unknown 3

8.53 patch 1

Grand Total 125

Page 19: A Machine Learning approach to predict Software Defects

Analyzing Incidents: OS vs Escalations

(blank)

VMWare

Solaris 10

Solaris

Windows

Windows 2008 R2

Windows 2003

HP-UX 11.31

Other

AIX 6.1Close

dHP-U

XLin

ux

Centos

Windows 2003 R2

Linux R

ed Hat 6.2

Other (See Descr

iption)

Linux R

ed Hat RHEL 4

.6

Windows 2003 SP1

Linux R

ed Hat RHEL 5

.1

Windows XP SP3

Linux R

ed Hat RHEL 5

.2 AIX

Linux R

ed Hat RHEL 5

.50

10

20

30

40

50

60

70

80

9083

4 3 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1

Red

Red

Page 20: A Machine Learning approach to predict Software Defects

Analyzing Incidents: Developer vs Escalations

prasad.m.k_hp.com: Handled high number of escalations

prasad.m

.k_hp.co

m

vachan.b-s_

hp.com

ashwin.ra

mesh_hp.co

m

knag_hp.com

sowmya-s.

b_hp.com

chaitra.parash

ar_hp.com

sandeep.p5_hp.com

sjkris

hna.k_hp.co

m

ganesh-kumar.a

nantharamu-agrahara_hp.com

yogish.ja

gadeesh-gowda_hp.co

m

cherian.se

bastian_hp.co

m

narasimhiah_hp.co

m

priyanka

-k.ka

chhwaha_hp.co

m

phani.mupparty

_hp.com

ksree_hp.co

m

james.ponnusa

my_hp.co

m

sachin.divy

aveer_hp.co

m

sandeep.bhardwaj_hp.com

sachidananda.naik_

hp.com

shivaku

mara.madegowda_hp.co

m

kapil.dev_

hp.com

jag-hg_hp.com

shailesh

-hastimal.ja

in_hp.com

shibu.m.k_

hp.com

anila.jo

seph_hp.co

m

prashant.k

umar_hp.com

0

5

10

15

20

25

30

35

29

15

10 108 8

5 5 5 53 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1

Total

Total

Page 21: A Machine Learning approach to predict Software Defects

Analyzing CR data

Total0

2000

4000

6000

8000

10000

12000

10219

75 93

Escalations in CR

NShowstopperY

N Showstopper Y Grand TotalCount of ESCALATION 10219 75 93 10387

Note: For Defects or CRs (QCCR) , Showstopper would be marked for the defects which are must fixes or immediate fix is needed for a release

Page 22: A Machine Learning approach to predict Software Defects

Analyzing CRs: Customers vs Escalations

TATA CONSULTANCY SERVICES LTD: Highest ”Showstopper” escalations

Allegis, NORTHROP GRUMMAN,PepperWeed: Highest escalations

0

0.5

1

1.5

2

2.52

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

ShowstopperY

Page 23: A Machine Learning approach to predict Software Defects

Analyzing CRs: Modules vs Escalations

Ops - Monitor Agent (opcmona) & Installation: Highest ”Showstopper” escalations

Installation & Lcore – Other: Highest escalations

Installa

tion

Lcore - O

ther

Ops - O

ther

Perf - O

ther

Ops - M

essage Agent (

opcmsg

a)

Perf - R

TM

Perf - A

RM

Lcore - C

ontrol

Ops - Acti

on Agent (opca

cta)

LCore - BBC

Ops - Agent R

epository

(agtrep)

(blank)

Ops - M

essage In

terceptor (o

pcmsg

i)

Ops - M

onitor A

gent (opcm

ona)

Perf - C

ollecto

r

Unknown

Ops - ECS

Config

Lcore - S

ecurity

Perf - C

oda

Ops - O

psAgt

Documentation

SelfMon

LCore - Deploy

Collecti

on Framework

Ops - Trap In

terceptor (

opctrapi)

Ops - Lo

gfile Encapsu

lator (opcle

)0

5

10

15

20

25

30

35

17

35

4

12

12

23

4

1 12 2

4

1 1 1

31

20

10

53

2 2 2 2 2 2 2 2 21 1 1 1 1 1

ShowstopperY

Page 24: A Machine Learning approach to predict Software Defects

Analyzing CRs: S/w release vs Escalations

11 8.6

8.6_IP1

8.53 patch 10.5 9 8.5 8.111.2 8.1x

11.00.10111.11

11.1411.01

8.60.50111.02

11.10

10

20

30

40

50

60

70

20

51 1

41 2

10

5

12

5

63

14

64

1 1 1 1

ShowstopperY

Release 11 : Highest number of ”Showstopper” and ”Y” escalations

Page 25: A Machine Learning approach to predict Software Defects

Analyzing CRs: OS vs Escalations

Windows (Version number not clear): Highest number of Escalations Both “Showstopper” and “Y”

Windows

Windows 2008 R2

Windows 2003 R2

Windows 2008

AIX;HP-UX;Li

nux;Solaris

;Windows

Linux;W

indows

Windows 2003

Windows XP

Windows XP SP2

HP-UX 11;HP-U

X 11.31;Windows 2

003

Windows 2003 R2;W

indows 2008 R2

0

1

2

3

4

5

6

7

8

9

10

4 4 4

3 3

2

1

9

1

2

1 1

3

2

1 1 1

ShowstopperY

Note: Submitter of CRs tend to choose the OS fields as they want to. Some choose the exact versions where the issue was seen or reported or some choose just at a high level. No strict rules observed

Page 26: A Machine Learning approach to predict Software Defects

Analyzing CRs: Developer vs Escalations

swati.sinha_hp.com: Handled highest number of Showstopper Escalations

umesh.sharoff_hp.com : Handled highest number of Escalations

umesh.sh

aroff_hp.com

tejaswini.s

2_hp.com

srinath.nadig_hp.co

m

balaji.sundaram_hp.co

m

sunil.lingappa_hp.co

m

muneer.vb_hp.co

m

rathneesh.t-

m_hp.com

sonu.sudhaka

ran_hp.com

komal.rathor_hp.co

m

kiran.pilla

i_hp.com

dhanaseka

ran.d_hp.com

vaibhav.khanduja_hp.co

m

manohar.d.c_

hp.com

ganesh-kumar.a

nantharamu-agrahara_hp.com

vijay-s

hriniva

s.kalghatagi_hp.co

m

krish

na-murth

y.ganapathi_hp.co

m

mariyappa.nagalin

ga_hp.com

naresh.durgam_hp.co

m

pradeep.gururaj_hp.com

veera-raghava

.reddy_

hp.com

sjkris

hna.k_hp.co

m

jain.sambhav_

hp.com

yogish.ja

gadeesh-gowda_hp.co

m

bipin.mish

ra_hp.com

vikrant.n

avalgund_hp.com

james.ponnusa

my_hp.co

m

neeraja.k_hp.co

m

sachin.divy

aveer_hp.co

m

shahul-hameed.noor-m

ohamed_hp.com

yogeesh-g.v_

hp.com

(blank)

knag_hp.com

0

1

2

3

4

5

6

7

8

9

10

8

7

6 6

4 4 4

3 3 3 3 3 3

2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1

9

2

4

1

3

1 1

2 2

6

1

3 3

2

8

1 1 1 1

2

1 1 1 1 1

6

2 2

1 1

ShowstopperY

Page 27: A Machine Learning approach to predict Software Defects

Company behavior analysis: RHEINENERGIE (Had maximum RED escalations)

28 incident cases Patterns observed:◦ 6 RED escalation◦ Mostly contains RED escalations (6/28); 21.28% chance that an incident logged in will be a

RED escalation ◦ Most reported module:

◦ Ops - Monitor Agent (opcmona) (7 nos.) ; 3 of them were RED escalated◦ Installation (6 nos.)◦ Perf – Collector (3 nos.)

◦ Average number of days a single incident handled: 73.5 days◦ Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs;

◦ All the 6 RED escalations moved to CR; ◦ 8 GREEN escalations moved to CR;◦ 1 YELLOW escalations moved to CR;

Page 28: A Machine Learning approach to predict Software Defects

Company behavior analysis: APPLE INC

27 incident cases

Patterns observed:

No RED escalations ever

Mostly contains GREEN escalations (19/27); 70.37% chance that an incident logged in will be a GREEN escalation

Most reported modules: ◦ Ops Monitor Agent (4 nos)◦ Perf Collector (3 nos)◦ Installation, Ops- Action Agent, Ops- Ops Agent, Perf Other (2 nos each)

Average number of days a single incident handled: 463.777 days

Number of incidents which move to CR: 10; 37.03% of the incidents move to CRs

Page 29: A Machine Learning approach to predict Software Defects

Company behavior analysis: BOEING

33 incident cases

Patterns observed:

1 RED escalation

Mostly contains GREEN escalations (31/33); 93.93% chance that an incident logged in will be a GREEN escalation

Most reported module:◦ Installation (7 nos.)◦ Perf Collector, Other (5 nos.)◦ Perf GlancePlus (4 nos.)◦ Perf ARM (RED escalation); 3% chance that it will be an RED escalation

Average number of days a single incident handled: 399.322 days

Number of incidents which move to CR: 22; 66.66% of the incidents move to CRs

Page 30: A Machine Learning approach to predict Software Defects

Other observations made on Incidents

DIFFERENCE_INITIAL_CLOSED and DAYS_SUPPORT_TO_CPE are not matching

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100103106109112115118121

-400

-300

-200

-100

0

100

200

300

400

500

DIFFERENCE_INITIAL_CLOSED DAYS_SUPPORT_TO_CPE


Recommended