+ All Categories
Home > Documents > MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD...

MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD...

Date post: 15-Feb-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
278
MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment of the requirements for the PhD of the School of Psychology & Counselling, CARRS-Q, Queensland University of Technology, 2010
Transcript
Page 1: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

MINING PATTERNS

AND FACTORS

CONTRIBUTING TO

CRASH SEVERITY ON

ROAD CURVES

Shin Huey Chen

BCompSc(Hons), MCS

This report is submitted as partial fulfilment

of the requirements for the PhD of the

School of Psychology & Counselling, CARRS-Q,

Queensland University of Technology,

2010

Page 2: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 3: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 4: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Abstract

Road curves are an important feature of road infrastructure and many serious

crashes occur on road curves. In Queensland, the number of fatalities is twice

as many on curves as that on straight roads. Therefore, there is a need to re-

duce drivers’ exposure to crash risk on road curves. Road crashes in Australia

and in the Organisation for Economic Co-operation and Development(OECD)

have plateaued in the last five years (2004 to 2008) and the road safety com-

munity is desperately seeking innovative interventions to reduce the number

of crashes. However, designing an innovative and effective intervention may

prove to be difficult as it relies on providing theoretical foundation, coherence,

understanding, and structure to both the design and validation of the efficiency

of the new intervention.

Researchers from multiple disciplines have developed various models to

determine the contributing factors for crashes on road curves with a view

towards reducing the crash rate. However, most of the existing methods are

based on statistical analysis of contributing factors described in government

crash reports.

In order to further explore the contributing factors related to crashes on

road curves, this thesis designs a novel method to analyse and validate these

contributing factors. The use of crash claim reports from an insurance com-

pany is proposed for analysis using data mining techniques. To the best of

our knowledge, this is the first attempt to use data mining techniques to anal-

yse crashes on road curves. Text mining technique is employed as the reports

consist of thousands of textual descriptions and hence, text mining is able to

identify the contributing factors.

Besides identifying the contributing factors, limited studies to date have

i

Page 5: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

investigated the relationships between these factors, especially for crashes on

road curves. Thus, this study proposed the use of the rough set analysis

technique to determine these relationships. The results from this analysis are

used to assess the effect of these contributing factors on crash severity.

The findings obtained through the use of data mining techniques presented

in this thesis, have been found to be consistent with existing identified con-

tributing factors. Furthermore, this thesis has identified new contributing fac-

tors towards crashes and the relationships between them. A significant pattern

related with crash severity is the time of the day where severe road crashes

occur more frequently in the evening or night time. Tree collision is another

common pattern where crashes that occur in the morning and involves hitting

a tree are likely to have a higher crash severity. Another factor that influences

crash severity is the age of the driver. Most age groups face a high crash sever-

ity except for drivers between 60 and 100 years old, who have the lowest crash

severity. The significant relationship identified between contributing factors

consists of the time of the crash, the manufactured year of the vehicle, the age

of the driver and hitting a tree.

Having identified new contributing factors and relationships, a validation

process is carried out using a traffic simulator in order to determine their

accuracy. The validation process indicates that the results are accurate. This

demonstrates that data mining techniques are a powerful tool in road safety

research, and can be usefully applied within the Intelligent Transport System

(ITS) domain.

The research presented in this thesis provides an insight into the complex-

ity of crashes on road curves. The findings of this research have important

implications for both practitioners and academics. For road safety practition-

ers, the results from this research illustrate practical benefits for the design of

interventions for road curves that will potentially help in decreasing related

injuries and fatalities. For academics, this research opens up a new research

ii

Page 6: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

methodology to assess crash severity, related to road crashes on curves.

Keywords: Road curves, data mining, text mining, rough set analysis, crash

risk assessment, index scale, ITS, road safety.

iii

Page 7: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 8: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Contents

Abstract iii

List of Abbreviations xx

List of Publications and Presentations xxiii

Statement of Original Authorship xxv

Acknowledgements xxviii

1 Introduction 1

1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Rationale for the research . . . . . . . . . . . . . . . . . . . . . 3

1.3 Research aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Research approach . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.7 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Literature Review 13

2.1 Crashes on road curves and the causes . . . . . . . . . . . . . . 14

2.1.1 Crash causal chain . . . . . . . . . . . . . . . . . . . . . 15

2.1.2 The causes of crashes . . . . . . . . . . . . . . . . . . . . 16

v

Page 9: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.1.2.1 Road and environmental factors . . . . . . . . . 16

2.1.2.2 Driver-related factors . . . . . . . . . . . . . . . 21

2.1.2.3 Vehicle-related factors . . . . . . . . . . . . . . 24

2.2 Existing crash prediction models . . . . . . . . . . . . . . . . . . 24

2.2.1 Horizontal road curves . . . . . . . . . . . . . . . . . . . 25

2.2.1.1 The basic horizontal curve geometry . . . . . . 25

2.2.1.2 The clothoide . . . . . . . . . . . . . . . . . . . 27

2.2.2 Horizontal curve prediction models . . . . . . . . . . . . 29

2.2.2.1 Glennon’s horizontal curve model . . . . . . . . 30

2.2.2.2 Zegeer’s horizontal curve model . . . . . . . . . 31

2.2.3 Data mining techniques in road safety . . . . . . . . . . 33

2.2.3.1 VEDAS . . . . . . . . . . . . . . . . . . . . . . 34

2.2.3.2 SAWUR . . . . . . . . . . . . . . . . . . . . . . 35

2.2.4 Traffic simulators . . . . . . . . . . . . . . . . . . . . . . 36

2.2.4.1 CORSIM . . . . . . . . . . . . . . . . . . . . . 38

2.2.4.2 AutoTURN . . . . . . . . . . . . . . . . . . . . 39

2.2.4.3 PARAMICS . . . . . . . . . . . . . . . . . . . . 40

2.2.4.4 VISSIM . . . . . . . . . . . . . . . . . . . . . . 41

2.2.5 Driver behaviour model . . . . . . . . . . . . . . . . . . 44

2.2.5.1 Psychology-Based Driver Behaviour Models . . 45

2.3 Intelligent Transport System applications . . . . . . . . . . . . . 47

2.3.1 Interventions for Speeding . . . . . . . . . . . . . . . . . 49

2.3.2 Intervention for Sight distance . . . . . . . . . . . . . . . 51

2.3.3 Interventions for road curvature . . . . . . . . . . . . . . 52

vi

Page 10: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.3.4 Intervention for vehicle stability . . . . . . . . . . . . . . 53

2.4 Research direction . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Data mining 59

3.1 Knowledge Discovery in Databases and Data mining . . . . . . . 59

3.2 Text mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.1 Text mining algorithm . . . . . . . . . . . . . . . . . . . 64

3.3 Rough set theory . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.3.1 Rough sets analysis software . . . . . . . . . . . . . . . . 73

3.3.2 Rough set Algorithms . . . . . . . . . . . . . . . . . . . 74

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 Design of Approach 77

4.1 Scope of proposed approach . . . . . . . . . . . . . . . . . . . . 78

4.2 Framework of approach . . . . . . . . . . . . . . . . . . . . . . . 79

4.3 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.1 The attributes . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4 Identify factors from crash records . . . . . . . . . . . . . . . . . 84

4.4.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.4.2 Technique used to find contributing factors . . . . . . . . 87

4.4.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4.4 Transformation . . . . . . . . . . . . . . . . . . . . . . . 88

4.4.5 Text mining . . . . . . . . . . . . . . . . . . . . . . . . . 88

vii

Page 11: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.4.5.1 Text mining software selection . . . . . . . . . . 89

4.4.5.2 Text mining algorithm selection . . . . . . . . . 92

4.4.6 Factors validation . . . . . . . . . . . . . . . . . . . . . . 93

4.5 Identify relationship between factors . . . . . . . . . . . . . . . 93

4.5.1 Technique used to find the relationship . . . . . . . . . . 94

4.5.2 Transformation . . . . . . . . . . . . . . . . . . . . . . . 94

4.5.2.1 Classification . . . . . . . . . . . . . . . . . . . 95

4.5.2.2 Presence indication . . . . . . . . . . . . . . . . 96

4.5.2.3 Preparing the decision table . . . . . . . . . . . 97

4.5.3 Rough set analysis . . . . . . . . . . . . . . . . . . . . . 97

4.5.3.1 Rough set software selection . . . . . . . . . . . 97

4.5.3.2 Rough set algorithm selection . . . . . . . . . . 103

4.5.4 Verification of Rules . . . . . . . . . . . . . . . . . . . . 104

4.5.4.1 Dynamic verification . . . . . . . . . . . . . . . 104

4.5.4.2 The features of the defined simulator . . . . . . 105

4.5.4.3 Performance of the simulator . . . . . . . . . . 112

4.5.4.4 Statistical verification . . . . . . . . . . . . . . 113

4.5.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.5.5.1 Statistical quality filters . . . . . . . . . . . . . 114

4.6 Identify the significant contributing factors . . . . . . . . . . . . 115

4.6.1 Selected software program . . . . . . . . . . . . . . . . . 116

4.6.2 Transformation . . . . . . . . . . . . . . . . . . . . . . . 117

4.6.3 Select attributes . . . . . . . . . . . . . . . . . . . . . . . 119

4.7 Understanding crash severity . . . . . . . . . . . . . . . . . . . . 120

viii

Page 12: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.7.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . 121

4.7.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.8 Novelty and limitations of approach . . . . . . . . . . . . . . . . 122

4.8.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.8.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.8.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 124

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5 Implementation of approach 127

5.1 Flow of implementation . . . . . . . . . . . . . . . . . . . . . . 127

5.2 Identify factors from past crash records . . . . . . . . . . . . . . 128

5.2.1 Text mining process preparation . . . . . . . . . . . . . . 128

5.2.2 Text mining analysis process . . . . . . . . . . . . . . . . 133

5.3 Identify relationships between factors . . . . . . . . . . . . . . . 134

5.3.1 Rough set analysis preparation . . . . . . . . . . . . . . 134

5.3.2 Rough set analysis Process . . . . . . . . . . . . . . . . . 137

5.3.3 Filter rules . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.3.4 Rule validation preparation . . . . . . . . . . . . . . . . 139

5.3.4.1 Dynamic validation preparation . . . . . . . . . 140

5.3.4.2 Accuracy measurement preparation . . . . . . . 142

5.3.5 Validation process . . . . . . . . . . . . . . . . . . . . . 143

5.3.5.1 Dynamic validation process . . . . . . . . . . . 143

5.3.5.2 Accuracy measurement process . . . . . . . . . 144

5.4 Identify significant factors . . . . . . . . . . . . . . . . . . . . . 144

5.4.1 Attribute evaluation preparation . . . . . . . . . . . . . 144

ix

Page 13: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.4.2 Attribute evaluation process . . . . . . . . . . . . . . . . 146

5.5 Understanding crash severity . . . . . . . . . . . . . . . . . . . . 146

5.5.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . 147

5.5.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6 Results 149

6.1 Factors from past crash records . . . . . . . . . . . . . . . . . . 150

6.1.1 The factors . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.1.2 Factors validation . . . . . . . . . . . . . . . . . . . . . . 151

6.2 Relationships of attributes . . . . . . . . . . . . . . . . . . . . . 152

6.2.1 Selected rules . . . . . . . . . . . . . . . . . . . . . . . . 153

6.3 Rule validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.3.1 Validation with a traffic simulator . . . . . . . . . . . . . 161

6.3.2 Accuracy Measurement validation . . . . . . . . . . . . . 163

6.4 Identify Significant factors . . . . . . . . . . . . . . . . . . . . . 163

6.4.1 The factors . . . . . . . . . . . . . . . . . . . . . . . . . 164

6.5 Understanding crash severity . . . . . . . . . . . . . . . . . . . . 164

6.5.1 The rules . . . . . . . . . . . . . . . . . . . . . . . . . . 164

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7 Analysis and Discussion 169

7.1 Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7.1.1 Factors from past crash records . . . . . . . . . . . . . . 169

7.1.2 Relationships of contributing factors . . . . . . . . . . . 170

x

Page 14: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1.2.1 Overall view rule analysis . . . . . . . . . . . . 170

7.1.2.2 Lowest severity rule analysis . . . . . . . . . . . 172

7.1.2.3 Low severity rule analysis . . . . . . . . . . . . 173

7.1.2.4 Medium severity rule analysis . . . . . . . . . . 175

7.1.2.5 High severity rule analysis . . . . . . . . . . . . 176

7.1.2.6 Highest severity rule analysis . . . . . . . . . . 178

7.1.3 Rule validation . . . . . . . . . . . . . . . . . . . . . . . 180

7.1.3.1 Dynamic validation . . . . . . . . . . . . . . . . 181

7.1.3.2 Accuracy measurement validation . . . . . . . 182

7.1.4 Identify Significant factors . . . . . . . . . . . . . . . . . 183

7.1.5 Understanding crash severity . . . . . . . . . . . . . . . 183

7.1.5.1 Overall view rule analysis . . . . . . . . . . . . 184

7.1.5.2 Lowest severity rule analysis . . . . . . . . . . . 185

7.1.5.3 Low severity rule analysis . . . . . . . . . . . . 187

7.1.5.4 Medium severity rule analysis . . . . . . . . . . 188

7.1.5.5 High severity rule analysis . . . . . . . . . . . . 190

7.1.5.6 Highest severity rule analysis . . . . . . . . . . 190

7.1.6 Overall analysis of the rules . . . . . . . . . . . . . . . . 192

7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

7.2.1 Research questions and answers . . . . . . . . . . . . . . 194

7.2.2 Application of results in road safety . . . . . . . . . . . . 197

7.2.3 Ways to reduce the crash severity . . . . . . . . . . . . . 199

7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

xi

Page 15: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8 Conclusion and Future work 205

8.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8.1.1 The aim . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8.1.2 Summary of approach . . . . . . . . . . . . . . . . . . . 206

8.1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 207

8.1.4 Research findings and implications . . . . . . . . . . . . 208

8.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

8.3 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

A Literature Review: Horizontal Curves and Road Engineering

interventions 221

A.1 Types of horizontal curves . . . . . . . . . . . . . . . . . . . . . 221

A.2 Road engineering and environmental interventions . . . . . . . . 223

A.3 Driver-related interventions . . . . . . . . . . . . . . . . . . . . 227

A.4 Vehicle-related interventions . . . . . . . . . . . . . . . . . . . . 228

B Data categories 231

B.1 Classification and Labels . . . . . . . . . . . . . . . . . . . . . . 231

References 235

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

xii

Page 16: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

List of Figures

1.1 The number of crashes on road curves over a 10-year period. . 2

1.2 An overview of the proposed framework in relation to the re-

search questions. . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 The three major contributing factors of road crashes (Shinar,

2007). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 An illustration of the degree of curve (Highway, 2004). . . . . . 18

2.3 An illustration of the length of a curve (Highway, 2004). . . . . 19

2.4 An illustration of the sight distance in horizontal curve (AEC-

Portico, 2005). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 A cross-section of a super-elevated horizontal curve (Environ-

ment & Works Bureau, 1997). . . . . . . . . . . . . . . . . . . . 20

2.6 The geometry of a horizontal curve (CTRE, 2006). . . . . . . . 25

2.7 The clothoide of a road curve (Herve, 2005). . . . . . . . . . . 28

2.8 An illustration of the Task-capability for driver behaviour in

psychology studies (Fuller, 2005). . . . . . . . . . . . . . . . . . 46

2.9 An illustration of the Curve Warning System (Gazill & Robe,

2003). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.10 An illustration of how a shift control system with navigation

system can help a driver in road curves (Amemiya, 2004). . . . . 52

2.11 An illustration of how shift control system with navigation sys-

tem can help a driver in road curves (Amemiya, 2004). . . . . . 53

xiii

Page 17: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.1 An overview of the steps within the KDD process (Fayyad,

Piatetsky-Shapiro & Smyth, 1996). . . . . . . . . . . . . . . . . 60

4.1 The framework of the proposed approach related to the research

questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2 The overview of the process for the first research question. . . . 85

4.3 The traffic incident report. . . . . . . . . . . . . . . . . . . . . . 86

4.4 The overview of the processes taken to identify the relationships

between the contributing factors. . . . . . . . . . . . . . . . . . 94

4.5 The lateral position results for experienced drivers (Abdourah-

mane, 2005). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.6 The lateral position results for inexperienced drivers (Abdourah-

mane, 2005). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.7 The overview of the process for the third research question. . . . 116

4.8 The outcome of the analysis processes. . . . . . . . . . . . . . . 121

5.1 The analysis process of the proposed approach relates to the

research questions. . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.2 The work space of the Enterprise miner. . . . . . . . . . . . . . 130

5.3 The flow of the analysis process. . . . . . . . . . . . . . . . . . . 130

5.4 An example of the parse settings. . . . . . . . . . . . . . . . . . 131

5.5 An example of the transformation tab. . . . . . . . . . . . . . . 132

5.6 An example of the clustering tab. . . . . . . . . . . . . . . . . . 133

5.7 The genetic algorithm configuration tab. . . . . . . . . . . . . . 138

5.8 The rule filter configuration tab. . . . . . . . . . . . . . . . . . . 138

5.9 The rule filter configuration tab. . . . . . . . . . . . . . . . . . . 143

5.10 The rule filter configuration tab. . . . . . . . . . . . . . . . . . . 145

xiv

Page 18: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.11 The attribute evaluator configuration window. . . . . . . . . . . 146

6.1 The comparison of the factors identified from both curve and

non-curve related crashes. . . . . . . . . . . . . . . . . . . . . . 151

A.1 An illustration of a simple curve. . . . . . . . . . . . . . . . . . 221

A.2 An illustration of a compound curve. . . . . . . . . . . . . . . . 222

A.3 An illustration of a reverse curve. . . . . . . . . . . . . . . . . . 222

A.4 An illustration of a spiral curve. . . . . . . . . . . . . . . . . . . 223

xv

Page 19: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 20: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

List of Tables

2.1 A summary of crash prediction models for horizontal curves. . . 33

2.2 A summary of the simulators based on the features. . . . . . . . 44

2.3 A summary of the crash prediction models for horizontal curves. 54

3.1 An information system example. . . . . . . . . . . . . . . . . . 67

3.2 An example format of a decision table. . . . . . . . . . . . . . . 70

4.1 The frequency count of each attribute in the data. . . . . . . . . 82

4.2 The frequency count of each attribute in the data (continue). . . 83

4.3 A summary of the text mining software programs. . . . . . . . . 92

4.4 A summary of the text mining software programs. . . . . . . . . 103

4.5 The list of six safety speeds and radius. . . . . . . . . . . . . . . 107

4.6 An example of a confusion matrix. . . . . . . . . . . . . . . . . 113

5.1 Tabulated contributing factors, age group, time of incident, age

of vehicle, driving experience and outcome. . . . . . . . . . . . . 136

5.2 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.1 The cost groups. . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6.2 The top five strongest rules. . . . . . . . . . . . . . . . . . . . . 153

6.3 The strongest rules for lowest level. . . . . . . . . . . . . . . . . 155

6.4 The strongest rules for low level. . . . . . . . . . . . . . . . . . . 156

6.5 The strongest rules for medium level. . . . . . . . . . . . . . . . 157

xvii

Page 21: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.6 The strongest rules for high level. . . . . . . . . . . . . . . . . . 158

6.7 The strongest rules for highest level. . . . . . . . . . . . . . . . 159

6.8 The strongest rules for simulation. . . . . . . . . . . . . . . . . . 160

6.9 Test cases results - Expected output. . . . . . . . . . . . . . . . 161

6.10 Test cases results - Actual ouput. . . . . . . . . . . . . . . . . . 162

6.11 The statistical information from accuracy measurement. . . . . 163

6.12 The strongest rules generated based on the significant factors. . 165

6.13 The strongest rules generated based on the significant factors

for the lowest severity level. . . . . . . . . . . . . . . . . . . . . 165

6.14 The strongest rules generated based on the significant factors

for the low severity level. . . . . . . . . . . . . . . . . . . . . . . 166

6.15 The strongest rules generated based on the significant factors

for the medium severity level. . . . . . . . . . . . . . . . . . . . 166

6.16 The strongest rules generated based on the significant factors

for the high severity level. . . . . . . . . . . . . . . . . . . . . . 166

6.17 The strongest rules generated based on the significant factors

for the highest severity level. . . . . . . . . . . . . . . . . . . . . 167

7.1 A summary of the research questions and answers. . . . . . . . . 196

B.1 The sub categories and labels for timeGrp. . . . . . . . . . . . 232

B.2 The sub categories and labels for the age group. . . . . . . . . . 233

B.3 The sub categories and labels for the age of the vehicle. . . . . 234

xviii

Page 22: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

List of Abbreviations

Abbreviation/Symbol Definition

ABS Automatic Braking System

ACC Adaptive Cruise Control

ADAS Advanced Driving Assistance Systems

AFS Adaptive Front Lighting system

AHSRA Advance Cruise-Assist Highway System

Research Association

API Application Programming Interface

ARC Australian Research Council Linkage

ASV Advanced Safety Vehicle

ATSB The Australian Transport Safety Bureau

CARRS-Q Centre for Accident Research and

Road Safety - Queensland

CASR Centre for Automotive Safety Research

CSW Curve Speed Warning

EBA Emergency Braking Assistance

EBD Electronic Brake-force Distribution

ESC Electronic Stability Control

GPS Global Positioning System

IAG Insurance Australia Group Limited

IIHS Insurance Institute for Highway Safety

ITS Intelligent Transport Systems

KDD Knowledge Discovery in Databases

xix

Page 23: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Abbreviation/Symbol Definition

KDD Knowledge Discovery and

Data mining

LDWS Lane Departure Warning System

MUARC Monash University Accident Research Centre

OECD Organisation for Economic Co-operation

and Development

PMD post-Mounted Delineators

QT Queensland Transport

RHT Risk Homoeostasis Theory

ROSE Rough Set Data Explorer

RSES Rough Set Exploration System

SAS Name of data mining tool

SAWUR Situation-Awareness With Ubiquitous

data mining for Road safety,

TSIS Traffic Software Integrated System

UDM Ubiquitous Data Mining

xx

Page 24: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Glossary

Advanced Driving Assistance Systems (ADAS) are intelligent vehi-

cle systems, namely driving assistance including lane-keeping, collision avoid-

ance and pedestrian detection, parking assistance, manoeuvre of vehicle pla-

toons and active suspension control (Sharke, 2004) .

‘Afternoon lull’ is the time of the day a driver’s biological clock makes

him sleepy.

Algorithm resembles a recipe with a finite set of well-defined instructions

to achieve some task, which is given an initial state and terminates in a desired

end-state.

Classification is a supervised learning method to find a set of models that

describe and distinguish data so as to predict classes of objects with unknown

labels.

Clustering consists of grouping a data set into subsets (clusters) which

has similar properties.

Crash cost is defined as the total damage cost of vehicles and any other

damaged objects.

Crash risk is the statistical probability of a crash.

Crash type refers to the type of crash such as rear-end, roll-over and run

off road types of crashes.

Contributing factors are the factors that are involved in the causal chain

of events that lead to a crash occurring.

Data mining, also known as knowledge-discovery in databases (KDD), is a

process that extracts knowledge by analysing data to discover hidden patterns

and dependencies in the database.

Global Positioning System (GPS) is a system that uses satellite to

xxi

Page 25: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

determine one’s precise location and highly accurate time reference anywhere

on Earth (Bishop, 2005).

Intelligent Transport System (ITS) is an application of modern com-

puter and communication technologies used to transport infrastructures and

vehicles (ATSB, 2004).

Situation refers to the state of affairs of an entity.

Situation-awareness refers to knowing what is going on and understand-

ing the possessed knowledge to achieve a certain goal (French & Hutchinson,

2003). The goal is important as it defines the scope of information to focus

on.

Ubiquitous data mining (UDM) is the process of analysing data com-

ing from distributed and heterogeneous sources with mobile and/or embedded

devices.

xxii

Page 26: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

List of Publications and Presentations

Conference Papers

1. Chen, Samantha and Rakotonirainy, Andry and Loke, Seng Wai and

Krishnaswamy, Shonali (2007). A crash risk assessment model for road

curves. In: 20th International Technical Conference on the Enhanced

Safety of Vehicles, 18-21 June 2007, Lyon, France.

2. Chen, Samantha and Rakotonirainy, Andry and Sheehan, Mary and Kr-

ishnaswamy, Shonali and Loke, Seng Wai (2006). Assessing Crash Risks

on Curves. In: Australian Road Safety Research, Policing and Education

Conference, 25th - 27th October 2006, Gold Coast, Queensland.

3. Chen, Samantha and Rakotonirainy, Andry and Sheehan, Mary and Kr-

ishnaswamy, Shonali and Loke, Seng Wai (2009). Applying Data Mining

to Assess Crash Risk on Curves. In: Australian Road Safety Research,

Policing and Education Conference, 10th - 12th November 2009, Sydney.

xxiii

Page 27: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 28: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet

requirements for an award at this or any other higher education institution. To

the best of my knowledge and belief, the thesis contains no material previously

published or written by another person except where due reference is made.

Signature: ..............................

Date: ...............................

xxv

Page 29: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 30: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Acknowledgements

Firstly, I would like to thank my supervision team. This included Dr. Andry

Rakotoninary, Associate Professor of CARRS-Q, Queensland University of

Technology, who was my principal supervisor and who showed such patience.

In addition, he supported the research with important advices and helpful sug-

gestions for improvement and gave constant encouragement throughout the

course of the research.

My first co-supervisor to whom I would like to express my appreciation who

is also from Queensland University of Technology is Professor Mary Sheehan

who listened to my issues during the course of research and provided appro-

priate support and encouragements.

The second co-supervisor was Associate Professor Shonali Krishnaswamy

from the School of Network Computing, Monash University, who I am grateful

for her criticism on the design and implementation as well as suggestions for

improvements.

The third co-supervisor was Dr. Seng Wai Loke, Associate Professor at

the Department of Computer Science and Computer Engineering at La Trobe

University. I would like to thank him for his constructive suggestions and

comments for my research.

I would like to express my gratitude to the Australian Research Council

(ARC) and the industry partner, IAG and CARRS-Q that provided me with

the opportunity and funding to conduct this research. I am grateful to the

partners from IAG who contributed their help in the research. I would like

to thank, Philip Woods for his patience in guiding and explaining the design

of the data and database to me so that I could have a better understanding

xxvii

Page 31: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

of what data I could use for analysis. Max Perry, who co-operated really well

and communicated with me to extract required data from the database at the

later stage of the research.

Another group of people I would like to thank is the ITS team who helped

me in numerous ways. I would like to thank the following team mates for their

help and endurance of my occasional crazy ways of relieving stress: Dr. Justin

Lee who helped me with the LaTex errors, provided pointers of ways to write

the thesis and had the patience to proofread part of the thesis. Gregoire Larue

who pointed out presentation errors of the equations in the thesis. Last but

not least, Husnain Malik, for the amusing debates, analytical moments and

the laughter he had gave me.

I also appreciate the help of Zahia Louguar, ENTPE-National School of Ur-

ban and Planning Design, an internship student from France, who had helped

me in part of the implementation of the road curve simulator. The simulator

would not had happened without the road-engineering knowledge she shared

with me.

I would also like to thank Jane Todd, Ng Meili, Katherine Teo, and Har-

minder Bhar for editing and proofreading my thesis.

Finally, special thanks to my family, who constantly give me support, espe-

cially to my grandfather who strongly supported me to begin my PhD journey,

but was not able to see the end of the journey. I also really appreciate the en-

couragement and support my best friend Lyanne Tan gave me. She has always

been there for me whenever I needed her. I am also very grateful to Samuel

Wong for his patience and tolerance with me especially during the period of

time when I was writing up the thesis. Also not forgetting a friend, Felix Tan,

for his help and guidance ever since I arrived in Brisbane and started my PhD.

Last but not least, to everyone who has helped in any way and given me moral

support and encouragement throughout my PhD journey.

Thank you all.

xxviii

Page 32: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 1

Introduction

Chapter Overview

Road crashes occur everyday in Australia and around the world. Statistics

show that over 3,000 people are killed in car crashes everyday and over 40,000

people killed each year throughout the world (OECD, 1997). In Australia in

2007, approximately 8 deaths per 100,000 population were due to car crashes

(Australia-Govt, 2008). Road crashes cost Australia $15 billion per year (BTE,

2000) and New South Wales experiences the highest cost, followed by Victoria

and Queensland.

1.1 The problem

Road curves are an important feature of road infrastructure. However, the

consequences of crashes that occur on road curves are often more severe than

on straight roads.

In Queensland, approximately 30% of crashes occur on road curves (Shields,

Morris, Jo & Fildes, 2001) and 34% of the crashes are fatal or require hospi-

talisation (QT, 2005). Reports from Queensland Transport shows the fatality

rate on road curves is 2.5 times higher than that on straight roads (QT, 2005).

Figure 1.1 illustrates the number of crashes on road curves over a period of

10 years. The distribution indicates that the crash rate on road curves are

steadily increasing.

1

Page 33: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2 CHAPTER 1. INTRODUCTION

Figure 1.1: The number of crashes on road curves over a 10-year period.

Several researchers have examined this issue comprehensively and a range

of interventions have been suggested to improve safety on road curves based on

the major contributing factors for crashes such as road, the environment, driver

and vehicle. Examples of the interventions are road signs, warnings, and a host

of campaigns to educate drivers. Existing interventions determine the causes

of a crash by obtaining the three contributing factor groups from police road

crash statistics reports. Unfortunately, these reports do not generally provide

detailed descriptions of crashes such as the relationships of these contributing

factors.

As technology continues to advance, vehicles are equipped with the basic

technology to support the driving task. Hence, an increasing number of ve-

hicles are equipped with technologies known as Intelligent Transport Systems

(ITS). ITS refers to the application of computing, information and communi-

cations technologies in transportation. Intelligent Transport Systems are in-

creasingly being used in all modes of transport to improve safety, convenience

and productivity. With ITS, vehicles have the ability to obtain information on

their current location using navigation systems such as Global Position Sys-

tem (GPS), obtain real-time traffic information, receive notification of possible

Page 34: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

1.2. RATIONALE FOR THE RESEARCH 3

collisions and speeding. The World Health Organisation stated that traffic ac-

cidents could be reduced to 40% of all injuries if all vehicles were equipped

with various ITS technologies (OECD, 2003).

Existing ITS applications for road curves are designed to reduce the occur-

rence of a crash. The applications are related to various contributing factors

and has specific functions related to each. Until recently, studies have been

carried out to determine the causes and crash rate when travelling on a road

curve using a wider range of data sources. These studies have not been com-

pleted and thus remain an area which requires further research.

1.2 Rationale for the research

The road authorities reports present the list of the contributing factors using

statistics only. This leads to the requirement of a multidisciplinary research

that uses theories from traffic engineering, road safety and computer science,

with the aim to identify and understand possible new contributing factors of

crashes using a wider range of analysis techniques. In addition, the reports do

not list the relationships between contributing factors, so this is an opportunity

to determine those relationships and the related crash severity.

Many researchers have examined the causes or contributing factors of road

crashes. In Australia, one such centre is the Centre for Accident Research and

Road Safety-Queensland (CARRS-Q). The aim of the centre is to identify and

initiate research, education, advocacy and services leading to the reduction of

injuries and fatalities on the road.

The research areas investigated in CARRS-Q are high risk and illegal driver

behaviour, vulnerable road users, school and community road safety, work

related road safety and human behaviour and technology (CARRS-Q, 2008).

This thesis concentrates on the human behaviour and technology component

which investigates how technology can assist in reducing the number of crashes

Page 35: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4 CHAPTER 1. INTRODUCTION

on roads in a variety of situations.

While CARRS-Q has a range of research projects examining road curves;

this thesis is the only one to focus on understanding of the relationships of

contributing factors and crash severity in relation to road curves.

1.3 Research aims

The aim of this research is to understand the major contributing factors of

crashes on road curves and the relationship between these factors, which can

be used to determine the crash severity associated with particular road curves.

In this research context, crash severity is measured using cost which represents

the damage cost of vehicles and any other damaged objects. The cost do not

include injuries sustained by the driver. Hence, this is a study to understand

and identify the contributing factors to crashes on road curves as well as the

effect of various combinations of these factors on crash severity.

This research uses insurance claim records to identify contributing factors

for crashes that have occurred on road curves. Data mining techniques are

employed as they have the ability to identify patterns and relationships. Re-

sults are more meaningful when a list of the combinations or the relationships

between the contributing factors are determined. This is achieved with the

rough set analysis approach which provides the minimal subset of contributing

factors and their combinations. The minimal subset of contributing factors

are useful for further application on vehicles as devices on-board have limited

memory and resources. Thus, data or inputs are reduced or maintained to the

minimal. The combinations of the contributing factors are used to understand

the crash severity.

Page 36: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

1.4. RESEARCH QUESTIONS 5

1.4 Research questions

Road authorities, such as Queensland Transport, use crash data to identify

the contributing factors for crashes on the roads. Contributing factors are the

possible causes that lead to an occurrence of a crash. Since various studies

have identified the contributing factors of crashes on road curves, this leads us

to the first research question.

• What are the factors discovered from the crash descriptions that causes

crashes on road curves?

This question leads to the investigation of the contributing factors for

crashes on road curves using insurance crash records. This will help in

determining any new contributing factors that can be identified when

more data sources are analysed.

Contributing factors are often reported as individual factors. For example

hitting a pedestrian, hitting an object, rear-end collisions and vehicle over-

turning. This lead to the second question:

• What are the characteristics that influence the severity of a crash?

This question investigates the characteristics which are made up of the

combination of contributing factors for the crashes. This is followed

by the process of understanding how the characteristics influence the

severity of the crash based on the cost incurred.

Depending on the number of contributing factors used for analysis, this list

can be lengthy. There is a need to identify a minimal number of significant

factors to represent the data and combinations. Subsequently, this leads to

the third question:

• Which significant factors can increase the severity of a crash?

This last question investigates the important contributing factors that

influence the severity levels of crashes.

Page 37: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6 CHAPTER 1. INTRODUCTION

This research aims to investigate all these questions and get answers through

data mining techniques. A traffic simulator is defined and will be utilised to

verify the results obtained from the data mining process and this will be dis-

cussed in later sections. The concept of data mining will also be covered in

the next section.

1.5 Research approach

In order to understand the causes of a crash, it is necessary to identify and un-

derstand the contributing factors. The standard approach used by researchers

finding possible contributing factors of crashes is via statistics from police

reports. Besides using these reports, this research proposes expanding the

opportunity to analyse crash insurance records to determine the contributing

factors. These insurance records are used because they contain a field that de-

scribes the crash which can provide more information of the crash. This type

of information in the road authorities crash databases is usually restricted to

the public. Thus, having these additional information about the crash can be

helpful to further understand the series of events that lead to the crash. The

proposed analysis technique applied to studying crash records is called data

mining.

Data mining is a relatively new term. Companies have been using powerful

computers and database software to analyse customers’ purchase patterns or

behaviour for many decades. Data mining is also known as data or knowledge

discovery, and is a process which analyses large volumes of data from different

points of view to find hidden correlations, patterns, trends and dependencies.

Consequently, predictive and descriptive models are created and used to sup-

port decision making. In the process of analysing the input, data is converted

to information and then knowledge. Data can be in the form of facts, numbers

or text generated from a computer. After the data is processed, information

Page 38: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

1.5. RESEARCH APPROACH 7

is obtained and the information consists of patterns or associations within

the data. Using the information to understand the patterns is the process of

gaining key knowledge.

As the crash description is in its natural language format, text mining is

selected to analyse the data. Text mining, also known as textual data mining,

is a variation of data mining. The analysis of the natural language text is

thought to be a problem for artificial intelligence. This is because it is a

complex task to train the learning model of the various meanings of words and

sentences. A word can have a different meanings in different contexts and not

all models are trained and capable of differentiating between them. Therefore,

it has been an issue to analyse natural language text. One resolution is to use

the technique, information retrieval which has the same goal as text mining.

However, this does not meet users’ needs. Text mining is also used in areas

such as customers’ reviews and preferences for improvement. Due to the format

of the data-free text, it is difficult to recommend solutions and few software

systems can comprehend the textual data. Thus, the use of text mining is

recommended.

The text mining tool used is a text miner module within SAS which is a

software system that can be used to perform data mining (SAS, 2006). The

choice of SAS is based on its ability to transform textual data into a useful

format to facilitate the classification and clustering of the data collected.

The clustering algorithm that will be used in SAS is the Ward algorithm.

Essentially, there are two methods of clustering: hierarchical algorithms and

portioning algorithms. Hierarchical algorithms create clusters with similar

characters. The Ward algorithm belongs to the hierarchical algorithm and is

considered to be the agglomerative or as bottom-up approach type of hierar-

chical algorithm. Agglomerative algorithm uses the distance between clusters

concept for clustering. The text mining process using the Ward algorithm

creates clusters which consist of keywords from the text description and these

Page 39: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8 CHAPTER 1. INTRODUCTION

keywords contain words that represent the contributing factors and the out-

come.

Text mining is performed on the textual data and in this context, the crash

description. Prior to analysis, the records need to be ‘clean’ to remove errors

and missing fields to ensure the data to be valid and containing no empty

fields. The text mining process will produce a list of keywords that are related

to the crash. Next, the list of keywords is tabulated into a table suitable for

the rough set analysis.

Rough set analysis introduced by Pawlak (1995) is an approach to classify

incomplete, inaccurate or doubtful information. This is a process that deter-

mines the relationships between the factors involved. The rough set analysis

provides a list of the combinations or relationships between the contributing

factors. This list is then verified using a simulator created using Matlab. The

verification process is conducted to determine the validity of the results ob-

tained from the rough set analysis.

The software tool used is ROSETTA which is a toolkit for analysing tabular

data with rough set theory in a Graphical User Interface (GUI) environment.

In addition, ROSETTA provides an extensive library of rough set algorithms.

The examples of algorithms used in ROSETTA are: Genetic algorithm, John-

son’s algorithm, Holte’s Reducer, Dynamic Reducts (RSES), Exhaustive cal-

culation (RSES) and Genetic Algorithm (RSES). Each algorithm returns a list

of various combinations of contributing factors.

The selected rough set algorithm to be used for analysing and determining

the combinations or relationships between contributing factors is the genetic

algorithm. As defined by Vinterbo and Ohrn (2000) genetic algorithm is based

on supervised learning where the model is trained with a set of data and is

later fine tuned with correct data. A list of the relationships between the

contributing factors are obtained after rough set analysis with the genetic

algorithm. The list produced is further analysed to determine the significant

Page 40: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

1.6. CONTRIBUTIONS 9

contributing factors.

This is followed by verification of the relationships between the contributing

factors based on the outcome. A simulator designed for road curves with

Matlab is used for the verification of the list of relationships between the

contributing factors obtained from rough set analysis. In order to simulate

a real situation, the simulator is designed based on a stochastic model which

uses and produces random results. The contributing factors are represented

as numerical value variables in the simulator and these numerical variables

are designed to be adjustable so that the effects and outcomes of different

combinations of the contributing factors can be observed.

Once the verification process is complete, the next step is to identify sig-

nificant contributing factors using a search algorithm. The algorithm returns

a list of factors that are the best ones during the search. The significant or

minimal number of factors are the set of contributing factors that influence the

crash severity. The data can be represented with this set of significant factors.

In order to understand the contributing factors and their effect on the crash

severity, the significant factors are used to determine the relationships between

the factors. The relationships can indicate various combinations of the factors

and the possible outcome related to the crash severity.

Figure 1.2 provides an overview of the overall proposed approach.

Extracting useful information from text requires complex algorithms and

lengthy manipulation of the data. Given the complexity of the algorithms,

the large amount of data and absence of similar research results which can

validate our approach and results, the process to design, try and combine

novel approach is required to prove that our results are accurate.

1.6 Contributions

The contributions of this thesis are:

Page 41: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

10 CHAPTER 1. INTRODUCTION

Figure 1.2: An overview of the proposed framework in relation to the research

questions.

• Using data mining techniques to identify contributing factors to crashes

on road curves.

Due to the format of the description, the specific data mining technique

used to identify the contributing factors is text mining. The use of text

mining technique will expand the approach to identify contributing fac-

tors of crashes on road curves. Key outcomes of the research are identified

below.

• Identify the relationship between the contributing factors

This research will help identify the relationship between the contributing

factors, which will indicate which contributing factors are closely related

and how they increases crash severity.

• Identify significant contributing factors

Identifying significant contributing factors can be useful as an indication

of which factors influence have a higher influence on the severity of a

crash and representing the data with minimal number of factors.

Page 42: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

1.7. THESIS OUTLINE 11

• Design a traffic simulator for road curves

A traffic simulator is designed and implemented to simulate crashes in

road curves with contributing factors related to the road, the environ-

ment, vehicle and driver.

• Validate rules with traffic simulator

The approach to validate rules with a traffic simulator is different from

the usual 10-fold cross validation technique. The traffic simulator is used

due to the area of road safety.

• Identify the relationships between contributing factors that influence

crash severity

An understanding of the associations of a minimal number of contribut-

ing factors. Additionally, the effects of the various relationships between

contributing factors on crash severity.

A detailed discussion is presented in the subsequent chapters in this thesis

with the next section briefly outlining the structure of this research.

1.7 Thesis outline

Chapter 2 - Literature review: This chapter evaluates the background

studies which include a review of the contributing factors to crashes and the

existing interventions to reduce the number of crashes on road curves. The

contributing factors are categorised into driver, vehicle, road and the environ-

ment. It goes on to explain how each factor contributes to crashes on road

curves. In addition, the existing countermeasures for the identified problems

are reviewed. The existing countermeasures are mostly individual as little or

Page 43: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

12 CHAPTER 1. INTRODUCTION

no research has combined all the solutions to tackle the problem. Hence, a

combination approach is proposed.

Chapter 3 - Data mining: This chapter gives a background theory on

data mining techniques.

Chapter 4 - Design of approach: This chapter explains the design

of the proposed method which employs data mining techniques to perform

analysis on the data collected. This explanation includes the techniques used

and the justification for utilising it.

Chapter 5 - Implementation of approach: This chapter explains the

process of implementing the approach.

Chapter 6 - Results: This chapter presents the results obtained from

data analysis with data mining techniques.

Chapter 7 - Analysis and Discussion: This chapter will pose obser-

vations and discussion on the results.

Chapter 8 - Conclusion and future work: This chapter concludes

the thesis with a discussion on the findings, possible future work research and

conclusion.

Appendix A - Additional information of the horizontal curves and ex-

isting road engineering interventions.

Appendix B - The tables of the classification and labels used during the

data transformation process in the Design of approach chapter.

Page 44: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 2

Literature Review

Chapter Overview

Road curves are essential features in road design and consist of horizontal

and vertical road curves. This research will focus on horizontal road curves

as part of its study. Existing studies have been carried out to determine the

contributing factors of crashes related to road curves and ways to reduce the

number of crashes and related crash severity. These studies have categorised

the contributing factors into three main categories: road and environment,

driver-related and vehicle-related factors.

This chapter will review the existing crash rate assessment models for hori-

zontal road curves. Most existing assessment model consists of horizontal curve

prediction models, application of data mining techniques, use of traffic sim-

ulators, psychology based driver behaviour models, and intelligent transport

systems.

Two horizontal curve prediction models exist however, they are only lim-

ited to highway road curves. Existing contributing factors are been reported

based on statistics collected hence, data mining can be used to determine the

significance in identifying contributing factors and other possible factors from

past claim reports that describe the crashes.

Traffic simulators are becoming an increasingly powerful tool in analysing

traffic and transportation systems. They are generally used to simulate and

monitor the traffic volume and crashes. However, there is no evidence of a

13

Page 45: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

14 CHAPTER 2. LITERATURE REVIEW

simulator that could integrate all three categories of contributing factors. Ad-

ditionally, no simulator is able to imitate a crash on a road curve without the

need to set up the variables in the simulator initially. Therefore, a brief expla-

nation for the need to define a traffic simulator is presented in the remainder

of this chapter.

Intelligent Technology Systems (ITS) are implemented on board vehicles to

aid drivers and adjust errors made by driver. However, each road curve related

ITS application is designed to solve a specific problem. It has been discovered,

that the number of models available to assess crash rates on horizontal road

curves is limited. Existing crash rate assessment models do not assess crash

rates based on all the contributing factor groups.

The rest of this chapter will cover crash severity, the contributing factors,

interventions and issues with existing approaches.

2.1 Crashes on road curves and the causes

Road curves are an important feature of our road infrastructure. In Australia,

30% of crashes occur on road curves (Shields, Morris, Jo & Fildes, 2001). In

Queensland, road curve-related crashes contribute to 63.44% of fatalities and

25.17% require hospitalisation (QT, 2006).

The severity of a crash is largely dependent on the contributing factors

prior to the crash. Crash statistics indicate that 73% of fatal crashes that

occur on road curves involve speeding. Speeding can lead to loss of control of

the vehicle followed by run-off road or roll-over crashes. Based on the database

at Queensland Transport, run-off road crashes contribute towards 53% of all

crashes on road curves (QT, 2006). Due to the severity of crashes on road

curves, it has led to this study which aims to understand the causes of crashes

and reduce the crash severity on road curves.

The identification of the causes of crashes can prevent or reduce the recur-

Page 46: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.1. CRASHES ON ROAD CURVES AND THE CAUSES 15

rence of the crashes on road curves which will in turn lead to a reduction in

the crash severity on road curves. Thus, the first step is to understand the

composition of the causes of crashes on road curves and this is discussed in

the following section.

2.1.1 Crash causal chain

This section explains the causal chain of events and the analytical approach

used to determine the causes of crashes. The causal chain of events encom-

passes pre-crash crash, crash, and post crash activities and factors (Rechnitzer,

2000). The factors and activities involved in the chain are analysed with causal

factor analysis. Crash causal factor analysis is used to understand the devel-

opment of a crash by collecting and placing the information in a logical and

chronological sequence for easier examination. This sequence allows for the vi-

sualisation of the multiple causes and relationship between direct and indirect

causes.

Direct causes are the contributing factors that primarily cause the occur-

rence of a crash (Palumbo & Rees, 2001). For example, the explosion of a

pressurised vessel is the immediate cause that leads to a crash. Contributing

factors may be events or conditions that increase the probability of the crash

occurring (Palumbo & Rees, 2001). A wet road surface is an example of a con-

tributing factor. Events can be defined as occurrences that happen in order to

complete a task with each event arranged in a chronological order. Conditions

can be defined as the state or situation of the crash. They are usually the

inactive elements that increases the probability of a crash occurring and this

case, a wet road.

Indirect causes can be events or conditions that are not sufficient to cause

the occurrence of a crash instead they trigger the direct causes and lead the

crash to occur. Indirect causes can also be unsafe acts or conditions (Palumbo

& Rees, 2001). Using defective equipment such as tyres with bad friction is

Page 47: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

16 CHAPTER 2. LITERATURE REVIEW

an example of an unsafe act. Examples of unsafe conditions are poor lighting

and distractions.

Crash causal factor analysis can be a time consuming task as it required

investigators to return to the crash site and collect information. The investi-

gator must be knowledgeable of the analysis process in order to be efficient.

The investigation consists of interviewing the driver involved in the crash and

will involve a minimum of two interviews. Once data is collected, it is then

analysed and unwanted data are eliminated. Some data may be missing or

insufficient for in depth analysis thus, the investigator has to return to the site

of the crash and interview the people involved again. This process continues

until events and causes of the crash are arranged in a step by step sequence.

A similar approach is a rule-based approach where data mining is preferred to

identify the combinations of contributing factors. The concept of data mining

is discussed in detail in the next chapter.

2.1.2 The causes of crashes

Road authorities investigating the contributing factors for road crashes have

published statistical reports and implemented interventions on road curves.

Generally, crashes occur due to factors related to the vehicle, surrounding

environment and the driver. As seen from Figure 2.1 on page 17, human

factors are believed to be the major contributing factors for crashes. CTRE

(2005) states that human factors contribute 96% to crashes and Shinar (2007)

concurs this by stating that 90% of crashes is due to driver error. Figure 2.1

illustrates the composition of the contributing factors in a single crash.

2.1.2.1 Road and environmental factors

In this research, road design is considered as a road related factor including

road curves. A road curve is defined as a change in alignment or change of

Page 48: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.1. CRASHES ON ROAD CURVES AND THE CAUSES 17

Figure 2.1: The three major contributing factors of road crashes (Shinar, 2007).

direction between two straight lines (Hanger, 2003). The change of direction

is too abrupt when two straight lines intersect, thus a curve is required to

interpose between the straight lines as a safety measure. Road curves are

normally circular curves, similar to circular arcs. Two major categories of

curves exist, namely horizontal curves and vertical curves. The layout of each

curve depends on the geographical landscape and surrounding buildings to

provide a safer driving road. The scope of this study focuses on horizontal

curves and is discussed in the next section.

Degree of curves The geometric design of horizontal curves can affect

the probability of a crash occurring. The degree of curve is the amount of

degree created by an arc measuring 30.48 metres or 100 feet as illustrated in

Figure 2.2 on Page 18. The degree determines the sharpness or flatness of

a curve. The curvature is represented with the radius of a curve or degree of

a curve. The smaller the radius, the sharper the curve will be (CTRE, 2005;

DOT, 2006). Sharper curves have higher potential crash occurrence as the

level of difficulty to negotiate along the curve increases. In addition, drivers

have less time to correct the drift off due to poor visibility of the upcoming

Page 49: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

18 CHAPTER 2. LITERATURE REVIEW

curve and the sharpness of the curve (Morena, 2003). Normally, design speed

and warning signs are imposed on the roadside to warn drivers. Unfortunately,

drivers tend to ignore them or are not aware of the warnings. As a results of

this, drivers be involved in curve-related crashes such as run-off road, head-on

collision, overturning or hitting other objects.

Figure 2.2: An illustration of the degree of curve (Highway, 2004).

Lane width The width of a lane can affect how drivers position their

vehicles on the road. A narrow road causes drivers to cross the centreline to

stay on the road. This can lead to head-on crashes with vehicles from the

opposite direction. Vehicles travelling on road curves tend to occupy more

road space than on straight roads.

Surface and side friction Road surface is a characteristic that has an

effect on vehicle safety. As the surface wears, the friction decreases and this

will affect the braking distance. A vehicle needs a longer braking distance

on low friction surfaces and this distance increases when the surface is wet.

Braking distance can be defined as a distance a vehicle will travel to reach a

complete stop.

Length of curve The length of a curve is the distance, d, shown in

Figure 2.3 on Page 19, between the start and the end of the arc. A short

length causes sudden change and affects the sight distance of drivers.

Page 50: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.1. CRASHES ON ROAD CURVES AND THE CAUSES 19

Figure 2.3: An illustration of the length of a curve (Highway, 2004).

d

Sight distance Sight distance can be defined as the length of roadway

ahead that is visible to a driver at a specific height. Figure 2.4 on Page

19 illustrates sight distance in a horizontal curve. Short sight distance is

dangerous as it can lead to slow brake reaction time. Brake reaction time is

the time between detection of an object on the road ahead and the application

of brakes (AECPortico, 2005). Roadside objects such as poles or trees cab

affect sight distance and judgement which increases injury severity.

Figure 2.4: An illustration of the sight distance in horizontal curve (AECPor-

tico, 2005).

Super-elevation Super-elevation is an inclined roadway that uses the

weight of a vehicle to create the required centripetal force for curve negoti-

ation. The frictional force between the vehicle’s tyres and the road surface

offsets the centrifugal force to prevent the vehicle from sliding out of the curve

(Environment & Works Bureau, 1997). Sharper curves require a steeper super-

elevation in order for vehicles to travel safely along a curve at a higher speed.

The amount of super-elevation depends on the design speed, degree of curve

and the number of lanes on the road. Figure 2.5 on Page 20 illustrates a

Page 51: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

20 CHAPTER 2. LITERATURE REVIEW

cross-section view of the super elevation.

Figure 2.5: A cross-section of a super-elevated horizontal curve (Environment

& Works Bureau, 1997).

Where

W is the weight of the vehicle.

R is the radius of the curve in feet.

v is the speed of the vehicle in m/s.

g is the gravity constant in m/s2.

F is the coefficient of sideways friction.

E is the super elevation in m/m which is equivalent tan θ.

N is the force normal to the road surface.

The stability while manoeuvring in the curve is defined as in Equation 2.1

E + F =V 2

127R(2.1)

where V is in km/h.

The maximum super-elevation for rural roads range from 0.06 m/m for flat

road to 0.10 m/m in mountain terrain. Urban roads have desirable maximum

values between 0.04 or 0.05 m/m.

Environmental-related factors are the next major contributing factor

Page 52: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.1. CRASHES ON ROAD CURVES AND THE CAUSES 21

for a road crash. Factors such as wet or slippery road surfaces, poor light-

ing, animals and traffic conditions contribute to crashes on road curves. For

example, roads with less friction resistance and debris can cause vehicles to

skid and lose control easily. The surrounding traffic conditions can also affect

a driver’s decision making and driving attitude. However, weather conditions

are unpredictable and can affect driving vision. For example, stormy and foggy

conditions can affect a driver’s vision of the road ahead. Therefore, warning

signs are needed to guide or warn drivers of hazards ahead. Providing drivers

with incorrect warning signs is another issue and some speed limit signs are

invalid as the limits are based on the speed criteria defined 50 years ago (Tor-

bic, Harwood, Gilmore, Pfefer, Neuman, Slack & Hardy, 2004). Hence, this

can be misleading to drivers trying to drive safely on curved roads.

2.1.2.2 Driver-related factors

Driver-related factors contribute to a high percentage of crashes as drivers

commonly make errors that lead to a crash. Human error occurs in situations

where a driver fails to achieve their desired outcome based on their planned

actions and has no correction plan for it (Reason, 2003). Human errors consist

of two kinds and are listed below (Reason, 2003):

1. Slips - when one’s actions do not proceed as planned.

2. Mistakes - where the plan has a problem in achieving the outcome.

While driving, the failure to achieve intended actions can lead to crashes.

An example of a common error made is misjudgement. Distraction and fatigue

are major factors for this and is caused unintentionally and unexpectedly.

Driving long hours and stressful working conditions can contribute to driver

fatigue. In addition, drivers’ misjudgement occurs when they either over-

estimate or under-estimate the sharpness of a curve and make errors when

Page 53: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

22 CHAPTER 2. LITERATURE REVIEW

turning the wheel. Even ignoring driving rules such as drink driving, speeding,

and not wearing a seat belt can increase the likelihood of mistakes and lead to

serious injuries. The following paragraphs will discuss these factors in detail

for example speeding, drink driving, driver’s age and fatigue.

Speeding Speeding is a major contributing factor, especially on crashes

on road curves (Torbic, Harwood, Gilmore., Pfefer, Neuman, Slack & Hardy,

2004; Liu, Chen, Subramanian & Utter, 2005). Excessive speed reduces a

driver’s ability to react and correct mistakes within a short time. In addition,

speeding reduces the driver’s ability to negotiate a curve and increases the

likelihood of losing control of the vehicle.

Furthermore, the severity of injuries increases with higher speed limits.

However, many drivers consider it acceptable to exceed 16km/h over the speed

limit (Corkle, Marti & Montebello, 2001). Most commonly, driver age, atti-

tude, gender, impairments, ‘running late’ and law enforcement influences a

driver’s choice of speed.

Alcohol-related Alcohol consumption can influence the ability of a driver

to control a vehicle and perform tasks such as braking and steering the wheel.

Driving under the influence impairs a driver’s decision making process such as

when they are not able to make a suitable judgements on speed and steering to

adjust the vehicle according to the road curve (NHTSA, 2006). This increases

the driver’s exposure to a crash while negotiating a road curve. Thus, drink

driving is seen as one of the major causes of crashes on road curves (Keall &

Frith, 2004).

Age of driver Age is a contributing factor and young drivers between

the ages of 17 to 24 years old have a higher chance of being involved in a crash

on road curves (QT, 2005). This group of drivers are involved in fatal crashes

Page 54: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.1. CRASHES ON ROAD CURVES AND THE CAUSES 23

twice as much as drivers between the ages of 25 to 59. Due to inexperience,

young drivers have less ability to recognise a hazard and together with poor

judgement and decision making they are unable to respond appropriately and

in turn are more likely to be involved in a crash.

Young drivers may also face strong peer influence, which can lead them to

drive recklessly and aggressively. Many young drivers accelerate on the road in

order to experience strong sensations and excitement (Machin & Sankey, 2006;

Machin & Sankey, 2008). This increase their chances of a crash especially if

they speed on a road curve.

Fatigue Fatigue can be defined as tiredness, weariness or exhaustion

(ALTS, 2004) and is usually influenced by time spent on driving, monotony

and time of travel (Cliff & Horberry, 2007). Dozing off behind the wheel is an

extreme form of fatigue. Drivers facing fatigue will experience a slow reaction,

reduced ability to concentrate and may take a longer time to interpret traffic

situations. Drivers will also have trouble keeping the vehicle within the lane,

drifting off the road, changing speed and not reacting in time to avoid haz-

ardous situations such as road curves.

In summary, speed, alcohol consumption, age and fatigue are among the

highest ranking factors that contribute to road crashes. Other human factors,

including emotions such as depression, sadness, aggressiveness, stress or any

mental stress can also affect the decision making and attention of a driver

(Fuller, 2005) thus, making human factors a major contributor to road crashes.

In addition, driving someone else’s vehicle can also increase the chances of a

crash due to unfamiliarity with operating the vehicle regardless of the age of

the vehicle (Haworth & Pronk, 1997).

Page 55: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

24 CHAPTER 2. LITERATURE REVIEW

2.1.2.3 Vehicle-related factors

Vehicle-related factors are the third group of contributing factors and it focuses

on vehicle defect or failure, which has been found to have minimal impact on

crash rates. An example of vehicle defect is worn out, punctured or no-thread

tyres, which reduces the friction with road surfaces. Another defect is poor

brake conditions which increases the braking distance and time needed for a

vehicle to come to a halt. In addition, a vehicle with a faulty air bag that

does not inflate during an emergency can add to the severity of a driver’s

injury. Many older and cheaper vehicles have fewer primary and secondary

safety features compared to the latest models (ATSB, 2004). This is evident

in the addition of highly sensitive air bags and intelligent technologies installed

in modern vehicles which is absent in older vehicles. Besides these defects, ve-

hicle size and mass can also affect the stability and control of a vehicle. In

summary, vehicle-related factors which contribute to crash rates are poor brake

conditions, vehicle stability and tyre conditions. Further information on ex-

isting non-information system related interventions can be found in Appendix

A.

Besides identifying the contributing factors of road curves, a further goal

is to reduce the number of crashes by employing interventions for hazards

relating to road curves. A review of existing interventions to determine crash

rates on road curves is discussed in the following sections.

2.2 Existing crash prediction models

Road authorities have studied ways to reduce the crash rates on road curves

such as using prediction models, intelligent transport system applications and

traffic simulators. This section reviews the existing interventions deployed to

reduce the number of crashes on road curves. The definition of horizontal road

curve and geometry are explained before discussing the existing interventions.

Page 56: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 25

2.2.1 Horizontal road curves

The purpose of a horizontal curve is to change the road direction to either the

right or left or when a road is changing direction at an intersection point be-

tween two lines, which are known as tangents. A sudden change of alignment

is dangerous for road safety. Therefore, it is necessary to introduce a curve

between the tangents to reduce the abrupt change of direction. The horizon-

tal curves exist in four variations:(1) Simple curve, (2) Compound curve, (3)

Reverse curve and (4) Spiral curve. This is covered in detail in Appendix A.

Road safety in road curves is influenced by the road design such as the

degree of curve, length of curve, land width, surface and side friction, sight

distance, and super elevation. These design factors are discussed in the fol-

lowing part of this chapter.

2.2.1.1 The basic horizontal curve geometry

Firstly, the basic geometry of a road curve is discussed. Figure 2.6 on Page

25 presents the basic geometry of a horizontal road curve (CTRE, 2006).

Figure 2.6: The geometry of a horizontal curve (CTRE, 2006).

Where :

Page 57: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

26 CHAPTER 2. LITERATURE REVIEW

R is the radius of the curve(in meters) and represents the tightness of a curve.

The standard definition is in Equation 2.2 (CTRE, 2006).

R =1746.4

D(2.2)

where D is the degree of the curve.

In Figure 2.6 on Page 25, PI stands for the Point of Intersection and is

the point at which the two tangents to the curve intersect.

I is the Delta Angle. This is the angle between the tangents and is also

equal to the angle at the centre of the curve.

PC stands for the point of curvature and is the beginning point of the

curve

PT is the point of tangency which is the end point of the curve.

T is the tangent distance which is the distance from the points PC to PI

or vice versa. T can be represented as such in Equation 2.3.

T = R tan1

2(2.3)

E is the external distance which is the distance from the point PI to the

middle point of the curve, M. E can be obtained with Equation 2.4.

E = R

(

1

cos∆2

− 1

)

(2.4)

M is the middle ordinate and is the distance from the middle point of the

curve to the middle of the chord that joins the points PC and PT. M can be

represented as in Equation 2.5.

M = R

(

1 − cos∆

2

)

(2.5)

LC is the long chord which is the distance along the line that joins the

points PC and PT. The length of LC can be obtained with the Equation 2.6.

Page 58: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 27

LC = 2R

(

sin∆

2

)

(2.6)

L is the length of the curve and is the arc between the points PC and PT.

L can be obtained with the Equation 2.7.

L = 100

(

D

)

(2.7)

where D is the degree of the curve.

Back tangent is the straight line that connects from the points PC and

PI for a progress to the right. The Forward tangent is the straight line that

connects from the points PI to PT. These two lines will be discussed more in

the clothoide section.

Lastly is the Deflection Angle (DA) from tangent to chord is half the

central angle of the subtended arc, hence, is defined as in Equation 2.8

DA =arclength

100×

D

2(2.8)

where D is the degree of the curve.

2.2.1.2 The clothoide

A road curve is able to join with a straight line (backwards and forward tan-

gents) smoothly due to the presence of a clothoide. A clothoide is a curve

which enables the driver to steer the vehicle gradually along the curve. Thus,

a road curve consists of a straight line followed by an entry clothoide and then

by an arc of the circle, an exit clothoide and another straight line. Figure 2.7

on Page 28 illustrates the position of the clothoide in a road curve (Herve,

2005).

The horizontal slope varies from 2.5% for the straight line to 7% for the

linking curve arc. A straight road followed immediately with a linking curve

Page 59: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

28 CHAPTER 2. LITERATURE REVIEW

Figure 2.7: The clothoide of a road curve (Herve, 2005).

causes the driver to turn the steering wheel abruptly in order to adjust the

trajectory of the vehicle along the curve. This sudden linkage is related to

the slope which increases suddenly from 2.5% to 7%. Hence, the clothoide is

interposed in between the straight lines and the curve arc to ensure smooth

and safe driving in the road curve.

For safety measures, the parameters of the clothoide to link the curve arc

can not be randomly chosen. They have to be based on several criterias for

example, the length of the clothoide is based on the radius of the curve arc.

The clothoide length is determined to enhance the sight distance for the driver,

so that he has improved vision of the approaching curve. For curve roads, the

safe clothoide length is commonly around 67 metres. The safety clothoide

length is defined in Equation 2.9

L = 6(

R0.4)

(2.9)

Where R is the radius

The defined parameters in Equation 2.9 are essential for designing a safe

road curve. This equation will be implemented in a traffic simulator for road

curves developed with a Matlab software program. The details of the simulator

will be discussed in later chapters.

Now that an understanding of a horizontal curve structure has been es-

Page 60: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 29

tablished, the following paragraphs discuss how the road design contributes to

crashes in a road curve.

2.2.2 Horizontal curve prediction models

The crash severity for curve-related crashes is higher than those that occur on

straight roads (Glennon, Neuman & Leisch, 1985). There are different methods

used to predict the number of crashes on curves. One of the prediction methods

is utilising crash prediction models to determine the likelihood of a crash or a

potential crash.

The first model is based on a two term relation where crash rate decreases

with increasing curve radius and the number of crashes decrease with increas-

ing curve length. This model is originally defined in a study by Glennon et

al.(1985) as a weak relation of the decreasing crash rate with increasing curve

length. Consider the case where a vehicle is travelling at a speed where the

lateral acceleration needed to negotiate the curve exceeds the surface friction.

From the road geometry point of view, a loss of control can happen and is due

to the presence of the curve and the radius but not the length of the curve.

Hence, this shows that the crash rate declines with increasing curve length

and is consistent with the first model (John & Gary, 2008). This detail will be

considered in the design of the traffic simulator. Details of the simulator are

explained in later sections of this chapter.

The second model is based on a single term relation where the crash rate

decreases with increasing curve radius (John & Gary, 2008). This is a sim-

pler and linear model. Krammes et al (1995) derived a linear model where

crash rate versus curvature based on 1,126 road curve sites in the United

States and a preliminary driver workload model was developed. Matthews

and Barnes (1988) also studied crashes on 4,666 curves on two-lane highways

in New Zealand and defined a model which is relatively consistent with the

Page 61: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

30 CHAPTER 2. LITERATURE REVIEW

one developed in the United States. It can be assumed that the New Zealand

experience is consistent with the Australian experience hence, the US linear

model can be applied to the Australian context (John & Gary, 2008).

The following subsections will explain the two most common horizontal

curve prediction models: Glennon’s and Zegeer’s models.

2.2.2.1 Glennon’s horizontal curve model

Glennon’s model estimates the crash reduction when the horizontal curve is

flattened while maintaining the lines of tangency or central angle (McGee,

Hughes & Daily, 1995).

a. Model definition This model uses the length of highway segment,

curvature in degree, curvature corresponding to the new alignment and the

length of the curved component. The model is presented as in Equation 2.10

(McGee et al., 1995):

∆A = ARδ (∆L) V + 0.0336 (∆D) V (2.10)

where

∆A = the net reduction in crashes.

∆L = change in the highway length.

∆D= change in degree of curvature.

V = curvature in degrees.

ARδ = crash rate compared to straight roads.

b. Input factors

Page 62: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 31

• Road and Environment

The factors taken into account for prediction are: change in degree of

curve, highway length, curvature in degrees and crash rate.

• Vehicle

None of the vehicle-related factors are considered for prediction.

• Driver

None of the driver-related factors are considered for prediction.

c. Area of application This model is applicable for horizontal curves

on highway roads.

d. Model weakness Glennon (1995) states that based on the findings

from the model, the traffic volume and lane width have little effect on the

crash prediction. This model is only applicable for highway segments.

2.2.2.2 Zegeer’s horizontal curve model

Zegeer’s model is used to estimate the number of crashes on individual hori-

zontal curves on two lane rural roads.

a. Model definition The definition of the model is as in Equation 2.11.

A = [(1.552) (L) (V ) + (0.014) (D) (V )

− (0.012) (S) (V ) (0.978w) − 30 (10)(2.11)

where

A = total number of crashes on the curve in a five-year period

L = length of curve in miles

V = volume of vehicles in millions

Page 63: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

32 CHAPTER 2. LITERATURE REVIEW

D = degree of curve

S = presence of spiral, 0 for no spiral exists and 1 for an existence of a spiral.

W = width of the road.

b. Input factors

• Road and Environment

The factors taken into account for prediction are: length of curve, degree

of curve, road width and traffic volume.

• Vehicle

None of the vehicle-related factors are considered for prediction.

• Driver

None of the driver-related factors are considered for prediction.

c. Application on horizontal curve The second model is applicable

for curves on highways and curves with no available crash data.

d. Model weakness This model does not consider road side parameters

in the prediction. The model only evaluates individual curves, therefore it is

not able to evaluate highway sections with varying alignments.

Table 2.1 presents a summary of the Glennon’s and Zegeer’s model.

The availability of crash prediction models for horizontal curves are limited.

Majority of the models available are designed to predict crashes for highways,

intersection or for black spot areas. Although there are crash prediction mod-

els for horizontal curves, they are designed mainly for use on highways. In

addition, Glennon’s and Zegeer’s models consider road and environmental fac-

tors but neglect to consider other factors such as road side parameters, vehicle

Page 64: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 33

Table 2.1: A summary of crash prediction models for horizontal curves.

features Glennon’s Zegeer’s

Purpose Estimates the crash Estimates the number

reduction when the horizontal of crashes on individual

curve is flatten while horizontal curves on two

maintaining the lines lane rural roads.

of tangency or

central angle

Input

factors

Road and Env Yes Yes

Vehicle No No

Driver No No

Application Road segment Highway Highway

Model Only applicable for Does not consider road

weakness highway segments. side parameters and unable

to evaluate highways

sections with varying

alignments

and human-related factors. Therefore, there is a need to understand the effect

of contributing factors on road curve crash severity, taking into account the

wide range of other potential contributing factors.

2.2.3 Data mining techniques in road safety

Road safety can be improved with the application of data mining techniques.

Data mining can be defined as a process that extracts knowledge by analysing

data to discover hidden patterns and dependencies in the database (Hand,

Mannila & Smyth, 2001; Berthold & Hand, 2003).

Data mining techniques can be used to predict a driver’s behaviour in

order to rectify unsafe actions (Krishnaswamy, Loke, Rakotonirainy, Horovitz

& Gaber, 2005). Researchers have performed traffic studies and investigated

Page 65: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

34 CHAPTER 2. LITERATURE REVIEW

a method to predict the occurrence of a crash. Pande and Abdel-Aty (2006)

applied data mining techniques to predict rear-end crashes on highways and

warn drivers about potential crashes 5 to 10 minutes prior to a crash. The

techniques are used to identify and classify different categories of crashes and

conditions that are prone to rear-end crashes. The classifications are then

used to define a prediction model. The model is used in real-time with the

use of loop detectors to assess the probability of crashing. This study shows

that data mining techniques can be used to identify the causes of a crash and

simultaneously predict and warn drivers of unsafe actions according to the

surrounding conditions.

2.2.3.1 VEDAS

The data mining techniques are applied further in vehicles. One such applica-

tion is VEDAS which is a mobile and distributed data stream mining system

for real-time vehicle monitoring. It is designed to be a data mining system

that uses an on-board data stream mining and management system. This

allows VEDAS to perform pre-processing of the incoming data stream to re-

duce dimensionality generated by Principal Component Analysis. At the same

time, it allows the system to carry out analysis of data streaming from various

sensors in most modern vehicles. VEDAS monitors two aspects of driving:

1. Vehicle health

Vehicle-health monitoring involves obtaining data from different parts of

the vehicle such as the air filter and engine spark plug. Data collected are

compared to a safe operating regime in the server at the control station.

Monitoring is a continuous process in order to detect changes and to

re-compute and display latest results in real time.

2. Driver characteristics

Page 66: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 35

This involves the detection of unusual driving patterns such as drowsy

driving, and drink driving. Kargupta et al. (2004) used supervised

learning with known characteristics of drink driving such as speed and

steering wheel angle to discover both types of driving patterns.

VEDAS is also capable of reporting emerging patterns to the fleet managers

at a central control station over low-bandwidth wireless network connection.

The drawback of VEDAS is that it does not have the situation awareness

feature to capture contextual information of on-road conditions to improve

its response accuracy. In addition, it does not support supervised learning

where data can be processed faster in real-time situations using a classifica-

tion algorithm. VEDAS is limited to only mine data from Global Positioning

System (GPS) navigation. Hence SAWUR is introduced for vehicles (Salim,

Shonali, Loke & Rakotonirainy, 2005) which stands for Situation-Awareness

With Ubiquitous data mining for Road safety (Krishnaswamy et al., 2005).

2.2.3.2 SAWUR

SAWUR is an Advanced Driving Assistance System (ADAS) that incorporates

and uses ubiquitous data mining (UDM) to analyse contextual information

related to driver behaviour, environment, driver profile and condition of the

car in real-time. SAWUR addresses the issues of VEDAS and develops the

ability to manage user interaction and visualisation of results on a limited

screen. It also efficiently represents and communicates data mining models

via a wireless network. The general idea of SAWUR is a system which has

the ability to determine the current situation the driver is facing and using

on-board unsupervised learning techniques to analyse and provide appropriate

actions to avoid danger.

Salim et al.(2007) use this concept and defines a model to predict poten-

tial collision at four-leg cross intersections. The model uses data mining to

Page 67: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

36 CHAPTER 2. LITERATURE REVIEW

understand the cause of collision from historical data and sensor data in order

to recognise holistic situations at road intersections. This shows that the de-

tection of driver behaviour could improve with the use of historical data and

learning from the knowledge obtained.

Besides getting input on the driving context, using past information of a

crash can be helpful in understanding the pattern of the causes and conse-

quences. Hence, the application of data mining and the type of data mining

technique used is dependent on the format of the data. Other than using data

mining to predict the number of crashes, traffic simulators are able to simu-

late and visualise the crash and possible outcomes. The capabilities of traffic

simulators are discussed in the next section.

2.2.4 Traffic simulators

Simulation is a dynamic representation of a certain part of a real world which is

achieved with a computer model that moves in progress with time. Simulator

tools are usually used in traffic engineering to aid engineers in identifying

possible road designs and traffic flow issues. Traffic simulators are widely

used in research, planning, development, training and demonstration of traffic

system design.

Traffic simulators are used to achieve a better understanding of a problem

and the factors involved. Simulators are also used to determine the effects of

control measures and new traffic rules such as speed limits, and restrictions

on lane changing and overtaking for certain sections of a road. In addition,

simulators can be used to discover the effect of a new infrastructure before it is

built (Treiber, 2008). In summary, the reason for using a simulator is to test,

evaluate and determine a solution without building new infrastructure. This

is beneficial for research and training for the people involved.

For this road curve study, a simulator is needed to validate the contributing

Page 68: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 37

factors obtained using data mining techniques.

Traffic simulators can be classified into macroscopic, microscopic and meso-

scopic simulators. A macroscopic simulator models a section of a place rather

than individual vehicles. Such a simulator was developed to model traffic

on highways, rural highways, surface-street grid networks and arterial roads.

Thus, a macroscopic simulator focuses on the flow, speed and the density of

the traffic at specific locations in a place. Examples of macroscopic simulators

are FREFLO, AUTOS, METANET and VISSIM (Chu, Liu & Recker, 2003).

A microscopic simulator models the movement of individual vehicles de-

rived from the car-following and lane-changing theories. Examples of micro-

scopic simulators are AutoTURN, PARAMICS, CORSIM, VISSIM, AIMSUN,

and HUTSIM. The microscopic model is useful in evaluating traffic congestions,

complex geometric design and system-level impacts of proposed improvements,

where other tools have limitations in performing. Although microscopic sim-

ulators are helpful and widely used nowadays, it is time consuming to build

such a model, difficult to calibrate and costly.

Lastly is the mesoscopic simulator which has the properties of macroscopic

and microscopic simulators. This simulator is less reliable; however, it has

more superior properties compared to the typical planning analysis techniques.

Examples of macroscopic simulators are DYANSMART, DYANMIT, INTER-

GRATION, and METROPOLIS.

These simulators can be further classified into deterministic and stochastic

types of simulators. A deterministic simulator is one that will always produce

the same results for the same set of inputs. On the other hand, the stochastic

simulator will produce dynamic results given the same set of inputs.

This section reviews microscopic traffic simulators as they are widely used

and are an essential tool for traffic engineering. Traffic simulators are used

to resolve challenges in traffic control research. In addition to determine a

simulator that is suitable for simulating the contributing factors that are looked

Page 69: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

38 CHAPTER 2. LITERATURE REVIEW

into in the research. The basic selection criteria of a suitable simulator are:

• User friendly,

• Have the ability to construct a curve with ease; and

• Have the capability to reflect the scenario with driver, vehicle and envi-

ronmental contributing factors configured in the simulator.

Other criterias related to the infrastructure are:

• Computing capability required to run the simulator,

• Visual design; and

• Extra components required.

The following list describes the different types of microscopic traffic simu-

lators and their capabilities and limitations.

2.2.4.1 CORSIM

CORSIM is a simulator that integrates two traffic simulators: FRESIM and

NETSEIM. FRESIM models freeway traffic while NETSIM models urban

street traffic. Therefore, CORSIM is able to simulate highways, urban streets

and networks.

a. User Interface The integrated, Windows-based interface is provided

when CORSIM operates in a software environment called the Traffic Software

Integrated System (TSIS). The output processor of TSIS, the TRAFVU, gen-

erates the network view graphically and the performance with animation.

b. Ability to construct curves CORSIM is able to simulate different

intersection controls, traffic flow control and model surface geometry such as

the number of lanes (Bloomberg & Dale, 2000).

Page 70: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 39

c. Inputs type CORSIM provides tools to build the network and observe

the animation. The network design is based on images such as digital maps.

d. System requirements CORSIM only requires Microsoft Windows

and Microsoft Internet Explorer which are easier to obtain than Linux for

instance.

e. Model weakness The simulator does not take into consideration the

weather conditions as a parameter for simulation.

2.2.4.2 AutoTURN

AutoTURN, designed by Transoft Solutions, is the most popular CAD-based

simulator that imitates vehicle turn and performs swept path analysis. Auto-

TURN is used to assess and evaluate vehicle manoeuvres and spatial require-

ments for the designs of all road types such as intersections, roundabouts, bus

terminals, loading bays or any street projects involving access, clearance and

manoeuvrability checks.

a. User Interface The simulator has a graphic-driven user interface

that incorporates dialogue boxes and menus for road designing. The interface

allows animation of the turning manoeuvres after the design is complete.

b. Ability to construct curves The vehicle path is generated with the

SmartPath tools within AutoTURN and contains four drive modes: generate

arc path, over steer corner, corner path and steer a path. In addition, the

tools are able to simulate forward and reverse vehicle turning manoeuvres at

the same time incorporating engineering algorithms to account for factors such

as speed, super-elevation, lateral friction and turn radii.

Page 71: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

40 CHAPTER 2. LITERATURE REVIEW

c. Inputs type The simulator also allow users to create all vehicle types

which includes automobiles, emergency and service vehicles, buses and trucks

from different countries such as Australia, Canada, France, New Zealand,

United Kingdom and United States.

d. System requirements The operating systems required for worksta-

tion and network are Windows 2000, XP or Vista and Windows Server 2000

or 2003. The platform requirements to work with are the latest Autodesk

AutoCAD, Bentley MicroStation platforms and MicroStation V8.1, V8 2004

(V8.5), XM (32-bit). The languages available in the simulator are English,

French, German and Spanish.

e. Weakness AutoTURN is not suitable as it does not integrate driver’s

behaviour. Moreover, AutoTURN requires AutoCAD to perform which is not

really accessible. As AutoTURN performs with the AutoCAD platform, the

simulation interface is presented with an AutoCAD outlook.

2.2.4.3 PARAMICS

PARAMICS (PARAllel MICroscopic Simulation) is a powerful tool developed

by Quadstone Limited to model complete real world traffic and transportation

problems and provide information on the traffic flow. This model is able to

replicate a city traffic network and simulate 200,000 vehicles in a large traffic

network. This is a large capacity of vehicles compared to other traffic simula-

tors which can only simulate lesser vehicles. For example, VISSIM is able to

simulate a maximum of 1,200 vehicles at a time for a network.

Another feature of PARAMICS is that it allows further customisation and

extension of many features of the simulator with the Application Programming

Interfaces (API). An API allows a user to overwrite the default models in the

simulator and also to interface complementary modules to the simulator (Chu

Page 72: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 41

et al., 2003).

a. User Interface PARAMICS has a graphical user interface to build

a network and observe the animation. The network layout is defined based

on the map images imported into the simulator. However, the interface is not

well designed nor user friendly.

b. Ability to construct curves Due to the use of API, PARAMICS

can construct road curves.

c. Inputs type PARAMICS allows calibrating parameters such as data

related to the network geometry, vehicle parameters and the driver behaviour

such as aggressiveness and awareness levels.

d. System requirements PARAMICS was designed for a variety of

platforms including Windows and other computer operating systems, although

it was developed to run on a Unix box (Oketch & Carrick, 2005).

e. Model weakness PARAMICs is neither well designed nor user-

friendly compared to other traffic simulators.

2.2.4.4 VISSIM

VISSIM, (German for Traffic in Towns Simulation) was developed at the Uni-

versity of Karlsruhe, Germany during the early 1970s (Bloomberg & Dale,

2000). VISSIM is a powerful microsimulation tool which has the ability to

model complex traffic flow in urban areas and inter-urban motorways in a

graphical manner. The road and network designs are based on maps or aerial

photos imported into the simulator.

Page 73: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

42 CHAPTER 2. LITERATURE REVIEW

The simulator allows the ability to model all modes of transportation such

as bus transit, light rail, heavy rail, rapid transit, general traffic, cyclist and

pedestrians. This model is able to analyse the traffic impacts of traffic oper-

ations before actually implementing the system. Thus, it gives an idea of the

implementation costs involved and how it can be better managed (AECOM,

2008).

VISSIM allows a number of calibration parameters to be configured close

to local conditions. The configuration can be the speed behaviour such as the

desired speed distribution, acceleration and deceleration that reflects the real

world, vehicle parameters that represent the technical abilities of the desired

vehicle type and signal control logic configured to the desired condition. These

configurations can be reproduced in the simulator. The simulation provides a

range of the measure of effectiveness such as the travel time, number of stops

and the delay and queue lengths.

VISSIM is applied in different designs such as the capacity analysis of

bus priority schemes, analysis of toll plaza facilities, traffic impact studies

for shopping centres, impact analysis of route guidance systems and variable

message sign systems (AECOM, 2008).

a. User Interface The simulator has an intuitive and easy to use graph-

ical network editor for creating the networks, vehicles and environment based

on the maps imported into the simulator (PTV, 2009). The simulator has the

capability in providing a variety of animations such as 3D display of the vehicle

movements from a driver’s seat, 2D and 3D, visual vehicle movements within

the network, creating AVI clips in VISSIM and lots more (PTV, 2009).

b. Ability to construct curves The graphical editor is able to model

roundabouts and intersections of any kind of geometry in high detail. The

modelling of the arc of the curve is not mentioned.

Page 74: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 43

c. Inputs type The inputs imported into the simulator are digital maps

for reconstructing road networks and environment of inter urban and urban

areas. Information about vehicles, driving behaviour and traffic volume closely

reflect the real world.

d. System requirements The simulator is compatible with Windows

operating systems.

e. Model weakness VISSIM does not take into consideration weather

conditions as one of the calibration parameters. In addition, the average num-

ber of vehicles limited on a traffic network is 1,200 vehicles. The construction

of the curve arc is complex as it takes into consideration a variety of parameters

in order to define a safe curve.

CORSIM, PARAMICS and VISSIM import digital maps or background im-

ages to create the network design. Most simulators are able to model networks

of different geometries based on the maps or images. Road curve geometries

can be constructed with most simulators except for CORSIM. Thus, CORSIM

will not be considered for this research.

All simulators provide a graphical user interface to model, edit and simulate

the network. However, PARAMICS is neither well-designed nor pleasant to

use compared to other traffic simulators. Thus, PARAMICS is not a tool that

will be considered for this study.

Traffic simulators are used to monitor and analyse the traffic flow or analyse

the traffic signal control. The possibilities of simulating a crash on a road

curve are low as none of the simulators are able to replicate many crashes

simultaneously. This is due to the vehicle or driver behaviour model used in

the simulator, for example PARAMICS and VISSIM have a speed distribution

model and a lane changing behaviour model to avoid the crashes.

All simulators allow the flexibility to configure and reflect the driver or

Page 75: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

44 CHAPTER 2. LITERATURE REVIEW

Table 2.2: A summary of the simulators based on the features.

Simulators and features CORSIM AutoTURN PARAMICS VISSIM

Input GUI Yes Yes Yes Yes

Parameters

Driver No No Agrsv CF

Awareness Speed

LC

Vehicle Yes Yes Yes Yes

Environment No No No No

(eg. Wet road,

friction)

Road geometry Road curve No Yes Yes Yes

Output Animation 2D,3D 2D 2D,3D 2D,3D

Crash No No No No

simulation

System

requirement

OS Win Win Win,Unix Win

Extra software Internet AutoCAD, No No

explorer MicroStation

Legend:

Agrsv = Aggressiveness.

Win = Windows.

CF = Car following.

LC = Lane changing.

vehicle parameters in the simulator however, none have the flexibility to con-

figure the environmental factors. Therefore, a simulator is needed to imitates

crashes on road curves.

2.2.5 Driver behaviour model

Modelling driver behaviour is an interdisciplinary study which involve fields

such as psychology, robotics, control theory and statistics (Oliver & Pentland,

2000). A comprehensive driver behaviour model requires a thorough under-

Page 76: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.2. EXISTING CRASH PREDICTION MODELS 45

standing of the subject matter and needs to have the capability to generate

and explain differing characteristics. This section explains driver behaviour

modelling in two aspects: the Psychology and Statistical approaches to model

driver behaviour and estimated crash risk.

2.2.5.1 Psychology-Based Driver Behaviour Models

When a driver begins to drive on the road, the probability of being involved

in a crash is unpredictable, so the focus of the driving task is to avoid crashes

and the conditions that delay the avoidance response (Vaa, 2000). The driving

task has traditionally been characterised into three different levels (Michon,

1985) namely:

1. Strategic- which involves route planning to reach a particular destina-

tion.

2. Tactical- a level where drivers make manoeuvre decisions while driving

to achieve short-term objectives

3. Operational- where the selected manoeuvre is carried out by the driver.

The driving decisions made by a driver increases its probability in being

involved in a crash. However, this is not the only factor that delays an avoid-

ance response. According to Wilde’s (2000) Risk Homoeostasis Theory (RHT),

estimation of risk of crashing consists of objective risk, subjective risk and feel-

ings of risk. These estimates vary for each individual and each has an intended

level of acceptable risk. When one of the acceptable risk level decreases, a cor-

responding acceptable risk level will either increase or decrease. A renowned

research study is the Munich Taxicab study, where half of the taxi drivers had

Anti-Brake Systems (ABS) installed on their taxi and the other half did not.

The results obtained showed that drivers with ABS increased their acceptable

risk level as they assumed that ABS can lower the actual risk. On the other

Page 77: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

46 CHAPTER 2. LITERATURE REVIEW

hand, taxi drivers without the ABS, have lower acceptable risk level and drove

more carefully.

Figure 2.8: An illustration of the Task-capability for driver behaviour in psy-

chology studies (Fuller, 2005).

Where:

C is control or capability.

D is the decision.

Page 78: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.3. INTELLIGENT TRANSPORT SYSTEM APPLICATIONS 47

Fuller (2005) proposes a Task-capability interface model that is able to

measure the probability of an individual failing a task. Factors taken into

account are the demand of the task and one’s capability to execute the task

with each primary factor having several contributing sub-factors. For example,

if the capability of an individual exceeds the task demand, then it is considered

as an easy task and vice versa. Figure 2.8 on Page 46 presents an illustration

of the theory of the Task-capability interface model.

2.3 Intelligent Transport System applications

Road crash injury is believed to be preventable and predictable as it is a

human-made problem amenable to rational analysis and countermeasures (World Health,

2004). Road crashes are a concern for OECD member countries, which include

Australia. As a result, ITS technology such as collision avoidance, driver sta-

tus alert, speed control and automated enforcement are emerging practices to

reduce the number or the severity of road crashes. Although new technologies

are being developed, there remain considerable challenges to be overcome in or-

der to achieve crash reduction (OECD, 2003). Billions of dollars are currently

being spent to develop new technologies that are not related to safety. This

results in a negative impact on road safety if action is not taken to improve

the current situation.

The World Report on Road Traffic Injury Prevention states that Intelligent

Transport Systems (ITS) could reduce fatalities and injuries by 40% across the

Organisation for Economic Co-operation and Development (OECD), thereby

saving over US$ 270 billion per year (OECD, 2003). The Australian Transport

Safety Bureau (ATSB) reports that ITS should bring benefits with a total of at

least $14.5 billion by 2012. Of this amount, $3.8 billion is estimated to be sav-

ings due to safety improvements (ATSB, 2004). Therefore, a better approach

is to utilise technology with existing engineering intervention to enhance road

Page 79: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

48 CHAPTER 2. LITERATURE REVIEW

safety. Modern vehicles are equipped with various safety features to ensure the

safety of the driver and passengers. The safety features can be divided into

two main categories – Passive safety and Active safety.

1. Passive Safety Features Passive safety is a safety feature which

minimises injury severity and helps keep the driver and passengers alive in the

event of a crash. The following paragraphs list different features of passive

safety features (Bishop, 2005).

a. Seat belts Seat belts are legally required to be installed in vehicles.

Seat belts are able to prevent the driver or passengers from being thrown

forward or out of the vehicle during a crash.

b. Front air bags Air bags are safety features with the purpose of

cushioning a person’s body from impact. They are installed at driver and pas-

sengers’ seats to prevent occupants from hitting the steering wheel, dashboard

and windshield.

c. Side air bags The side air bags protect the occupant’s head and

prevent injuries during roll-over crashes. They are installed above the doors

and deploy downwards to cover the windows.

2. Active Safety Features

Active safety features are designed to prevent a crash from occurring and make

driving safer so vehicles may have one or more of the following safety features

installed (Bishop, 2005).

a. Anti-lock braking system Anti-lock Braking System(ABS), also

known as Emergency Braking Assistance (EBA), is usually coupled with Elec-

tronic Brake-force Distribution (EBD), which prevents brakes from locking

and losing traction while braking. This can shorten the stopping distances in

almost all cases.

Page 80: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.3. INTELLIGENT TRANSPORT SYSTEM APPLICATIONS 49

b. Electronic Stability Control Electronic Stability Control (ESC)

is designed to aid the handling of a vehicle, especially when the sensors of a

vehicle detects a possible loss of control.

c. Adaptive cruise control Adaptive Cruise Control (ACC) is a sys-

tem which controls the speed of a vehicle automatically. A driver can set and

maintain a speed throughout the driving trip.

ITS can improve road safety by reducing the likelihood of an occurrence of a

crash and as a result reduce the injuries associated with crashes and the driver’s

exposure level to the road environment (ATSB, 2004). The focus in Intelligent

Transport Systems (ITS) is the issue of monitoring on-road situations and real-

time decision making in order to reduce accidents and fatalities. For example,

the smart cruise controller in modern vehicles can assist drivers to drive safely

as it can monitor the environment and adjust the vehicle speed accordingly.

The following section discusses ITS applications that are designed to resolve

an issue or contributing factors that lead to a road crash.

2.3.1 Interventions for Speeding

a. Curve Speed Warning

Curve Speed Warning (CSW) is an Intelligent Transport System (ITS) appli-

cation for curved roads (Bishop, 2005). CSW can warn drivers when they are

travelling too fast to safely negotiate an upcoming curve. Figure 2.9 on Page

50 illustrates an example of how CSW works. Bishop (2005) categorised CSW

into two groups:

• Digital map approach

This is a simple approach which uses a digital map as a navigation sys-

tem to determine the current vehicle position and the road geometry

information. CSW is then able to estimate a safe speed to negotiate a

Page 81: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

50 CHAPTER 2. LITERATURE REVIEW

curve in a typical road condition. When the actual vehicle speed exceeds

the recommended speed, CSW either issues a reduce speed alert to the

driver or reduces the speed automatically. BMW has designed an active

accelerator which confers a slight resistant feeling to inform the driver to

slow down and prevents drivers from accelerating further (Bishop, 2005).

• Infrastructure-oriented approach

In Japan, Advance Cruise-Assist Highway System Research Associa-

tion (AHSRA) looks into an infrastructure-oriented approach to provide

warnings to drivers in hazard locations (Bishop, 2005). Speed detectors

and road-vehicle communications equipments are installed prior to the

curve and warnings are sent directly to the drivers when they are driving

too fast. This system is evaluated at several hazardous locations and

testing is still ongoing (Bishop, 2005). This system relates to the speed-

ing problem on road curves and will be helpful in reducing the number

of crashes due to speeding on curves.

Figure 2.9: An illustration of the Curve Warning System (Gazill & Robe,

2003).

Using a digital map to detect road geometry and provide a speed estimate is

not sufficiently reliable and accurate as the maps may contain errors on location

of the vehicles. This causes sensors such as GPS to read inaccurate information

such as the road geometry. Inaccurate road geometry information can result

in erroneous curvature estimate and safe speed estimate hence, providing false

Page 82: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.3. INTELLIGENT TRANSPORT SYSTEM APPLICATIONS 51

alarms to drivers. Sensors can be installed to sense more contextual infor-

mation such as the street width, visibility, weather conditions, driving style,

surface quality and shoulder detection. However, the infrastructure-oriented

approach is only for particular hazardous road sections. This approach is not

effective enough as it only provides general warnings before entering a curve

and only when a driver is speeding.

2.3.2 Intervention for Sight distance

a. Adaptive Front Lighting

Adaptive Front Lighting system (AFS) illuminates the road ahead and the side

of the vehicle path in order to optimise the visibility for the driver at night.

A basic system takes into account the speed to create the desired illumination

for the driver. A more advanced system takes into account the steering angle

data and speed, along with a swivelling lamp to automatically illuminate a

wider angle of the path ahead. In addition, the next generation AFS will

utilise data from GPS and digital maps to have the ability of recognising any

upcoming road curves. This enhanced AFS can provide proper illumination

before entering and when driving through a road curve. Overall, AFS looks

into the night time visibility issue and improves 90% of the driver’s view ahead

and to the side. Other than that, the enhanced AFS can be helpful in road

curves, including sharp curves.

This is a helpful application for night time on road curves where light beams

can be adjusted to illuminate a wider angle ahead. However, the performance

of AFS depends on the speed and steering angle data and also may vary when

a driver is driving at a high speed or when the weather conditions affects

visibility.

Page 83: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

52 CHAPTER 2. LITERATURE REVIEW

2.3.3 Interventions for road curvature

a. Shift Control System with Navigation System

This shift control system in collaboration with a navigation system, shifts the

appropriate gear position according to the road curvature which is based on

the information from the navigation system, driver’s operation and the slope

of the road. Toyota has such a technology called NAVI.AI-Shift (Amemiya,

2004), which uses the information from the navigation system to detect up-

coming road curves and the road condition to estimate in three-dimensional

model. When a driver releases the acceleration pedal before entering the curve,

the shift will automatically adjust from 5th gear to 3rd gear. The 3rd gear

adjustment will be maintained during the manoeuvre on the road curve until

the vehicle leaves the curve. Figures 2.10 and 2.11 on Page 53 provide brief

ideas of how the system works.

Figure 2.10: An illustration of how a shift control system with navigation

system can help a driver in road curves (Amemiya, 2004).

This system can be useful as it can foresee a curved road and manage the

gear in order to prevent the driver from accelerating in a sharp curve. However,

it does not consider the traffic ahead and the driver’s behaviour at that point

of time.

b. Curve Overshooting Prevention Support System

Currently in its research phase, a research team of Yamaha in Japan has

Page 84: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.3. INTELLIGENT TRANSPORT SYSTEM APPLICATIONS 53

Figure 2.11: An illustration of how shift control system with navigation

system can help a driver in road curves (Amemiya, 2004).

equipped a research-use motorcycle, Yamaha ASV-2, with a curve overshoot-

ing prevention support system. This system communicates information about

the shape of the curve to the driver, especially in curves with poor visibility

(Yamaha, 2000). Further work is still in progress in Japan carried out by

ITS teams such as Advance Cruise-Assist Highway System Research Associa-

tion (ASHRA), the Ministry of Land, Infrastructure, and Transport Advanced

Safety Vehicle (ASV) study group.

This system only applies to motorcycles and not automobiles and can be

useful for riders manoeuvring on a road curve.

2.3.4 Intervention for vehicle stability

a. Electronic Stability Control

As mentioned previously, Electronic Stability Control (ESC) is a safety feature

that helps drivers to maintain control over a vehicle. ESC combines anti-lock

brake, traction control and yaw control technology to provide safety to drivers

(VicRoads, 2007). Each wheel has a speed sensor, independent braking and

additional sensors to monitor the driver’s steering, which can detect if a driver

is losing control. Loss of control normally occurs on slippery roads or when

speeding, particularly on road curves. When a driver enters a curve at a

high speed the vehicle may spin out of control. A vehicle mounted with ESC

Page 85: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

54 CHAPTER 2. LITERATURE REVIEW

detects the situation and brakes the individual wheels automatically to keep

the vehicle under control (VicRoads, 2007).

Studies indicate that ESC is most effective in reducing fatal single-vehicle

crashes because these crashes happen due to loss of control and happens, in a

greater part, on curves. In June 2006, the prestigious Insurance Institute for

Highway Safety (IIHS) concluded that ESC can save 10,000 lives a year. Fur-

thermore, ESC can reduce fatal single-vehicle crashes by approximately 56%

and 41% for all single-vehicle crashes. A summary of the systems is shown in

Table 2.3.

Table 2.3: A summary of the crash prediction models for horizontal curves.

Factors Features System

Road &

Env

Sight Distance AFS

Curvature SCSN COPSS

Human Speeding CSW

Vehicle Stability ESC

Where:

AFS = Adaptive Front Lighting System

SCS = Shift Control System with Navigation System

COPSS = Curve Overshooting Prevention Support System

CSW = Curve Speed Warning

ESC = Electronic Stability Control

All the active safety features mentioned in this chapter, aim to reduce

crashes on road curves from the warnings provided to the drivers. The com-

mon information which safety applications use include speed, steering angle,

road geometry from the navigation system and the current vehicle location.

This information is used to determine the probability of a crash and provide

appropriate interventions to prevent the occurrence of one. However, the men-

Page 86: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.4. RESEARCH DIRECTION 55

tioned applications do not consider humans as a contributing factor or whether

the interventions provided to the drivers are suitable for the situation. Only

Lane Departure Warning System (LDWS) consider the human aspects such

as fatigue and distraction when assessing the likelihood of a crash. Further

research is being conducted and researchers are considering more factors or

elements from the surrounding situation to improve the accuracy of the in-

vehicle applications. Hence, several emerging technologies are being designed

with a situation awareness capability that can reason and will provide a solu-

tion to the current situation. The situation awareness concept will be briefly

explained in the next section.

2.4 Research direction

Glennon’s and Zegeer’s crash rate prediction models consider road and envi-

ronmental factors however, they do not take into consideration factors such as

road side parameters, vehicle and human-related factors. Therefore, this is an

area to explore further to determine the contributing factors with wider data

source and techniques. Wong and Chung (2007) study shows that assessing

with more factors improves accuracy.

Data mining techniques have been used (Wong & Chung, 2007; Kuhlmann,

Ralf-Michael, Lubbing & Clemens-August, 2005; Singh, 2001a) to identify

the contributing factors and the relationships between them. The existing

approach to identify contributing factors involve numerical data only. Thus,

when involving crash descriptions, text mining is proposed and consequently

this will help identify more contributing factors from crash descriptions.

Existing studies (Wong & Chung, 2007; Singh, 2001b) which examine the

relationship between the contributing factors, only relate one individual factor

to another specific one. Thus, the relationship is specific to the assigned factor

and this limits the understanding of the other possible relationships. Hence,

Page 87: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

56 CHAPTER 2. LITERATURE REVIEW

there is a need to better understand the complex relationship between these

factors.

The existing simulators are powerful however, most simulators do not in-

corporate driver-related factors and have restrictions in simulating crashes on

road curves. The limitations are critical and important for this research, as

none of the simulators meet all of the selection criterias. Thus, a traffic simu-

lator to simulate crashes on road curves based on the results from data mining

techniques is required to advance research in this area.

ITS applications are designed to aid drivers and reduce the chances of a

crash when travelling on road curves. However, the applications are not com-

plete as not all of the contextual data are considered for the crash analysis.

Existing studies such as ADAS and SAWUR enforce the analysis with situ-

ation contextual data and analysis in real-time with data mining techniques.

No existing ITS application for road curves uses complete contextual data to

analyse data in real-time with data mining techniques. This is evidence that

more information should be used in the analysis to increase accuracy.

Therefore, the proposed approach will aim to understand the complex re-

lationships between the contributing factors and its effect on crash severity on

road curves. The understanding of the contributing factors will identify causes

which may contribute towards changes in road design or interventions and this

in turn will reduce the number of crashes on road curves. Data mining tech-

niques will be used to identify the contributing factors and its relationships. A

traffic simulator will be defined specifically for this research and will be used

to verify the data mining results. The details of the proposed approach are

discussed in the next chapter.

Page 88: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

2.5. SUMMARY 57

2.5 Summary

Road crashes on curve usually result in at least some form of injury and are

often fatal. The scope of this research focuses on crashes on horizontal curves.

Horizontal curves consist of simple, compound, reverse and spiral curves. The

three main categories of contributing factors to road crashes are driver, road-

way and environment and vehicle however, human error is considered the main

contributing factor to road crashes. Furthermore, the degree of curve, lane

width, sight distance, length of curve and super-elevation contribute to the

roadway factor. Weather conditions, roadway surface and traffic condition

contribute towards environmental factors. Lastly, a discussion on vehicle fac-

tors which include safety features, vehicle type, condition and age of the vehicle

need to be considered as a possible contributor. In conclusion, road crashes

on curves can be fatal and the major contributing factors are due to driver

behaviours such as speeding, drinking and fatigue which can affect a driver’s

ability to make decisions.

The existing crash prediction models consist of prediction models the appli-

cation of data mining in vehicles, use of traffic simulator, study of psychologi-

cal driver behaviour models and intelligent transport systems. The horizontal

prediction model does not consider factors such as road side parameters and

vehicle and human related factors. Thus, there is a need to understand the

causes of crashes on road curves using a wider range of contributing factors.

Existing research (Wong & Chung, 2007; Singh, 2001a) have studied the

relationships between contributing factors however, the findings only relate

one factor to another one which only provides limited information. Thus,

there is a need to identify the complex relationships between more factors and

specifically factors involved in crashes on road curves.

Existing simulators are powerful but most of the simulators do not consider

driver-related factors and are unable to simulate crashes on road curves. The

Page 89: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

58 CHAPTER 2. LITERATURE REVIEW

ability to simulate crashes on road curves and taking into consideration driver-

related factors is critical for this research, since none of the simulators meet

the selection criteria. Therefore, a traffic simulator to imitate crashes on road

curves based on the results from data mining techniques is proposed.

Existing ITS related studies such as ADAS and SAWUR enforce the anal-

ysis with situation contextual data and analysis in real-time with data mining

techniques. However, no existing ITS application for road curves uses contex-

tual data and analyse data in real-time with data mining techniques.

The details of the proposed approach which considers all the issues men-

tioned previously are discussed in the next chapter.

Page 90: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 3

Data mining

Chapter Overview

The literature review in Chapter 2 have shown the causes and existing inter-

ventions available to reduce the number of crashes on road curves. One of the

interventions covered is using data mining technique. Thus, this chapter will

provide a background to data mining and rough set analysis theory.

3.1 Knowledge Discovery in Databases and Data mining

Knowledge Discovery in Databases (KDD) is the process of identifying useful

and understandable patterns in a data (Fayyad, Piatetsky-Shapiro & Smyth,

1996). KDD is more concerned with the development of methods and tech-

niques to find the patterns. It is referred to as an overall process to discover

useful knowledge (Fayyad, Piatetsky-Shapiro & Smyth, 1996; Maroles, Heredia

& Rodriguez, 2002). Data can be defined as a set of facts while patterns are

described as the description of the subset of the data. The discovered pattern

is applicable to new data with a degree of certainty (Fayyad, Piatetsky-Shapiro

& Smyth, 1996).

KDD is introduced into the analysis process because traditional analytical

methods are slow, expensive and highly subjective (Fayyad, Piatetsky-Shapiro

& Smyth, 1996). Most the databases have an increasing number of records and

fields which proves to be difficult to analyse manually and by using computers

59

Page 91: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

60 CHAPTER 3. DATA MINING

it will aid humans to identify the meaning and patterns in the data.

The KDD process begins with the usage of a database along with the se-

lection of data, data pre-processing, transformation, data mining, interpreting

results to identify patterns and determining which patterns can be considered

as new knowledge. Figure 3.1 shows an overview of the steps in the KDD

process.

Figure 3.1: An overview of the steps within the KDD process (Fayyad,

Piatetsky-Shapiro & Smyth, 1996).

Data mining is a process or algorithm within KDD to extract patterns

from data (Fayyad, Piatetsky-Shapiro & Smyth, 1996). It is also known as

data or knowledge discovery and is a process which analyses large volume of

data from different points of view to find hidden correlations, patterns and

dependencies in a database. This allows one to extract knowledge via the

information obtained from the analysis. The objective of data mining is to

perform predictions and describe the meaning of the patterns discovered.

Data mining is a relatively new term but not the technology. Companies

have been using power computers and Oracle software to analyse customers’

purchase pattern and behaviour for decades. The use of data mining can

increase the number of new customers as well as retaining the existing ones.

Page 92: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.1. KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING61

Other benefits of data mining are as follows:

• Exploit information and use it to obtain competitive advantages.

• A data-driven, self-organising, bottom-up approach to data analysis.

• Examine segment of databases automatically.

• Able to process all types of data and large databases.

Data mining methods The objective of data mining, for example, per-

form predictions and describe meanings of pattern, can be is achieved with a

variety of data mining methods and the following list explains each method

briefly.

1. Classification is a function that groups or map data items into one

of several predefined classes (Weiss & Kulikowski, 1991; Hand, 1981).

An example of a classification function is when a bank automatically

approves or disapproves loan applications (Fayyad, Piatetsky-Shapiro &

Smyth, 1996).

2. Clustering aims to identify a finite set of clusters that describes the

data. Clusters can be mutually exclusive or may overlap which means

that data can belong to more than one cluster.

3. Summarisation is a method that uses a subset of the data to describe

the entire collection in a compact manner. This approach includes us-

ing other complicated methods such as derivative rules (Agrawal, Man-

nila, Srikant, Tolvonen & Verkamo, 1996), multivariate visualization

techniques and the discovery of functional relationships (Zembowicz &

Zytkow, 1996).

Page 93: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

62 CHAPTER 3. DATA MINING

4. Regression is a function that classifies data items with a real value pre-

diction variable. The most common regression is the linear regression

function.

5. Dependency modelling consists of finding a model that describes sig-

nificant dependencies between variables (Fayyad, Piatetsky-Shapiro &

Smyth, 1996).

6. Change and deviation detection is a method that identifies the most sig-

nificant changes in the data based on previous measured values (Berndt

& Clifford, 1996; Guyon, Matic & Vapnik, 1996; Kloesgen, 1996)

The data mining techniques discussed generally analyse numerical values

and when the data is a text format, the technique has to have the ability to

analyse blocks of text. The variation data mining that is capable to handle

this is text mining. Text mining is explained further in the next section.

3.2 Text mining

The analysis of natural language text is thought to be difficult to deal with

as few software programs have the capability to fully understand the meaning

of text. Furthermore, recommending solutions are difficult in situations which

involve free text and few software programs are able to understand this. An

approach to this issue is the use of Information Retrieval technique which has

the same goal as text mining. However, this does not meet most users’ needs.

Thus, another approach known as statistical language analysis (Garside et

al., 1987) is introduced to produce robust parsers. However, the structures

extracted are not of any use (Witten, Bray, Mahoui & Teahan, 1999), hence

text mining is recommended which is able to discover unknown information

from free text.

Page 94: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.2. TEXT MINING 63

The purpose of text mining is to discover useful information and patterns

or trends from textual data instead of numerical data. Traditional data mining

is ideal when dealing with numbers but is not feasible for mining text descrip-

tions. Text mining is a form of clustering and is also known as textual data

mining which is a variation of data mining. The purpose of text mining is

to discover useful information and patterns or trends from large unstructured,

natural language digital text.

An example of text mining was applied in biomedical science where Swan-

son (1991) extracted various evidence from titles of articles in biomedical litera-

ture when investigating the causes of migraine headaches. The clues suggested

that magnesium deficiency could be the cause of migraine headaches however,

this hypothesis did not exist in the literature. The results had to be tested

with a non-textual method and subsequently, Ramadan et al. (1989) found

evidence supporting the hypothesis (Welch & Ramadan, 1995).

Due to the capability of text mining, it can be used to analyse crash descrip-

tions in crash records. The following paragraph provides a brief description of

the software programs available to perform text mining.

Text mining Software

Based on a study carried out by Crowsey et al. (2007), the popular software

programs that are able to perform text mining and are user-friendly are SAS,

and SPSS Clementine.

• SAS

SAS is a software system which can be used to perform data mining.

The module to perform text mining is the text miner which is within

the enterprise module (SAS, 2006). Text miner can be used to extract

knowledge from textual data. The text miner module is the first mining

solution which closely combines text-based information with structured

data used for improved analyses and decision making.

Page 95: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

64 CHAPTER 3. DATA MINING

• SPSS Clementine

This software program used to perform text mining has a module called

the Predictive Text Analytics. This module provides an interface to

access all the text mining features of Clementine (SPSS, 2008). SPSS

Clementine is a mature data mining tool which allows experts and normal

users to perform data mining. Clementine was one of the first general

data mining tools and has a data flow interface that provides easy un-

derstanding of the data mining process.

3.2.1 Text mining algorithm

The clustering algorithm that will be used in SAS is the Ward algorithm. The

Ward algorithm forms and group clusters together but does not group together

clusters with the smallest distance. Instead, it joins clusters together without

increasing the heterogeneity too much. The purpose of the Ward algorithm

is to unify clusters so that the resulting clusters are as consistent as possible

(Czek, Hrdle & Weron, 2005). It uses two methods of clustering: hierarchi-

cal algorithms and partitioning algorithms (Czek et al., 2005). Hierarchical

algorithms create clusters with similar characters. The Ward algorithm be-

longs to the hierarchical algorithm and is considered to be the agglomerative

or bottom-up approach type of hierarchical algorithm. Agglomerative algo-

rithms use the distance between clusters for clustering. The pseudo code for

an agglomerative algorithm is listed below (Czek et al., 2005).

Agglomerative algorithm:

Perform the finest partition.

Compute the distance matrix D.

while all clusters are agglomerated into D do

Find two clusters with the closest distance.

Place the two clusters into one cluster.

Page 96: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.2. TEXT MINING 65

Compute the distance between the new clusters

to obtain a reduced distance matrix D

end while

As Ward algorithms are agglomerative, the general distance computation

follows the agglomerative pseudo code as discussed previously. The Ward

algorithm computes the distance between clusters with the formula 3.1 .

Let P,Q and R be the different three clusters, P ∩ Q ∩ R = ∅.

For explanation purposes, clusters P and Q are grouped together as a cluster,

thus, the new cluster P+Q is formed. Then the new cluster is used to compute

the distance between cluster R. The Ward distance between the two clusters

is calculated with the function as in 3.1 (Czek et al., 2005):

d(R,P +Q) = δ1d(R,P )+δ2d(R,Q)+δd(P,Q)+δ4 |d(R,P ) − d(R,Q)| (3.1)

where

δj = nR+nP

nR+nP +nQand

nP =∑n

i+1 I (xi ∈ P ) is the number of objects in cluster P.

The values of nP and nQ are defined equivalently.

Data mining techniques can be applied to analyse crash data and knowl-

edge is derived via understanding the contributing factors of the crash. Besides

recognising the causes, knowing the relationship between the contributing fac-

tors variables can be achieved. Singh (2001a) studied the relationships between

contributing factors such as age, gender and vehicle type using the Principal

Component Analysis. Another approach to determine the relationships be-

tween the contributing factors is rough set theory analysis. This is explained

further in the next section.

Page 97: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

66 CHAPTER 3. DATA MINING

3.3 Rough set theory

Rough set theory is a mathematical approach to deal with uncertainty and

vagueness in the data. The uncertainty consists of missing data, noisy data

and ambiguity in semantics (Krishnaswamy, 2008), while vagueness is the lack

of information about elements of the universe. The purpose of using rough set

theory to analyse data is to discover a set of the minimal number of attributes

that can represent the whole data set.

Data is represented in a tabular format, known as information system in

this context. Each row of the table corresponds to an object and each col-

umn corresponds to an attribute related to the object. Each object in a row

contains a decision attribute in the last column. The formal definition of an

information system or table, S, is in a pair

S = (U,A)

where:

U is an non-empty, finite set of objects

A is an non-empty finite set of attributes, an indiscernbility relation on U. If

x, y ∈ U and xAy then x and y are indistinguishable in S.

An indiscernbility (indistinguishable) relation(Ind(B)) is a relationship where

objects cannot be classified properly due to limited availability of information.

Given two objects, xi, xj ∈ U , they are indiscernible by the set of attributes B

in A, if and only if a(xi)a(xj) for every a ∈ B. That is, (xi, xj) ∈ Ind(B) if

and only if ∀a ∈ B where B ⊆ A, a(xi) = a(xj) (Parmar, Wu & Blackhurst,

2007).

An example of an information system is shown in Table 3.1

Where

Pi represents the set of attributes.

Oi represents the set of objects.

0,1,2 represents the values of objects.

Page 98: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.3. ROUGH SET THEORY 67

Table 3.1: An information system example.

Object P1 P2 P3

O1 1 2 0

O2 0 2 1

O3 1 2 0

O4 2 0 0

O5 0 2 1

In rough set theory, a set with similar objects is called an elementary set

which forms a fundamental atom of knowledge (Pawlak,1982). Any union of

the elementary sets forms a crisp set and the other sets form the rough set

(Pawlak,1982). Each rough set has boundary-line objects as some objects can-

not be definitely classified as a member of a set due to a lack of knowledge

or information. These objects cannot be classified properly and are called the

boundary-line cases, also known as objects with indiscernible relationships.

Thus, the lower and upper approximations are used to identify the context of

each object and reveal the relationships between objects so that objects can be

classified properly. The lower approximation has objects that definitely belong

to a set while the upper approximation has objects that possibly belong to the

set. The lower approximations can be formally presented as in Equation 3.2.

Given the set of attributes B in A, and the set of objects X in U, the lower

approximation of X is the union of all equivalence class which are contained

in the target set (Parmar et al., 2007).

XB = ∪{

xi

[

XiInd(B)

]

⊆ X}

(3.2)

For example, if the target set = O1, O2, O3 , then the lower approximation

Page 99: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

68 CHAPTER 3. DATA MINING

will be O1, O3.

The upper approximation is formally presented as in Equation 3.3. Given

the set of attributes B in A, and the set of objects X in U, the upper ap-

proximation of X is the union of the elementary sets which have a non-empty

intersection with X (Parmar et al., 2007).

XB = ∪{

xi

[

XiInd(B)

]

∩ X 6= 0}

(3.3)

For example, if the target set = O1, O2, O3 , then the upper approximation

will be O1, O3, O2, O5.

Reducts and rules are the results from rough set theory. Reducts is the

subset of attributes that are sufficient to present in the information system.

A reduct consists of no excessive attributes and at the same time maintains

the indiscernibility relation between the original attributes. This can be rep-

resented formally as:

Given a set B, a reduct is a set of attributes B′ ⊆ B, such that all attributes

a ∈ B−B′ are dispensable and Ind(B) = Ind(B′) (Krishnaswamy, 2008). There

can be more than one reduct, given a set B.

Rules are generated from reducts. Reducts are considered as an extensional

category representation as they do not provide an insight of the set and have

limited practical use. As insight of the category is required and thus a set of

rules is generated that can describe the scope of the category. The formula for

a decision rule is defined as follows (Nguyen & Nguyen, 2003):

Let S be the decision table and be defined in a pair as S =(U,A)

where:

U is the non-empty, finite set of objects and

Page 100: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.3. ROUGH SET THEORY 69

A is a non-empty, finite set of attributes and represents the condition at-

tributes.

Rules generate with a decision and are defined as in Equation 3.4:

S = (U,A ∪ {dec}) (3.4)

where:

U is the object,

A represents the condition attributes and

{dec} is a decision attribute and {dec} /∈ A.

The rule is presented in the form:

(ai1 = v1) ∧ ... ∧ (aim = vm) ⇒ (dec = k) (3.5)

where:

1 ≤ i1 < ... < im ≤ |A| , vi ∈ Vai.

Each a ∈ A which corresponds to the function a : U → Va and Va is the value

set of a. This function is known as the evaluation function.

A decision table is required for the rough set analysis process. A decision

table has columns filled with attributes and the rows contain records. There are

two types of attributes: (1) condition attributes and (2) a decision attribute.

Condition attributes are the data of interest and decision attribute is the out-

come that is based on the different combination of the condition attributes.

Table 3.2 is an example of a decision table which has records: {r1,r2,r3} and

the condition attributes are: {a1, a2, a3}, and D is the decision attribute.

The decision table is required as rough set analysis needs a column that

Page 101: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

70 CHAPTER 3. DATA MINING

Table 3.2: An example format of a decision table.

a1 a2 a3 D

r1 1 1 0 1

r2 0 1 0 2

r3 0 0 1 3

contains the decision factor in the table. Each rule is associated with a set of

numerical characteristics: support, coverage, accuracy and confidence. These

are defined in in the list below.

• Support

Support can be defined as the number of records that satisfy a given rule

(Aldridge, 2001). Wang and He (2006) define support as: support(X →

Y ) = P (X ∪ Y )

where X is the condition attributes and Y is the decision attribute.

This definition is explained as the support of rule x → y is the number

of records or objects in the decision table that contain X ∪ Y .

Two kinds of support are available: (1) LHS support and (2) RHS sup-

port. LHS support is defined as the number of rules that have the prop-

erty of the IF conditions, while RHS support is defined as the number of

rules that have the property of the THEN condition (Sulaiman, Sham-

suddin & Abraham, 2008).

• Coverage

The other characteristics of a rule is coverage and that are two kinds: (1)

LHS coverage and (2) RHS coverage. LHS coverage is the value which

is obtained by dividing the support of the rules that exhibit the prop-

erty of the IF conditions by the total number of records used. On the

other hand, RHS coverage is obtained from dividing the support of the

Page 102: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.3. ROUGH SET THEORY 71

rules that exhibit the property of the THEN conditions by the number

of records that satisfied the THEN condition.

• Accuracy

Accuracy is defined as the number of records or objects that satisfy the

condition and decision of the rule compared to the number of records or

objects that satisfy the condition. RHS accuracy is obtained by dividing

the number of RHS supports by the number of LHS supports.

• Confidence

The confidence of the rule is helpful in identifying optimal and consistent

rule. In addition, confidence is useful to determine the reliability of the

rule (Wang & He, 2006). Confidence is calculated to avoid using the

rules blindly. The confidence is calculated with the following formula as

define in Wang and He’s work (Wang & He, 2006).

confidence(rule : A → B) =card([X]r ∩ Y )

card([X]r)(3.6)

Where:

A represents the condition attributes.

B is the decision attribute.

X represents the number of records or objects that meet the attribute A

of the decision table.

Y represents the number of records or objects that meet the decision B

of the decision table.

R represents the attribute set that related to the condition A.

The card function is the cardinal number of the set. Thus, card([X]r)

or support(r) represents the number of records or objects that meets the

Page 103: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

72 CHAPTER 3. DATA MINING

condition A.

The decision rule that has confidence equals or near to 1 is considered

as a consistent rule and such information are useful for selecting rules.

Rules are selected based on the rule quality and this is explained further

below.

Quality of rules Rules generated from reducts can be a lengthy and may

contain weak rules. Thus, the quality or strength of the rules is measured to

identify significant or strong rules. Rule quality is evaluated based on support

and accuracy and are classified into (Aldridge, 2001):

• Statistically significant rules

Significant rules are based on the statistical value of the support and

accuracy. Significant rules with high discriminating power have high

classification power (Dey, Ahmad & Kumar, 2005).

• Interesting rules

Experts who are looking for certain patterns, control the knowledge dis-

covery process and set a threshold to evaluate and select the suitable

rules.

• Strong rules

Strong rules are rules that are evaluated from an appropriate combination

of support and accuracy characteristics (Koperski & Han, 1995).

The types of rules that are of interest are rules that have higher strength.

Strength is measured by the support and accuracy (Herbert & Yao, 2005;

Wang & Namgung, 2007).

Page 104: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.3. ROUGH SET THEORY 73

3.3.1 Rough sets analysis software

The following list discusses the non-commercial software programs available to

perform rough set analysis.

• ROSE

ROSE, also known as Rough Set Data Explorer, is a software that im-

plements the rough set theory and rule discovery techniques. (Priedki,

Lowllnski, Stefanowski, Susmaga & Wilk, 1998). ROSE consists of two

components: a graphical user interface and a set of libraries. The core

library is written in C++ programming language, while the interface is

implemented in Borland C++ and Borland Delphi.

• RSES

RSES, also known as Rough Sets Exploration System, is a tool for Win-

dows operating systems. RSES consists of a graphical user interface

and a RSES library kernel running in the background. RSES software

classifies data based on rough set theory, LTF networks, data discreti-

sation, decision tree and instance based classification (Olson & Delen,

2008). The library is written in Java and partly in C++ programming

language.

The algorithms are based on rough sets theory and two algorithms are

available in the software to calculate reducts. One of them is the ex-

haustive algorithm which observes subsets of the attributes in loops,

classifies and returns those attributes that are reducts of the required

type. However, this algorithm uses a large amount of memory and is

time consuming when the decision table is large and complicated as it

involves very extensive calculations even though it is optimised and used

carefully (Bazan & Szczuka, 2005).

In order to address the problem with the exhaustive algorithm, an alter-

Page 105: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

74 CHAPTER 3. DATA MINING

native algorithm that can be used is the genetic algorithm. This algo-

rithm allows the flexibility to set conditions and shorten the rules and

reducts with regards to the different requirements (Bazan & Szczuka,

2000).

• Rosetta

Rosetta is a tool for analysing tabular data with rough set theory. It

consists of a computational kernel and a graphical user interface. This

application operates under Windows-based operating systems such as the

Windows NT or Windows 95. The non-commercial versions are made

public however; it does not make the algorithms from the RSES library

available when the decision tables are larger than the predefined size

which is 500 objects and 20 attributes.

• Weka

Weka is a data mining program that contains of a collection of machine

learning algorithms. Weka has tools for pre-processing data, classifica-

tion, regression, clustering, association rules and visualisation. It is also

designed to develop new machine learning schemes (Weka, 2008).

3.3.2 Rough set Algorithms

The algorithms available are: genetic reducer, Johnson’s algorithm, Holte’s re-

ducer, dynamic reducer, exhaustive calculation reducer, RSES genetic reducer

and RSES Johnson’s algorithm.

• Genetic reducers:

There are two types of genetic reducer algorithm within Rosetta. First is

the genetic reducer which implements genetic algorithm to compute the

minimal attribute set that is described by Vinterbo and Ohrn (2001).

Page 106: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

3.3. ROUGH SET THEORY 75

This algorithm supports cost information and approximate solutions.

The second genetic reducer, the RSES genetic reducer, computes all

reducts by means of brute force. The difference between the genetic

reducer and RSES genetic reducer is that the latter does not support

cost or approximate solutions. In addition, RSES genetic reducer is not

ideal for tables of moderate sizes which is not favourable for the large

incident records available for analysis.

• Johnson Algorithms

Similar to the genetic reducers, there are two types of Johnson’s al-

gorithm. Johnson’s algorithm described by Johnson (2001), computes

single reducts only and supports approximate solutions. The other one,

RSES Johnson’s algorithm, is based on the greedy algorithm of Johnson.

This algorithm also returns a single reduct however, it does not support

approximate solutions.

• Other algorithms

– Holte’s reducer returns singleton attributes set or reducts and uni-

variate rules.

– Dynamic reducer is defined by Bazan et al. (2001). Reducts are

obtained via random sampling of sub tables and computing some

algorithms.

– Exhaustive calculation reducer uses brute force to obtain reducts.

This algorithm is suitable for tables of moderate sizes as it does

not scale up well.

The explanation of rough set theory shows that the analysis can also iden-

tify the relationships between contributing factors.

Page 107: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

76 CHAPTER 3. DATA MINING

3.4 Summary

Knowledge Discovery in Databases (KDD) is an overall process that discovers

new knowledge from data. Data mining is an algorithm or process within KDD

that is used to determine the patterns in the data. Data mining analyses

numerical data and a variation of data mining is required when data is in

textual format. The variation for mining text is called text mining and is used

to find patterns in unstructured text. Software programs available for text

mining are SAS and SPSS Clementine. The algorithm used in SAS is Ward

algorithm which unify clusters to ensure consistency.

Rough set analysis is a variation of data mining technique. It deals with

uncertainty and vagueness available in data. Rough set aims to discover the re-

lationships between attributes and the minimal number of attributes to repre-

sent the data. The formal definition of the terms used in rough set analysis are

explained in this chapter. The software programs available to perform rough

set analysis are ROSE, RSES, Rosetta and Weka. The algorithms available

are: genetic reducers, Johnson algorithms, Holte’s reducer, Dynamic reducer,

and Exhaustive calculation reducer.

This chapter briefly explains the concept of data mining and rough set

analysis which is essential in understanding the rest of this thesis. The next

chapter will focus on the design of the proposed approach to this research.

Page 108: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 4

Design of Approach

Chapter Overview

The previous chapter discusses the concept of data mining and rough set

analysis along with their possible limitations. These limitations have lead to

several research questions being put forward and this chapter will attempt to

provide answers by designing the right approach. Data mining techniques will

be employed as they have the ability to identify patterns and relationships in

data.

Data mining is not a new technique however, to the best of our knowl-

edge applying this technique in understanding the contributing factors and

its relation to crash severity is novel. For instance, identifying contributing

factors using text mining technique is innovative as existing reports identify

contributing factors from statistics. Past crash reports from insurance com-

panies are used to identify these contributing factors. These reports contain

records of all crashes that cost less than AUD$2500 as they are excluded from

Queensland Transport’s statistical reports. The use of more crash cases could

provide more in depth information for analysis purposes. In addition, past

insurance reports contain detailed crash descriptions which are not available

in statistical reports.

One of the possible areas for exploration is identifying the relationship

between the primary contributing factors with other possibly related ones.

The approach taken to achieve this is through rough set analysis which is

77

Page 109: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

78 CHAPTER 4. DESIGN OF APPROACH

another data mining technique.

Based on these relationships, a minimal set of contributing factors can be

obtained. This study differentiates with preceding ones by its application to

crashes on road curves with its results verified by a traffic simulator.

The approach proposed is supported by strong theories such as rough set

theory which is already proven and used in other research fields such as the

biological medical area which uses it to identify patients with cardiovascular

diseases. The rest of this chapter explains how data mining techniques can be

applied in order to achieve the aims mentioned.

4.1 Scope of proposed approach

The scope of the proposed approach is based on the research questions that are

discussed in the earlier chapter. The research questions are listed as follows:

1. What are the factors discovered from the crash descriptions that con-

tributes to crashes on road curves?

This question leads to the investigation of finding the contributing fac-

tors for crashes on road curves using insurance crash records. The design

of the approach to discover the factors are discussed in Section 4.4.

2. What are the characteristics that influence the severity of a crash?

This second question investigates the characteristics which is made up of

the combination of contributing factors of the crashes. This is followed

with the process of understanding how the characteristics influence the

severity based on the crash cost. The process to achieve these are dis-

cussed in Section 4.5.

3. Which significant factors increase the severity of a crash?

The last question investigates the important contributing factors that

Page 110: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.2. FRAMEWORK OF APPROACH 79

influence the severity levels of crashes. The design of this process is

discussed in Section 4.6.

The next section presents the framework of the proposed approach that

investigates the research questions.

4.2 Framework of approach

Figure 4.1 shows the framework of the proposed approach to investigate the

research questions. The approach consists of four main components and they

are: input, analysis process, validation process and output. Each component

contains sub-components that represent the steps of the process that is used

to achieve each process objective. The input component contains the data

used for analysis. The analysis process component contains three main sub-

components and each is used to investigate a research question. The results

are validated in the validation process component. The last component is

the output which contain the process to understand the relationship of the

contributing factors related to crash severity.

Figure 4.1: The framework of the proposed approach related to the research

questions.

Page 111: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

80 CHAPTER 4. DESIGN OF APPROACH

The available data for analysis is a set of crash records from the insurance

company, IAG. The next section describes the data and the limitations.

4.3 The data

This research used records of crashes that occurred from the year 2003 to 2006.

The information of the crash is recorded by an operator through an interview

with the driver who is involved in a crash. The interview follows the questions

on an online system. The data recorded on the system is stored in a database

and can be exported for analysis. The data consists of information of the

driver, vehicle, along with a description of the crash.

The following sections is the description of the data used for the research.

4.3.1 The attributes

The data contained ten attributes that described information of the driver,

vehicle, description and cost of the crash.

• Gender

This is either male or female genders. As most drivers are male, this can

affect the results.

• Driver age

This attribute indicates the age of the driver. The age ranges from 16 to

89.

• Alcohol consumption

This attribute indicates whether the driver had consumed any alcohol.

This is represented with the values Yes or No and could be biased as

most clients have every intention on receiving the claim.

Page 112: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.3. THE DATA 81

• Vehicle manufactured year

This attribute indicates the year the vehicle is manufactured.

• Time

This attribute stores the time of the crash. The time is stored in the

format DD/MM/YYYY.

• Date

This attribute stores the date of the crash and the format is stored as

HH:MM am/pm.

• Crash description

Description of the crash is stored in this attribute. Descriptions are

stored as unstructured text data.

• Type of crash

This attribute stored the type of crash involved such as curve, head on,

rear, others,etc. This attribute is useful to identify which records are

related to road curves.

• Number of parties involved

This attribute stored the number of parties involved in the crash. The

number of parties included are the insurance client, opposite party and

the other properties damaged such as fence and lamp post. This attribute

is useful when filtering single vehicle crashes or multiple vehicle crashes.

This research required this attribute to filter out records that are related

to single vehicle crashes.

• Crash cost

This attribute stores the calculated total cost incurred by all parties

involved in the crash and relates to property damages and not physical

injuries. The cost value is stored in Australian dollars. This is useful as

it relates to severity of a crash.

Page 113: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

82 CHAPTER 4. DESIGN OF APPROACH

With the definition of the attributes, statistical techniques are applied to

obtain the statistical information of the data from curve related records only.

Table 4.1 and 4.2 presents the frequency table of each attribute in the data.

Table 4.1: The frequency count of each attribute in the data.

Num of rec 3433

Attributes Count Percent

Time

aft 746 21.72

eph 678 19.74

even 760 22.13

morn 481 14.01

mph 452 13.16

night 317 9.23

Gender female 1243 36.20

male 2191 63.80

Vehicle

age

new 783 22.80

mod 2222 64.71

old 398 11.59

older 25 0.73

oldest 5 0.15

vin 1 0.03

Driver

age

mature1 389 11.33

mature2 707 20.59

old 439 12.78

senior1 686 19.98

senior2 463 13.48

young 750 21.84

Legend: Refer to Appendix for abbreviations.

Page 114: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.3. THE DATA 83

Table 4.2: The frequency count of each attribute in the data (continue).

Attributes Yes No

Count Percent Count Percent

Alcohol 423 12.32 3011 87.68

Embarkment 8 0.23 3426 99.77

Gravel 351 10.22 3083 89.78

Pole 283 8.24 3151 91.76

Gutter 266 0.76 3168 92.25

Wet 699 20.36 2735 79.64

Dirt 123 3.58 3311 96.42

Kangaroo 89 2.59 3345 97.41

Collide 1061 30.90 2373 69.10

Hit 1229 35.79 2205 64.21

Leave 7 0.20 3427 99.80

Skid 183 5.33 3251 94.67

Roll 382 11.12 3052 88.88

Legend: Refer to Appendix for abbreviations.

Page 115: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

84 CHAPTER 4. DESIGN OF APPROACH

4.3.2 Limitations

The data has limitations and are listed as follows.

1. There is no information of the curve such as the degree of the curve which

can identify whether the curvature is a contributing factor.

2. The data indicates the total crash cost value for all parties involved so

the cost incurred by each party is not known. This leads to a limited

understanding of the severity of the crash by each individual party.

3. The insurer narrated what happened and who are involved in the crash

thus the crash description could be biased as most insurer has the inten-

tion to obtain the claim for the crash.

Now that the data descriptions and limitations have been addressed, the

following sections will explain the process as shown in Figure 4.1.

4.4 Identify factors from crash records

This initial process is designed to investigate the first research question and

Figure 4.2 illustrates an overview of it highlighted in a darker tone. Each

related processes are discussed in the following sections.

This phase of the approach aims to understand the contributing factors for

crashes on road curves using insurance crash records. When a crash occurs,

a police collects information briefly on based on a traffic incident report form

shown in Figure 4.3.

Queensland Transport analysed these information obtained about crashes

to identify the contributing factors. Reports available from Queensland Trans-

port contain the contributing factors of crashes that happen over a period of

Page 116: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.4. IDENTIFY FACTORS FROM CRASH RECORDS 85

Figure 4.2: The overview of the process for the first research question.

time. The reports are generated from an online database system called We-

bcrash 2.0 (QT, 2006), however the details of crashes is limited based on indi-

vidual access permission and privilege. In this research, the access was limited

and therefore, the crash description is unavailable for analysis. This result

in using statistical values related to the contributing factors. Unfortunately,

statistical values of the contributing factors do not accurately describe what

occurs in a road crash. In addition, crash reports from Queensland Trans-

port contain contributing factors only for crashes that incur damages above

AUD$2500. The exclusion of crashes that cost less than AUD$2500 could

mean missing key information. Insurance crash records from IAG include

crashes that cost less than AUD$2500 so it is recommended that these records

be used in order to fully understand the contributing factors for a crash and

the outcomes.

4.4.1 Selection

Insurance crash records contain a crash description field which describes what

has happened and the outcome of the crashes. This field will be used for

analysis in order to determine the causes of the crashes. The descriptions of the

Page 117: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

86 CHAPTER 4. DESIGN OF APPROACH

Figure 4.3: The traffic incident report.

crashes are stored in unstructured textual format and there are approximately

11,058 records for analysis. Analysing the descriptions to determine keywords

in the text is a challenging task as most software programs deal with numerical

values and so they are not able to fully understand or interpret the meaning of

Page 118: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.4. IDENTIFY FACTORS FROM CRASH RECORDS 87

the textual input. The text data can be analysed manually however, it is too

time consuming due to the huge volume of records. Thus text mining, which

is part of data mining, is recommended as this software accepts textual data

for analysis and produces a list of keywords amongst the textual inputs. A

brief explanation of text mining is discussed in the next section.

4.4.2 Technique used to find contributing factors

The recommended technique is known as text mining and can also be known

as textual data mining. The purpose of text mining is to discover useful

information and patterns or trends from large unstructured, natural language

digital text. Traditional data mining is ideal when dealing with numbers but

is not feasible for mining text descriptions. Text mining is used to locate

keywords for each of the five clusters. The number of clusters is based on the

severity level.

An example of text mining is applied in biomedical science where Swanson

(1991) extracted various evidence from titles of articles in biomedical literature

when investigating the causes of migraine headaches. The clues suggested

that magnesium deficiency might be the cause of migraine headaches. This

hypothesis did not exist in the literature. The results had to be tested with a

non-textual method and subsequently, Ramadan et al. (1989) found evidence

supporting the hypothesis (Welch & Ramadan, 1995).

4.4.3 Pre-processing

A brief introduction of text mining previously have explained that data min-

ing techniques can be applied to analyse crash records and knowledge can be

derived via understanding the contributing factors for a crash. The crash de-

scriptions in the records are used as input for analysis however, the data needs

to be ‘cleaned’ before analysis.

Page 119: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

88 CHAPTER 4. DESIGN OF APPROACH

The aim of data cleaning is to ensure the data is devoid of any errors.

Data cleaning involves steps to filter incomplete and duplicate records in order

to create a complete data for analysis. The data is filtered for curve related

crashes in order to ensure that only curve-related crash records are analysed.

Curve related records can be identified by the Type of crash field in the data.

In data mining, 70 to 90% of the effort is spent on data preparation which

consists of data cleaning. Most data mining tools required clean and complete

data as input as none of the tools are able to perform with incomplete or

data with missing values. In addition, the data format is not consistent across

databases and thus may require transforming data to different expressions or

types such as characters changed to numbers. The next section explains the

process of transforming the data.

4.4.4 Transformation

Transformation involves selecting the required attributes and changing the

data format to meet the requirements of the selected algorithm. Transforma-

tion ensures that data are in the required format and is consistent throughout

the process.

With regard to text mining, the crash description attribute is already in a

text format therefore transformation of the data is not required. The selected

crash descriptions are tabulated where each row stores one description and then

analysed with the selected text mining algorithm. The next section explains

the process involved in text mining.

4.4.5 Text mining

The purpose of text mining is to discover contributing factors among the text

provided among the data. This is achieved with various software programs

available. The following section gives a brief description of software programs

Page 120: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.4. IDENTIFY FACTORS FROM CRASH RECORDS 89

and algorithms available for text mining.

4.4.5.1 Text mining software selection

Based on the approach discussed previously, a data mining software program

that has the capability to perform text mining is needed. The software program

has to be able to analyse the text data and produce a list of keywords. The

criteria to select a data mining software program are as follows.

• Ability to analyse text data

The ability to understand textual data and produce a list of keywords

is paramount. The keywords are used to represent as the contributing

factors of curve related crashes.

• Ease of use

Ease of use consists of the graphical interface that is easy to use and

understand. The system should be easy to perform and no complicated

knowledge required.

• Able to import and use Microsoft Excel files

The software program should have the ability to import Excel files and

use it for analysis.

• Results are easy to understand

The results produced from the text mining should be easy to interpret

and understand. The results can be presented as graphs or statistical

numbers or in an easy to understand format.

A number of data mining software programs are available however, there

are limited programs that have the capability to perform text mining. Based

on a study carried out by Crowsey et al.(2007), the most popular text mining

software programs are SAS and SPSS Clementine.

Page 121: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

90 CHAPTER 4. DESIGN OF APPROACH

1. SAS

SAS is a software system which can be used to perform data mining. SAS

contains various modules to perform various data mining and analytical

process.

• Ability to analyse text data

The module to perform text mining is the text miner which is

within the enterprise module (SAS, 2006). Text miner can be used

to extract knowledge from the textual data. The text miner mod-

ule is the first mining solution which closely combines text-based

information with structured data used for improved analyses and

making decisions.

• Ease of use

SAS provides an interactive interface which allows computation to

be represented with icons and placed in the workplace. Each icon

contains data and a specific action or function to perform specify

by the user. The action can be specified with a right-click on the

icon to call up a context menu to set the required action or data.

• Able to import and use Microsoft Excel files

SAS has the ability to import Microsoft Excel files for earlier ver-

sion until version 2003. SAS also allows import of other file types

such as Lotus 1,2,3.

• Results are easy to understand

Text miner module results consists of a list of keywords along with

the related frequency count. The frequency count indicates the

number of times the keywords appears in the text. The results can

be clustered into the number of clusters indicated and the keywords

are classified into the appropriate clusters.

Page 122: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.4. IDENTIFY FACTORS FROM CRASH RECORDS 91

2. SPSS Clementine

The other software program that is able to perform text mining is SPSS

Clementine which is a mature data mining tool which allows experts and

normal users to perform data mining. Clementine was one of the first

general data mining tools. This tool is not fully developed as it is now at

the point of research and has limitations. One of the limitation is that it

requires LexiQuest to perform text mining. LexiQuest is a text mining

product which primarily process large text documents.

• Ability to analyse text data

SPSS Clementine has a module called the Predictive Text Analyt-

ics. This module provides an interface to access all the text mining

features of Clementine (SPSS, 2008).

• Ease of use

Clementine has a data flow interface that provides easy understand-

ing of the data mining process.

• Able to import and use Microsoft Excel files

The module has a node that allows the import of Microsoft Excel

files of any version.

• Results are easy to understand

In order to understand the collected results easily, they are pre-

sented with charts and graphs such as histogram, distribution table

and line plots.

Table 4.4 summarises the text mining software programs and the related

criteria.

Accuracy is based on the percentage of accurate results that the system

can achieve within a reasonable performing time. The threshold of accuracy

is estimated to be 80% of accurate results.

Page 123: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

92 CHAPTER 4. DESIGN OF APPROACH

Table 4.3: A summary of the text mining software programs.

Features SAS SPSS Clementine

Ability to analyse text data Y Y

Ease of use Y Y

Import Excel files Y Y

Results understandable Y N

A good tool suite is one that is able to perform the above operations. Based

on the above criterion, SAS was selected due to the ease of use, and the robust

features and ability to perform text mining. The algorithm used in text mining

is discussed in the following section.

4.4.5.2 Text mining algorithm selection

The text miner module uses clustering algorithm to find the keywords for the

defined number of clusters. The clustering algorithm that is selected to be

used in SAS was the Ward algorithm. The Ward algorithm forms clusters

and group clusters together and does not group together clusters with the

smallest distance. Instead, it joins clusters together without increasing the

heterogeneity too much. The purpose of the Ward algorithm is to unify clusters

so that the resulting clusters are as consistent as possible (Czek, Hrdle &

Weron, 2005)

With the selected software and algorithm, the crash descriptions are anal-

ysed using a module called Text miner available with the SAS. Text mining

uses the Ward algorithm to categorise the text. Text miner identifies the key-

words along with the frequency count. The frequency count is used to identify

the most frequently used keywords among the crash descriptions. The key-

words with the highest count are identified as the factors. These factors are

then verified to be identified as contributing factors. The validation process is

explained in the next section.

Page 124: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 93

4.4.6 Factors validation

The results obtained from text mining need to be verified before they can be

claimed as contributing factors for crashes on road curves. The verification

process consists of comparing the keywords obtained for curve related crashes

with non-curve related crashes. In order to achieve this, 11,058 non-curve

related crash records are selected for analysis with text mining techniques and

obtain the list of keywords. The list of keywords obtained are then compared

to the keywords from curve related crash records and to determine whether

the keywords appear in both lists. The keywords obtained from curve related

crash records are only recognised as the contributing factors only when it does

not appears in keyword list from the non-curve related crash records. Once

the factors are verified, the keywords are then used as attributes which are

represented in columns of the new table and later will be used for rough set

analysis.

Data mining techniques can be applied to analyse crash data and knowl-

edge is derived from the understanding of the contributing factors for the crash.

Besides recognising these contributing factors, emphasising the relationship

between these variables can also be achieved. Singh (2001a) studies the re-

lationships between contributing factors such as age, gender and vehicle type

to the crash using Principal Component Analysis. Rough set theory analysis

is another approach that can be used to determine the relationships between

the contributing factors. A background of rough set theory is explained in the

next section.

4.5 Identify relationship between factors

This process aims to identify the relationship between the contributing factors

identified in the previous section. This process is related to the second research

question and Figure 4.4 illustrates the related processes to achieve this aim.

Page 125: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

94 CHAPTER 4. DESIGN OF APPROACH

Figure 4.4: The overview of the processes taken to identify the relationships

between the contributing factors.

4.5.1 Technique used to find the relationship

Rough set is a mathematical approach to deal with uncertainty and vagueness

in the data. The uncertainty consists of missing data, noisy data and ambiguity

in semantics (Krishnaswamy, 2008), while vagueness is the lack of information

about elements of the universe. The purpose of using rough set theory to

analyse data is to discover a set of the minimal number of attributes that can

represent the whole data set.

Rough set analysis process requires a decision table which has columns

filled with attributes and the rows contain the records. There are two types

of attributes: (1) condition attributes and (2) a decision attribute. Condition

attributes are the data of interest and decision attribute is the outcome that

is based on the different combination of the condition attributes. The next

section explains the process to organise the data as a decision table.

4.5.2 Transformation

The transformation involves these three processes and are as follows.

Page 126: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 95

• Classification

This process groups attributes based on criteria and then transforms the

data from numerical to text representations.

• Presence indication

This process is to indicate the presence of contributing factors for each

record.

• Preparing the decision table

The preparation includes steps to arrange the attributes including a de-

cision attribute and create a decision table.

The details on each process are explained further in the following sections.

4.5.2.1 Classification

The attributes are classified with semantic criteria for each object or record.

Classification allows the results to be easier to comprehend compared to nu-

merical values only. Information will not be lost from the classification of

attributes. The semantic of classification is in the following list.

• Time

The time is classified based on a defined intervals. The intervals are based

on the Queensland Transport crash reports (QT, 2005). The defined

intervals are available in Appendix B.

• Age

The age of the driver is classified based on the age range defined in

Queensland Transport crash reports (QT, 2005). The defined age range

are available in Appendix B.

• Vehicle age

The age of the vehicle is calculated based on the manufactured year with

Page 127: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

96 CHAPTER 4. DESIGN OF APPROACH

reference to the time the data is extract which is year 2006. The age

interval defined is based on the road safety reports which stated the age

intervals

• Crash cost

Initially the crash cost is classified using percentile theory however, due

to the rigid and possible biased classification, cost will be classified using

the clustering method. Clustering is a data mining technique that is

used to classify data objects into related groups without the advance

knowledge of the group definitions. It groups cost based on a statistical

theory thus, being more rigid with no potential of being biased. The

crash cost data is classified into five groups without any knowledge of

the cost range for each group. The number of clusters relate to the

number of severity levels defined i.e. (1) lowest, (2) low, (3) medium, (4)

high, (5) highest.

The clustering produces five groups and cluster proximities. Clustering is

not considered ideal when the proximity or distance between each cluster is too

sparse or there is an overlap. The ideal proximity or distance is when clusters

have an equal proximity amongst clusters with no overlaps.

The classification is performed on attributes related to driver and vehicle

only. The contributing factors that are identified with text mining are not

classified, instead use the presence indication which will be discussed in the

next section.

4.5.2.2 Presence indication

Once the attributes are classified and organised, contributing factors are iden-

tified using text mining. A ‘1’ or ‘0’ value is used for marking the presence

of contributing factors. ‘1’ indicates a presence of a factor based on the crash

description attribute and vice versa. The markings do not only indicate what

Page 128: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 97

contributing factors are present in each record but also provides a consistent

format for analysis.

4.5.2.3 Preparing the decision table

A decision table is required as rough set analysis needs a column that contains

the decision factor in the table. However, the data obtained from IAG does

not contain that attribute, thus preparing and organising the data is required

before rough set analysis.

A decision table is similar to an information system, S = (U,A), however

the decision table has distinct definition of the condition attributes, A, and

a decision attribute, D. The decision table can be defined as DS = (U,A,d),

where U is the finite set of objects, A represents the condition attributes and

d represents the decision attribute.

4.5.3 Rough set analysis

This section begins with a brief explanation of a selection of rough set soft-

ware programs and algorithms. A description of the process of finding the

minimum number of attributes to represent the data using rough set analysis

will be discussed. The purpose of employing rough set analysis is to observe

relationships between attributes which are not mentioned in most road safety

reports or databases. In addition, the analysis generates decision rules that

are used to determine the common pattern.

4.5.3.1 Rough set software selection

The criteria for selecting a rough set software are listed in the following para-

graphs.

• Ability to list the relationships between factors and in a simple format

Page 129: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

98 CHAPTER 4. DESIGN OF APPROACH

The software program must be able to list the relationships between

factors either using rules or graphical illustrations.

The results produced from rough set analysis should be presented in a

format that is easy to understand. Extra information on the relationship

should be provided, such as, the support count, quality and coverage.

This information can be presented with statistical numbers or graphical

illustrations.

• Ease of use

The software program’s graphical interface should be easy to understand

and use. The design should be intuitive where users know what to do

and how to perform the intended process.

• Ability to import and use Microsoft Excel files

The software program should be able to import Excel files and convert

it so that it can be used in the program.

• Ability to import large data files

The software program must be able to import large data files which

consist of a large volume of records.

The following list discusses the non-commercial software programs available

to perform rough set analysis.

1. ROSE2

ROSE2, also known as Rough Set Data Explorer version 2, is software

that implements the rough set theory and rule discovery techniques.

(Priedki, Lowllnski, Stefanowski, Susmaga & Wilk, 1998). ROSE con-

sists of two components: a graphical user interface and a set of libraries.

The core library is written in C++ programming language, while the

interface is implemented in Borland C++ and Borland Delphi.

Page 130: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 99

• Ability to list the relationships between factors and in a simple

format

ROSE produces a list of rules which is used to represent the rela-

tionships between the factors. The rules are tabulated in a table

which can be used as classifiers to group rules in other data sets.

• Ease of use

The software program has a graphical interface which facilitates

commands with a click. This makes it easy for end-users to use the

program.

• Ability to import and use Microsoft Excel files

ROSE does not support importing Excel files to be used as the

data source. The file format that is accepted is plain text file or

.isf file. The plain text file has to be organised in a similar format

to the .isf file.

• Ability to import large data files

ROSE is unable to read large files as inputs.

2. RSES2

RSES2, also known as Rough Sets Exploration System version 2, is a

tool for Windows operating systems. RSES consists of a graphical user

interface and a RSES library kernel operating in the background. RSES

software classifies data based on rough set theory, LTF networks, data

discretisation, decision tree and instance based classification (Olson &

Delen, 2008). The library is written in Java and partly in C++ pro-

gramming language.

The algorithms are based on rough sets theory and two algorithms are

available in the software to calculate reducts. One of them is the ex-

haustive algorithm which observes subsets of the attributes in loops,

classifies and returns those attributes that are reducts of the required

Page 131: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

100 CHAPTER 4. DESIGN OF APPROACH

type. However, this algorithm uses a large amount of memory and is

time consuming when the decision table is large and complicated as it

involves very extensive calculations even though it is optimised and used

carefully (Bazan & Szczuka, 2005).

As an alternative, genetic algorithm is recommended as this algorithm

allows the flexibility to set conditions and shorten the rules and reducts

with regards to the different requirements (Bazan & Szczuka, 2000).

• Ability to list the relationships between factors and in a simple

format

RSES2 lists of rules represent the relationship between the con-

tributing factors identified from the text mining process. The rules

are listed in a tabular format along with the support count only

for each rule. The set of decision rules could be used as classifiers.

• Ease of use

The software utilises a graphical user interface which allows the

definition of the process flow visually. The flow of the process is

created and visualised by adding icons to a blank project space.

• Ability to import and use Microsoft Excel files

RSES only accepts one of these data formats: RSES data file,

Rosetta data file and Weka data file. As the data are stored in

Excel file, it needs to be converted into the acceptable file format.

One of the methods is to export it as a Rosetta data file from

Rosetta software program.

• Ability to import large data files

There is a limit to the file size which is based on the memory limit

of the computer.

3. Rosetta

Rosetta is a tool for analysing tabular data with rough set theory. It

Page 132: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 101

consists of a computational kernel and a graphical user interface. This

application operates under Windows-based operating systems such as the

Windows NT or Windows 95. The non-commercial versions are made

public however; it does not make the algorithms from the RSES library

is not available when the decision tables are larger than the predefined

size which is 500 objects and 20 attributes.

• Ability to list the relationships between factors and in a simple

format

Rosetta produces a list of rules to represent the relationship be-

tween factors. The rules are listed in a tabular format along with

the support count, coverage, accuracy, stability and the length of

the rule. The rules can be used as classifiers for new data.

• Ease of use

The software program interface is designed as a tree format. The

main nodes consists of the data source and algorithms. Each main

node have sub-nodes which contain the details of the data or the

algorithm.

• Ability to import and use Microsoft Excel files

Rosetta is able to import the Excel files and also other database

format such as Microsoft Access files.

• Ability to import large data files

The size of the file that it can import is dependent on the com-

puter’s memory processing limitations.

4. Weka

Weka is a data mining program that contains a collection of machine

learning algorithms. Weka has tools for pre-processing of data, classi-

fication, regression, clustering, association rules and visualization. It is

Page 133: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

102 CHAPTER 4. DESIGN OF APPROACH

also designed to develop new machine learning schemes (Weka, 2008).

Weka is also able to perform attribute evaluation and attribute selec-

tion. Attribute selection involves the search of all possible combinations

of attributes in the data in order to discover the subset of attributes

that works best for predictions. In order to achieve this, two objects

are set up: an attribute evaluator and a search method. The evaluator

determines the method to use in order to assign a worth to the subset of

attributes and the search method will determine the style of the search

to perform.

• Ability to list the relationships between factors and in a simple

format

Weka display rules which indicates the possible relationships along

with the support and confidence values.

• Ease of use

This software program offers a choice of either using the command

line or graphical interface. The graphical interface is intuitive and

is easy to use.

• Ability to import and use Microsoft Excel files

Weka can only import files in these formats: csv, arff and dat files.

• Ability to import large data files

The size of the file is dependent on the computer’s memory pro-

cessing limitation.

Table 4.4 summarises the rough set software programs and the related

criteria.

Based on the brief description above, it shows that RSES2 is similar to

Rosetta however, it has a limited number of algorithms to perform reduction

on the data and this is not considered ideal. As for ROSE2, the format of

Page 134: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 103

Table 4.4: A summary of the text mining software programs.

Features ROSE2 Rosetta RSES2 Weka

Ability to list relationships Y Y Y Y

Ease of use Y Y Y Y

Import Excel files N Y N N

Results easy to understand N Y Y N

Able to read large files N Y Y Y

the input data is constrained to using a certain file extension which affects

the data format indirectly. This is inconvenient for example when wanting

to input Excel files into the software as the data from these files cannot be

read properly by the software program. In addition, converting the input data

into the .inf file extension can be complicated. Grobian was not selected as

it is difficult to use and not fully developed as the other software programs.

Rosetta was selected to perform the analysis in this research due to the ease of

use and easy to understand results. Reasons for eliminating the other software

programs are provided in the next section.

4.5.3.2 Rough set algorithm selection

Rosetta has several in-built algorithms such as the genetic reducer, Johnson’s

algorithm, Holte’s reducer, dynamic reducer, exhaustive calculation reducer,

RSES genetic reducer, and RSES Johnson’s algorithm. These algorithms were

briefly explained in the previous chapter.

The data set for analysis in this study is large, thus algorithms that could

not accommodate a large volume were not considered. What remains is genetic

reducer, Johnson’s algorithm, RSES Johnson’s algorithm, Holte’s reducer and

dynamic reducer. The ideal reducts will not consist of a single attribute, hence

Holte’s reducer, Johnson’s algorithm and RSES Johnson’s algorithm will not

be considered. Dynamic reducer is also not considered suitable because the

Page 135: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

104 CHAPTER 4. DESIGN OF APPROACH

data set is not complicated enough to have sub tables.

Consequently, the algorithm selected is the genetic algorithm as a minimal

set of attributes is returned which meets the aim of this study. The aim to

carry out rough set analysis was to obtain a minimal set of attributes that is

useful enough to provide information or predict an occurrence of an incident.

The minimal set will be usable especially for real-time streaming analysis.

4.5.4 Verification of Rules

The aim for verification is to validate the accuracy of the rules obtained from

the rough set analysis process. The validation results can imply whether the

rules are suitable for performing any further analysis and appropriate to derive

knowledge from the results.

The rules can be validated using two possible methods: dynamic using a

simulator or statistical verification.

4.5.4.1 Dynamic verification

The dynamic method to validate the rules uses a traffic simulator. Due to

limited availability of real time data and the danger and difficulties involved

in carrying out the validation on real roads, a simulator is recommended.

Simulation is a dynamic representation of a certain part of a real world which

is achieved with a computer model that moves in progress with time. Traffic

simulators are used to achieve a better understanding of a problem and the

factors involved. A traffic simulator is defined for validation purposes. The

design of the traffic simulator draws from physics theories, road geometry,

and other theories used by traffic engineers. Although there are limitations to

the simulator, the definition is supported by existing and proven theories. In

addition, the simulator is defined based on the assumption that the parameters

are not tuned to obtain the expected results. Details on the design of the

Page 136: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 105

simulator are discussed in the next section.

The validation process is performed using test cases. Test cases are scenar-

ios set-up to be simulated with the traffic simulator. The results are collected

and checked for the accuracy of the rules. The accuracy is checked against a

defined threshold. A threshold is a defined acceptance allowance of the results

obtained. The defined threshold for the accuracy of the type of crashes gener-

ated from the simulator is 70% ±10%. The threshold is selected based on the

availability of the data which is limited. In addition, the data inputs are not

using real-time data. Hence, the accuracy will not be more than 80%.

4.5.4.2 The features of the defined simulator

Simulation is a dynamic representation of a certain part of a real world which is

achieved with a computer model that moves in progress with time. Simulator

tools are usually used in traffic engineering to aid engineers in identifying

possible road designs and traffic flow issues. Traffic simulators are widely

used in research, planning, development, training and demonstration of traffic

system design.

Traffic simulators are used to determine effects of control measures and

new traffic rules such as speed limits, restrictions on lane changing and over-

taking for certain sections of the road. Simulators can also be used to discover

the effect of a new infrastructure before it is implmented (Treiber, 2008). In

summary, the reason for using a simulator is to test, evaluate, and determine a

solution without building new infrastructure. This in turn is useful for research

and training for the people involved.

All simulators allow the flexibility to configure and reflect the driver or ve-

hicle parameters in the simulator; however, none has the flexibility to configure

the environmental factors. Therefore, a simulator is defined for crash research

purposes.

Page 137: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

106 CHAPTER 4. DESIGN OF APPROACH

For validation purposes, a traffic simulator will be designed and built with

Matlab. The difference in this simulator compared to other commercial sim-

ulators is that it is used to imitate crashes on road curves, and has features

that include environmental factors such as wet surface road, friction and vi-

sion and has the ability to simulate crashes on curves. The simulator uses

the rules without cost obtained from the analysis process as the inputs for the

simulations.

The simulator defined for this study is implemented based on a stochastic

model where certain parameters such as aggressiveness are arbitrarily selected.

A stochastic model is preferred as the results obtained are more realistic when

random values are used compared to having a model that uses the same values

and generating the same results. The stochastic model generates random val-

ues which follows a normal distribution where the values oscillate in random

and bounded amplitudes and periods.

The simulator is designed taking into considerations the driver, environ-

ment and vehicles factors. The driver-related factors defined in the simulator

are aggressiveness, reaction capability, reaction time, and driver experience.

The environment-related factors are friction, light and weather forecast. The

vehicles-related factors are the tyre quality, braking capability of the car and

accelerating capability of the car.

The features of the simulator are: (1) construction of the curve, (2) speed

and radius calculations, (3) longitudinal and lateral position of vehicle, and (4)

modelling of crashes. These features are discussed in detail in the following

paragraphs.

• Construction of the curve

The simulator constructs the curve with the four parameters: (1) the

straight line at the end of the curve, (2) angular sector, (3) linking circle

radius and (4) the clothoide which is a curve that links the linking circle

and the straight line.

Page 138: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 107

• Speed and radius calculations

One of the driver-related factors is speed and the simulator is designed

with a safety speed feature which is based on the safety radius of the

curvature. The safety speed is the speed that the driver is advised to

drive at or within the range. The safety radius of the curvature is the

minimum radius that the driver can manoeuvre safely in the curve. The

minimum radius can be calculated with the Equation 4.1.

MinRad =Speed2

δ + F(4.1)

MinRad is the minimum radius where δ is the slope or super-elevation.

F or Fraction of g is the acceleration due to gravity.

Based on the calculation in Equation 4.1, Brunel (2005) listed six safety

speeds and radius and these are presented in Table 4.5.

Table 4.5: The list of six safety speeds and radius.

Speed(km/h)

Curvature

factors

40 60 80 100 120 140

Fraction of g* 0.25 0.16 0.13 0.11 0.1 0.09

Slope(%) 0.07 0.07 0.07 0.07 0.07 0.07

Min. radius 40 120 240 425 665 1000

(meters)

Legend :

*Fraction of g is acceleration due to gravity.

Slope is the angle raised or the super-elevation.

Besides having the safety speed, a reference speed and driver speed is

defined. The definition for driver speed is presented in Equation 4.3.

Page 139: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

108 CHAPTER 4. DESIGN OF APPROACH

DriverSpd(t+1) = DriverSpd(t)+ (RefSpd(t+1)−DriverSpd(t))∗k

(4.2)

Where:

k is a coefficient related to the contributing factors which describes the

way adriver adapts the current speed to the environment.

t is the time.

RefSpd is the reference speed.

DriverSpd is the driver driving speed.

The reference speed refers to the theoretical speed the driver will drive.

For example, a driver who is driving on a wet road will tend to reduce

his speed. The reference speed is represented in Equation 4.3.

RefSpd = InitSpd ∗ RefSpdCoeff (4.3)

Where:

RefSpd is the reference speed.

InitSpd is the initial speed. The value is modified with the contributing

factors.

RefSpdCoeff is the reference speed coefficient which is discussed in the

next paragraph.

Reference speed coefficient (RefSpdCoeff) In Equation 4.3, the

reference speed coefficient is represented as RefSpdCoeff. The value of

the coefficient is calculated using values of the contributing factors and

the influence of the reference speed. For example, in the simulator, the

driver reaction capability and reaction time is not considered in the cal-

culation of the reference speed. However, the driver adapts his speed

Page 140: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 109

and trajectory movement according to the dynamic environment. Thus,

this behaviour is defined with the reference and driver speed coefficients.

The coefficients for the reference speed is calculated with Equation 4.4.

In addition the driver speed coefficient is calculated with Equation 4.5.

refSpdCoeff = Driver ∗ Environment ∗ V ehicle (4.4)

where refSpdCoeff is the reference speed coefficient.

Besides having the reference speed coefficient, the driver speed coefficient

is required as a driver adapts the speed to the environment. The driver

speed coefficient is defined in Equation 4.5

driverSpdCoeff = Driver ∗ Speed ∗ Environment (4.5)

• Longitudinal and lateral position of vehicle

One of the factors in defining in the simulator is the position of the vehicle

on the curve. The longitudinal and lateral positions of the vehicle will be

used to indicate the location of the vehicle on the curve in the simulator.

The longitudinal position is calculated with Equation 4.6.

LongPos(t + 1) = LongPost(t) + DriverSpd(t) (4.6)

The lateral reference position is equally important as the longitudinal

position. Koita (2005) conducted experiments to observe the lateral

position difference between experienced drivers and inexperienced drivers

and the results are presented in Figures 4.5 and 4.6 on Page 110.

The results indicate that the lateral position of the vehicle changes when

it is at the curve entry and in the curve. Experienced drivers adopt the

‘wait and see’ strategy in order to assess the sharpness of the curve and

Page 141: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

110 CHAPTER 4. DESIGN OF APPROACH

Figure 4.5: The lateral position results for experienced drivers (Abdourah-

mane, 2005).

Figure 4.6: The lateral position results for inexperienced drivers (Abdourah-

mane, 2005).

adapt his trajectory in the curve. Therefore, the lateral position change

is gradual. The gradual change is observed as the lateral position of the

vehicle increases in the x-axis direction when it proceeds to the centre

part of the curve. Then there is a gradual decrease in the x-axis direction

when the vehicle travels out of the curve with a maximum value that is

Page 142: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 111

half of the curvature value. The point of cord is at half the curvature

value.

On the other hand, drivers with less experience are afraid of the sharpness

of the curve and their lateral position is only adjusted at the last moment.

Thus, the lateral position has a sudden change. The results in in Figure

4.5 on Page 110 indicate the lateral position for inexperienced drivers

and the point of cord occurs at the point less than half of the curvature

value. The defined simulator adopts the driver experience lateral position

and the position is determined with the Gauss curve. The lateral position

is defined as in Equation 4.7

LatPos = Min + Agr ∗ (Guass(N,Min,Max,

ceil(N2+driverExp

* N22

),ceil(0.1*N))- Min)(4.7)

Where:

LatPos is the calculated lateral position of the vehicle.

Min is the lowest value of the lateral position.

Agr is the driver aggressiveness.

Guass is the function that takes in a number of inputs and produces a

normal distribution.

N is the total number of points for the curve.

Max is the highest value of the lateral position.

N2 is the total number of points on the circle that links the curve.

driverExp is the driver experience which starts from 0 to indicate a bad

driver to a perfect driver with a value of 1.

• Modeling of crashes

A crash is likely to occur when a driver travels over the limited or safety

speed on a curve. Another factor such as the reaction time which is the

capability of the driver to see the obstacles early and avoids it may also

Page 143: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

112 CHAPTER 4. DESIGN OF APPROACH

contribute to the crash. The simulator has the ability to simulate three

types of crashes using a simplistic approach. They are:

– Collision with an obstacle on the road

Collision occurs when a driver is either hyper-aggressive or his re-

action time is less than one second. Aggressiveness is represented

in a variable in the simulator and controls the speed of the vehicle

in the simulator. In addition, related variables have to be adjusted

accordingly to the aggressiveness. The simulator determines the

collision based on the longitudinal and lateral position of the car

as it advances towards the obstacle.

– Skid or loss of control

Skid or loss of control occurs when travelling at a high speed on the

curve or over the safety speed limit for the radius of the curvature.

The simulator determines the skid based on the calculation of the

driving speed, safety speed and the radius of the curvature.

– Off road crashes

Off road crashes occur when the lateral position of a vehicle is out

of the bend taking into account the dimensions of the car. The

position of the vehicle leaving the bend is due to loss of control or

speeding.

4.5.4.3 Performance of the simulator

The performance analysis of a simulator is determined by its ability to simulate

crashes close to reality. A simulation is performed for 10,000 runs in order to

observe the driver speed. The number of runs is required to ensure that the

results obtained is not biased to a crash type.

The results obtained show the driver speed has a normal distribution. Thus,

the results indicate the simulator is able to simulate crashes close to reality.

Page 144: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.5. IDENTIFY RELATIONSHIP BETWEEN FACTORS 113

The simulator is employed to validate the rules obtained from rough set anal-

ysis process. The details of the validation process are discussed in the next

section.

4.5.4.4 Statistical verification

Another possible method of verifying rules from rough set analysis is using

statistical analysis measurement supported in rough set analysis software pro-

grams. This option is suitable for rules that cannot be performed with the

simulator.

The accuracy measurement is to verify that the rules obtained are within

the defined accuracy threshold. The criteria for validation uses the statistical

information collected during the analysis process such as the accuracy and

coverage. The accuracy validation is carried out using the validation data set,

which is 20% of the data, to classify with the rules obtained from the analysis

data which is 80% of the data. This is based on the 80-20 rules for dividing

data for analysis (Narula, 2005).

The data uses the classification method that calculates a confusion matrix

which contains information about the actual and predicted classification (Ko-

havi & Provost, 1998). The performance can be evaluated from the matrix as

it shows the number of correct and incorrect classifications. Table 4.6 shows

an example of a confusion matrix with two classes.

Table 4.6: An example of a confusion matrix.

Predicted

Actual

Positive Negative

Positive a b

Negative c d

Based on the definition in Table 4.6, values are calculated and are defined

as follows.

Page 145: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

114 CHAPTER 4. DESIGN OF APPROACH

• Accuracy = (a+d)/(a+b+c+d)

• True positive rate(Sensitivity) = d/(c+d)

• True negative rate (Specificity) = a/(a+b)

• Precision = d/(b+d)

Precision is the proportion of the predicted positive cases that were cor-

rect.

• False positive rate = b/(a+b)

• False negative rate = c/(c+d)

The rule classification performance can be determined using the classifica-

tion accuracy observed. The accuracies are compared and is acceptable when

the accuracy difference is within the defined threshold. The accuracy threshold

defined is 80% with an allowance of ±10%(Narula, 2005).

4.5.5 Filtering

This process is to filter the set of rules using a rule quality filter. Filtering

is required to remove rules that do not make sense for example rules with all

false values. The quality filters are categorised into empirical and statistical

algorithms. The statistical algorithms are preferred as they have theoretical

support that follows reasonable structure to define rule quality.

4.5.5.1 Statistical quality filters

Statistical quality filters uses a contingency table which contains the behaviour

of the decision rules when classified with a class. The table is similar to the

confusion matrix with a similar the layout as the one shown in Table 4.6.

The filters measure the quality either as the association or agreement mea-

sure. Measure of association is to determine the relationship between rows and

Page 146: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.6. IDENTIFY THE SIGNIFICANT CONTRIBUTING FACTORS 115

columns. In other words, it is to find the relationship on both diagonals of the

table. The measure of agreement is to find the relationship of elements that

is only on the diagonal in the table (An & Cercone, 2001; Bruha & Kockova,

1993; Agotnes, 1999).

The available quality filters that measure association are:

• Pearson X2 statistic

Pearson algorithm is applied to a 2x2 contingency table.

• G2-likelihood statistic

When G2-likelihood statistic is divided by 2, it is equal to another algo-

rithm, called the J-measure (Smyth & Goodman, 1990)

Both algorithms have one degree of freedom. Pearson statistic algorithm

is preferred to perform the quality filtering as it is able to analysis 2x2

contingency table which is to the decision table used for analysis. Hence,

Pearson is applied to filter the rules.

Once the rules are filtered, they are sorted according to the support count

in ascending order. After sorting, the rule with the highest support count will

be on the top of the list. This process is followed by interpretation which is

explained in the next section.

4.6 Identify the significant contributing factors

This phase of the approach investigates the third research question that aims

to identify the significant contributing factors that can affect the severity of

crashes on road curves. This process is considered as a part of the rough set

analysis as rules are used for the identification process. Figure 4.7 shows an

overview of this process with details discussed in the following sections.

Page 147: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

116 CHAPTER 4. DESIGN OF APPROACH

Figure 4.7: The overview of the process for the third research question.

4.6.1 Selected software program

The software program required has to be able to select attributes from a set of

rules with selection algorithms. The selected software program is Weka which

is a data mining software with a collection of machine learning algorithms.

This software program is able to perform rough set tasks such as association

rules and attribute selections. Weka is the selected to identify the significant

factors instead of Rosetta are due to the following reasons:

• Rosetta does not has any algorithm to identify the significant attributes.

The possible approach is to refer to the statistics and select the one with

the highest frequency count.

• RSES2 is not equipped with the feature to identify the significant at-

tributes. The statistics available does not contain the frequency count

of each attributes too.

• ROSE2 does have the feature to identify the significant attribute however

due to its inability to import large amount of data and its data format

restrictions remains unsuitable.

Page 148: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.6. IDENTIFY THE SIGNIFICANT CONTRIBUTING FACTORS 117

Thus, Weka is selected as it has a selection of algorithms to identify the sig-

nificant attributes from the rules and is able to import large amount of data

for analysis.

Prior to selecting the attributes, the data will be transformed into a format

compatible with the Weka software program. The transformation process is

discussed in the next section.

4.6.2 Transformation

The Weka software program accepts file formats such as arff (Attribute-Relation

File Format) data files, csv (Comma Separated Values) data files, xrff(XML

Attribute-relation File Format) data files, and binary serialised instances. The

rules are stored in a plain text data file as text. The data will need to be

transformed to one of the acceptable file format for input as Weka is not able

to import in this format. The selected file format for transformation is the arff

(Attribute-Relation File Format) which is an ASCII text file that contains a

list of instances sharing a set of attributes.

The transformation involves converting the rules into the arff format which

consists of a header, and a data section. The header section consists of the

following items.

• The name of the relation or data set (@RELATION). The format is

defined as:

@RELATION < relation − name >

• the attributes (@ATTRIBUTE) list. Each attribute is defined as the

following format:

@ATTRIBUTE < attribute − name >< datatype >

Page 149: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

118 CHAPTER 4. DESIGN OF APPROACH

The data section contains the attribute values and the format is @DATA

followed by the list of values. The values are seperated with commas along

with the class value at the end. Each data is defined in the following format:

< attribute − value >,< attribute − value >,< class − value >

An example of an ARFF data file format using the rules obtained from

rough set analysis is as follows.

% HEADER section

@RELATION cost

@ATTRIBUTE time numeric

@ATTRIBUTE vehage numeric

@ATTRIBUTE drvage numeric

@ATTRIBUTE ALCOHOL numeric

@ATTRIBUTE tree numeric

@ATTRIBUTE mountain numeric

@ATTRIBUTE losttraction numeric

@ATTRIBUTE fog numeric

@ATTRIBUTE puddle numeric

@ATTRIBUTE loosesurface numeric

@ATTRIBUTE slippery numeric

@ATTRIBUTE oversteer numeric

@ATTRIBUTE phone numeric

@ATTRIBUTE crashtype numeric

@ATTRIBUTE class 1,2,3,4,5

% DATA section

@DATA

1,2,4,0,0,0,0,0,0,0,0,0,0,0,2

Page 150: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.6. IDENTIFY THE SIGNIFICANT CONTRIBUTING FACTORS 119

5,2,1,0,1,0,0,0,0,0,0,0,0,1,2

5,2,3,0,1,0,0,0,0,0,0,0,0,1,2

6,2,1,0,1,0,0,0,0,0,0,0,0,3,2

2,2,1,0,0,0,0,0,0,0,0,0,0,3,2

6,1,3,0,0,0,0,0,0,0,0,0,0,1,2

4,2,2,0,0,0,0,0,0,0,0,0,0,0,2

3,2,2,0,0,0,0,0,0,0,0,0,0,3,2

3,2,1,0,0,0,0,0,0,0,0,0,0,1,2

The rules are in text format thus in order to transform into an ARFF data

file, the header section needs to be defined with the relation name, attributes

and the data types. Then in the data section, the text has to be converted to

numerical values and separated with commas in the format shown previously.

4.6.3 Select attributes

The algorithm selection is based on this list of criteria:

• Able to handle rules and evaluate them.

• Able to handle the data format of the relation.

• Able to handle multiple decision class.

• Able to discover the optimum values.

The available algorithms that can handle rules and numeric data file is

the ClassifierSubsetEval algorithm which estimates the merits of a set of at-

tributes. The algorithm has a list of attribute evaluator and the ones that are

able to handle rules with multiple decision class and in the arff format are:

Page 151: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

120 CHAPTER 4. DESIGN OF APPROACH

• ConjunctiveRule is a conjunctive rule learner which can handle numeric

and nominal class types. Conjunctive rules use the logical relation, AND,

to relate the attributes.

• JRip is a propositional rule learner. A propositional rule represents a

particular true or false proposition where a variable can be only one of

those values. The original Repeated Incremental Pruning to Produce Er-

ror (RIPPER) contains a system error and the version Weka implemented

avoids this and thus has some difference from the original version.

• Ridor also known as the RIpple-DOwn Rule learner. The rules are or-

ganised in a tree structure. Each node of the tree contains a rule and

has two child branches which contain a satisfied rule node in one child

and the unsatisfied rule node in the other. The tree branches out until

there are no more rules and the last branch contains the conclusion.

Ridor is the selected attribute evaluator for the ClassifierSubsetEval algo-

rithm because it is able to analyse rules with multiple decision classes. The

rules are pruned and thus the final result contains the best attributes that are

can be used to represent the data.

4.7 Understanding crash severity

This process is designed to consolidate the results obtained and is belongs to

the results interpretation process. Figure 4.8 illustrates an overview of this

process.

Details of each process is discussed in the following sections.

Page 152: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.7. UNDERSTANDING CRASH SEVERITY 121

Figure 4.8: The outcome of the analysis processes.

4.7.1 Interpretation

Interpretation is the process in understanding the rules obtained from the

rough set analysis process. The rules are analysed to understand what they

mean and how the factors relate to each other. This process determines the

relationship between the contributing factors identified with the text mining

process.

4.7.2 Findings

For this study, the crash severity is assessed based on the cost value and

contributing factors. This is based on the assumption that:

• Cost is related to the crash severity.

• A high cost value indicates a high crash severity and vice versa.

The cost values are labeled with a cost level in the classification process.

The crash severity levels correspond to a cluster which is classified based on

the cost.

Page 153: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

122 CHAPTER 4. DESIGN OF APPROACH

The crash severity levels has a total of five levels and they are (1) lowest

(2) low (3) medium (4) high and (5) highest. Each severity level is related to

a cost distribution of a cluster which is defined previously with the clustering

method.

4.8 Novelty and limitations of approach

This section lists the novelty, contributions and the limitations of the proposed

approach.

4.8.1 Novelty

The novelties of the approach are as follows:

• The use of data mining in road safety

The use of data mining to explore crash records with a different technique

is the first novelty of this approach.

• Identifying the relationships between contributing factors related to road

curves

The relationships between the contributing factors of road curve crashes

are observed to determine the effect on crash severity.

• The use of insurance crash records for analysis

A different data source used for analysis and determine whether new

contributing factors can be identified.

• Effectiveness

The process to assess severity of crashes on road curves requires the

understanding of the contributing factors of crashes. Data mining makes

the analysis process more effective and it is more efficient in getting

Page 154: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.8. NOVELTY AND LIMITATIONS OF APPROACH 123

more accurate results in a shorter time. Overall, the proposed approach

is effective and is not time-consuming.

• Understanding of crash severity

The approach to understand crash severity uses the relationships between

the contributing factors. The relationship is used to observe the effect

on the crash severity.

The approach includes a validation phase which ensures that the results

are verified and valid to be used. In addition, this approach is also user-

friendly as it has an easy-to-use interface. Besides that, the concept of

the approach is easy to understand as as it is designed to be trouble-free

for the users. Moreover, this approach is also easy for the users to accept

compared to other approaches.

Extracting useful information from text requires complex algorithms and

lengthy manipulation of the data. Given the complexity of the algorithms,

the large amount of data and absence of similar research results which can

validate our approach and results, the process to design, try and combine

novel approach is required to prove that our results are accurate.

4.8.2 Limitations

There are two limitations that need to be acknowledged and addressed in

this study. Using past crash data to determine crash severity may limit the

scope of this approach because it might not be able to adapt to new and

current situations or circumstances if that approach has never been tested or

programmed. This leads to the second limitation. This approach has to be

updated regularly in order for it to respond to new circumstances. This is

due to the limitation of using streaming data from the sensors in the vehicle.

Additionally, the results will only be accurate to a certain extent as it uses

only past crash data.

Page 155: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

124 CHAPTER 4. DESIGN OF APPROACH

4.8.3 Contributions

There are several contributions of this research, one of which is the discovery

of relationships between contributing factors for crashes on road curves where

no researchers had yet discovered them. The other contribution is the method

to determine the contributing factors and related crash severity on road curves

based on the data and results available. Text mining is an innovative approach

for discovering current and new contributing factors of curve-related crashes

based on crash data. The contributing factors discovered can allow one to

understand the crash in depth and accurately. In addition, a traffic simulator

is defined to validate the results obtained from text mining and rough set

analysis.

4.9 Summary

This chapter covers the design of the approach that is used for this research.

The research scope is in response to the research questions covered in Chapter

3. The main processes in the approach are:

1. Identify factors from crash records

2. Identify relationship between factors

3. Identify significant factors

4. Understanding crash severity

Each process has a sub process which performs the goal of the research

question. The collected results are validated with either a traffic simulator

or a statistical measurement for their accuracy. The first approach validates

rules with no cost using a traffic simulator. The second approach validates

rules with cost based on the accuracy measurement. The validation results are

verified with a defined threshold.

Page 156: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

4.9. SUMMARY 125

Once verified, the results are then used to determine the effect of the factors

and relationships on the crash severity. The process is achieved using signif-

icant factors and related rules. With the design explained, the next chapter

will be discussing about the implementation of the approach.

Page 157: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 158: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 5

Implementation of approach

Chapter Overview

Now that the design of the approach has been covered, this chapter will discuss

the implementation of the approach developed in the previous chapter. This

chapter will follow the framework of the approach as shown in the first section

of this chapter.

5.1 Flow of implementation

The course of this chapter follows the framework in Figure 5.1.

Figure 5.1: The analysis process of the proposed approach relates to the re-

search questions.

127

Page 159: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

128 CHAPTER 5. IMPLEMENTATION OF APPROACH

The details of each process is explained in the following sections.

5.2 Identify factors from past crash records

This section provides details on the implementation of text mining and begins

with the preparation and inputs for the analysis process. Further details on

the text mining process is also covered in this chapter.

5.2.1 Text mining process preparation

The selection, pre-processing and transformation processes make up the pre-

analysis phase of the text mining process. Details of each process are provided

in the following paragraphs.

• Selection

The data is filtered for road curve related crash records and any records

that do not match the criteria is excluded for instance, records that are

classified as the ‘other’ incident type. A curve related incident can be

verified through the type of incident field in each record.

• Pre-processing

This process involves ‘cleaning’ the data to ensure that minimal incor-

rect or redundant data is present. The data cleansing process involves

detecting errors, eliminating duplicates and correcting errors which are

discussed in the following paragraphs.

– Error detection

The missing values in the data are detected with the use of the

search function built within the Microsoft Excel program. Other

errors such as invalid values which occur with numerical values such

as costs, are detected with an ascending sort of the values and they

Page 160: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.2. IDENTIFY FACTORS FROM PAST CRASH RECORDS 129

appear at the top of the sorted list. Invalid values include negative

numerical values.

– Duplicates elimination

If the data contains repeated or duplicate records these are detected

with reference to the incident number in each records. Duplicates

occur due to the data extraction process which extracts data from

multiple tables in the database and are appended at the end of

the extracted table. The duplicate records are removed to reduce

unnecessary data analysis and analysis time.

– Error correction

The missing values are replaced with an NA value, which means

‘not available’, in the field. However, a record is removed when it

contains more than three NA values in the fields. This is to ensure

that the data is significant for analysis.

• Transformation

Transformation involves organising the data into a format suitable for the

algorithm. Each row contains a crash records and each column contains

an attribute.

• Software settings

The software program used for text mining is SAS and the module used

to perform the analysis is the Text miner node. This tool is available in

the Enterprise miner item from the analysis item in the Solution menu

(Solution menu → Analysis → Enterprise miner). Figure 5.2 shows the

layout and work space of the Enterprise miner.

A new project within enterprise miner contains a blank space for drawing

the flow of the analysis process with the components available. Figure

5.3 shows the flow of the text mining process.

Page 161: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

130 CHAPTER 5. IMPLEMENTATION OF APPROACH

Figure 5.2: The work space of the Enterprise miner.

Figure 5.3: The flow of the analysis process.

The first component at the beginning of the flow is the data source

which contains the ‘cleaned’ and organised crash records. The records

are analysed without any further filtering. The second component is the

text miner which performs the analysis process. The two components

are linked together with a directional arrow drawn from the first to the

second component. Each component allows its settings to be changed.

The settings that can be configured for the data source components are:

1. The number of records used for analysis.

Page 162: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.2. IDENTIFY FACTORS FROM PAST CRASH RECORDS 131

Out of 11,058 records, 6011 curve related records are selected for

the analysis. The number of records are further reduced to 3434

after removing records with negative cost values.

2. Select the attributes for analysis.

Crash description attributes are selected for analysis.

3. State the role of the attribute such as being an input, reject or

target type of attribute.

The crash description role is set as an input value.

The text miner component has three configuration tabs and they are:

1. Parse tab

This setting is the parsing of textual data which is one of the set-

tings that can be configured. This configuration setting allows the

control to identify terms such as entities (names, addresses, etc.),

words occurring in a single document or ignoring selected terms in

the text. Figure 5.4 shows the settings for the parse tab.

Figure 5.4: An example of the parse settings.

2. Transformation tab

This tab is the setting for Singular Value Decomposition (SVD)

Page 163: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

132 CHAPTER 5. IMPLEMENTATION OF APPROACH

computation. SVD is a powerful strategy to discover the meanings

of the numbers in a matrix by decomposing them into a product

of simpler matrices where it is clearer and easier to understand.

The parsing process generates a term-document frequency matrix.

The matrix stores the number of times a term appears in a doc-

ument as an entry in the matrix. The matrix can contain a huge

volume of terms for large documents and cannot be analysed ef-

fectively with limited computing space. Thus, singular value De-

composition (SVD) is used to reduce the dimensions of the matrix.

SVD can transform the matrix into a table that is more informative

and compact.

Figure 5.5 shows the settings for the transformation tab.

Figure 5.5: An example of the transformation tab.

In order to generate the SVD dimensions, the check box needs to

be selected.

3. Clustering tab

This tab allows the configuration of the clustering of text. The

settings that can be specified are:

Page 164: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.2. IDENTIFY FACTORS FROM PAST CRASH RECORDS 133

– Indicate whether the text be clustered manually or automat-

ically.

– Indicate the number of clusters.

– Indicate the minimum number of terms to be included in each

cluster.

Figure 5.6 is an example of the clustering tab.

Figure 5.6: An example of the clustering tab.

Once the data is prepared and settings specified in the software program,

the next step is to run the analysis process which is discussed in the next

section.

5.2.2 Text mining analysis process

This process aims to analyse the ‘cleaned’ and organised data in order to

determine the contributing factors for crashes on road curves. The analysis

flow diagram is prepared and the components are configured to the required

settings which will activate the text miner component. Right-click on the text

miner component in the workspace and select the Run item in the drop down

Page 165: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

134 CHAPTER 5. IMPLEMENTATION OF APPROACH

menu. The text miner will begin analysis on the text according to the settings.

When the analysis is completed, the results will appear in two separate tables.

The details of the results will be discussed in the Results chapter.

5.3 Identify relationships between factors

This section begins with a brief explanation of rough set analysis. This is

followed by a description of the process of finding the minimum number of

attributes to represent the data using rough set analysis. The purpose of

employing rough set analysis is to observe relationships between attributes

which are not mentioned in most road safety reports or databases.

5.3.1 Rough set analysis preparation

Rough set analysis is strict on the format of the data input hence, keywords

from text mining have to be organised in the appropriate format for analysis.

The data format is considered appropriate when it consists of a decision at-

tribute and when the software can read the data easily. This table is where

data is organised with a decision table and the next section explains the process

of preparing the decision table.

• Transformation

The process of preparing a decision table with the available data involves

(1) organising the attributes in the decision table and (2) indicating the

presence of contributing factors.

1. Organising the attributes in the decision table

Organising the attributes involve combining both the attributes

related to the driver and vehicle with the contributing factors ob-

tained from text mining. The combined attributes are the condi-

Page 166: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.3. IDENTIFY RELATIONSHIPS BETWEEN FACTORS 135

tion attributes and these are organised into columns in the decision

table.

2. Indicating the presence of contributing factors

Once the attributes are organised, then next step is to indicate the

presence of the contributing factors that are obtained with text

mining, for each record.

The presence is checked with a search of the incident description

against the contributing factors listed in the columns and this is

done for each record. A TRUE or 1 value is marked when the con-

tributing factors are present in the description and vice versa. The

following pseudo-code describes how the indication is performed.

SET count=0

while count 6= end of file do

for A = firstAttribute to lastAttribute do

if SEARCH(contribFactor, Incdescription) > 0 THEN then

presence = 1

else

presence = 0

end if

end for

end while

The data format accepted by rough set analysis software programs re-

quires it to be consistent and have a decision attribute. The keywords

from the text mining process are used as attributes for rough set analy-

sis. Each attribute is represented in a column across the table and the

decision attribute in the last column. The decision attribute for the new

table is the labelled cost.

Table 5.1, not only contains the key attributes obtained from text mining

Page 167: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

136 CHAPTER 5. IMPLEMENTATION OF APPROACH

but also additional contributing factors such as the age group, time of

incident, age of vehicle and driving experience.

Table 5.1: Tabulated contributing factors, age group, time of incident, age of

vehicle, driving experience and outcome.

Cost level CFn CFn+1 AgeGrp Time VehAge DriverExp Outcome

L1 Y Y A T V D Z

Ln Y Y A T V D Z

Ln+1 Y Y A T V D Z

Legend:

CF is the contributing factor.

n is the count that increases by 1 until the total count.

AgeGrp is age group.

VehAge is vehicle age.

DriverExp is driver experience.

Outcome is the outcome of a crash.

Y represents the contributing factors.

Z represents the type of incident.

A represents the age group of the driver.

T represents the time of incident.

V represents the age of the vehicle.

D represents the driving experience.

All data is tabulated together in order to generate more complete and

meaningful result. In addition, it allows the discovery of more different

combinations or relationships of contributing factors for a crash..

• Software settings

The software used for rough set analysis is Rosetta. A new project in

Rosetta has a tree-like structure which has two main nodes: the structure

Page 168: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.3. IDENTIFY RELATIONSHIPS BETWEEN FACTORS 137

and algorithm.

The structure node is the location to specify the data source and imports

it into the program. Tabular files such as the Excel is imported into the

program with the ODBC import function. This ODBC function will load

the database or file as a child node under the structure node.

The algorithm node contains the functions for analysis such as the re-

duction rules, filters for rules and classification. The analysis process is

discussed in the next section.

5.3.2 Rough set analysis Process

The purpose of employing rough set in the analysis process is to find the re-

lationships between the significant contributing factors and the decision rules.

Rough set analysis produces a set of rules which indicates the relationship with

the possible combinations amongst the contributing factors.

Genetic algorithm is used which can obtain reducts that represent the dif-

ferent possible combinations or relationships between the contributing factors.

The decision factor in the data for the analysis is the cost. The cost is further

categorised into five sub-categories.

The default settings are used for the genetic algorithm in the configuration

window. Figure 5.7 shows the configuration window for the genetic algorithm.

The rules contain the possible list of contributing factors and the decision

attribute which is the related cost. These rules are useful for the prediction or

understanding possible crash severity.

5.3.3 Filter rules

The rules are examined thoroughly to locate any redundancy or useless rules.

The Basic filtering algorithm is used to filter the rules which removes indi-

Page 169: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

138 CHAPTER 5. IMPLEMENTATION OF APPROACH

Figure 5.7: The genetic algorithm configuration tab.

vidual reducts from the reduct set that meets the removal criteria set in the

configuration tab. Basic filtering is applied to the rules while options such as

the LHS support, RHS support and coverage, can be adjusted to preference.

The criteria set can be a combination of two or more criterion. The removal

criteria is based on the decision made by the cost group. This is to classify

the rules into the individual cost group. Figure 5.8 shows an example of the

configuration window for the filter.

Figure 5.8: The rule filter configuration tab.

For this research, rules are filtered and selected based on the confidence

Page 170: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.3. IDENTIFY RELATIONSHIPS BETWEEN FACTORS 139

level. Confidence can be also known as the strength which is used to measure

the quality of the rules obtained (Nguyen & Nguyen, 2003). Confidence is

calculated with Equation 5.1 :

Confidence =LHS + RHS

LHS(5.1)

LHS support is the number of records in the data that has all the properties

described by the IF condition. Data that contains all the properties described

by the THEN condition is the RHS support (Suhana, 2007).

High confidence rules are selected and used to generate the rules. Decision

rules with high confidence are selected for observations and modelling. This is

based on the data mining philosophy where only strong, short decision rules

with high confidence are selected (Nguyen & Nguyen, 2003).

5.3.4 Rule validation preparation

This section discusses the validation process using a traffic simulator. Due

to the limited availability of real time data which can be used for validation,

a simulator is required to perform the validation. The first part of this sec-

tion presents the background of the traffic simulator and subsequently, the

validation process.

The aim of the validation process is to accurately show that the combina-

tion of contributing factors obtained from rough set analysis does cause the

type of incident as indicated.

The rules can be validated with two possible methods: dynamic and sta-

tistical verification.

Page 171: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

140 CHAPTER 5. IMPLEMENTATION OF APPROACH

5.3.4.1 Dynamic validation preparation

This section discusses validation with a simulation which tests the hypothesis.

The hypothesis defined is the contributing factors will produce the type of

incident as discovered from rough set analysis.

In order to achieve this hypothesis, the contributing factors are computed

as data inputs in the simulator. The type of incident is observed during the

simulation and the count for correct and incorrect observations is recorded.

Test cases will be used and are defined as follows.

The verification is carried out using test cases and a number of test cases

will be defined before running the simulator. The defined threshold for the ac-

curacy of the results generated from the simulator is 70% ±10%. The threshold

is selected based on the availability of the data which is limited. In addition,

the data inputs are not using real-time data so the accuracy will be below 80%.

Table 5.2 presents the test cases carried out with the simulator and the

observed output obtained. The first column states the test index while the

second column states the aim of each test, followed by the inputs for the

simulations. The last column states the expected output from the simulations.

Page 172: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.3. IDENTIFY RELATIONSHIPS BETWEEN FACTORS 141

Table 5.2: Test cases

# Aim Input ExpOp

1 To show loss of control Speed is set to normal vehicle spins and

causes collision driver speed. Reaction time and collide.

capability set to low

no wet road and gravel

2 To show obstacle such Driver speed is set to high collide

as kangaroos causes collision Reaction time and capability

when driver speeds set to normal

3 To show obstacles such Driver speed is set to high hit object

as dogs leads to hit Reaction time and capability

or collision when driver speeds set to normal

4 To show wet roads Driver speed is set collide or spin

and loss of control to high. Reaction time and

(speeding) leads to off road crash capability set to normal or off road

5 To show gravel and Driver speed is set Skid or roll

dirt causes vehicle to normal. Friction is over

to skid decreased

6 To show wet roads Driver speed is set Skid or collide,

only lead to collision to normal. Friction spin or off road

or off road crashes is decreased

7 To show left-hand bend Driver speed is set off road

causes off road crashes to high. Reaction time

assuming that the and capability are decreased

driver speeds

where:

ExpOp represents the expected output.

Page 173: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

142 CHAPTER 5. IMPLEMENTATION OF APPROACH

Using the test cases, the parameters within the simulators are configured

according to the inputs stated.

5.3.4.2 Accuracy measurement preparation

Accuracy measurement is to determine the performance of the rules with a

classification test. The data is divided into two groups, one for analysis and

the another for validation. The division is 80% for analysing and 20% for

validation. The data used for analysis generates a set of rules and these rules

are used to test the performance on the validation data set. In order to perform

the classification using the set of rules, the Classify/Test table using rule set

has to be invoked. The parameters that can be configured before classifications

are:

1. The general test mode which has two options:

• Calculate a confusion matrix: This option is selected only when the

data contains a decision class. The classification method calculates

a confusion matrix which contains the information of the actual and

predicted classes.

• Classify new cases: This option classifies the data and adds a de-

rived decision class to the original data. The derived decision is

usually stored as a last column.

2. Conflict resolved with either options:

• Simple voting

A decision is made based on the vote count in favour of each pos-

sibility (one matching rule - one vote).

Page 174: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.3. IDENTIFY RELATIONSHIPS BETWEEN FACTORS 143

• Standard voting

Each rule can have many votes.

3. Rules from set

A drop down list to select the rules to be used for classification and

another drop down list to select the data to be classified.

Figure 5.9 shows the Classify/Test table using rule configuration window.

Figure 5.9: The rule filter configuration tab.

Once the parameters are selected, the classification can be performed. The

process is explained in the next section.

5.3.5 Validation process

This section explains the validation process for both dynamic and statistical

validation methods.

5.3.5.1 Dynamic validation process

The validation process uses the inputs stated in the test cases. Once the

parameters are configured according to requirements, each test case are sim-

Page 175: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

144 CHAPTER 5. IMPLEMENTATION OF APPROACH

ulated in the simulator. The simulator runs for 242 times for each test case.

The results are printed in the results windows and the type of crashes and the

number of crashes are collected.

5.3.5.2 Accuracy measurement process

The classification begins when the Classify/Test table using rule button is

invoked. The data is classified with the set of rules stated in the configuration

window. Once the data is classified, a confusion matrix is produced. This

matrix shows the accuracy of the rules and the results are presented in the

Results chapter.

Once the rules are validated and when the accuracy of the rules are within

the defined threshold, they are used to determine the significant factors amongst

the attributes. The next section explains the process to identify the significant

factors.

5.4 Identify significant factors

This section discusses the process of identifying the significant factors amongst

the set of attributes.

5.4.1 Attribute evaluation preparation

The software used for attribute evaluation is Weka and this section explains

the settings prepared for the analysis process.

• Transformation

The data are converted into an arff file format as explained in the Design

chapter. The converted data is imported into Weka where it is checked

for format and content error.

Page 176: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.4. IDENTIFY SIGNIFICANT FACTORS 145

• Software settings

Once the data is loaded successfully into Weka, the attributes are se-

lected. For this study, all attributes are selected for analysis Figure 5.10

is an example of the configuration window for selecting the attributes.

Figure 5.10: The rule filter configuration tab.

The subsequent window to configure is the attribute evaluator where the

algorithm for analysis is available for selection. The parameters available

for configuration in the window are as follows:

1. Attribute evaluator

This is where the algorithm, ClassifierSubsetEval is selected.

2. Search method

This is where the search method Ridor is selected.

3. Attribute Selection modes

This option has two modes: use full training set and cross-validation.

The Use full training set mode is selected for this analysis.

4. The decision class

Page 177: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

146 CHAPTER 5. IMPLEMENTATION OF APPROACH

The decision class is detected by the program however this can be

changed if it is not correct.

Figure 5.11 is an example of the attribute evaluator window.

Figure 5.11: The attribute evaluator configuration window.

This window also contains boxes for results and they are the Attribute

selection output box and Result list.

5.4.2 Attribute evaluation process

Once the settings are ready for analysis, the evaluation process is invoked and

the analysis is performed. The analysis presents the results in the results boxes.

Detailed information about the results are presented in the Attribute selection

output box. The results obtained will be presented in the next chapter.

5.5 Understanding crash severity

This section discusses the process in analysing the results collected and under-

standing how it affects crash severity.

Page 178: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

5.6. SUMMARY 147

5.5.1 Interpretation

The first step in understanding crash severity is to interpret and evaluate

the rules obtained from rough set analysis process. The rules indicate the

relationships between the contributing factors which can reveal a combination

of factors. At the same time, identifying which combination leads to a high

or low crash severity can also be discovered. The significant factors identified

are derived from the rules and they indicate the factors that can be used for

prediction.

5.5.2 Findings

The next process following the interpretation is consolidating the analysis and

defining a table that contains information about what is discovered. The in-

formation is divided into five crash severity levels and the rule with the highest

confidence is used to represent each level. Each level will have detailed infor-

mation about the selected rule such as the combinations of factors and their

relationships. The outcome of this process will be discussed in the Analysis

and Discussion chapter.

5.6 Summary

This chapter discusses the implementation of the approach designed in the

previous chapter. For each process in the approach, an explanation on the

preparation process and settings of the software used is explained in detail.

Data is selected, ’cleaned’ and transformed before the analysis process.

Contributing factors are identified with text mining analysis and SAS is

used to perform text mining. The text miner module is used to extract data

and the settings for the module is configured to the required settings. The

settings of the software and its operations are explained with figures.

Page 179: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

148 CHAPTER 5. IMPLEMENTATION OF APPROACH

Rough set analysis is used in the analysis process as it is able to determine

the relationship or dependency between the contributing factors and decision

making. The results obtained from text mining are used as input for this

analysis process. The data is formatted and organised into a decision table.

Pre-processing is required to ensure consistency in the table. The software used

is Rosetta and the software settings are explained along with screen shots.

Rough set analysis generates a set of rules and is used to identify the

significant factors. Weka is the software program used to discover the factors

in the data using a search algorithm which returns the best significant factors.

The settings is explained with screens shots of the software program.

The process to understand the effects of the factors and relationships on

the crash severity is achieved with the rules related to the set of significant

factors.

All the results obtained from the processes in this chapter will be explained

further in the thesis.

Page 180: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 6

Results

Chapter Overview

The previous chapter discussed the implementation approach using data min-

ing techniques to achieve the aims of this research. Although data mining

techniques are not new, the use of data mining to understand crash severity is

novel. The four main objectives defined for this research are:

• To identify and discover new contributing factors of crashes on road

curves using the text mining technique.

• To understand the relationships between the contributing factors.

• To identify and understand the significant contributing factors.

• To understand the crash severity in road curves which in turn can reduce

the crash risk or the number of crashes occurring in the road curves.

Text mining technique is used to identify the contributing factors however,

in order to identify new factors, the results from text mining are compared

with statistics from Queensland Transport. New contributing factors revealed

are: tree, embankment, gravel, pole, gutter, loss of control, wet road surface,

dirt, kangaroo, truck, lost of traction and foggy conditions. The dependency or

relationships are determined with the results obtained from rough set analysis.

149

Page 181: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

150 CHAPTER 6. RESULTS

Rough set analysis produces a set of rules and they are classified into different

crash severity levels.

This chapter presents the results while the analysis will be discussed in the

next chapter.

6.1 Factors from past crash records

The aim of this process is to identify other contributing factors from incident

descriptions using the text mining technique. The incident descriptions com-

prise of blocks of free form text. Traditional data mining technique is only

able to analyse numerical data therefore, text mining is employed to analyse

the text description.

The incident descriptions are used as input for the text mining process.

This is achieved with Test miner, a text mining module in SAS. This module

clusters the data based on the Ward algorithm which will be explained in detail

in the Design chapter.

6.1.1 The factors

The module generates two sets of results and they are:

1. A list of clusters with keywords and

2. A list of words that appeared in the data and their frequencies.

The factors are selected based on the frequency of each keyword in the lists.

The selected keywords that are related to road curves are: tree, embankment,

gravel, pole, gutter, loss control, wet road, dirt, kangaroo, truck, lost traction

and fog. The type of crashes identified amongst the keywords are collide or

collision, hit, leave, slide, spin, skid and roll. The list of keywords are used as

contributing factors as well as attributes in the rough set analysis.

Page 182: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.1. FACTORS FROM PAST CRASH RECORDS 151

6.1.2 Factors validation

The aim of the validation process is to verify that the factors obtained are only

related to crashes on road curves. This process involves the comparison of the

keywords obtained for curve related crashes against the ones for non-curve

related crashes. Figure 6.1 shows a comparison of the factors identified from

curve related crashes and non-curve related crashes.

Figure 6.1: The comparison of the factors identified from both curve and non-

curve related crashes.

The figure shows the list of factors for curve-related and non-curve re-

lated crashes identified from text mining. Factors are listed in each categories

while common factors are contained in the intersection area. This comparison

verifies and refines the factors identified from the text mining analysis. The

factors are considered as contributing factors for crashes on road curves when

they are unique which means that the factors do not belong to non-curve re-

lated crashes. The refined contributing factors are tree, lost traction, fog,

puddle, loose surface, slippery, over steer, phone, mountain.

These contributing factors are used as attributes in the decision table for

rough set analysis. The difference with the normal data table is that it requires

Page 183: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

152 CHAPTER 6. RESULTS

a decision attribute which is usually located as the last attribute in the table.

The results obtained from rough set analysis is presented in the next section.

6.2 Relationships of attributes

One of the main aims of rough set analysis is to extract consistent and optimal

decision rules from the decision tables(Bazan, Nguyen, Skowron & Szczuka,

2003). Rules can accurately describe the relationship between attributes ac-

cording to Bullard et al. (2007).

Rules generated can be lengthy and weak therefore, the quality or strength

of the rules are measured to identify significant or strong rules. Rule quality

is evaluated based on the support and accuracy and they are classified into

different crash severity levels (Aldridge, 2001). Crash severity is assumed to

be related to the cost of the crash. Thus, cost is used in assessment. The cost

is clustered and each cluster group has a cost range. Table 6.1 lists the defined

cost group.

Table 6.1: The cost groups.

Level Label Cost($) Description

1 C1 0.00 – 2499 lowest cost/severity

2 C2 2500 – 16762.46 Low cost/severity

3 C3 16762.47 – 38606.94 Medium cost/severity

4 C4 38060.95 – 57076.36 High cost/severity

5 C5 57076.37 – 77216.36 lowest cost/severity

The types of rules that are of interest are rules that have strong strength.

Strength is measured by the support and accuracy (Herbert & Yao, 2005;

Wang & Namgung, 2007).

Page 184: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.2. RELATIONSHIPS OF ATTRIBUTES 153

6.2.1 Selected rules

The number of rules generated by rough set analysis consists of a large set of

1253 rules. The rules are filtered based on quality with G2 likelihood algorithm

and reduced to 1139 rules. Quality is assessed with the strength of the rule.

In Rosetta, the support count is the measure of the strength of the reduct

(Ohrn, 2001; Sulaiman, Shamsuddin & Abraham, 2008). The relative strength

is computed by dividing the support count over by the total attributes and

multiplying by 100.

Strong rules are rules that are evaluated from an appropriate combination

of support and accuracy characteristics (Koperski & Han, 1995). The higher

the support count, the higher the strength. Therefore, only rules with high

strength are selected. Rules with low quality strength will not be considered

due to inaccurate prediction of crash circumstances.

From the rule selection process, the first five filtered rules with the stronger

strength are shown in Table 6.2

Table 6.2: The top five strongest rules.

# Rules Outcome Rel supt(%).

Time Veh yr Driver age Alcohol Crash type Cost grp

1 Even mod yg N Hit C1 OR C2 OR C3 Or C4 13.3,80,3.33,3.33

2 Even mod yg N None C1 OR C2 OR C3 36.8, 57.89, 5.26

3 Aft mod m2 N Hit C1 OR C2 OR C3 26.67,60,13.33

4 Morn mod od N None C1 OR C2 42.85, 57.14

5 Even mod m2 N None C1 OR C2 30.76,69.23

6 Even mod yg N Collide C1 OR C2 23.07, 76.92

Legend:

Note: Refer to Appendix for the classification and definition of the label used in the table.

This rule column presents the common factors of the rules and the values for

each factor. This is followed by the cost group the rule is categorised in and the

accuracy of each rule in percentage. The rules are read with an invisible AND

between each factor. An example in reading the first rule is: Time is evening

Page 185: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

154 CHAPTER 6. RESULTS

AND vehicle is manufactured between 1991 to 2000 AND driver is between 17

to 25 years old AND No alcohol consumption AND the type of crash involved

is hit an object. The outcome of this rule is four possible classification of cost

group: C1, C2, C3 or C4. The accuracy of each cost group are: 13.3%, 80%,

3.33% and 3.33% respectively. The highest support count is 30 the relative

strength is 80%. This means that this rule supports 80% of the data.

The rules can be filtered into the appropriate severity level using the Pear-

son quality filter. The following tables 6.3, 6.4, 6.5, 6.6, 6.7 present the rules

with highest relative support for each severity level.

The rows in each table contain a rule and each column indicates the pres-

ence of a contributing factor with Yes (Y) or No (N). The last column indicates

the type of crash involved such as collide, hit or no crash. The rules are read

with an invisible AND in between each contributing factor.

Table 6.3 presents the rules with highest relative support for the lowest

severity level.

Page 186: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.2. RELATIONSHIPS OF ATTRIBUTES 155

Table 6.3: The strongest rules for lowest level.

Rules Outcm

TM Vehyr DrvAge Gender Alc Tree Mt LT Fog Pud LS Slip TK OS LC PH CT

1 Mph mod m2 F N N N N N N N N N N N N None

2 Even new s1 F N N N N N N N N N N N N Collide

3 Morn new s2 F N N N N N N N N N N N N Hit

4 Mph mod s2 M N N N N N N N N N N N N Collide

5 Even new od M N N N N N N N N N N N N Hit

Legend:

Outcm: Outcome TM: Time

ALC: Alcohol

MT: Mountain

PUD: Puddle

SPY: Slippery

TK: Truck

LT:Lost traction

OS: Over steer

LS:Loose surface

LC: Lost concentration

PH: Phone

CT: Crash Type

Vehyr:Vehicle manufacture year. Details of label in Appendix.

DrvAge:Driver’s age. Details of label in Appendix.

Note: Refer to Appendix for the classification and definition of the label used in the table.

Page 187: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

156 CHAPTER 6. RESULTS

Table 6.4 shows the top five rules for the low severity level.

Table 6.4: The strongest rules for low level.

Rules Outcm

TM Vehyr DrvAge Gender Alc Tree Mt LT Fog Pud LS Slip TK OS LC PH CT

1 Mph mod m1 M N N N N N N N N N N N N Hit

2 Even mod m2 F N N N N N N N N N N N N Hit

3 Even mod s1 F N N N N N N N N N N N N Collide

4 Morn mod s2 M N N N N N N N N N N N N Hit

5 Night mod s1 F N N N N N N N N N N N N None

Legend:

Outcm: Outcome TM: Time

ALC: Alcohol

MT: Mountain

PUD: Puddle

SPY: Slippery

TK: Truck

LT:Lost traction

OS: Over steer

LS:Loose surface

LC: Lost concentration

PH: Phone

CT: Crash Type

Vehyr:Vehicle manufacture year. Details of label in Appendix.

DrvAge:Driver’s age. Details of label in Appendix.

Note: Refer to Appendix for the classification and definition of the label used in the table.

Page 188: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.2. RELATIONSHIPS OF ATTRIBUTES 157

Table 6.5 presents the rules with highest strength for the medium severity

level.

Table 6.5: The strongest rules for medium level.

Rules Outcm

TM Vehyr DrvAge Gender Alc Tree Mt LT Fog Pud LS Slip TK OS LC PH CT

1 Even mod m2 M Y N N N N N N N N N N N Roll

2 Morn new m1 M N N N N N N N N N N N N Roll

3 Aft mod s2 M N N N N N N N N N N N N Roll

4 Eph new m2 M N N N N N N N N N N N N Roll

5 Even new m2 M N N N N N N N N N N N N Roll

Legend:

Outcm: Outcome TM: Time

ALC: Alcohol

MT: Mountain

PUD: Puddle

SPY: Slippery

TK: Truck

LT:Lost traction

OS: Over steer

LS:Loose surface

LC: Lost concentration

PH: Phone

CT: Crash Type

Vehyr:Vehicle manufacture year. Details of label in Appendix.

DrvAge:Driver’s age. Details of label in Appendix.

Note: Refer to Appendix for the classification and definition of the label used in the table.

Page 189: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

158 CHAPTER 6. RESULTS

Table 6.6 lists the rules with the highest strength for the high severity level.

Table 6.6: The strongest rules for high level.

Rules Outcm

TM Vehyr DrvAge Gender Alc Tree Mt LT Fog Pud LS Slip TK OS LC PH CT

1 Mph new m2 M Y N N N N N N N N N N N Hit

2 Aft new m2 M N N N N N N N N N N N N Collide

3 Aft new od F N N N N N N N N N N N N Hit

4 Night new m2 M Y N N N N N N N N N N N Roll

5 Night new m2 M Y Y N N N N N N N N N N Hit

Legend:

Outcm: Outcome TM: Time

ALC: Alcohol

MT: Mountain

PUD: Puddle

SPY: Slippery

TK: Truck

LT:Lost traction

OS: Over steer

LS:Loose surface

LC: Lost concentration

PH: Phone

CT: Crash Type

Vehyr:Vehicle manufacture year. Details of label in Appendix.

DrvAge:Driver’s age. Details of label in Appendix.

Note: Refer to Appendix for the classification and definition of the label used in the table.

Page 190: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.2. RELATIONSHIPS OF ATTRIBUTES 159

Table 6.7 lists the rules for the highest severity level.

Table 6.7: The strongest rules for highest level.

Rules Outcm

TM Vehyr DrvAge Gender Alc Tree Mt LT Fog Pud LS Slip TK OS LC PH CT

1 Morn mod yg M Y Y N N N N N N N N N N Collide

2 Aft new s2 F N N N N N N N N N N N N Roll

Legend:

Outcm: Outcome TM: Time

ALC: Alcohol

MT: Mountain

PUD: Puddle

SPY: Slippery

TK: Truck

LT:Lost traction

OS: Over steer

LS:Loose surface

LC: Lost concentration

PH: Phone

CT: Crash Type

Vehyr:Vehicle manufacture year. Details of label in Appendix.

DrvAge:Driver’s age. Details of label in Appendix.

Note: Refer to Appendix for the classification and definition of the label used in the table.

Page 191: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

160 CHAPTER 6. RESULTS

The traffic simulator is not able to simulate all of the factors listed in Table

6.2, thus, another set of rules are simulated for the simulator. The factors for

the decision table are selected on the basis that is suitable for the simulator

to perform. Table 6.8 shows the list of rules selected for the simulator.

Table 6.8: The strongest rules for simulation.

Rules Outcome

Time Veh yr Driver age Alcohol Wet Gravel Kangaroo Gutter Crash type

1 Even mod yg N N N N N Hit OR Collision OR Roll

2 Aft mod od N Y N N N Skid OR none

3 Eph mod m2 N N Y N N Hit OR none

4 Even mod yg N N N Y N Hit or None

5 Even old yg N N N Y N Collide or None

6 Even mod s2 N Y N N Y None

Legend:

Note: Refer to Appendix for the classification and definition of the label used in the table.

Each row in Table 6.8 contains a rule with each rule comprising of factors

that will be used to configure the parameters in the simulator. The last column

lists the crash type that will occur with the factors stated. The list of possible

types of crashes are hit, collision and off road (skid or roll). The rules are used

in the traffic simulator for validation which is discussed in the next section.

6.3 Rule validation

This section discusses the accuracy validation of the rules obtained. These rules

are verified with the simulator and accuracy measurement due to the limited

amount of information the simulator can accept as input. The validation is

performed based on the accuracy of classification with the rules obtained.

Page 192: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.3. RULE VALIDATION 161

6.3.1 Validation with a traffic simulator

This section presents the validation results using a traffic simulator. Test cases

are designed and used to verify the rules obtained from rough set analysis.

Each test case consists of the aim of the test, inputs into the simulator, the

expected outcome and the actual outcome. The total number of test cases

is seven and each test case verifies the rule based on the number of specified

types of crashes at the end of the simulations.

The results obtained from the rough set analysis are considered to be ac-

curate when the number of crashes is within the defined threshold and meets

the hypothesis defined. Table 6.9 presents the expected outputs or results for

each test case and 6.10 presents the actual results obtained with each test case

that was simulated in the traffic simulator.

Table 6.9: Test cases results - Expected output.

# ExpOp Crash Accu.*(%)

1 Hit or none 56,44

2 Skid or None 25,75

3 Hit or None 33.33,66.67

4 Hit or None 66.67, 33.33

5 Collide or none 50,50

6 None 100

7 Collide or hit 66.67, 33.33

8 Hit or none 50,50

9 Collide or hit or none 44.44,33.33, 22.22

10 Hit, collide, skid, none 18.18,36.36,9.09,36.36

*Crash Accu = accuracy of percent of crashes.

ConV = Considered valid.

ExpOp = Expected output.

Page 193: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

162 CHAPTER 6. RESULTS

Table 6.10: Test cases results - Actual ouput.

# ActOp Crash Count(%) Status

1 Off road or None 92.56, 7.43 ConV

2 Off road or None 18.18, 81.81 Success

3 Off road or None 16.11, 83.88 Success

4 Off road or None 60.33,39.66 Success

5 Off road or none 83.05,16.94 ConV

6 Off road or none 14.46, 85.53 Fail

7 Hit 100 ConV

8 Off rd or none 62.40, 37.60 ConV

9 Off road or none 36.36, 63.63 ConV

10 Off road or none 97.52,2.48 Fail

*Crash Accu = accuracy of percent of crashes.

ConV = Considered valid.

ActOp = Acutal output.

Page 194: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.4. IDENTIFY SIGNIFICANT FACTORS 163

The first column of Table 6.9 presents the test case number while the second

column lists the expected types of crash along with an estimated accuracy of

obtaining the crash type. The first column of Table 6.10 presents the test case

number while the following column lists the actual types of crash obtained

and the percentage of the number of related crashes. The last column states

whether the test case is considered successful with either a Success or Fail

label.

6.3.2 Accuracy Measurement validation

Table 6.11 shows the statistical information obtained from the accuracy val-

idation. The table consists of the number of objects used, the accuracy and

coverage for each cost group and the overall information at the end of the

table.

Table 6.11: The statistical information from accuracy measurement.Predicted

Actual

Costgrp No of obj Accuracy Coverage

Vlow 933 0.601 0.546

Low 1634 0.674 0.558

Med 153 0.437 0.474

High 25 0.286 0.28

VHigh 0 0 0

Total number of objects 2747

Total accuracy 0.636

Total coverage 0.545

6.4 Identify Significant factors

This section presents the results to the process that identifies the significant

contributing factors. The aim of the process is to identify the significant con-

tributing factors that affect severity risk of crashes on road curves. The iden-

Page 195: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

164 CHAPTER 6. RESULTS

tification process is based on the rules obtained from the rough set analysis

process.

6.4.1 The factors

These factors are evaluated with the ClassifierSubsetEval with the Best first

search method. The search is a forward direction and the total number of

subsets evaluated is 128. Further evaluation produce a list of selected factors.

The evaluation of the attributes of the input file returned six significant

factors. The significant factors identified are time, vehicle age, driver age,

tree, puddle and crash type.

6.5 Understanding crash severity

This section presents the rules obtained from the significant factors using rough

set analysis. This process is to determine the minimum combination of con-

tributing factors that can determine the severity level. The analysis process

generated 460 rules and they are classified into cost groups and severity level.

The details are explained in the following sections.

6.5.1 The rules

These set of rules are generated based on the significant factors from the previ-

ous process. The rules are generated to understand the relationship of factors

using minimum number of factors. The rules are sorted in ascending order

with reference to the support count. In addition, quality filtering is performed

on the set of rules to ensure the rules are refined. Table 6.12 shows the five

strongest rules from the rule set.

Each row represents a rule and it is read across the column with an invisible

Page 196: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.5. UNDERSTANDING CRASH SEVERITY 165

Table 6.12: The strongest rules generated based on the significant factors.

Rules Outcome Rel supt(%)

Time Veh yr Driver age Tree Puddle Crash type Cost group

1 Aft mod m2 N N Hit C1 OR C2 OR C3 18.18,72.72,9.09

2 Even mod yg N N Collide C1 OR C2 22.72, 77.27

3 Even mod s1 N N Hit C1 OR C2 OR C3 22.72, 72.72, 4.54

4 Aft mod od N N Hit C1 OR C2 42.85,57.14

5 Mph mod s1 N N None C1 OR C2 OR C3 52.38, 42.85, 4.76

Legend:

Note: Refer to Appendix B for the classification and definition of the label used in the table.

AND between each factors of the rule. Each rule may have more than one cost

group. An example on reading the rule: the first rule is read as Time is

afternoon AND vehicle is manufactured between 1991 to 2000 AND driver age

is between 30 to 39 years old AND no tree involved AND no puddle AND

crash type is hit object. The outcome is that the rule is classified into one of

the cost groups: C1 or C2 or C3. The relative support for each cost group is:

18.18%, 72.72% and 9.09%.

The rules are further filtered into their respective cost group. The following

tables 6.13, 6.14, 6.15, 6.16, and 6.17 list the strongest rule for each severity

level. Each row represents a rule and the rules are read with an invisible AND

in between each factors.

Table 6.13: The strongest rules generated based on the significant factors for

the lowest severity level.Rules

Time Veh yr Driver age Tree Puddle Crash type

1 Eph new od N N None

2 Night mod od N N Hit

3 Even new s1 N N Collide

4 Eph new od N N Hit

5 Even new od N N Hit

Legend:Note: Refer to Appendix B for the classification and definition of the label used in the table.

Page 197: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

166 CHAPTER 6. RESULTS

Table 6.14 presents the rules for low severity level.

Table 6.14: The strongest rules generated based on the significant factors for

the low severity level.Rules

Time Veh yr Driver age Tree Puddle Crash type

1 Eph new m1 Y N Hit

2 Night new od N N Collide

3 Even new od N N Hit

4 Eph new m1 Y N Collide

5 Night new yg N N Roll

Legend:Note: Refer to Appendix B for the classification and definition of the label used in the table.

Table 6.15 presents the rules for medium severity level.

Table 6.15: The strongest rules generated based on the significant factors for

the medium severity level.Rules

Time Veh yr Driver age Tree Puddle Crash type

1 Morn new s2 Y N Collide

2 Eph new m2 Y N None

3 Night new s1 N N Roll

4 Aft new m2 N N Roll

5 Morn new yg Y N Hit

Legend:Note: Refer to Appendix B for the classification and definition of the label used in the table.

Table 6.16 lists the rules for high severity level.

Table 6.16: The strongest rules generated based on the significant factors for

the high severity level.Rules

Time Veh yr Driver age Tree Puddle Crash type

1 Eph mod s1 Y N Roll

2 Mph new od N N Hit

Legend:Note: Refer to Appendix B for the classification and definition of the label used in the table.

Table 6.17 presents the rules for the highest severity level.

Page 198: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

6.6. SUMMARY 167

Table 6.17: The strongest rules generated based on the significant factors for

the highest severity level.Rules

Time Veh yr Driver age Tree Puddle Crash type

1 Aft new s2 N N Roll

2 Morn mod yg Y N Collide

Legend:

Note: Refer to Appendix B for the classification and definition of the label used in the table.

6.6 Summary

This chapter presents the results obtained from each process of the approach.

The overall aim is to identify significant contributing factors, their dependen-

cies and the decision rules. Data mining techniques such as text mining and

rough set analysis are employed to obtain significant contributing factors. The

data is ‘cleaned’ and prepared before rough set analysis is implemented.

Text mining technique is used to identify contributing factors from inci-

dent descriptions in each crash record. The new contributing factors for road

curves are tree, embankment, gravel, pole, gutter, loss control, wet road, dirt,

kangaroo, truck, lost traction and fog. The contributing factors are later used

as attributes for the decision table.

Rough set analysis produced a set of rules which are classified into different

crash severity risk levels. The rules determine the dependency or relationships

between contributing factors. The rules are selected based on the strength

which is represented as the support count from the statistical information of

the rules. The strength of the rule is measured to avoid using rules blindly

and also to indicate the significant attributes or contributing factors.

The rules obtained are validated based on their accuracy. The significant

contributing factors obtained are time, vehicle age, driver age, tree, puddle and

crash type. Rules are generated based on the significant contributing factors

and used to observe the effect on crash severity.

Page 199: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

168 CHAPTER 6. RESULTS

The results from the processes are presented in this chapter, while the

analysis and discussion of the outcomes will be discussed in the next chapter.

Page 200: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 7

Analysis and Discussion

Chapter Overview

This chapter will continue with the analysis of the results presented in the

previous chapter. Followed by a review of the research findings and whether

they have adequately addressed the research questions.

7.1 Analysis of results

This section discusses the interpretation of the results obtained in each process

of the approach. The flow of the analysis follows the approach processes.

7.1.1 Factors from past crash records

Text mining technique is used to analyse crash records to discover the con-

tributing factors for a crash. The factors identified are presented in the Re-

sults chapter. In order to ensure that the factors are curve-related crashes, a

comparison is made with factors identified from crashes that are not curve re-

lated. The factors will only be considered related to road curve crashes when

they do not appear in the non-curve related crashes list. Contributing fac-

tors for curve-related crashes are trees, lost of traction, foggy conditions,

puddle, loose surface, slippery surface, over steer, phone, mountain.

169

Page 201: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

170 CHAPTER 7. ANALYSIS AND DISCUSSION

7.1.2 Relationships of contributing factors

This section discusses the interpretation of the rules obtained for the top five

strongest rules and the rules for each cost group.

Rough set analysis produces a set of rules which determine the dependency

or relationship between the contributing factors. The rules are selected based

on strength therefore, high strength rules are selected as it affects prediction

accuracy.

An overall view of the main six rules is listed with the highest support

count. The next view presents the top five rules for each severity level. This

is in order to obtain a better analysis pattern for each severity level.

7.1.2.1 Overall view rule analysis

The combination of contributing factors for the strongest rule is related to the

crash cost when the driver is aged between 17 and 25 years old, driving in

a vehicle that is manufactured between 1991 and 2000, driving between the

hours of 7 pm and 12 am, had no alcohol consumption and is involved in a fixed

object collision. The outcome of this rule is the possible cost classification of

lowest, low, medium and high. Low cost group has the highest relative support

of 80%. The relative support is comparative to the total support count. The

total support count for this rule is 30.

The second strongest rule in the list corresponds to a driver between 25

and 29 years old, in a vehicle manufactured between 1991 and 2000, driving

between 7pm and 12am and had no alcohol consumption but the driver is not

involved in any crash. The total support count is 19 and the highest relative

support is 58.8% which belongs to the low cost group.

The third strongest rule states that a driver between 30 and 39 years old,

in a vehicle manufactured between 1991 and 2000, driving between 7pm and

12am and had no alcohol consumption is involved with a fixed object collision.

Page 202: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 171

The total support count is 15 and the highest relative support is 60% which

belongs to the low cost group.

The next rule on the lists states that a driver between 60 and 100 years old,

in a vehicle manufactured between 1991 and 2000, driving between 9 am and

12 pm and had no alcohol consumption is not involved in any crash. The total

support count is 14 and the highest relative support is 57.14% which belongs

to the low cost group.

The fifth rule has the combination of a driver between 30 and 39 years old,

in a vehicle manufactured between 1991 and 2000, driving between 7pm and

12am and had no alcohol consumption is not involved in any crash. The total

support count is 13 and the highest relative support 69.23% which belongs to

the low cost group.

The last rule listed in the table has the combination of a driver between 25

and 29 years old, in a vehicle manufactured between 1991 and 2000, driving

between 7pm and 12am and had no alcohol consumption but is in involved in

a collision. The total support count is 13 with a highest relative support of

76.92% which belongs to the low cost group.

The general observations of this set of rules are:

• Most of the rules have vehicles manufactured between 1991 and 2000

and that could be in relation to when the data was collected. The data

was collected between 2003 and 2006, which means the vehicles were

manufactured before 2003.

• Age groups of 17 to 25 and 30 to 39 years old have a higher count.

• Most drivers who are 17 to 25 years old are involved in a crash between

7 pm to 12 am.

• The low cost group is found to be most common.

Page 203: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

172 CHAPTER 7. ANALYSIS AND DISCUSSION

• The most common time for car crashes is between 6 am and before 12

pm.

This overall view does not provide a complete information of the patterns

amongst the data. Therefore, more rules are used to determine a detailed

pattern for each severity level.

7.1.2.2 Lowest severity rule analysis

Five rules are selected based on the support count. Rules with higher support

count are selected for analysis.

The first rule listed states that a female driver between 30 and 39 years

old, in a vehicle manufactured between 1991 and 2000, driving between 6am

and 9am, had no alcohol consumption and no other related factors is present.

The driver is not involved in any crash. The total support count is 4.

The second rule states that a female driver between 40 and 49 years old,

in a vehicle manufactured between 2001 and 2005, driving between 7pm and

12am, had no alcohol consumption and no other related factors is present. The

driver is involved in a collision. The total support count is 4.

The third rule has the combination of a female driver between 50 and 59

years old, in a vehicle manufactured between 2001 and 2005, driving between

9am and 12pm, had no alcohol consumption and no other related factors is

present. The driver is involved in a collision. The total support count is 3.

The fourth rule has the combination of a male driver between 50 and 59

years old, in a vehicle manufactured between 1991 and 2000, driving between

9am and 12pm, had no alcohol consumption and no other related factors is

present. The driver is involved in a collision. The total support count is 3.

The fifth rule on the list states that a male driver between 60 and 100 years

old, in a vehicle manufactured between 2001 and 2005, driving between 9am

Page 204: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 173

and 12pm, had no alcohol consumption and no other related factors is present.

The driver is involved in a collision. The total support count is 3.

The observations for the lowest set of rules are:

• The age groups range from mature to older age group(30 to 39, 40 to 49,

50 to 59, and 60 to 100 years old).

• The vehicles are mostly manufactured between 1991 and 2005 and this

is approximately 1 to 15 years old. Most vehicles are manufactured

between 1991 and 2000 as the data was collected between 2003 and 2006.

Therefore, the vehicles are registered before or at the time the data was

collected.

• Most vehicles involved in a crash were manufactured between 2001 and

2005. Considering these new vehicles are being driven by mature age

drivers, the crash cost is the lowest. This could be due to mature drivers

driving at a slower speed therefore damages to vehicles are not as serious

compared to high speed crashes. No alcohol consumption is evident

therefore no impairment is present to increase the crash severity.

• Both male and females drivers are involved with a majority of female

drivers involved in a crash.

• No other factors are present except for the time of crash, year the vehicle

is manufactured and the driver age.

7.1.2.3 Low severity rule analysis

Rules with a higher support count are selected for analysis and the five rules

are selected are listed below.

The first rule states that a male driver between 26 and 29 years old, in a

vehicle manufactured between 1991 and 2000, driving between 6am and 9am,

Page 205: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

174 CHAPTER 7. ANALYSIS AND DISCUSSION

had no alcohol consumption and no other related factors is present. The driver

is involved in a fixed object collision. The total support count is 8.

The second rule states that a female driver between 30 and 39 years old,

in a vehicle manufactured between 1991 and 2000, driving between 7pm and

12am, had no alcohol consumption and no other related factors is present. The

driver is involved in a fixed object collision. The total support count is 8.

The third rule has the combination of a female driver between 40 and 49

years old, in a vehicle manufactured between 1991 and 2000, driving between

7pm and 12am, had no alcohol consumption and no other related factors is

present. The driver is involved in a collision. The total support count is 7.

The fourth rule has the combination of a male driver between 50 and 59

years old, in a vehicle manufactured between 1991 and 2000, driving between

9am and 12pm, had no alcohol consumption and no other related factors is

present. The driver is involved in a fixed object collision. The total support

count is 7.

The fifth rule on the list states that a female driver between 40 and 49

years old, in a vehicle manufactured between 1991 and 2000, driving between

12am and 6am, had no alcohol consumption and no other related factors is

present. The driver is not involved in a crash. The total support count is 6.

Observations for the low set of rules are:

• The drivers are mostly mature drivers. The age ranges from mature to

older age group (26 to 29, 30 to 39, 40 to 49, and 50 to 59 years old).

• The vehicles are manufactured between 1991 and 2005 which is approxi-

mately 1 to 15 years old. This is due to the data collected between 2003

and 2005.

• More vehicles are involved in a crash compared to the lowest set of rules.

• Most crashes occur in the later time of day such as evening and night

Page 206: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 175

time. Poor light affects a driver’s vision and can result in serious mis-

judgement errors in driving.

• Both male and females drivers are involved with a majority of female

drivers involved in a crash.

• No other factors are present except for the time of crash, year the vehicle

is manufactured and the driver age.

7.1.2.4 Medium severity rule analysis

Rules with a higher support count are selected for analysis and the five rules

selected are listed below.

The first rule listed states that a male driver between 30 and 39 years old,

in a vehicle manufactured between 1991 and 2000, driving between 7 pm and

12 am, had consumed alcohol and no other related factors is present. The

driver is involved in rollover type of crash. The total support count is 2.

The second rule states that a male driver between 26 and 29 years old, in

a vehicle manufactured between 2001 and 2005, driving between 9 am and 12

pm, had no alcohol consumption and no other related factors is present. The

driver is involved in rollover type of crash. The total support count is 2.

The third rule has the combination of a male driver between 50 and 59

years old, in a vehicle manufactured between 1991 and 2000, driving between

12 pm and 4 pm, had no alcohol consumption and no other related factors is

present. The driver is involved in rollover type of crash. The total support

count is 2.

The next rule has the combination of a male driver between 30 and 39 years

old, in a vehicle manufactured between 2001 and 2005, driving between 4 pm

and 7 pm, had no alcohol consumption and no other related factors is present.

The driver is involved in rollover type of crash. The total support count is 1.

Page 207: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

176 CHAPTER 7. ANALYSIS AND DISCUSSION

The fifth rule on the list states that a male driver between 30 and 39 years

old, in a vehicle manufactured between 2001 and 2005, driving between 7 pm

and 12 am, had no alcohol consumption and no other related factors is present.

The driver is involved in rollover type of crash. The total support count is 1.

Observations for the medium set of rules are:

• The drivers are mostly mature drivers with the age between 30 and 39

years old.

• Most crashes occur in the later time of day such as evening and night

time. Poor light affects a driver’s vision and can result in serious mis-

judgement errors in driving.

• Most drivers are male.

• Newer vehicles are involved in crashes.

• Most vehicles are involved in rollover crashes which indicate the vehicle

went off the road. Possible causes are speeding or misjudgement of the

curvature of the road due to poor vision or alcohol consumption.

• No other factors are present except for the time of crash, year the vehicle

is manufactured and the driver age. Only the first rule had presence of

alcohol consumption.

7.1.2.5 High severity rule analysis

Rules with a higher support count are selected for analysis and the five rules

selected are listed below.

The first rule listed states that a male driver between 30 and 39 years old,

in a vehicle manufactured between 2001 and 2005, driving between 6 am and 9

am, had consumed alcohol and no other related factors is present. The driver

is involved in a fixed object collision. The total support count is 1.

Page 208: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 177

The second rule states that a male driver between 30 and 39 years old, in

a vehicle manufactured between 2001 and 2005, driving between 12 pm and 4

pm, had no alcohol consumption and no other related factors is present. The

driver is involved in a collision. The total support count is 1.

The third rule has the combination of a female driver between 60 and 100

years old, in a vehicle manufactured between 2001 and 2005, driving between

12 pm and 4 pm, had no alcohol consumption and no other related factors is

present. The driver is involved in a fixed object collision. The total support

count is 1.

The fourth rule has the combination of a male driver between 30 and 39

years old, in a vehicle manufactured between 2001 and 2005, driving between

12am and 6am, had no alcohol consumption and no other related factors is

present. The driver is involved in rollover type of crash. The total support

count is 1.

The fifth rule on the list states that a male driver between 30 and 39 years

old, in a vehicle manufactured between 2001 and 2005, driving between 12am

and 6am, had no alcohol consumption, hit a tree and no other related factors

is present. The driver is involved in a fixed object collision. The total support

count is 1.

In general, the observations for the high set of rules are:

• The drivers are mostly mature drivers with the two age between 30 to

39 and 60 to 100 years old.

• Most crashes occur in the later time of day such as afternoon and night

time. During the night, poor light affects a driver’s vision and can result

in serious misjudgement errors in driving.

• Most drivers are male.

• Vehicles involved in the crashes are newer and they cost more due to the

Page 209: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

178 CHAPTER 7. ANALYSIS AND DISCUSSION

cost of repairs and insurance involved.

• Crashes involving hitting fixed objects are the most common crash type.

The fifth rule states that there is presence of alcohol and a tree. The

presence of alcohol can impair a driver’s reaction time. Misjudgement

of the curvature of the road and overestimating the suitable speed to

negotiate the curve safely may have lead to the crash.

7.1.2.6 Highest severity rule analysis

Rules with the higher support count are selected for analysis and two rules are

selected and due to a small data set for analysis only two rules are provided

below.

The first rule listed states that a male driver between 17 and 25 years old,

in a vehicle manufactured between 1991 and 2000, driving between 9 am and

12 pm, had consumed alcohol, collide with a tree and no other related factors

are present. The driver is involved in a collision. The total support count is 1.

The second rule states that a female driver between 50 and 59 years old, in

a vehicle manufactured between 2001 and 2005, driving between 12 pm and 4

pm, had no alcohol consumption and no other related factors are present. The

driver is involved in a rollover type of crash. The total support count is 1.

The observations for the highest set of rules are:

• The age groups of drivers are between 17 and 25 years old and 50 and

59 years old.

• Most crashes occur during the day such as in the morning and afternoon.

Visibility is not an issue however, sun glare could affect a driver’s vision.

• Both male and females drivers are involved.

Page 210: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 179

• A combination of new and older vehicles between 10 and 15 years old

were involved.

• Alcohol consumption and environmental contributing factors are present

in the top rule for this level. This indicates that the vehicle went off road

and collided with a fixed object.

Both these contributing factors as well as being listed in the high set of

rules may have increased the severity level of the crash. Further inves-

tigation shows the combination of the rules are similar with the others

however, the one point of difference is the age of the driver. Young drivers

appear only in this set of rules. Because young drivers tend to be more

inexperienced and reckless in their driving, a combination of high speed

and judgement error has increased the crash cost and severity.

With the discussion for each severity level, the overall or common patterns

discovered are:

• Male drivers tend to be involved in crashes that incur a higher cost range.

• Most female drivers involved in a crash are mature aged drivers (30 to

39, 40 to 49, 50 to 59 years old).

• Most crashes occur in low-visibility conditions which affect a driver’s

vision and can lead to misjudgement of the road and travel speed.

• The medium severity level involving mature age drivers includes rollover

crashes and one in the highest severity level.

• Young drivers are involved in crashes that incur the highest cost and

severity. This demonstrates that young drivers face a higher crash sever-

ity than drivers in other age groups.

• Alcohol effects control and judgement when driving as well as prolonging

reaction time. The most common type of crashes are collisions involving

fixed objects or animals. This may result in serious damage to the vehicle.

Page 211: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

180 CHAPTER 7. ANALYSIS AND DISCUSSION

• New vehicles can be costly to repair in the event it is damaged in a

crash. Their body structures are designed as many major parts rather

than minor individual parts. This new design results in having to replace

a whole major section of a car when only a minor area is damaged. For

example, if a new car gets rear ended, the whole of the back body has to

be replaced as the vehicle is built to crumble on impact so as to prevent

injury to the occupant. Thus, the whole back section of the vehicle will

need to be replaced hence incurring high cost repairs.

• The severity of a crash increases when driving a new vehicle because

drivers are unfamiliar with the vehicle. The most common age groups of

drivers involved are between 30 to 39 years old and 50 to 59 years old.

• The most common time for a crash for lowest and low severity levels are

in the evening hours. As for the medium, high and highest levels, the

most common time is afternoon or evening hours.

By comparing the results obtained for each severity level and the analysis

of the top five rules of all the severity levels, the results differ in some factors.

For example, the most common time for a crash in the top five rules is in the

morning hours however, the most common time for a crash for each severity

level is in the evening. A detailed analysis of the results from each level provides

more information than a general overview of the rules therefore, the difference

in results is revealed.

7.1.3 Rule validation

This section discusses the validation of the accuracy from the rules obtained.

These rules are verified with the simulator and measured for accuracy due to

the limited amount of information the simulator can process. The validation

is performed based on the accuracy of classification with the rules obtained.

Page 212: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 181

7.1.3.1 Dynamic validation

Ten test cases are simulated with the traffic simulator and the results consist

of three status: fail, success or considered valid. A test case is considered a

‘success’ when the type of crashes and the percentage of crashes are similar. On

the other hand, a test case is considered a ‘fail’ case when there are differences

in the crash outcomes. The considered ‘valid’ cases are when the type of

crashes are similar with some difference in the percentage of crashes.

Test cases 1, 5, 7, 8 and 9 are cases that are considered valid due to the

difference in the percentage of crashes. For example, the expected crash out-

come for test case 1 is object collision or no crash. The actual crash outcome is

going off road or no crash. This is considered valid as hitting objects will either

occur when the vehicle travels off the road or collides with objects on the road

side or when the vehicle hits an animal or object on the road. The simulator

represents two possible scenarios with two different terms, off road crash and

object collision. However, the percentage of the crash count is different so this

test case is only considered valid.

Test cases 6 and 10 failed because they had a different expected outcome.

Test case 6 failed due an additional off road crash outcome in the actual sim-

ulation. As for test case 10, there was a missing car skidding crash for the

actual outcome. Therefore, both test cases failed.

Test cases 2, 3 and 4 are successful due to similar outcomes. For example,

test case 4 expected a 66.67% chance of object collision. The simulator pro-

duced a 60.33% chance for off road crashes. The number of crashes has a 10%

difference hence this test case is considered successful.

For test case 5, the results do not indicate a collision because the simulator

does not have the ability to simulate this type of crash. The closest scenario

showcased by the simulator was an off road crash. This implies that the

simulator is able to generate the expected outcome but in a different context.

Page 213: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

182 CHAPTER 7. ANALYSIS AND DISCUSSION

In test case 10, the results generated indicate no presence of collision, skid-

ding and spinning occurs however, the off road crash did occur. For this test

case, the simulator is not able to reflect an accurate result as (1) the simula-

tor is not able to simulate a spin and (2) out of the three valid crashes, the

simulator is only able to generate one type of crash.

The overall accuracy of the rules is based on the number of success and fail

test cases. In general, the simulation results indicate that 80% of the rules from

the rough set analysis are similar to the results obtained from the simulator.

7.1.3.2 Accuracy measurement validation

The criterion for this validation uses the statistical information collected dur-

ing the analysis process such as the accuracy and coverage. The classification

power can be determined from the classification accuracy observed. The ac-

curacy is compared and accepted when the accuracy difference is within the

defined threshold. The accuracy threshold defined is 80% with an allowance

of ±10%.

The reason for the zero values for very high cost group is due to limited

number of records that are classified in this group. The data are divided

randomly into 60 and 40 % and coincidentally there are no records for the

validation data set.

The rules generated from the analysis data set are applied to the valida-

tion data to determine the classification accuracy. The new data set contains

records for all cost groups and the accuracy obtained from the new data set

has improved. The classification accuracy obtained is 63.3% with a 54.5% cov-

erage. This is acceptable as the accuracy is within the threshold defined, 70%

± 10.

The accuracy measurement shows that it is lower than the traffic simulator

validation by 16.3%. One of the possible reason is that the data used for

Page 214: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 183

validation is a random 40% and may not contain data related to certain crash

severity levels. When no data is available for a severity level, the average

accuracy value decreases thus, the lower accuracy.

7.1.4 Identify Significant factors

The significance of the contributing factors is the measure of the presence in

the derived rules (Wong & Chung, 2007). The contributing factors are a result

of the text mining process. The data is combined for meaningful, detailed

and complete results. The total number of contributing factors listed in the

decision table is 17 attributes.

The 17 attributes are represented as columns across the table and are com-

posed of: gender, age, driving experience, manufacture year of vehi-

cle, alcohol level, time of incident, tree, mountain, lost of traction,

foggy conditions, puddle, loose surface, slippery surface, truck, over

steer, concentration, phone and type of crash.

The presence of a contributing factor is calculated based on the calculated

percentage of the presence in each rule and divided by the total percentage of

the selected attributes for each crash severity level. The number of factors is

reduced to six factors.

7.1.5 Understanding crash severity

This section discusses the rules obtained from the set of significant factors using

rough set analysis. This process is to determine the minimum combination of

contributing factors that can establish the severity level. The overall rules are

viewed with the top five rules listed and their highest support count. Another

view presents the top five rules for each severity level. This is to obtain a

better pattern analysis for each severity level.

Page 215: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

184 CHAPTER 7. ANALYSIS AND DISCUSSION

7.1.5.1 Overall view rule analysis

The combination of contributing factors for the strongest rule relates to the

crash cost is when it is a driver who is aged between 30 and 39 years old, driving

in a vehicle that is manufactured between 1991 and 2000, driving between the

time 12pm and 4pm, no occurrence of hitting a tree and absence of a puddle.

The driver is involved in a fixed object collision. The outcome of this rule is

the possible cost classification of lowest, low, and medium. Low cost group has

the highest relative support of 72.72%. The relative support is comparative to

the total support count. The total support count for this rule is 22.

The second strongest rule in the list corresponds to a driver between 17

and 25 years old, in a vehicle manufactured between 1991 and 2000, driving

between 7pm and 12am, no occurrence of hitting a tree or the absence a puddle.

The driver is involved in a collision. The total support count is 22 and the

highest relative support is 77.27% and this belongs to the low cost group.

The third strongest rule states that a driver between 40 and 49 years old,

in a vehicle manufactured between 1991 and 2000, driving between 7pm and

12am, no occurrence of hitting a tree or the absence of a puddle. The driver

is involved with in a fixed object collision. The total support count is 22 and

the highest relative support is 72.72% and this belongs to the low cost group.

The fourth rule on the lists states that a driver between 60 and 100 years

old, in a vehicle manufactured between 1991 and 2000, driving between 12pm

and 4pm, no occurrence of hitting a tree or the absence of a puddle. The driver

is involved with hit object crash type. The total support count is 21 and the

highest relative support is 57.14% and this belongs to the low cost group.

The fifth rule has the combination of a driver between 40 and 49 years old,

in a vehicle manufactured between 1991 and 2000, driving between 6am and

9am, no occurrence of hitting the tree and absence of puddle. The driver is not

involved in any crash. The total support count is 21 and the highest relative

Page 216: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 185

support 52.38% and that belongs to the low cost group.

The general observations of the rules are:

• Most the rules have vehicles manufactured between 1991 and 2000 and

that could be due to the time period when the data was collected. The

data was collected between 2003 and 2006, which means that most of

the vehicles of the clients were manufactured before year 2003.

• Age group of between 40 and 49 years old has a higher count.

• The low cost group is found to be most common.

• The most common time for car crashes is in the later hours of the day

between 7 pm and 12 pm..

This overall view does not provide complete information on the patterns

amongst the data. Therefore, more rules are used to determine the detailed

pattern for each severity level.

7.1.5.2 Lowest severity rule analysis

Five rules are selected based on the support count. Rules with a higher support

count are selected for analysis.

The first rule listed states that a driver between 60 and 100 years old, in a

vehicle manufactured between 2001 and 2005, driving between 4pm and 7pm,

and no indication of hitting a tree or the presence of a puddle. The driver is

not involved in any crash. The total support count is 4.

The next rule states that a driver between 60 and 100 years old, in a vehicle

manufactured between 1991 and 2000, driving between 12 am and 6 am, and no

indication of hitting a tree or the presence of a puddle. The driver is involved

in a fixed object collision. The total support count is 4.

Page 217: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

186 CHAPTER 7. ANALYSIS AND DISCUSSION

The third rule has the combination of a driver between 40 and 49 years

old, in a vehicle manufactured between 2001 and 2005, driving between 7 pm

and 12 am, and no indication of hitting a tree or the presence of puddle. The

driver is involved in a collision. The total support count is 4.

The next rule has the combination of a driver between 60 and 100 years

old, in a vehicle manufactured between 2001 and 2005, driving between 4 pm

and 7 pm, and no indication of hitting a tree or the presence of a puddle. The

driver is involved in a fixed object collision. The total support count is 3.

The fifth rule on the list states that a driver between 60 and 100 years old,

in a vehicle manufactured between 2001 and 2005, driving between 7pm and

12am, and no indication of hitting a tree or the presence of a puddle. The

driver is involved in a fixed object collision. The total support count is 3.

In general, the observations for the lowest set of rules are:

• Most drivers are between 60 and 100 years old.

• Vehicles are mostly manufactured between 2001 and 2005 which is be-

tween 1 to 5 years old. These vehicles are relatively new.

• Considering these new vehicles are being driven by older age drivers, the

crash cost is the lowest. This could be due to them driving at a slower

speed therefore damages to vehicles are not as serious compared to high

speed crashes.

• Most crashes occur in the later time of day therefore, poor light affects a

driver’s vision and can result in serious misjudgement errors in driving.

• No other factors are present except for the time of crash, year the vehicle

is manufactured and the driver age.

Page 218: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 187

7.1.5.3 Low severity rule analysis

Rules with a higher support count are selected for analysis and the five rules

selected are listed below.

The first rule listed states that a driver between 26 and 29 years old, in

a vehicle manufactured between 1991 and 2000, driving between 4 pm and 7

pm, hitting a tree and with absence of a puddle. The driver is involved in a

fixed object collision. The total support count is 3.

The second rule states that a driver between 60 and 100 years old, in a

vehicle manufactured between 1991 and 2000, driving between 12 am and 6

am, and no indication of hitting a tree or the presence of a puddle. The driver

is involved in a collision. The total support count is 3.

The third rule has the combination of a driver between 60 and 100 years

old, in a vehicle manufactured between 1991 and 2000, driving between 7 pm

and 12 am, and no indication of hitting a tree or the presence of a puddle.

The driver is involved in a fixed object collision. The total support count is 3.

The fourth rule has the combination of a driver between 25 and 29 years

old, in a vehicle manufactured between 1991 and 2000, driving between 4 pm

and 7 pm, hitting a tree and with absence of a puddle. The driver is involved

in a collision. The total support count is 3.

The fifth rule on the list states that a driver between 17 and 25 years old,

in a vehicle manufactured between 1991 and 2000, driving between 12 am and

6 am, and no indication of hitting a tree or the presence of a puddle. The

driver is involved in a rollover crash. The total support count is 3.

In general, the observations for the low set of rules are:

• The main age groups are between 25 and 29 and 60 and 100 years old.

• Most of the vehicles were manufactured between 1991 and 2000 and that

could be due to the period when the data was collected. The data was

Page 219: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

188 CHAPTER 7. ANALYSIS AND DISCUSSION

collected between 2003 and 2005.

• Most crashes occur in the later time of day such as evening and night

time. Poor light affects a driver’s vision and can result in serious mis-

judgement errors in driving.

• Crashes involving hitting fixed objects are the most common crash type

and trees are the most common fixed object collisions. Evening peak

hours of between 4pm and 7pm, crashes involving hitting fixed objects

involve drivers between 25 and 29 years old. Based on the time of the

crash, drivers could be driving home from work. Drivers could suffer

from fatigue from a full day at work and doze off at the wheel, run off

the road and collide with a tree. Due to the high volume of traffic at

the time drivers will be driving at a slower speed therefore damages to

vehicles are not as serious compared to high speed crashes.

7.1.5.4 Medium severity rule analysis

Rules with higher support count are selected for analysis and five rules selected

are listed below.

The first rule listed states that a driver between 50 and 59 years old, in a

vehicle manufactured between 1991 and 2000, driving between 9 am and 12

pm, hitting a tree and in the absence of a puddle. The driver is involved in a

collision. The total support count is 1.

The next rule states that a driver between 30 and 39 years old, in a vehicle

manufactured between 2001 and 2005, driving between 4 pm and 7 pm, hitting

a tree and in the absence of a puddle. The driver is not involved in a crash.

The total support count is 1.

The third rule has the combination of a driver between 40 and 49 years

old, in a vehicle manufactured between 2001 and 2005, driving between 12 am

Page 220: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 189

and 6 am with no indication of hitting a tree or the presence of a puddle. The

driver is involved in rollover collision. The total support count is 1.

The next rule has the combination of a driver between 30 and 39 years old,

in a vehicle manufactured between 2001 and 2005, driving between 12 pm and

4 pm with no indication of hitting a tree or the presence of a puddle. The

driver is involved in rollover collision. The total support count is 1.

The fifth rule on the list states that a driver between 17 and 25 years old,

in a vehicle manufactured between 2001 and 2005, driving between 9 am and

12 pm, hitting a tree and in the absence of a puddle. The driver is involved in

a fixed object collision. The total support count is 1.

Observations for the medium set of rules are:

• The drivers are mostly mature drivers with the age between 30 and 39

years old.

• Most crashes occur in the later time of day such as evening time.

• Poor light affects a driver’s vision and can result in serious misjudgement

errors in driving.

• Most vehicles involved in a crash are relatively new.

• Most vehicles are involved in roll over crashes which indicate the vehicle

went off the road. Possible causes are speeding or misjudgement of the

curvature of the road due to poor vision as most crashes occur across the

late afternoon and night hours.

• Crashes that involve hitting a fixed object such as a tree occur during

the day between 9 am and 12pm.

Page 221: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

190 CHAPTER 7. ANALYSIS AND DISCUSSION

7.1.5.5 High severity rule analysis

Rules with higher support count are selected for analysis and five rules selected

are listed below.

The first rule listed states that a driver between 40 and 49 years old, in

a vehicle manufactured between 1991 and 2000, driving between 4 pm and 7

pm, hitting a tree and in the absence of a puddle. The driver is involved in a

roll over crash. The total support count is 1.

The second rule states that a driver between 60 and 100 years old, in a

vehicle manufactured between 2001 and 2005, driving between 6 am and 9 am,

and no indication of hitting a tree or the presence of a puddle. The driver is

involved in a fixed object collision. The total support count is 1.

The observations for the high set of rules are:

• The drivers are mostly mature drivers with the age between 40 to 49 and

60 to 100 years old.

• Most crashes occur during morning and evening peak hours. Morning

peak hours, between 6am and 9am, have high volume of traffic and most

vehicles are travelling at a higher speed to get to work on time. When

a crash occurs, the impact will be higher than vehicles travelling at a

slower speed. The risk of rear-end collision is higher because vehicles are

travelling very close to each other.

• Evening peak hours, between 4 pm and 7pm, have high volume of traffic

and drivers could suffer from fatigue from a full day at work and doze

off at the wheel, run off the road, roll over and collide with a tree.

7.1.5.6 Highest severity rule analysis

Rules with higher support count are selected for analysis and only two rules

are selected which is due to a small data set for analysis and therefore the

Page 222: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 191

limited number of rules generated.

The first rule states that a female driver between 50 and 59 years old, in

a vehicle manufactured between 2001 and 2005, driving between 12 pm and 4

pm, and no indication of hitting a tree or the presence of a puddle. The driver

is involved in a rollover. The total support count is 1.

The second rule listed states that a driver between 17 and 25 years old, in

a vehicle manufactured between 1991 and 2000, driving between 9 am and 12

pm, hitting a tree and with no presence of a puddle. The driver is involved in

a collision. The total support count is 1.

Observations for the highest set of rules are:

• Age groups of drivers between 17 and 25 years old and 50 and 59 years

old.

• Most crashes occur during the day such as in the morning and afternoon.

Visibility is not an issue however, sun glare could affect a driver’s vision.

• Due to a collision with a fixed object and the subsequent roll over, it has

increased the severity level of the crash as they are listed in the high set

of rules but not in the lower severity level.

With the discussion for each severity level, the overall or common patterns

discovered are:

• Most crashes occurred in the evening or night hours.

• Collision with fixed objects i.e. trees are quite common amongst the

rules. There is an interesting combination between this type of crash

and the time it occurs that increases its severity. Collision with a tree in

the morning hours has an increased severity level and this is evident in

the comparison of the rules in the medium and high severity level. All

other factors remain the same except for the time of the crash.

Page 223: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

192 CHAPTER 7. ANALYSIS AND DISCUSSION

• Most drivers aged between 60 and 100 years old face lowest severity

when a crash occurred during the evening and night hours. The severity

increases when the crash occurred during the morning peak hours. In

relation to the lower crash severity in the evening hours can be due to

poor lighting that is not favourable for the drivers to drive at a fast speed

as the visibly is not as clear as compared to the day time. Therefore,

driving at a lower speed reduces the crash severity for a driver. However,

the traffic volume is high during the morning peak hours and drivers can

be impatient or rushing to work or not fully awake from their sleep. The

impatience and rushing to work leads a driver to speed. As for a driver

who is not fully awake has a slow reaction to the surroundings. Hence,

speed and slow reaction time results in a higher crash severity during the

morning peak hours.

7.1.6 Overall analysis of the rules

This set of rules generated using the significant factors provide reliable infor-

mation on the relationship between the contributing factors. The results on the

analysis for each severity level identifies more details of the patterns amongst

the data than the results for the top five rules. Based on this information,

the relationship between the time of the crash, the vehicle manufacture and

tree collision, primarily influences the crash cost and severity. The presence of

other contributing factors will also increase the crash cost and severity depend-

ing on the impact of the crash. The impact of the crash is determined from

the speed the vehicle is travelling and the object the vehicle collides with and

whether any other contributing factors exist and can influence the outcome.

For example, if a puddle of water is present on the road at the time of a crash

and the vehicle is speeding, the outcome and impact of the crash will be high

as the vehicle could have skidded, ran off the road and collided with another

object or vehicle

Page 224: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.1. ANALYSIS OF RESULTS 193

In relation to the contributing factors and the related type of crash involved,

the relationship for each crash type differs from the significant relationship

identified. The hit object crash type have a common relationship between

the contributing factors is new vehicle and older drivers. The crash severity

increases based on the time of the day. Evening hours have lower crash severity

while morning hours have higher crash risk. One of the possible reason is that

older drivers do not tend to speed at evening hours due to poor visibility.

Thus, in general, the vehicle is manufactured and the age of the driver are the

common factors related to hit object crash and at the same time, the time of

the day influence the crash cost or severity.

For the collision type of crash, the time is the common factor among the

rules. Evening hours have lower crash severity while morning hours have higher

crash risk. The common driver age group is between 50 to 59 years old and

this group of drivers are more careful at driving hence the possible lower crash

risk. In addition, hitting a tree also increases the crash severity.

As for the roll over crashes, the common factors are the age and the time of

the crash. The driver age ranges between 30 to 59 years old and the common

time of crash occurred from 12 pm to 4 pm. The factor that influence the

crash severity is the age of the driver. The older the driver, the higher chances

of being involved in a roll over crash and also the crash severity.

In comparison with the significant relationship as seen at the beginning

of the chapter, the set of rules that have more contributing factors, and the

relationship provides more information of the possible causes of a crash. The

significant relationship from the compact information provides a summarized

information of the possible causes of a crash. Both significant relationships are

similar therefore the later significant relationship is preferred to represent the

data.

Page 225: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

194 CHAPTER 7. ANALYSIS AND DISCUSSION

7.2 Discussion

This section discusses about the answers to the research questions, and the

quality of the results obtained. The details of each are discussed in the follow-

ing paragraphs.

7.2.1 Research questions and answers

This subsection identifies whether the results obtained have addressed the

research questions. The research questions and answers are listed as follows.

Q1. What are the factors discovered from the crash descriptions that cause

crashes on road curves?

A1. This question aims to determine the contributing factors for crashes on

road curves using insurance crash records. Text mining is used to identify the

contributing factors by returning a list of keywords. The selection of keywords

is based on the frequency count and keywords with a high frequency count are

selected as contributing factors. However, these keywords are filtered and com-

pared to the factors that are not related to road curves. Only keywords that

do not co-exist with the factors for non-curve related crashes are considered

contributing factors.

Therefore, this research question has been addressed with results obtained

from the text mining process.

Q2. What are the characteristics that influence the severity of a crash?

A2. This second question aims to identify the characteristics of the con-

tributing factors for crashes. Crash severity is represented with the crash cost

in this research context. The crash cost consists of the damages to the ve-

Page 226: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.2. DISCUSSION 195

hicle as well as other objects or vehicles that may be involved in the crash.

Rough set analysis is used to obtain the combination of contributing factors.

The analysis returns a list of rules that are categorised into five severity levels

based on the crash cost. The rules represent the combination of contributing

factors for the crashes and are used to determine patterns amongst the data.

Each severity level has a set of rules and patterns are determined for each level

as well identifying the factors that influence the severity of a crash. Based on

the rules obtained, a significant relationship is made from the combination of

the time of crash, year of manufacture, alcohol consumption and collision with

a tree.

Therefore, this question is addressed with the rules obtained from the rough

set analysis process.

Q3. Which significant factors increase the severity of a crash?

A3. This final question investigates the important contributing factors that

influence the severity level of crashes. The significant contributing factors are

identified using a search algorithm that returns accurate results from the data.

The significant contributing factors are a minimal representation of the data

set. These factors compose of the time of crash, year of manufacture, collision

with a tree, puddle and crash type. These factors influence the severity levels

of a crash therefore, the results obtained have answered the research question.

In order to understand the severity even more, the relationship between the

significant contributing factors is analysed. The relationship is analysed for

each severity level and a common pattern is identified amongst the rules in all

severity levels. The pattern consists of the time of crash, year of manufacture

and collision with a tree. These combinations of significant contributing factors

influence the crash cost as well as the severity level.

Therefore, this question is answered with the list of significant contribut-

Page 227: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

196 CHAPTER 7. ANALYSIS AND DISCUSSION

ing factors and its relationship between the factors. Table 7.1 summarises the

discussion of the research questions to ascertain if they have been addressed.

Table 7.1: A summary of the research questions and answers.Research Question Answer Answered?

1 What are the factors discovered from Identified a list of Yes

the crash descriptions that causes contributing factors

crashes on road curves?

2 What are the characteristics that Identified a combination of contributing Yes

influence the severity of a crash? factors that influence the crash severity.

3 Which significant factors increase Identified a list of significant factors Yes

the severity of a crash? along with the combination of factors

that influence the crash severity.

The set of contributing factors that are significantly related is: time of the

crash, manufacture year of the vehicle, driver age and the involvement of a

fixed object in the collision.

From this set of contributing factors, the year the vehicle is manufactured is

considered to be a major factor that greatly influences the cost as intuitively,

new vehicles incur more cost to repair compared to older vehicles. Thus,

including this factor could bias the assessment of the crash severity, as the

crash cost is the main factor used to assess the severity of a crash. The reasons

for keeping the manufacture year of the vehicle factor are as follows:

• The output of the formal model used (Rough set theory) is a set of

contributing factors indicating the relationships between the factors, as

opposed to individual factors by themselves. The year the vehicle is man-

ufactured can be considered as an individual factor; however, the aim of

this research is to discover the relationship between the contributing fac-

tors. Thus, this factor is considered in terms of its relationship to other

contributing factors to determine crash severity. It would be statisti-

cally wrong to remove one factor from the set as the set is defined as an

unalterable whole. Furthermore, crash severity is based on crash cost,

Page 228: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.2. DISCUSSION 197

and crash cost is defined as the damage cost of vehicles and any other

damaged objects; the cost does not include driver injuries.

• This is an ARC linkage APAI project working in a partnership with an

insurance company to determine the factors affecting the cost of a crash.

Hence, this research aims to use most of the attributes available from

the insurance crash records to discover any possible significant relation-

ships between contributing factors that affect crash severity. Therefore,

scientifically, the year the vehicle is manufactured cannot be removed.

It is also of interest to the insurance company to know the factors and

the relationship between them with the view to adjust their insurance

premium.

• The year the vehicle is manufactured can be used to determine the condi-

tion of a vehicle, which can affect the severity of a crash; a vehicle in poor

condition may be involved in head-on or multiple collisions, resulting in

more severe consequences.

7.2.2 Application of results in road safety

The aim of this research is to identify and understand the combination of

factors from crash descriptions. This is to discover a pattern amongst the data

and to determine the significant factors and combinations that increase crash

severity.

The text mining process identifies the contributing factors from crash de-

scription in the insurance claim records. The results from text mining are

similar to the factors reported from road authorities such as Queensland Trans-

port. Such similarity in contributing factors validates the accuracy of the text

mining approach. Furthermore, new contributing factors were identified which

Page 229: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

198 CHAPTER 7. ANALYSIS AND DISCUSSION

are not listed in Queensland Transport reports.

Identifying contributing factors for road curve crashes is useful in identi-

fying suitable road designs, intelligent transport systems or technology. Un-

derstanding what causes crashes on road curves, provides the opportunity to

design new interventions or modify existing ones to improve the situation and

reduce the crashes.

The significant factors are a minimal representation of the data. Minimal

information is useful in small or mobile devices such as the ones installed

in vehicles. This information can be used to guide learning models or other

models that analyse streaming data on the vehicles. These models use minimal

data due to the limited processing memory available in the mobile devices.

Thus, minimal information reduces the usage of the processing memory and

time for analysis of the data.

The combination or relationship between the contributing factors is novel

in the road safety domain. The relationship allows the understanding of the

combination of contributing factors that influences crash cost and severity. In-

formation on the relationships is useful in the following areas:

• Improve the understanding of the relationship between the contributing

factors and their influence on cost and severity. This can be useful for

researchers in road safety to understand causes for crashes on road curves.

• Improve the learning phase of the existing prediction model. The com-

bination of contributing factors can be used to guide a model of the past

pattern in order to generate a more accurate prediction.

• Identifying the significant factors and the relationship between them may

influence the crash cost. This is useful information for insurance compa-

nies to have in assessing and determining premium policies for potential

clients.

Page 230: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.3. SUMMARY 199

7.2.3 Ways to reduce the crash severity

Based on the results obtained in this study, it can be ascertained that most

crashes are due to driver error. This is a factor that cannot be dealt with easily

in reducing crash severity, unlike road designs or vehicle related factors that can

be re-designed and be used. Driver error can be reduced using warning signs,

road signs or campaigns to educate drivers on the danger and consequences of

their wrong driving behaviour.

Tree collision is also a common factor seen from the results. A possible

solution is the reduction or removal of roadside objects in order to reduce

the consequences and impact from colliding with a tree. If removal is not

possible, installation of safety barriers is recommended which is able to absorb

the impact of the crash and reduce crash severity. Another recommendation is

planting other varies of vegetation such as shrubs instead of trees, which can

have a lower impact on a crash thus reducing the severity level.

7.3 Summary

In the beginning of this chapter, results from the analysis process of the ap-

proach were presented. The rules are analysed in two views; (1) overall view

and (2) individual severity level.

The observations for the overall view of the rules are:

• Most of the rules have vehicles manufactured between 1991 and 2000

and that could be in relation to when the data was collected. The data

was collected between 2003 and 2006, which means the vehicles were

manufactured before 2003.

• Age groups of 17 to 25 and 30 to 39 years old have a higher count.

Page 231: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

200 CHAPTER 7. ANALYSIS AND DISCUSSION

• Most drivers who are 17 to 25 years old are involved in a crash between

7 pm to 12 am.

• The low cost group is found to be most common.

• The most common time for car crashes is between 6 am and before 12

pm.

This overall view does not provide a complete information on the patterns

amongst the data. Therefore, more rules are used to determine the detailed

pattern for each severity level.

The observations of rules for each severity level are presented in the follow-

ing paragraphs.

Observations for the lowest set of rules are:

• The drivers age ranges from mature to older age group (30 to 100 years

old).

• The vehicles are mostly manufactured between 1991 and 2005 and this

is approximately 1 to 15 years old. Most vehicles are manufactured from

1991 to 2000 as the data was collected between 2003 and 2006. Thus,

the vehicles are registered at or before the point of time the data was

collected.

• Most vehicles that are manufactured between 2001 and 2005 are involved

in a crash. Considering these new vehicles are being driven by mature

age drivers, the crash cost is the lowest. This could be due to mature

drivers driving at a slower speed therefore damages to vehicles are not

as serious compared to high speed crashes. No alcohol consumption is

evident therefore no impairment is present to increase the crash severity.

• Both male and females drivers are involved with a majority of female

drivers involved in a crash.

Page 232: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.3. SUMMARY 201

• No other factors are present except for the time of the crash, year the

vehicle is manufactured and the driver age.

Observations for the low set of rules are:

• The drivers are mostly mature drivers. The age ranges from mature to

older age group (26 to 29, 30 to 39, 40 to 49, and 50 to 59 years old).

• The vehicles are manufactured between 1991 and 2005 which is approxi-

mately 1 to 15 years old. This is due to the data collected between 2003

and 2005.

• More vehicles are involved in a crash compared to the lowest set of rules.

• Most crashes occur in the later time of day such as evening and night

time. Poor light affects a driver’s vision and can result in serious mis-

judgement errors in driving.

• Both male and females drivers are involved with a majority of female

drivers involved in a crash.

• No other factors were present except for the time of crash, year the

vehicle is manufactured and the driver age.

Observations for the medium set of rules are:

• The drivers are mostly mature drivers with the age between 30 and 39

years old.

• Most crashes occur in the later time of day such as evening and night

time. Poor light affects a driver’s vision and can result in serious mis-

judgement errors in driving.

• Most drivers are male.

• Newer vehicles are involved in crashes.

Page 233: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

202 CHAPTER 7. ANALYSIS AND DISCUSSION

• Most vehicles are involved in roll over crashes which indicate the vehicle

went off the road. Possible causes are speeding or misjudgement of the

curvature of the road due to poor vision or alcohol consumption.

• No other factors were present except for the time of crash, year the vehicle

is manufactured and the driver age. Only the first rule had presence of

alcohol consumption

Observations for the high set of rules are:

• The drivers are mostly mature drivers with the two age between 30 to

39 and 60 to 100 years old.

• Most crashes occur in the later time of day such as afternoon and night

time. During the night, poor light affects a driver’s vision and can result

in serious misjudgement errors in driving.

• Most drivers are male.

• Vehicles involved in the crashes are newer and they cost more due to the

cost of repairs and insurance involved.

• Crashes involving hitting fixed objects are the most common crash type.

The fifth rule states that there is presence of alcohol and a tree. The

presence of alcohol can impair a driver’s reaction time. Misjudgement

of the curvature of the road and overestimating the suitable speed to

negotiate the curve safely may have lead to the crash.

Observations for the highest set of rules are:

• The age groups of drivers are between 17 and 25 years old and 50 and

59 years old.

• Most crashes occur during the day such as in the morning and afternoon.

Visibility is not an issue however, sun glare could affect a driver’s vision.

Page 234: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

7.3. SUMMARY 203

• Both male and females drivers are involved.

• A combination of new and older vehicles between 10 and 15 years old

were involved.

• Alcohol consumption and environmental contributing factors are present

in the top rule for this level. This indicates that the vehicle went off road

and collided with a fixed object.

Both these contributing factors as well as being listed in the high set of

rules may have increased the severity level of the crash. Further investigation

shows the combination of the rules are similar with the others however, the

one point of difference is the age of the driver. Young drivers appear only in

this set of rules. Because young drivers tend to be more inexperienced and

reckless in their driving, a combination of high speed and judgement error has

increased the crash cost and severity.

However, the time of crash for each severity level is in the later hours of

the day for example, evening.

Rules are validated with the traffic simulator which has an accuracy of

80% while the accuracy measurement obtained is 63.3% accurate. One of the

possible reasons for a lower accuracy rate is because of missing data and values

for some severity levels.

Through rough set analysis, significant factors are identified from the rules.

The rules are analysed in two views that is the same as previously mentioned.

The common patterns observed are:

• Most crashes occurred in the evening or night hours.

• Collision with fixed objects for example, trees are quite common amongst

the rules. There is an interesting combination between this type of crash

and the time it occurs that increases its severity. Collision with a tree in

the morning hours has an increased severity level and this is evident in

Page 235: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

204 CHAPTER 7. ANALYSIS AND DISCUSSION

the comparison of the rules in the medium and high severity level. All

other factors remain the same except for the time of the crash.

• Most drivers who are between 60 and 100 years old faces lowest severity

when a crash occurred during the evening and night hours. The severity

increases when the crash occurred during the morning peak hours.

The relationship between the time of the crash, year of manufacture and

the tree collision influences the crash cost and severity. The presence of other

contributing factors can influence the crash cost and severity depending on the

impact of the crash. The impact of the crash is determined by the speed the

vehicle is travelling and the object it collides with.

The second part of the chapter discusses whether the research questions

have been addressed. The first research question in identifying the contributing

factors of crashes on road curves is answered with the results obtained from

the text mining process.

The second research question identifying the relationships or combinations

of the contributing factors is achieved with the results from rough set analysis

process.

The third research question identifying the significant factors is answered

with the results obtained from a search algorithm in the rough set analysis

process.

This is followed by a discussion on other possible areas of road safety that

these results can be applied to. Along with the results, identification of the

main causes and suggestions and recommendations are also proposed.

Page 236: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CHAPTER 8

Conclusion and Future work

This final chapter draws together the major findings of this study and deter-

mines whether it has met its objectives and contribute to the research domain.

Findings are placed into context of broader implications and future research.

8.1 Achievements

8.1.1 The aim

As mentioned in Chapter 1, the aim of this research is to identify and under-

stand the contributing factors to crashes on road curves as well as the effect

of various combinations of these factors on crash severity.

The aim of this research is achieved with the following steps:

• To identify and discover new contributing factors of crashes on road

curves using the text mining technique.

• To understand the relationships between these contributing factors.

• To identify a minimal set of contributing factors.

• To understand the contributing factors and its effect on crash severity.

The next section provides a brief explanation of the approach.

205

Page 237: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

206 CHAPTER 8. CONCLUSION AND FUTURE WORK

8.1.2 Summary of approach

The approach proposed uses data mining techniques to achieve the aims of this

research. The major processes of the approach are: (1) text mining of crash

descriptions, (2) data analysis with rough set theory, (3) validation and (4)

understanding the relationship between the contributing factors and its effect

on crash severity.

Insurance crash claim records are used as data input and the approach

begins with a data cleaning process prior to analysis. This process ensures the

data contains no error as these can affect the results.

The text mining process analyses the ‘cleaned’ data to identify contributing

factors within the crash descriptions from insurance claim records. The identi-

fied contributing factors are sorted and categorised into a decision table which

is then used as an input for the rough set analysis process. Rough set analysis

is used to determine the minimal set of contributing factors, the relationship

or dependency between the contributing factors and decision rules.

A traffic simulator is designed to verify the rules generated with rough set

analysis. The validation process verifies the crash type obtained from the sim-

ulator is similar to the ones indicated in the rules. The assumption is that the

approach is valid when the accuracy of the results from the simulator is within

the defined threshold of 80% ± 10%. The accuracy is obtained by dividing

the number of outcomes from the simulator that are similar to the rules with

the total number of tests and multiplying by 100. The simulator is designed

based on a stochastic model and variables can be customised according to the

input data.

The second approach in verifying the rules is via accuracy calculation gen-

erated by rough set analysis process. The data is divided into two data sets:

80% and 20%. The 80% input data set is used for analysis while the 20%

is used for validation. The rules generated from the 80% input data set are

Page 238: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.1. ACHIEVEMENTS 207

applied to the 20% validation set using rough set theory analysis and this

generates the accuracy of the rules.

Once the rules are verified, the next step is to understand the relationship

of the contributing factors and its effect on crash severity. The crash severity

is defined with five levels: (1) lowest, (2) low, (3) medium, (4) high, and (5)

highest. Each severity level is related to a cost range and a set of related

contributing factors which is represented as rules. The rules are examined and

will determine the effect on crash severity.

8.1.3 Limitations

There are limitations to this approach and they are listed as follows.

• There is limited data source as only insurance claim records are used

to understand the contributing factors and the related crash severity.

This means that the understanding process is accurate with a limited

certainty.

• The limitation of using static data leads to the problem of constantly up-

dating this approach or it will not be able to react to new circumstances.

This is due to the limitation of using streaming data from sensors in a

vehicle. Additionally, results will only be accurate up to a certain extent

because it uses only past crash data.

• The crash descriptions provided can be biased as descriptions are nar-

rated with the intention of a claim. Biased data may affect the results

and analysis.

• There are not as many road curve crash records available compared to

other crashes. Therefore, results will be limited to the data available and

may not be applicable to other possible type of crashes on road curves.

Page 239: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

208 CHAPTER 8. CONCLUSION AND FUTURE WORK

• The data obtained do not have specific information on the design of

the road curve where the crash occurs. For example, no information is

available on the degree of the curve and speed limit imposed on the road

curve.

• No detailed crash description is available when fatalities occur therefore,

causes of fatal crashes are limited.

• The extent of a driver’s injuries is not taken into consideration when the

cost of a crash is calculated.

• The simulator defined is designed for simulating crashes on road curves.

It provides the flexibility for users to customise the variables according

to the desired situation. Once the settings are defined, the simulator

produces the outcome according to the type of crash involved. The

simulator can be programmed to run a defined number of iterations and

the types of crashes are stored and illustrated in a three dimensional

graph. The input data from insurance claim records are related to a list

of contributing factors, types of crash and the crash cost. The simulator

has the capability to present the results according to the contributing

factors as inputs. However, it is unable to present the cost involved in

the crash.

8.1.4 Research findings and implications

The three main contributing factor categories are: road and environmental,

vehicle and driver. Each category contains detailed and specific contributing

factors that lead to a crash on a road curve. Text mining process in the

data preparation process has identified the contributing factors from the crash

descriptions available in the crash claim records. The factors from text mining

are similar to the ones reported from road authorities such as Queensland

Transport, apart from some new ones. The new contributing factors identified

Page 240: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.1. ACHIEVEMENTS 209

are: tree, embankment, gravel, pole, gutter, lost control, wet road surface, dirt,

kangaroos, trucks, lost of traction, foggy conditions.

Rough set analysis produces a set of rules and they are classified into differ-

ent crash severity levels. The rules determine the dependency or relationships

between the contributing factors. Rules of a high strength are selected as it

affects the prediction accuracy.

The rules are presented in two views: (1) an overall view and (2) individual

severity level view. The first set of rules is obtained from the contributing

factors identified from text mining process. The observations for the overall

view of the rules are:

• Most the rules have vehicles manufactured between 1991 and 2000 and

that could be due to the time period when the data was collected. The

data was collected between 2003 and 2006, which means the vehicles were

manufactured before 2003.

• Age groups of 17 to 25 and 30 to 39 years old have a higher count.

• Most drivers between 17 to 25 years old are involved in crashes between

7 pm to 12 am.

• The low cost group is found to be most common.

• The most common time for vehicle crashes is between 6 am and before

12 pm.

This overall view does not provide complete information on the patterns

amongst the data. Therefore, more rules are used to determine the detailed

pattern for each severity level.

The observations of rules for each severity level are presented in the follow-

ing paragraphs.

Observations for the lowest set of rules

Page 241: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

210 CHAPTER 8. CONCLUSION AND FUTURE WORK

• Most drivers are mature to older age group drivers between 30 and 100

years old.

• The vehicles are mostly manufactured between 1991 and 2005. Most

vehicles are manufactured from 1991 to 2000 as the data was collected

from 2003 and 2006. Therefore, the vehicles are registered before the

time the data was collected.

• Most vehicles involved in a crash were manufactured between 2001 and

2005.

• New vehicles are being driven by mature age drivers, the crash cost is

the lowest. This could be due to mature drivers driving at a slower

speed therefore the damages to vehicles are not as serious compared

to high speed crashes. No alcohol consumption is evident therefore no

impairment is present to increase the crash severity.

• Both male and females drivers are involved with a majority of female

drivers involved in a crash.

• No other factors were present except for the time of crash, year the

vehicle is manufactured and the driver age.

Observations for the low set of rules

• Most drivers are mature to older age group who are between 26 and 59

years old.

• The vehicles are manufactured between 1991 and 2005. This is due to

when the data was collected between 2003 and 2006.

• More vehicles are involved in a crash compared to the lowest set of rules.

• Most crashes occur in the later time of day such as evening and night

time. Poor lighting affects a driver’s vision and can result in serious

misjudgement errors in driving.

Page 242: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.1. ACHIEVEMENTS 211

• Both male and females drivers are involved with a majority of female

drivers involved in a crash.

• No other factors were present except for the time of crash, year the

vehicle is manufactured and the driver age.

Observations for the medium set of rules

• Most drivers are mature age drivers who are between 30 and 39 years

old.

• Most crashes occur in the later time of day such as evening and night

time. Poor lighting affects a driver’s vision and can result in serious

misjudgement errors in driving.

• Most drivers are male.

• New vehicles are involved in crashes.

• Most vehicles are involved in roll over crashes which indicate the vehicles

went off the road. Possible causes are speeding or misjudgement of the

curvature of the road due to poor vision or alcohol consumption.

• No other factors were present except for the time of the crash, year the

vehicle is manufactured and the driver age.

Observations for the high set of rules

• The drivers are mostly mature drivers between 30 to 39 and 60 to 100

years old.

• Most crashes occur in the later time of day such as afternoon and night

time. During the night, poor light affects a driver’s vision and can result

in serious misjudgement errors in driving.

Page 243: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

212 CHAPTER 8. CONCLUSION AND FUTURE WORK

• Most drivers are male.

• Vehicles involved in crashes are newer and they cost more due to the cost

of repairs and the insurance.

• Crashes involving hitting fixed objects are the most common crash type.

This can be linked to alcohol consumption which impairs the driver’s

reaction time. Misjudgement of the curvature of the road and overesti-

mating the suitable speed to negotiate the curve safely may have lead to

the crash.

Observations for the highest set of rules

• The age groups of drivers are between 17 and 25 years old and 50 and

59 years old.

• Most crashes occur during the day such as in the morning and afternoon.

Visibility is not an issue however, sun glare could affect a driver’s vision.

• Both male and female drivers are involved.

• A combination of new and older vehicles between 10 and 15 years old

were involved.

• Alcohol consumption and environmental contributing factors such as hit-

ting a tree, are present in the top rule for this level. This indicates that

the vehicle went off road and collided with a fixed object such as the

tree. Both these contributing factors are being listed in the highest rules

may have increased the severity level of the crash. Further investigation

shows the combination of the rules are similar with the others however,

the one point of difference is the age of the driver. Young drivers appear

only in this set of rules because young drivers tend to be more inexpe-

rienced and reckless in their driving, a combination of high speed and

judgement error has increased the crash cost and severity.

Page 244: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.1. ACHIEVEMENTS 213

Comparing the results obtained for each severity level and the analysis of

the top five rules, the results differ in some factors. However, the time of crash

for each severity level is in the later hours of the day for example, evening

time.

The rules are verified with rough set theory and have an accuracy of ap-

proximately 63.3%. In addition, the accuracy obtained from the simulation is

80%. Both are acceptable as they are within the defined threshold.

Using the rules, the most significant contributing factors are: time, the

year the vehicle is manufactured, driver age, tree, puddle and crash type. The

factors are used to generate a list of rules to observe the combinations of

contributing factors and related crash severity.

The rules are presented in two views: (1) an overall view and (2) individual

severity level view. The first set of rules is obtained from the contributing

factors identified from text mining process. The observations from the overall

view of the rules are:

• Most the rules have vehicles manufactured between 1991 and 2000 and

that could be due to the period when the data was collected. The data

was collected between 2003 and 2006, which means that the vehicles were

manufactured before 2003.

• Age groups of drivers between 40 and 49 years old have a higher count.

• The common cost group amongst the rules is the low cost group.

• Most crashes occur in the later hours of the day between 7 pm and 12

am.

The observations for each crash severity are listed in the following para-

graphs.

Observations for the lowest set of rules

Page 245: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

214 CHAPTER 8. CONCLUSION AND FUTURE WORK

• Most drivers are between 60 and 100 years old.

• Most vehicles are manufactured between 2001 and 2005. These vehicles

are relatively new.

• Considering these new vehicles are being driven by older age drivers, the

crash cost is the lowest. This could be due to them driving at a slower

speed therefore damages to vehicles are not as serious compared to the

ones that travels at a higher speed.

• Most crashes occur in the later time of day therefore, poor light affects a

driver’s vision and can result in serious misjudgement errors in driving.

• No other factors were present except for the time of crash, year the

vehicle is manufactured and the driver age.

Observations for the low set of rules

• The main age groups are: between 25 and 29, and 60 and 100 years old.

• Most vehicles were manufactured between 1991 and 2000 and that could

be due to the period when the data was collected. The data collected

between 2003 and 2005.

• Most crashes occur in the later time of day such as evening and night

time. Poor light affects a driver’s vision and can result in serious mis-

judgement errors in driving.

• Crashes involving hitting fixed objects are the most common crash type

and trees are the most common fixed object collisions. Evening peak

hours between 4 pm and 7 pm where crashes involving hitting fixed

objects involve drivers between 25 and 29 years old. Based on the time

of the crash, drivers could be driving home from work. Drivers could

suffer from fatigue from a full day at work and doze off at the wheel, run

Page 246: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.1. ACHIEVEMENTS 215

off the road and collide with a tree. Due to the high volume of traffic at

the time drivers will be driving at a slower speed therefore damages to

vehicles are not as serious compared to high speed crashes.

Observations for the medium set of rules

• The drivers are mostly mature drivers between 30 and 39 years old.

• Most crashes occur in the later time of day such as evening time. Poor

light affects a driver’s vision and can result in serious misjudgement errors

in driving.

• Most vehicles involved in a crash are relatively new.

• Most vehicles are involved in roll over crashes which indicate the vehicles

went off the road. Possible causes are speeding or misjudgement of the

curvature of the road due to poor vision as most crashes occur across the

late afternoon and night hours.

• Crashes that involve hitting a fixed object such as a tree occur during

the day between 9 am and 12 pm.

Observations for the high set of rules

• Most drivers are mature drivers between 40 to 49 and 60 to 100 years

old.

• Most crashes occur during the morning and evening peak hours. Morning

peak hours is between 6 am and 9 am, have high volume of traffic and

most vehicles are travelling at a higher speed to get to the work on time.

When a crash occurs, the impact will be higher than vehicles travelling

at a slower speed. The risk of rear-end collision is higher because vehicles

are travelling very close to each other. Evening peak hours between 4

pm and 7pm, have high volume of traffic and drivers could suffer from

Page 247: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

216 CHAPTER 8. CONCLUSION AND FUTURE WORK

fatigue from a full day at work and doze off behind the wheels, run off

the road, roll over and hit a tree.

Observations for the highest set of rules

• Age groups of drivers between 17 and 25 years old and 50 and 59 years

old.

• Most crashes occur during the day such as in the morning and afternoon.

Visibility is not an issue however, sun glare could affect a driver’s vision.

• Due to a collision with a fixed object and the subsequent roll over, it has

increased the severity level of the crash as they are listed in the high set

of rules but not in the lower severity level.

The relationship from the significant contributing factors provides a sum-

marized information of the possible causes of a crash. Both significant rela-

tionships are similar therefore the later significant relationship is chosen to

represent the data.

The relationship between the time of the crash, when the vehicle was manu-

factured and the collision with a fixed object such as a tree, influences the crash

cost and severity. The presence of other contributing factors can also influence

the cost of a crash and its severity depending on the impact of the crash. The

impact of the crash is determined by the speed the vehicle is travelling and

the object the vehicle collides.

The year the vehicle is manufactured is thought to be a major factor that

greatly influences the cost as theoretically, new vehicles incur more cost than

older vehicles. Thus, using this factor results in a bias assessment of the crash

severity, as the crash cost is a factor used to assess the severity of a crash. The

reasons for keeping the factor are:

Page 248: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.1. ACHIEVEMENTS 217

• The output of the formal model used (Rough set theory) is a set of

contributing factors indicating the relationships between the factors, as

opposed to individual factors by themselves. The year the vehicle is man-

ufactured can be considered as an individual factor; however, the aim of

this research is to discover the relationship between the contributing fac-

tors. Thus, this factor is considered in terms of its relationship to other

contributing factors to determine crash severity. It would be statisti-

cally wrong to remove one factor from the set as the set is defined as an

unalterable whole. Furthermore, crash severity is based on crash cost,

and crash cost is defined as the damage cost of vehicles and any other

damaged objects; the cost does not include driver injuries.

• This is an ARC linkage APAI project working in a partnership with an

insurance company to determine the factors affecting the cost of a crash.

Hence, this research aims to use most of the attributes available from

the insurance crash records to discover any possible significant relation-

ships between contributing factors that affect crash severity. Therefore,

scientifically, the year the vehicle is manufactured cannot be removed.

It is also of interest to the insurance company to know the factors and

the relationship between them with the view to adjust their insurance

premium.

• The year the vehicle is manufactured can be used to determine the condi-

tion of a vehicle, which can affect the severity of a crash; a vehicle in poor

condition may be involved in head-on or multiple collisions, resulting in

more severe consequences.

Page 249: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

218 CHAPTER 8. CONCLUSION AND FUTURE WORK

8.2 Contributions

This research has contributed a novel approach and findings for contributing

factors and relationships between the factors on curve-related crashes and they

are discussed in the following paragraphs.

• Using data mining technique to identify contributing factors of

crashes on road curves.

The insurance claim records contain crash descriptions that are textual

and unstructured. The specific data mining technique used to identify

the factors is text mining. The use of text mining technique expands

the approach to identify contributing factors of crashes on road curves.

Pande and Abdel-Aty (2006) have studied the application of data mining

technique to analyse data however, the data used consists of a block of

textual description of the crash. Thus, text mining is a new approach to

analyse data in the road safety domain. Text mining technique can be

applied in other research that investigates crashes when the data consists

of a block of textual description.

• Identify the relationship between contributing factors

Road authorities’ report on road toll figures with a fixed period. In ad-

dition, the reports list out contributing factors individually without any

indication of relationship between the factors. This research identifies

the relationship between the contributing factors which is represented

as rules which will help identify which contributing factors are closely

related and which ones increase the severity of a crash.

• Identify significant contributing factors

The significant contributing factors are identified from the rules which

are determined by the percentage of the presence in the rules. The

significant contributing factors listed in Section 8.1.4 could be useful in

representing a subset of the data.

Page 250: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

8.3. FUTURE WORKS 219

• Defined a traffic simulator for road curves

A traffic simulator is defined to simulate crashes for road curves. One of

the significant features of the traffic simulator is that variables related

to the driver, road and environment and vehicle can be customised to a

required driving context. Due to the stochastic model within the traffic

simulator which generates values randomly, no simulation produces the

same results as the previous one. This could represent the real driving

situation more closely.

• Validate rules with traffic simulator

The most common way to validate rules is either by 10-fold cross vali-

dation technique or through the accuracy measurement. This research

looks into verifying rules with a traffic simulator. A traffic simulator is

used due to the area of study for example road safety. A valid verification

denotes that the approach proposed is valid.

8.3 Future works

The number of records for crashes related to road curve is less than 50% of

the total number of crashes. Hence, the data available for analysis is limited.

Therefore, the need to use more data from different sources such as sensors

installed in vehicles could improve the prediction accuracy.

A future study could focus on specific black spots for road curves which

have an extraordinarily high volume of crashes. This specific study of a road

curve could determine whether the findings are valid whilst on the other hand

identify more contributing factors and improve the learning process in the

KDD process.

Another area of improvement for this research is to observe the results of

correspondence analysis methods using multi-dimensional statistical methods

that are based on principal components.

Page 251: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

220 CHAPTER 8. CONCLUSION AND FUTURE WORK

A background study of the abilities of more rough set analysis softwares

can be another area for future improvements to this research. One such rough

set analysis software to study is the R-software, which is available as freeware.

Page 252: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

APPENDIX A

Literature Review: Horizontal Curves

and Road Engineering interventions

A.1 Types of horizontal curves

There exist four variations of horizontal curves, which will be explained in the

following.

Simple Curve A simple curve composed of a circular arc and the radius

of the circle determines the degree of sharpness. Simple curves are most fre-

quently used due to its simplicity to construct, design and layout. Figure A.1

illustrates a design of the simple curve.

Figure A.1: An illustration of a simple curve.

Compound Curve A compound curve consists of two simple curves

joined together and curved in the same direction (Hanger, 2003). This type

221

Page 253: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

of curve is usually interposed to avoid obstacles which are not able to remove

or reallocate, such as interchange ramps, and transitions into sharper curves

(Highway, 2004). Figure A.2 shows a compound curve.

Figure A.2: An illustration of a compound curve.

Reverse curves This curve is made up of two simple curves joining

together and curving in opposite directions. Figure A.3 illustrates a reverse

curve. Reverse curves are normally avoided for safety reasons.

Figure A.3: An illustration of a reverse curve.

Spiral curve This is a curve with altering radius and mostly used in

modern highways. The intention for using a spiral curve is to offer a transition

from the tangent to a simple curve or between simple curves in a compound

curve (Hanger, 2003). Figure A.4 shows a spiral curve.

222

Page 254: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Figure A.4: An illustration of a spiral curve.

A.2 Road engineering and environmental interventions

As previously presented, friction and curvatures are two examples of road

curves structure that can cause a driver to lose control of their vehicle. Speed

is also a reason that causes loss of control. Excessive speed and losing control

usually lead to a single vehicle run-off road crash when entering the road curve.

Road authorities had implemented countermeasures with the aim of pre-

venting crashes occurring and minimising the consequences of a crash in hor-

izontal curves. The primary methods adopted are engineering interventions

such as pavement markings, warning signs and delineations. These methods

are implemented to improve or minimise adverse consequences in roadway

designs. For example, in order to prevent vehicles from hitting objects, a

countermeasure is to remove or relocate objects in hazardous locations (Tor-

bic, Harwood, Gilmore., Pfefer, Neuman, Slack & Hardy, 2004). There follows

a discussion of countermeasures to reduce the speed and the possibilities of

a vehicle leaving the roadway or crossing the centreline at a horizontal curve

(Torbic et al., 2004).

The interventions listed in the rest of this section are summarised as follows:

• Warning signs

Warning signs are used to warn drivers of a hazard ahead and to indi-

cate a change of alignment or to indicate the safe speed for negotiating a

curve. Different types of warning signs exist and are used in road curves

223

Page 255: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

to aid drivers driving in road curves.

(1) Chevron Alignment Signs

A common warning sign installed in hazardous locations such as road

curves is the chevron alignment sign. Chevron alignment signs are used

to provide additional guidance where there is a change in horizontal align-

ment (Carlson, Rose, Chrysler & Bischoff, 2004). This countermeasure

is used to reduce the driving speed and speed variance (Carlson et al.,

2004).

Jennings et.al (2004) discovered that the alignment signs can influence

drivers to reduce their speed. They also found that these signs promote

a better lateral placement and drivers are better able to follow the curve.

However, studies have shown that the alignment signs do not have sig-

nificant results over other delineation methods (Carlson et al., 2004).

(2) Advisory speed signs

An advisory speed limit sign indicates the maximum speed which allows

a vehicle to travel in a curve safely and comfortably. This advisory speed

is not applicable for all drivers and vehicles, instead it is a guide to alert

the driver that there is a need to slow down.

However, the advisory speed limit sign is not always effective as drivers

may exceed the safe speed if they had travelled safely in a similar curve

at a higher speed. Hence, the signs are only useful when they are im-

posed in road curves consistently and standardised so that drivers will

know what to expect ahead.

(3) Horizontal alignment signs and Advisory speed plaques

The other warning sign used in road curves is the horizontal alignment

sign accompanied with an advisory speed plaque. The horizontal align-

ment signs are used to alert drivers of the change of alignment ahead

224

Page 256: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

and the advisory speed plague suggests a speed to safely manoeuvre in

a road curve. In addition, warning signs can also be accompanied by

flashing lights which are effective in speed reduction.

(4) Variable message or speed limit signs

Variable signs are used to resolve traffic flow and safety. The message

or speed limit shown on these electronic signs is updated according to

the traffic and road conditions. Variable signs show their messages elec-

tronically, hence, drivers can see them even when weather conditions are

unfavourable. Besides being electronic, it is also portable. These signs

can be transported to locations that need to convey traffic information.

Due to their high cost, the signs are only installed on highways in Aus-

tralia and have only been recently introduced into Australia. Therefore,

there are insufficient findings to prove the effective use of the signs.

• Delineators

The delineators are light-reflective devices mounted along the side of the

road to indicate the alignment of the road. Delineators act as a guidance

device and are particularly useful for a change of alignment or where the

alignment is confusing. These devices are effective where vision is not

clear such as at night time or during a rainy day.

(1) Post-mounted delineators

One such delineator is the post-mounted delineators (PMD) which are

used when there is a confusing or unexpected alignment on the road.

PMD are a good guidance at night as they are reflective and are designed

to be at a comparable height to the headlights of the vehicles (Vest,

Stamatiadis, Clayton & Pigman, 2005).

PMD are not effective in reducing driving speed but are helpful in re-

ducing the mean lateral placement of the vehicle (Zador, Stein, Wright

& Hall, 1987).

225

Page 257: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

(2) Guideposts

Guideposts are another common type of delineator that are used to show

and enhance the edge of the road (ARRB, 2003). They are placed on

narrow roads which have insufficient road width to mark the centre line.

In some road curves, guideposts are accompanied with retro-reflective

delineators to provide cues of a curve and as advanced warning of unex-

pected changes in horizontal alignment.

PMD and guideposts are used to ensure safe driving due to sharp or

narrow road curves. The delineators aid drivers to better judge the

curvature and thus reduce their speed when driving in a road curve.

• Pavement Markings

Pavement markings along the road are one of the countermeasures for

run-off-road crashes in road curves. Transverse pavement markings are

used in horizontal road curves and can provide drivers with the percep-

tion that the lane is narrower, and, hence,encourages them to slow down

in a road curve. One of the purposes of pavement markings is to warn

the drivers in advance of the hazards ahead (Fildes & Jarvis., 1994).

This perceptual countermeasure has significant long-lasting influence on

driver’s driving speed.

The signs, delineators and pavement markings are placed on roads to warn

drivers. However, there is no significant reduction of crashes in road curves.

The possible reasons are:

• Drivers tend to ignore them and end up in a crash.

• The warning signs are not placed in a location that is noticeable or they

are blocked by trees.

• Bad weather conditions affect the ability of the driver to see the warnings.

• The signs are damaged.

226

Page 258: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Many more reasons exist as to why such signs are not effective in reducing

crashes. Thus, such signs are used with other interventions or improved to

reduce more crashes.

A.3 Driver-related interventions

A driver’s speeding is one of the contributory causes of a crash. In order to

reduce the serious issue of speeding, various countermeasures are implemented

such as installing a speed camera which is able to detect whether a vehicle is

travelling above the speed limit. Different governments apply different rules.

For example,the state government of Western Australia increased the speed

fines and number of demerit points in 2007. The increase was said to be based

on the likelihood and severity of crashes (RAC, 2007). In contrast, the state

government of New South Wales suspends the driver’s license based on the

level of exceeded speed. The suspension can go up to a period of six months

(RTA, 2008).

Most speeding offenders are young drivers between the ages of 17 to 24

years old. In Queensland, drivers who are under the age of 25 are required to

accumulate 100 hours of certified and supervised driving experience in order to

be eligible to apply for a provisional license (QT, 2008b). Provisional license

drivers under the age of 25 have several other rules and restrictions. One such

rule is that they can only carry one passenger who is under the age of 21

years old, between 11 pm and 5 am. In addition, L-plates and P-plates are

compulsory. Such drivers are also restricted to drive sports car. Many more

new rules are listed in the booklet entitled Young drivers from Queensland

Transport (QT, 2008b).

Speeding is related to aggressiveness. Aggressive drivers tend to speed more

than other drivers and most aggressive drivers are young drivers. Intoxication

can also lead drivers to be aggressiveness. Intoxicated drivers will not perform

227

Page 259: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

well due to impaired judgement.

Another driver error that might cause off road crashes is fatigue, which is

caused mainly by the lack of sleep. Most adults require about six to eight

hours of quality sleep per night for alertness. Night shift workers have lower

sleep quality than day workers, hence, they may tend to doze off behind the

wheels. The only cure for fatigue is quality and adequate sleep. Drivers should

take a rest at intervals when they are travelling long distances.

Road authorities have run campaigns to educate drivers on the seriousness

and effects of speeding, drink driving, fatigue and other issues. For instance,

Queensland Transport carried out a Driver reviver campaign and strongly en-

couraged drivers to stop at rest stops to enjoy a cup of tea or coffee and a snack

(QT, 2008a). Other approaches used are TV commercials, outdoor billboards,

and online advertising to inform and remind people of the seriousness of the

issue.

The process for correcting driver errors is not an easy one with instant

results as it all depends on whether drivers are willing to learn and understand

the message sent to them.

A.4 Vehicle-related interventions

As technology improves, it is used to study crashes on road and installed in

vehicles to help a driver to drive safely. Studies had made used of crash or

accident prediction models, risk assessment models and simulators to reduce

the probability of crashes and crash risk. These solutions are not performed

in real time streaming. On the other hand, technologies that are installed on

board vehicles known as Intelligent Transport Systems (ITS) consist of sensors

to collect data and analysis in real time.

One of the engineering interventions to reduce the number of crashes is the

shoulder rumble strip. Studies showed that shoulder rumble strips reduce run-

228

Page 260: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

off-road crashes effectively. However, several drivers do not like the noise and

vibration produced and drivers can overreact or panic by the stimulus which

may result in their losing control of their vehicle. Shoulder rumble strips

incorporated with other safety countermeasures such as pavement markings

and delineations can reduce unintentional lane departure. Examples of other

countermeasures are to: realign the horizontal alignment, provide dynamic

warning signs (Torbic, Harwood, Gilmore., Pfefer, Neuman, Slack & Hardy,

2004), and install delineate roadside objects. Other interventions discussed

are chevron alignment signs, horizontal alignment stands and advisory speed

plaques, post-mounted delineators, and guideposts. All of them are designed

to reduce the number of crashes in road curves. However, these interventions

can be ignored by drivers, hence, a better approach for reducing the crash risk

is to employ Information technology applications and Intelligent Transport

Systems in the vehicle to guide drivers in a road curve.

229

Page 261: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment
Page 262: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

APPENDIX B

Data categories

B.1 Classification and Labels

This section presents the categories and the labels of the data.

• timeGrp

TimeGrp represents the time category. For the timeGrp category, time

is categorized into six sub categories. The six defined categories are:

night, morning peak hour, morning, afternoon, evening peak hour and

evening. The time range for night is defined as between midnight to 6

am, followed by the morning peak hour with a time range from 6 am

to 9 am. The morning sub category ranges from 9 am to 12pm(noon)

and the afternoon sub category ranges from 12 pm to 4 pm. Then the

evening peak hour is between 6 pm to 7 pm and lastly, the evening sub

category has a time range of 7 pm to midnight. The range and labels

are tabularised is shown in Table B.1.

231

Page 263: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Table B.1: The sub categories and labels for timeGrp.

Time category

Range Label Range Label

12–6 Night 6–9 mornPH

9–12 Morn 12–16 aftn

16–19 evenPH 19–24 even

Legend:

mornPH - morning peak hours,

Morn - morning,

Aftn - afternoon,

evenPH - evening peak hours,

even - evening.

232

Page 264: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

• Drvage The ageGrp represents the age group. The ages range from 17

to 100. Three main sub categories are defined in the Drvage category,

which represents the age of the drivers. The three categories are young,

mature and senior. Each category represents a range for the age and is

based on Queensland Transport categories (QT, 2005).

The young category range is from 17 to 25. The mature group category

ranges from 25 to 39 and has two sub categories: matureG1, and ma-

tureG2. Lastly is the senior category where the ages range from 40 to

100. The senior group has two sub categories: seniorG1 and seniorG2.

Note: G1,G2...Gn is representing Group 1 , Group 2, Groupn. Table B.1

represents the sub categories in the ageGrp category.

Table B.2: The sub categories and labels for the age group.

Driver age category

Label Description Range

yg Young 17–25

m1 MatureG1 26–29

m2 MatureG2 30–39

s1 SeniorG1 40–49

s2 SeniorG2 50–59

od Old 60–100

Legend:

matureGx = mature drivers group x

seniorGx = senior drivers group x, where x = 1, 2, 3..etc.

233

Page 265: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

• VehAge The vehAge category represents the calculated age of the ve-

hicle based on the year 2008. Seven sub categories are created within

the vehAge categories. They are new, oldG1, oldG2, olderG1, olderG2,

voldG1, and voldG2. The sub category represents the age of the vehicle

and it indicates the year the vehicle was manufactured. For example, the

New sub category represents vehicles of 1 to 5 years of age and indicates

that the vehicles were manufactured between the years 2003 to 2008.

This applies to the other sub categories, so the oldG1 represents vehicles

that were manufactured between 2000 and 2003 which is therefore 5 to

10 years old. Table B.3 displays all of the sub categories.

Table B.3: The sub categories and labels for the age of the vehicle.

Vehicle age category

Range Label

2001–2005 new

1991–2000 moderate

1981–1990 old

1971–1980 older

1961–1970 very old

1921–1960 obsolete

Legend:

oldGn = old car groupn.

olderGn = older car groupn

voldGn = very old car groupn, where n = 1,2,3,..etc.

234

Page 266: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

References

Abdourahmane, K. (2005). Modelisation des Trajectories et du Trafic. Tech-

nical report, LCPC. (in French).

AECOM (2008). VISSIM Micro-Simulation Software.

AECPortico (2005). Transpotation Engineering: Geometric Design Glossary.

Agotnes, T. (1999). Filtering Large Propositional Rule Sets While Retaining

Classifier Performance. PhD thesis, Norwegian University of Science and

Technology.

Agrawal, R., Mannila, H., Srikant, R., Tolvonen, H., & Verkamo, I. (1996). Fast

Discovery of Association Rules. In Fayyad, U., Piatetsky-Shapiro, G. G.,

Smyth, P., & Uthurusamy, R. (Eds.), Advances in Knowledge Discovery

and Data Mining, (pp. 307–328). AAI Press.

Aldridge, C. H. (2001). A Rough Set Based Methodology for Geographic

Knowledge Discovery. Proceedings of the 6th International Conference on

GeoComputation, GeoComputation Conference Proceedings.

ALTS (2004). Road Safety Issues Kaikoura District - July 2004. Authority,

Land Transport Safety.

Amemiya, H. (2004). The Japanese Studies: Market Introduction and Liability

Issues of ADAS in Japan.

An, A. & Cercone, N. (2001). Rule Quality Measures for Rule Induction

Systems: Description and Evaluation. In Computional Intelligence, vol-

ume 17. Blackwell Publishers.

ARRB (2003). Road Hazard Management Guide. Technical report, ARRB

Transport.

235

Page 267: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

ATSB (2004). Road Safety in Australia. Canberra, Australia: Paragon Printers

Australasia. A Publication Commemorating World Health Day 2004.

Australia-Govt (2008). Road Deaths Australia 2007 Statistical Summary.

Technical Report Road Safety Report 1, Department of Infrastructure,

Transport, Regional Development and Local Government.

Bazan, J., Nguyen, H. S., Skowron, A., & Szczuka, M. (2003). A View on

Rough Set Concept Approximations. Springer Berlin / Heidelberg.

Bazan, J. G. & Szczuka, M. S. (2000). RSES and RSESlib - A Collection

of Tools for Rough Set Computations. In Ziarko, W. & Yao, Y. (Eds.),

Rough Sets and Current Trends in Computing, (pp. 106–113). Springer-

Verlag Berlin Heidelberg.

Bazan, J. G. & Szczuka, M. S. (2005). The Rough Set Exploration System. In

J.F. Peters, A. S. (Ed.), Transactions on Rough Sets III, volume LNCS

3400, (pp. 37–56).

Berndt, D. & Clifford, J. (1996). Finding Patterns in Time Series: A Dynamic

Programming Approach. In Fayyad, U., Piatetsky-Shapiro, G. G., Smyth,

P., & Uthurusamy, R. (Eds.), Advances in Knowledge Discovery and Data

Mining, (pp. 229–248). AAAI Press.

Berthold, M. & Hand, D. J. (2003). Intelligent Data Analysis. Springer; 2nd

edition.

Bishop, R. (2005). In Intelligent Vehicle Technology and Trends: Lateral/Side

Sensing and Control Systems, chapter 6. Artech House.

Bloomberg, L. & Dale, J. (2000). A Comparison of the VISSIM and CORSIM

Traffic Simulation Models. Technical report, Institute of Transportation

Engineers Annual Meeting.

236

Page 268: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Bruha, I. & Kockova, S. (1993). Quality of Decision Rules: Empirical and

Statistical Approaches. In M. Gams (Ed.), Informatica, An International

Journal of Computing and Informatics, volume 17 (pp. 233–243). Biro M.

BTE (2000). Road Crash Costs in Australia - Report 102. Technical report.

Australia Commonwealth, Bureau of Transport Economics.

Bullard, L. A., Khoshgoftaar, T. M., & Gao, K. (2007). An Application of a

Rule-Based Model in Software Quality Classification. In 6th International

Conference on Machine Learning and Applications. IEEE Computer soci-

ety, IEEE Computer society.

Carlson, P. J., Rose, E. R., Chrysler, S. T., & Bischoff, A. L. (2004). Simplify-

ing Delineator and Chevron Applications for Horizontal Curves. Technical

Report FHWA/TX-04/0-4052-1, Texas Transportation Institute.

CARRS-Q (2008). CARRS-Q Human Behaviour and Technology Interface.

Website, 23 Oct. 2008.

Chu, L., Liu, H. X., & Recker, W. (2003). Development of Capability-Enhance

PARAMICS Simulation Environment. Technical report, University of Cal-

ifornia, Irvine.

Cliff, D. & Horberry, T. (2007). Driving on Empty: Driver Fatigue is Danger-

ous. In Queensland Government Mining Journal.

Corkle, J., Marti, M., & Montebello, D. (2001). Synthesis on the Effectiveness

of Rumble Strips. Technical Report MN/RC 2002-07, Minnesota Local

Road Research Board. Synthesis Report 1999-2001.

Crowsey, J. M., Ramstad, R. A., Gutierrez, H. D., Paladino, W. G., & White,

K. P. (2007). An Evaluation of Unstructured Text Mining Software.

CTRE (2005). Safety Analysis: Finding Road Safety Problem Locations.

237

Page 269: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

CTRE (2006). Horizontal Curves (Circular Spirals).

http://www.ctre.iastate.edu/educweb/ce353/lec05/lecture.htm.

Czek, P., Hrdle, W., & Weron, R. (2005). Statistical Tools for Finance and

Insurance: Cluster Algorithms.

Dey, L., Ahmad, A., & Kumar, S. (2005). Finding Interesting Rules Exploiting

Rough Memberships. In Pattern Recognition and Machine Intelligence,

volume 3776/2005 of Lecture Notes in Computer Science, (pp. 732–737).

Springer Berlin / Heidelberg. 0302-9743 (Print) 1611-3349 (Online).

DOT, G. (2006). Safety Action Plan, Prevent Vehicles from Departing the

Roadway or Lanes. Technical report.

Environment, T. & Works Bureau, H. K. (1997). Traffic Engineering and

Management, chapter 7. Technical report.

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From Data Minng to

Knowledge Discovery in Databases. American Association for Artificial

Intelligence, 37–54.

Fildes, B. N. & Jarvis., J. (1994). Perceptual Countermeasures: Literature

Review. Technical Report CR4/94, Monash University Accident Research

Centre, Australian Road Research Board.

French, H. T. & Hutchinson, A. (2003). Measurement of Situation-Awareness

in a C4ISR Experiment.

Fuller, R. (2005). Towards a General Theory of Driver Behaviour. In Accident

Analysis and Prevention, volume 37 (pp. 461472). Elsevier.

Gazill, M. & Robe, R. (2003). An Intelligent Vehicle Initiative Road Departure

Crash Warning Field operational test.

238

Page 270: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Glennon, J., Neuman, T., & Leisch, J. (1985). Safet and Operational Consid-

erations for Design of Rural Highway Curves. Report FHWA-RD-86-035,

Federal Highway Administration, McLean, Virginia.

Guyon, O., Matic, N., & Vapnik, N. (1996). Discovering Informative Patterns

and Data Cleaning. In Fayyad, U., Piatetsky-Shapiro, G. G., Smyth, P.,

& Uthurusamy, R. (Eds.), Advances in Knowledge Discovery and Data

Mining, (pp. 181–204). AAAI Press.

Hand, D. (1981). Discrimmination and Classification. Wiley.

Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of Data Mining.

The MIT Press.

Hanger, P. (2003). Engineering Training and Reference Manual.

Haworth, N. & Pronk, L. B. (1997). Characterisitics of Fatal Single Vehnicle

Crashes. Technical Report 120, Monash University, Accident Research

Centre.

Herbert, J. & Yao, J. (2005). Time-Series Data Analysis with Rough Sets. (pp.

908–911). 4th International Conference on Computational Intelligence in

Economics and Finance (CIEF), Salt Lake City,.

Herve, B. (2004/2005). Les Referentiels Techniques et les Champs

D’investigation Necessaires a Lelaboration dun Projet Routier. (pp.3̃6).

Cours de route IUT Bourges.

Highway, M. (2004). Horizontal and Vertical Alignment, chapter 4. Technical

report.

Hillol, K., Ruchita, B., Kun, L., Michael, P., Patrick, B., Samuel, B., James,

D., Kakali, S., Martin, K., Mitesh, V., & David, H. (2004). VEDAS:

A Mobile and Distributed Data Stream Mining System for Real-Time

239

Page 271: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Vehicle Monitoring. In Proceedings of SIAM International Conference on

Data Mining 2004, California.

John, M. & Gary, V. (2008). Road Safety Engineering Risk Assessment: Re-

lationships between Crash Risk and the Standards of Geometric Design

Elements. Technical Report ST1023, ARRB research.

Keall, M. & Frith, W. (2004). Issues in Estimation of Risk Curve Against

Driver BAC Level.

Kloesgen, W. (1996). A Multipattern and Multistrategy Discovery Assistant.

In Fayyad, U., Piatetsky-Shapiro, G. G., Smyth, P., & Uthurusamy, R.

(Eds.), Advances in Knowledge Discovery and Data Mining, (pp. 249–

271). AAAI Press.

Kohavi, R. & Provost, F. (1998). Glossary of terms. In Machine Learning,

volume 30 (pp. 271–274). Kluwer Academic Publishers.

Koperski, K. & Han, J. (1995). Discovery of Spatial Association Rules in Ge-

ographic Information Databases. In Lecture Notes in Computer Science.

Springer.

Krammes, R., Brakett, R., Shafer, M., Otteson, J., Anderson, I., Fink, K.,

Collins, K., Pendleton, O., & Messer, C. (1995). Horizontal Alignment

Design Consistency for Rural Two-Lane Highways. Report FHWA-RD-

94-034, Federal Highways Administartion, MacLean, Virginia.

Krishnaswamy, S. (2008). Rough Sets: An Introduction. PowerPoint slides.

Krishnaswamy, S., Loke, S. W., Rakotonirainy, A., Horovitz, O., & Gaber,

M. M. (2005). Towards Situation-Awareness and Ubiquitous Data Min-

ing for Road Safety: Rationale and Architecture for a Compelling Ap-

plication,. In Proceedings of Conference on Intelligent Vehicles and Road

Infrastructure, The University of Melbourne.

240

Page 272: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Kuhlmann, A., Ralf-Michael, V., Lubbing, C., & Clemens-August, T. (2005).

Data Mining on Crash Simulation Data. Machine Learning and Data

Mining in Pattern Recognition, 3587/2005, 558–569.

Liu, C., Chen, C.-L., Subramanian, R., & Utter, D. (2005). Analysis of

speeding-related fatal motor vehicle traffic crashes. NHTSA Technical Re-

port DOT HS 809 839, Mathematical Statisticians, Mathematical Analysis

Division, National Center for Statistics and Analysis, NHTSA.

Machin, M. A. & Sankey, K. S. (2006). Factors Influencing Young Drivers Risk

Perceptions and Speeding Behaviour. In 2006 Australasian Road Safety

Research, Policing and Education Conference. Gold Coast, Qld.

Machin, M. A. & Sankey, K. S. (2008). Relationships Between Young Drivers

Personality Characteristics, Risk Perceptions, and Driving Behaviour. In

Accident Analysis and Prevention, volume 40 (pp. 541547). Elsevier.

Maroles, E. F., Heredia, D. M., & Rodriguez, A. F. (2002). Mining Road

Accidents. In et. al, C. C. (Ed.), MICAI 2002: Advances in Artifical

Intelligence, volume 2313, (pp. 59–71).

Matthews, L. & Barnes, J. (1988). Relations between Road Environment and

Curve Accidents. In 14th Australian Road Research Board Conference,

Canberra, volume 14, (pp. 150–120). ARRB.

McGee, H. W., Hughes, W. E., & Daily, K. (1995). Effect of Highway Standards

on Safety. Transportation Research Board.

Michon, J. (1985). A Critical View of Driver Behavior Models. What do We

Know, What should We Do? In Human behavior and traffic safety, (pp.

pg 485–525). Plenum press.

Morena, D. A. (2003). Rumbling toward safety.

241

Page 273: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Narula, A. (2005). 80/20 Rule of Communicating Your Ideas Effectively. DK

Publishers Distributiors. PBISBN : 8190174126.

Nguyen, S. H. & Nguyen, H. S. (2003). Analysis of STULONG Data by

Rough Set Exploration System (RSES). Technical report. PKDD/ECML

Discovery Challenge.

NHTSA (2006). Drinking and Driving Data.

OECD (1997). Road Safety Principles and Models: Review of Descriptive

and Predictive Risk and Accident Consequnce Models. Technical Report

GD(97)153, Road Transport Research. Organisation for Economic Co-

opreation and Development.

OECD (2003). Road Safety Impact of New Technologies. OECD Publishing.

Ohrn, A. (2001). ROSETTA Technical Reference Manual.

Oliver, N. & Pentland, A. P. (2000). Driver Behaviour Recognition and Pre-

diction in SmartCar.

Olson, D. L. & Delen, D. (2008). Advanced Data Mining Techniques. Springer.

Palumbo, J. P. & Rees, C. D. (2001). Accident/incident prevention techniques,

chapter Chapter 9 Causal Factor Analysis, (pp. 105–109). CRC Press.

Pande, A. & Abdel-Aty, M. (2006). Application of Data Mining Techniques

for Real-Time Crash Risk Assessment on Freeways. In Applications of

Advanced Technology in Transportation, (pp. 250–256).

Parmar, D., Wu, T., & Blackhurst, J. (2007). MMR: An Algorithm for Clus-

tering Categorical Data Using Rough Set Theory. In Data & Knowledge

Engineering, volume 63, (pp. 879–893). Elsevier Science Publishers B. V.

Pawlak, Z. (1995). Rough Sets. In ACM Conference on Computer Science,

(pp. 262–264).

242

Page 274: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Priedki, B. L., Lowllnski, R. S., Stefanowski, J., Susmaga, R., & Wilk, S.

(1998). ROSE - Software Implementation of the Rough Set Theory. In

Polkowski, L. & Skowron, A. (Eds.), RSCTC’98, volume LNAI 1424, (pp.

605–608). Springer-Verlag Berlin Heidelberg.

PTV, v. (2009). VISSIM.

QT (2005). Road Traffic Crashes in Queensland, A Report on the Road Toll.

Technical report. Queensland, Transport.

QT (2006). Webcrash 2.3. Queensland Transport.

QT (2008a). Driver Reviver. Website.

QT (2008b). New License Laws for Young Drivers in Queensland. Website.

RAC (2007). Western Australia has Increased Speeding Fines for 2007.

Ramadan, N., Halvorson, H., Vande-Linde, A., Levine, S., Helpern, J., &

Welch, K. (1989). Low Brain Magnesium in Migraine. Journal of cerebral

blood flow and metabolism, 29, Pg. 590–593.

Reason, J. (2003). Human Error. Cambridge University Press.

Rechnitzer, G. (2000). Risk Control Systems in Road Safety - Revelant Ap-

plications for the Prevention of Occupational Tramua. Saftety Science

Montiro, 1.

RTA (2008). Speeding Penalties.

Salim, F. D., Seng Wai, L., Rakotonirainy, A., Srinivasan, B., & Krishnaswamy,

S. (2007). Collision Pattern Modeling and Real-Time Collision Detection

at Road Intersections. In Intelligent Transportation Systems Conference,

(pp. 161–166). IEEE Intelligent Transportation Systems Conference.

243

Page 275: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Salim, F. D., Shonali, K., Loke, S. W., & Rakotonirainy, A. (2005). Context-

Aware Ubiquitous Data Mining Based Agent Model for Intersection

Safety.

Sharke, P. (2004). Smart Cars: ADAS (Advance Driving Assistance System).

Shields, B., Morris, A., Jo, B., & Fildes, B. (2001). Australias National Crash

In-depth Study Progress Report. Technical report, Monash University,

Accident Research Centre.

Shinar, D. (2007). Traffic Safety and Human Behavior. Emerald Group Pub-

lishing Limited.

Singh, S. (2001a). Identification of Driver and Vehicle Characteristics through

Data Mining the Highway Crash. Technical report, NCSA, National High-

way Traffic Safety Administration.

Singh, S. (2001b). A Sampling Strategy for Rear-end Pre-crash Data Collec-

tion. Technical report, NCSA, National Highway Traffic Safety Adminis-

tration.

Smyth, P. & Goodman, R. (1990). Rule induction using information theory.

In G. Piarersky & W. Frawley (Eds.), Knowledge Discovery in Databases.

MIT Press.

SPSS (2008). Spss Clementine. 30 Oct. 2008.

Suhana, N. (2007). Generation of Rough Set, Significant Reducts and Rules

for Cardiac Dataset Classification. Master’s thesis, Faculty of Computer

Science and Information System, Universiti Teknologi Malaysia.

Sulaiman, S., Shamsuddin, S. M., & Abraham, A. (2008). An Implementa-

tion of Rough Set in Optimizing Mobile Web Caching Performance. In

UKSIM, Proceedings of the Tenth International Conference on Computer

244

Page 276: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Modeling and Simulation (UKSIM 2008), volume 00, (pp. 655–660). IEEE

Computer Society Washington, DC, USA.

Swanson., D. R. (1991). Complementary Structures in Disjoint Science Lit-

eratures. In 14th Annual International ACM/SIGIR Conference,, (pp.

280–289). ACM/SIGIR.

Torbic, D. J., Harwood, D. W., Gilmore, D. K., Pfefer, R., Neuman, T. R.,

Slack, K. L., & Hardy, K. K. (2004). Guidance for Implementation of

the AASHTO Strategic Highway Safety Plan. Technical report, NCHRP,

National Coorepative Highway Research Program.

Torbic, D. J., Harwood, D. W., Gilmore., D. K., Pfefer, R., Neuman, T. R.,

Slack, K. L., & Hardy, K. K. (2004). A Guide for Reducing Collisions

on Horizontal Curves. Technical Report NCHRP Report 500, volume 7,

National Corporative Highway Research Program,NCHRP.

Treiber, M. (2008). Microsimulation of Road Traffic.

Vaa, T. (2000). Cognition and Emotion in Driver Behaviour Models: Some

Critical Viewpoints.

Vest, A., Stamatiadis, N., Clayton, A., & Pigman, J. (2005). Effects of Warn-

ing Signs on Curve Operating Speeds. Technical Report KTC-05-20/SPR-

259-03-1F, University of Kentucky.

VicRoads (2007). Electronic Stability Control (ESC).

Vinterbo, S. & Ohrn., A. (2000). Minimal Approximate Hitting Sets and Rule

Templates. In International Journal of Approximate Reasoning, volume 25

(pp. 123143).

Wang, W. & Namgung, M. (2007). Applying Rough Set Theory to Find Re-

lationships between Personal Demographic Attributes and Long Distance

245

Page 277: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Travel Mode Choices. 2007 International Conference on Multimedia and

Ubiquitous Engineering(MUE’07).

Wang, X. & He, F. (2006). Improving Intrusion Detection Performance Using

Rough Set Theory and association rule mining. volume 2, (pp. 114–119).

Hybrid Information Technology, 2006. ICHIT ’06. International Confer-

ence.

Weiss, S. I. & Kulikowski, C. (1991). Computer Systems that Learn: Clssifi-

cation and Prediction Methods from Statistics, Neural Networks, Machine

Learning and Expert Systems. Morgan Kaufmann Publishers.

Weka (2008). Weka 3: Data mining software in java.

Welch, K. & Ramadan, N. (1995). Mitochondria, Magnesium and Migraine.

Journal of Neurol Science, 134, Pg. 9–14.

Witten, I. H., Bray, Z., Mahoui, M., & Teahan, B. (1999). Text Mining: A

New Frontier for Lossless Compression. In Proceedings of the Conference

on Data Compression, (pp. 198). IEEE Computer Society Washington,

DC, USA.

Wong, J.-T. & Chung, Y.-S. (2007). Rough Set Approach for Accident Chains

Exploration. In Accident Analysis and Prevention, volume 39 (pp. 629–

637). Elsevier.

World Health, O. (2004). World Report on Road Traffic Injury Prevention.

Technical report.

Yamaha, M. (2000). The Experimental Motorcycle Yamaha ASV-2 Mounting

”Advanced Safety Vehicle” Technologies.

Zador, P. L., Stein, H. S., Wright, P. H., & Hall, J. W. (1987). Effects of

Highway Standards on Safety Chevrons, Post-Mounted Delineators, and

246

Page 278: MINING PATTERNS AND FACTORS …MINING PATTERNS AND FACTORS CONTRIBUTING TO CRASH SEVERITY ON ROAD CURVES Shin Huey Chen BCompSc(Hons), MCS This report is submitted as partial fulfilment

Raised Pavement Markers on Driver Behavior at Roadway Curves. Trans-

portation ResearchRecord 1114, 1–10.

Zembowicz, R. & Zytkow, J. (1996). From Contingency Tables to Various

Forms of Knowledge in Databases. In Fayyad, U., Piatetsky-Shapiro,

G. G., Smyth, P., & Uthurusamy, R. (Eds.), Advances in Knowledge Dis-

covery and Data Mining, (pp. 329–351). AAAI Press.

247


Recommended