Download - Intelligent Heart Disease Prediction System

Intelligent Heart Disease Prediction System Using Naïve Bayes

Synopsis

A major challenge facing healthcare organizations (hospitals, medical centers) is

the provision of quality services at affordable costs. Quality service implies diagnosing

patients correctly and administering treatments that are effective. Poor clinical decisions

can lead to disastrous consequences which are therefore unacceptable. Hospitals must

also minimize the cost of clinical tests. They can achieve these results by employing

appropriate computer-based information and/or decision support systems.

Most hospitals today employ some sort of hospital information systems to manage

their healthcare or patient data. These systems are designed to support patient billing,

inventory management and generation of simple statistics. Some hospitals use decision

support systems, but they are largely limited.

Clinical decisions are often made based on doctors’ intuition and experience

rather than on the knowledge rich data hidden in the database. This practice leads to

unwanted biases, errors and excessive medical costs which affects the quality of service

provided to patients.

The main objective of this research is to develop a Intelligent Heart Disease

Prediction System using three data mining modeling technique, namely, Naïve Bayes. It

is implemented as web based questionnaire application .Based on the user answers, it can

discover and extract hidden knowledge (patterns and relationships) associated with heart

disease from a historical heart disease database. It can answer complex queries for

diagnosing heart disease and thus assist healthcare practitioners to make intelligent

clinical decisions which traditional decision support systems cannot. By providing

effective treatments, it also helps to reduce treatment costs.

HARDWARE CONFIGURATION

• Intel Pentium IV

• 256/512 MB RAM

• 1 GB Free disk space or greater

• 1 GB on Boot Drive

• 17” XVGA display monitor

• 1 Network Interface Card (NIC

SOFTWARE CONFIGURATION

• MS Windows XP/2000

• MS IE Browser 6.0/later

• MS DotNet Framework 2.0

• MS Visual Studio.Net 2005

• Internet Information Server (IIS)

• MS SQL Server 2000

• Windows Installer 3.1

SOFTWARE FEATURES

C#.Net

C# is an object-oriented programming language developed by Microsoft as part of the

.Net initiative. C# is intended to be a simple, modern, general-purpose, object-oriented

programming language. Because software robustness, durability and programmer

productivity are important, the language should include strong type checking, array

bounds checking, detection of attempts to use uninitialized variables, source code

portability, and automatic garbage collection.

C# is intended to be suitable for writing applications for both hosted and embedded

systems, ranging from the very large that use sophisticated operating systems, down to

the very small having dedicated functions. C# applications are intended to be economical

with regards to memory and processing power requirements. Programmer portability is

the very important feture of C#.

C# compiler could generate machine code like traditional compilers of C++ or

FORTRAN; in practice, all existing C# implementations target Common Language

Infrastructure (CLI). C# is more type safe than C++. The only implicit conversions by

default are those which are considered safe, such as widening of integers and conversion

from a derived type to a base type.

C# is the programming language that most directly reflects the underlying Common

Language Infrastructure (CLI). Most of C# intrinsic types correspond to value-types

implemented by the CLI framework. C# supports a strict boolean type, bool Statements

that take conditions, such as while and if, require an expression of a boolean type.

FEATURES OF C#.NET

The Visual Studio.Net is a tool rich programming environment containing all the

functionality for handling C# projects.

The .Net integrated development environment provides enormous advantages for

the programmers.

C# is directly related to C, C++ and Java. C # is a case sensitive language and it is

designed to produce portable code.

C# includes features that directly support the constituents of components such as

properties, methods and events.

C# is an object oriented language which supports all object oriented programming

(OOP’s) concepts such as encapsulation, polymorphism and inheritance.

Encapsulation is a programming mechanism that binds code and the data together.

It manipulates and keeps both safe from outside interference and misuse.

Polymorphism is the quality that allows one interface to access a general class of

action.

Inheritance is the process by which one object can acquire the properties of

another object.

Intellisense displays the name of every members of a class.

C# allows us for creating both managed and unmanaged applications.

Interoperability, simplicity, performance, cross language integration and language

independent are important features of C#.

Multiple inheritance is not supported, although a class can implement any number

of interfaces.

There are no global variables or functions. All methods and members must be

declared within classes.

FEATURES OF WINDOWS XP

The major feature of the Windows XPprofessional

Reliable

Easy to use and Maintain

Internet Ready

Windows File protection

Protects core system files from being overwritten by application installs. In the

event a file is overwritten, Windows File Protection will replace that file with the correct

version. By safeguarding system files in this manner, Windows XP mitigates many

common system failures found in earlier versions of windows.

Driver Certification

Provides safeguards and assure that device drivers have not been tampered with

and reducing the risk of installing non-certified drivers.

Full 32-bit Operating System

Minimizes the chance of application failures and unplanned reboots.

Microsoft Installer

Works with the Windows Installer service, helping users install, configure, track,

upgrade, and remove software programs correctly, minimizing the risks of user error and

possible loss of productivity.

System Preparation Tool (SysPrep)

Help administrator clone computer configurations, systems, and applications,

resulting in simplifier, faster, more cost-effective deployment.

Remote OS Installation

Permits Windows XP Professional to be installed across the network (including

SysPrep images). Remote OS Installation saves time and reduces deployment.

Multilingual Support

Allows users to easily create, read, and edit documents in hundreds of languages.

Faster Performance

Provides 25 percent faster performance than Widows 95 and Windows 98 on

system with 64 megabytes (MB) or more of memory.

Faster Multitasking

Uses a full 32-bit architecture, allowing you to run more programs and perform

more tasks at the same time than Windows 95 or Windows 98.

Scalable Memory and Processor Support

Supports up to 4 gigabytes (GB) of RAM and up to two symmetric

multiprocessors.

Peer-to-Peer Support for windows 95/98 and Windows NT

Enables windows 2000 professional to interoperate with earlier versions of

Windows on a peer-to-peer level, allowing the sharing of all resources, such as folders,

printers and peripherals.

Internet Information Services (IIS) 5.0

Includes web and FTP server support, as well as support for Front Page

transactions, Active server Pages, and database connections. Available as an optional

component, IIS 5.0 is installed automatically for those upgrading from versions of

Windows with Personal Web Server installed.

Search Bar

Helps user to quickly search for different types of information, such as Web pages

or people addresses. And choose which search engine you want to use – all from one

location.

History Bar

Helps user to find the way back to sites viewed in the past. The History bar not

only tracks Web sites, but also intranet sites, network servers and local folders.

Favorites

Helps user to find and organize relevant information whether it’s stored in files,

folders or Web sites.

Strong Development Platform

Support for Dynamic HTML Behaviors and XML gives developers the broadest

range of options – with the fastest development time.

SQL SERVER 2005

The SQL server web data administrator enables us to easily manage

our SQL Server data, wherever we are using its built-in features, we can do the following

from Microsoft Internet Explorer or our favorite web browser. Microsoft SQL Server

2000 introduces several server improvements and new features:

XML Support

The relational database engine can return data as Extensible Markup Language (XML)

documents. Additionally, XML can also be used to insert, update, and delete values in the

database.

Federated Database Servers

SQL Server 2000 supports enhancements to distributed partitioned views that allow you

to partition tables horizontally across multiple servers. This allows you to scale out one

database server to a group of database servers that cooperate to provide the same

performance levels as a cluster of database servers. This group, or federation, of database

servers can support the data storage requirements of the largest Web sites and enterprise

data processing systems.

SQL Server 2000 introduces Net-Library support for Virtual Interface Architecture (VIA)

system-area networks that provide high-speed connectivity between servers, such as

between application servers and database servers.

Existing system

Clinical decisions are often made based on doctors’ intuition and experience

rather than on the knowledge rich data hidden in the database.

This practice leads to unwanted biases, errors and excessive medical costs which

affects the quality of service provided to patients.

There are many ways that a medical misdiagnosis can present itself. Whether a

doctor is at fault, or hospital staff, a misdiagnosis of a serious illness can have

very extreme and harmful effects.

The National Patient Safety Foundation cites that 42% of medical patients feel

they have had experienced a medical error or missed diagnosis. Patient safety

is sometimes negligently given the back seat for other concerns, such as the cost

of medical tests, drugs, and operations.

Medical Misdiagnoses are a serious risk to our healthcare profession. If they

continue, then people will fear going to the hospital for treatment. We can put an

end to medical misdiagnosis by informing the public and filing claims and suits

against the medical practitioners at fault.

Proposed Systems

This practice leads to unwanted biases, errors and excessive medical costs which

affects the quality of service provided to patients.

Thus we proposed that integration of clinical decision support with computer-

based patient records could reduce medical errors, enhance patient safety,

decrease unwanted practice variation, and improve patient outcome.

This suggestion is promising as data modeling and analysis tools, e.g., data

mining, have the potential to generate a knowledge-rich environment which can

help to significantly improve the quality of clinical decisions.

The main objective of this research is to develop a prototype Intelligent Heart

Disease Prediction System (IHDPS) using three data mining modeling techniques,

namely, Decision Trees, Naïve Bayes and Neural Network.

So its providing effective treatments, it also helps to reduce treatment costs. To

enhance visualization and ease of interpretation,

Modules:

Analyzing the Data set:

A data set (or dataset) is a collection of data, usually presented in tabular form.

Each column represents a particular variable. Each row corresponds to a given member of

the data set in question. It lists values for each of the variables, such as height and weight

of an object or values of random numbers. Each value is known as a datum. The data set

may comprise data for one or more members, corresponding to the number of rows.

The values may be numbers, such as real numbers or integers, for example

representing a person's height in centimeters, but may also be nominal data (i.e., not

consisting of numerical values), for example representing a person's ethnicity. More

generally, values may be of any of the kinds described as a level of measurement. For

each variable, the values will normally all be of the same kind. However, there may also

be "missing values", which need to be indicated in some way.

A total of 500 records with 15 medical attributes (factors) were obtained from

the Heart Disease database lists the attributes. The records were split equally into two

datasets: training dataset (455 records) and testing dataset (454 records). To avoid bias,

the records for each set were selected randomly.

The attribute “Diagnosis” was identified as the predictable attribute with value

“1” for patients with heart disease and value “0” for patients with no heart disease. The

attribute “PatientID” was used as the key; the rest are input attributes. It is assumed that

problems such as missing data, inconsistent data, and duplicate data have all been

resolved.

Here in our project we get a data set from .dat file as our file reader program will

get the data from them for the input of Naïve Bayes based mining process.

Naives Baye’s Implementation in Mining:

I recommend using Probability For Data Mining for a more in-depth introduction

to Density estimation and general use of Bayes Classifiers, with Naive Bayes Classifiers

as a special case. But if you just want the executive summary bottom line on learning and

using Naive Bayes classifiers on categorical attributes then these are the slides for you.

Bayes' Theorem finds the probability of an event occurring given the probability

of another event that has already occurred. If B represents the dependent event and A

represents the prior event, Bayes' theorem can be stated as follows.

Bayes' Theorem:

Prob(B given A) = Prob(A and B)/Prob(A)

To calculate the probability of B given A, the algorithm counts the number of cases

where A and B occur together and divides it by the number of cases where A occurs

alone.

Applying Naïve Bayes to data with numerical attributes and using the Laplace

correction (to be done at your own time, not in class)( data with some numerical

attributes), predict the class of the following new example using Naïve Bayes

classification: with some numerical attributes), predict the class of the following new

example using Naïve Bayes classification:

http://www.cs.cmu.edu/~awm/tutorials/prob.html

Designing the Questionnaire:

Questionnaires have advantages over some other types of medical symptoms that

they are cheap, do not require as much effort from the questioner as verbal or telephone

surveys, and often have standardized answers that make it simple to compile data.

However, such standardized answers may frustrate users. Questionnaires are also sharply

limited by the fact that respondents must be able to read the questions and respond to

them.

Here our questionnaire is based on the attribute given in the data set, so the our

questionnaire contains :

Input attributes

1. Sex (value 1: Male; value 0 : Female)

2. Chest Pain Type (value 1: typical type 1 angina, value 2: typical type angina, value 3:

non-angina pain; value 4: asymptomatic)

3. Fasting Blood Sugar (value 1: > 120 mg/dl; value 0:< 120 mg/dl)

4. Restecg – resting electrographic results (value 0: normal; value 1: 1 having ST-T wave

abnormality; value 2: showing probable or definite left ventricular hypertrophy)

5. Exang – exercise induced angina (value 1: yes; value 0: no)

6. Slope – the slope of the peak exercise ST segment (value 1: unsloping; value 2: flat;

value 3: downsloping)

7. CA – number of major vessels colored by floursopy (value 0 – 3)

8. Thal (value 3: normal; value 6: fixed defect; value 7:reversible defect)

9. Trest Blood Pressure (mm Hg on admission to the hospital)

10. Serum Cholesterol (mg/dl)

11. Thalach – maximum heart rate achieved

12. Oldpeak – ST depression induced by exercise relative to rest

13. Age in Year

14. Height in cms

15. Weight in Kgs.

Heart Disease In WEB

In our Heart disease development the modeling and the standardized notations

allow to express complex ideas in a precise way, facilitating the communication among

the project participants that generally have different technical and cultural knowledge.

MVC architecture has had wide acceptance for corporation software development.

It plans to divide the system in three different layers that are in charge of interface control

logic and data access, this facilitates the maintenance and evolution of systems according

to the independence of the present classes in each layer. With the purpose of illustrating a

successful application built under MVC, in this work we introduce different phases of

analysis, design and implementation of a database and web application.

ASP.Net Web Application is a group of interrelated web development techniques

used for creating interactive web applications or rich Internet applications. With ASP.Net

Web Application, web applications can retrieve data from the server asynchronously in

the background without interfering with the display and behavior of the existing page.

In many cases, the pages on a website consist of much content that is common

between them. Using traditional methods, that content would have to be reloaded on

every request. However, using ASP.Net Web Application, a web application can request

only the content that needs to be updated, thus drastically reducing bandwidth usage and

load time. The use of asynchronous requests allows the client's Web browser UI to be

more interactive and to respond quickly to inputs, and sections of pages can also be

reloaded individually. Users may perceive the application to be faster or more responsive,

even if the application has not changed on the server side. The use of ASP.Net Web

Application can reduce connections to the server, since scripts and style sheets only have

to be requested once.

Models for Data Mining

1. Business understanding2. Data understanding3. Data preparation4. Modeling5. Evaluation6. Deployment

Data Flow Diagram

Heart_disease_297

Field Name Data Type Not NullID (PK) Int(4) Yes

age Numeric(9) No

sex Char(50) No

Chest_pain_type Numeric(9) No

Trest_blood_pressure Numeric(9) No

Serum_cholestorral Numeric(9) No

Fasting_blood_sugar Numeric(9) No

Resting_electrocardiographicNumeric(9) No

Maximum_heart_rate Numeric(9) No

Exerice_induced_angina Numeric(9) No

St_depression_induced Numeric(9) No

Slope_of_the_peak Numeric(9) No

Number_of major_vessels Numeric(9) No

thal Numeric(9) No

output Numeric(9) No

SYSTEM TESTING AND IMPLEMENTATION

The system testing verifies the whole set of programs that hang together. Before

the system is acceptable by the user testing is very important, it eliminates

communicational problem, programmers negligence or time constraints, which causes

error. The strategies for testing include unit testing, integration testing, system testing,

implementation testing.

SYSTEM TESTING

Testing is a series of different tests that whose primary purpose is to fully exercise

the computer based system. Although each test has a different purpose, all work should

verify that all work should verify that all system element have been properly integrated

and performed allocated function. Testing is the process of checking whether the

developed system works. According to the actual requirement and objectives of the

system.

The philosophy behind testing is to find the errors. A good test is one that

undiscovered error. Test cases are devised with this purpose in mind. Test cases are is a

set of data that the system will process as an input. However the data are created with the

intent of determining whether the system will process them correctly without any errors

to produce the required output.

Testing could be viewed as destructive rather than constructive. It is the process

of executing a programmed with the intend of finding errors. The testing is one that will

uncover different classes of errors with minimum amount of time and effort. In the

proposed system testing is done. Testing is performed to ensure that software function

appear to be working according to the specifications and that the performance

requirement of the system.

Testing could be viewed as destructive rather than constructive. It is the process

of executing a programmed with the intend of finding errors. The testing is one that will

uncover different classes of errors with minimum amount of time and effort.

In the proposed system testing is done. Testing is performed to ensure that

software function appear to be working according to the specifications and that the

performance requirement of the system.

Testing Methodologies

Black Box Testing

Black box testing also called behavioral testing focuses on the functional

requirements of the software. That is black box testing enables the software engineer to

derive sets of input conditions that will fully exercise all functional requirements for a

program. Black box testing attempts to find errors in the following categories. Incorrect

or missing functions. Interface errors. Errors in data structures or external data base

access Behavior or performance errors. Initialization and termination errors.

Functional Testing and black box type testing geared to functional requirements

of an application. This type of testing should be done by testers. Our project does the

functional testing of what input given and what output should be obtained.

System Testing-black box type testing that is based on overall requirements

specifications; covers all combined parts of a system. The system testing to be done here

is that to check with all the peripherals used in the project.

Stress Testing-term often used interchangeably with ‘load’ and ‘performance’

testing. Also used to describe such tests as system functional testing while under

unusually heavy loads, heavy repletion of certain actions or inputs, input of large

numerical values.

Performance Testing-term often used interchangeably with ‘stresses’ and ‘load’

testing. Ideally ‘performance’ testing is defined in requirements documentation or QA or

Test Plans.

White Box Testing

White box testing sometimes called glass box testing is a test case design method

that uses the control structure of the procedural design to derive test cases. Using white

box testing methods, the software engineer can derive test cases that guarantee that all

independent paths within a module have been exercised at least once. Exercise all logical

decisions on their true and false sides. Execute all loops at their boundaries and within

their operational bounds. Exercise internal data structures to ensure their validity.

Unit Testing

The most ‘micro’ scale of testing to test particular functions or code modules.

Typically, it is done by the programmer and not by tester, as it requires detailed

knowledge of the internal program design and code. Not always easily done unless the

application has a well designed architecture with tight code; may require developing test

modules or test harnesses.

Quality Assurance

Software Quality Assurance involves the entire software development process-

monitoring and improving the process, making sure that any agreed-upon standards and

procedures are followed, and ensuring that problems are found and dealt with. It is

oriented to ‘prevention’.

Software Life Cycle

The life cycle begins when an application is first conceived and ends when it is no

longer in use. It includes aspects such as initial concept, requirements analysis, functional

design, internal design, documentation planning, test planning, coding, document

preparation, integration, testing, maintenance, updates, retesting, phase-out, and other

aspects.

Verification and Validation

Verification refers to the set of activities that ensure that software correctly

implements a specific function. Validation refers to a different set of activities that

ensures that the software has been built is traceable to customer requirements.

Verification and validation encompasses a wide array of SQA activities that

include formal technical reviews, quality and configuration audits, performance

monitoring, simulation, feasibility study, documentation review, database review,

algorithm analysis, development testing, qualification testing and installation testing.

SYSTEM IMPLEMENTATION

System implementation is a stage in a stage in the project where the where the

theoretical designs turned into working system. The most crucial stage the user

confidence that the new system will work effectively and efficiently.

The performance of reliability of the system was tested and it gained acceptance.

The system was implemented successfully. Implementation is a process that means

converting a new system into operation.

Proper implementation is essential to provide a reliable system to meet

organization requirements. During the implementation stage a live demon was

undertaken and and made in front of end-users.

Implementation is a stage of project when the system design is turned into a

working system. The stage consists of the following steps.

Testing the developed program with sample data.

Detection and correction of internal error.

Testing the system to meet the user requirement.

Feeding the real time data and retesting.

Making necessary change as described by the user.

Conclusion

A prototype heart disease prediction system is developed using three data mining

classification modeling techniques. The system extracts hidden knowledge from a

historical heart disease database. DMX query language and functions are used to build

and access the models. The models are trained and validated against a test dataset. Lift

Chart and Classification Matrix methods are used to evaluate the effectiveness of the

models. All three models are able to extract patterns in response to the predictable state.

The most effective model to predict patients with heart disease appears to be Naïve Bayes

followed by Neural Network and Decision Trees. Five mining goals are defined based on

business intelligence and data exploration. The goals are evaluated against the trained

models. All three models could answer complex queries, each with its own strength with

respect to ease of model interpretation, access to detailed information and accuracy.

Naïve Bayes could answer four out of the five goals; Decision Trees, three; and Neural

Network, two. Although not the most effective model, Decision Trees results are easier to

read and interpret. The drill through feature to access detailed patients’ profiles is only

available in Decision Trees. Naïve Bayes fared better than Decision Trees as it could

identify all the significant medical predictors. The relationship between attributes

produced by Neural Network is more difficult to understand. IHDPS can be further

enhanced and expanded. For example, it can incorporate other medical attributes besides

the 15 listed in Figure 1. It can also incorporate other data mining techniques, e.g., Time

Series, Clustering and Association Rules. Continuous data can also be used instead of

just categorical data. Another area is to use Text Mining to mine the vast amount of

unstructured data available in healthcare databases. Another challenge would be to

integrate data mining and text mining

Future Enhancements

As we get a data set , we proceed our work with Implementing Naives Baye’s

algorithms.

Formula used in them are

◦ Prob(B given A) = Prob(A and B) / Prob(A)

To get input from client part we yet prepare a questionnaire (for the patient).

Modeling and the standardized notations used allow to express complex ideas in a

precise way through a WEB.

BIBLIOGRAPHY

1. Simon Robinson, Christian Nagel, Karli Watson, “PROFESSIONAL C#”,

Wiley Dreamtech India Pvt ltd., third edition.

2. Andy Harris, “MICROSOFT C# PROGRAMMING”, Prentice hall of India Pvt

ltd.,

3. Roger S.Pressman, “SOFTWARE ENGINEERING”, TataMcGraw Hill

Publications, fifth edition.

4. Elias M.Award’s, “SYSTEM ANALYSIS AND DESIGN”, Galgotia

Publications Private Limited Companies, 1997 Edition.

5. Herbert Schildt, “THE COMPLETE REFERNCE C# 2.0”, TataMcGraw Hill

Publications, second edition.

6. V.K.Jain, “THE COMPLETE GUIDE TO C# PROGRAMMING”,

Dreamtech press.