+ All Categories
Home > Documents > BirdHit Report

BirdHit Report

Date post: 11-Oct-2015
Category:
Upload: divya-vanacharla
View: 35 times
Download: 0 times
Share this document with a friend
Description:
Implemented data mining techniques using R to improve the safety of airplane flights by examining aircraft safety in the context of wildlife strikes.
Popular Tags:

of 16

Transcript

PROJECT REPORT

MIS 6324: Business Intelligence Software & Techniques

Topic: Data Mining from Bird Strikes Data Presented by:- Group 2Aditi SalujaDivya VanacharlaIshan DindorkarRohan PatilValay Raval Under the guidance of Prof. Kelly Slaughter

Final Project MIS 6324 Data Mining Techniques

Group 2Page 1

IntroductionOur aim is to implement data mining techniques on a large dataset (around 20 attributes and 20000 instances) and look out for interesting patterns which we can extract to aide in decision making.For our studies, we have decided to carry out the operations on a dataset "Bird Strike.xlsx". It purports to represent all the Bird Strikes reported from 2000-2011. This dataset is public and listed on Tableau Software Community, which is extracted by Federal Aviation Administration (FAA) (link). As FAA puts it "The FAA Wildlife Strike Database contains records of reported wildlife strikes since 1990. Strike reporting is voluntary. Therefore, this database only represents the information we have received from airlines, airports, pilots, and other sources."By definition, a Bird Strike is a collision between an aircraft and air-borne animals (generally birds). Our interest in this particular database was ignited by the fact that Bird Strike is a major cause of concern for airline industries and Air Traffic Controls around the world. Some major casualties caused by Bird Strikes are as below (source Wikipedia): TheFederal Aviation Administration(FAA) estimates the problem costs US aviation 400 milliondollarsannually and has resulted in over 200 worldwide deaths since 1988 NASA astronautTheodore Freemanwas killed when a goose shattered the Plexiglas cockpit canopy of hisNorthrop T-38 Talon, resulting in shards being ingested by the engines, leading to a fatal crash In 1988Ethiopian Airlines Flight 604suckedpigeonsinto both engines during take-off and then crashed, killing 35 passengers. On September 22, 1995, a U.S. Air ForceBoeing E-3 SentryAWACS aircraft (Callsign Yukla 27, serial number 77-0354), crashed shortly after takeoff fromElmendorf AFB. The aircraft lost power in both port side engines after these engines ingested severalCanada Geese during takeoff. It crashed about two miles (3km) from the runway, killing all 24 crew members on boardKnowing the DataBefore carrying out any kind of data mining activities, it is very important to know the data and exactly understand what it purports to represent. We know that each instance of our dataset represents a reported Bird Strike and all the details related to it. In the below table, we have explained what each attribute in the data represents about the Bird Strike instance:Bird Strikes

AttributesExplanationType

Aircraft_TypeWhat kind of aircraft was involved in the Bird StrikeNominal

Airport_Name On which airport was the strike detected Nominal

Altitude_binOn what altitude was the strike done (1000 ft)Binomial

Aircraft_ModelWhat was the model of the Aircraft struckNominal

Wildlife_Number_struckNumber of wildlife struck in the instanceRange

Effect_Impact_to_flightWhat was the impact of Strike on the flight, if anyNominal

Record_IDUnique record ID for each incidentNumerical

Effect_Indicated_DamageWhat was the damage caused to the Aircraft Binomial

Aircraft_Number_of_engines.Number of engines in the Aircraft struckNumerical

Aircraft_Airline.OperatorAirline operator of the aircraft struckNominal

Origin_StateUS State in which strike occurred on the aircraftNominal

When_Phase_of_flightDuring which phase of flight did strike occurrNominal

Conditions_PrecipitationPrecipitation condition during the strikeNominal

Remains_of_wildlife_collectedWas the wildlife remain collected or not?Binomial

Wildlife_SizeSize of the wildlife struck (Small, Medium, Large)Range

Conditions_SkyCondition of the sky during strike Nominal

Wildlife_SpeciesWhich species of the wildlife was struck?Nominal

Pilot_warnedWas the pilot warned of the possible strike?Binomial

Feet_above_groundAt what feet above the ground was the aircraftNumerical

Speed (in Knots)Speed of aircraft during the strikeNumerical

By graphing the attribute values of the database, we can answer some basic questions from the dataset:

a) Frequency of wildlife species where damage was caused(most common)

b) Frequency of strikes for each state

c) Plotting of Species & States for each strike

The graphs gives us the below following results about the dataset:

1) The species which have the highest frequency to strike the airplanes is Canada Goose (after Unknown category)

2) Florida has the highest frequency of bird strikes, followed by Colorado and Texas

3) Even though Canadian Goose has the highest frequency for Bird Strikes (distributed across all the states), when we plot State Vs Species graph we see that Turkey Vulture strikes are highly prominent in Florida.

Data CleaningThe next important step after knowing the data and before carrying out data mining activities, is to "clean" the data. As the name suggests, cleaning the data is getting rid of non-required instances. It helps improving the data quality and reducing negative impacts of errors. In our database, we will be cleaning the dataset by:a) Removing instances having NULL/ Blank valuesBy manual scanning, we can detect that number of instances have Blank values. We will address this issue by first converting the blank values by "NA" and then removing all the instances which have one or more "NA" values in them.Note: We will not be replacing the NULL values with mean values because the attributes having NULL values are all Nominal.R commands:> Birds_Strikes Bird_Strikes[Bird_Strikes == ""] Bird_Strikes$Origin_State[Bird_Strikes$Origin_State == "N/A"] Bird_Strikes Bird_Strikes$Feet_above_ground Bird_Strikes$Record_ID Bird_Strikes$Speed Bird_strikes_trans BirdtypeRules BirdRules_caused 1.0))> inspect(sort(BirdRules_Damage, by = "confidence")[1:10]) lhs rhs support confidence lift1 {Aircraft_Type=Airplane, Effect_Impact_to_flight=Precautionary Landing, Wildlife_Size=Large} => {Effect_Indicated_Damage=Caused damage} 0.01099387 0.8265683 7.2187432 {Aircraft_Type=Airplane, Effect_Impact_to_flight=Precautionary Landing, Conditions_Precipitation=None, Wildlife_Size=Large} => {Effect_Indicated_Damage=Caused damage} 0.01035583 0.8210117 7.1702163 {Effect_Impact_to_flight=Precautionary Landing, Wildlife_Size=Large} => {Effect_Indicated_Damage=Caused damage} 0.01207362 0.8200000 7.1613804 {Effect_Impact_to_flight=Precautionary Landing, Conditions_Precipitation=None, Wildlife_Size=Large} => {Effect_Indicated_Damage=Caused damage} 0.01138650 0.8140351 7.1092865 {Aircraft_Airline.Operator=BUSINESS, Remains_of_wildlife_collected=FALSE, Wildlife_Size=Large, Pilot_warned=N} => {Effect_Indicated_Damage=Caused damage} 0.01006135 0.7620818 6.6555586 {Aircraft_Type=Airplane, Aircraft_Airline.Operator=BUSINESS, Conditions_Precipitation=None, Wildlife_Size=Large, Pilot_warned=N} => {Effect_Indicated_Damage=Caused damage} 0.01060123 0.7578947 6.6189917 {Aircraft_Type=Airplane, Aircraft_Airline.Operator=BUSINESS, Wildlife_Size=Large, Pilot_warned=N} => {Effect_Indicated_Damage=Caused damage} 0.01168098 0.7555556 6.5985628 {Aircraft_Airline.Operator=BUSINESS, Conditions_Precipitation=None, Wildlife_Size=Large, Pilot_warned=N} => {Effect_Indicated_Damage=Caused damage} 0.01128834 0.7516340 6.5643139 {Aircraft_Airline.Operator=BUSINESS, Wildlife_Size=Large, Pilot_warned=N} => {Effect_Indicated_Damage=Caused damage} 0.01241718 0.7507418 6.55652210 {Aircraft_Type=Airplane,Aircraft_Number_of_engines.=1,Wildlife_Size=Large}=> {Effect_Indicated_Damage=Caused damage} 0.01011043 0.7304965 6.379711From some of the interesting rules above, we can conclude that maximum times damage is caused to the aircraft during Bird Strikes when:a) Wildlife_Size = Largeb) Pilot_warned = Nc) Effect_Impact_to_flight = Precautionary LandingThese can be useful results for airlines operators and ATC to take precautionary measures, for example We can see that damages are caused most when Pilots are not warned about the possibility of Bird Strike by ATC.To see which species cause the damage most number of times, we will run another small Apriori algorithm on Wildlife_species and Effect_Indicated_Damage:> Bird_Type Bird_Type_trans BirdtypeRules BirdRules_caused 1.0)> inspect(BirdRules_caused)

lhs rhs support confidence lift1 {Bird_Strikes.Wildlife_Species=Turkey vulture} => {Bird_Strikes.Effect_Indicated_Damage=Caused damage} 0.005531591 0.6158192 5.4660892 {Bird_Strikes.Wildlife_Species=Canada goose} => {Bird_Strikes.Effect_Indicated_Damage=Caused damage} 0.009489977 0.6032258 5.354308

The above results show that damage to the aircraft is caused most of the times during a Bird Strike when Wildlife_Species = Turkey vulture OR Canada goose. The ATC can run special operations to relocate these species from surrounding areas of the airport.

Logistic RegressionFor predicting the result on whether any Bird Strike will cause damage or not, we will be using Logistic Regression on the dataset. We generally use Logistic Regression on cases where our predictive variable is binomial, like True or False, Yes or No etc.In our case, the dependent variable will be "Effect_Indicated_Damage" and the independent variables will be rest of the attributes.

6. Classification: Decision Tree

Through this technique, we tried predicting values (Cause Damage, No Damage) of dependent variable - Effect_Indicated_Damage. We divided, our dataset into two parts training data & test data. With the help of training data, we created a predictor model by executing tree function. Then, we tried predicting values of Effect_Indicated_Damage in test dataset by applying predictor model using command predict. This exercise helped us in classifying casualty cases on the basis of occurrence of damage and gauging efficiency of predictor model if it is applied on unknown dataset. R commands:> install.packages("rpart")> install.packages("tree")> library(rpart)> library(tree)> Bird_Training_DataSet$Aircraft_Type Bird_Training_DataSet$Altitude_bin Bird_Training_DataSet$Aircraft_Model Bird_Training_DataSet$Wildlife_Number_struck Bird_Training_DataSet$Effect_Impact_to_flight Bird_Training_DataSet$Effect_Indicated_Damage Bird_Training_DataSet$Aircraft_Number_of_engines. Bird_Training_DataSet$Origin_State Bird_Training_DataSet$When_Phase_of_flight Bird_Training_DataSet$Conditions_Precipitation Bird_Training_DataSet$Wildlife_Size Bird_Training_DataSet$Conditions Bird_Training_DataSet$Wildlife_Species Bird_Training_DataSet$Pilot_warned Bird_Training_DataSet$Feet_above_ground Bird_Training_DataSet$Speed Bird_DTModel summary(Bird_DTModel)Record_ID - Origin_State - Conditions_Precipitation - Aircraft_Type - X.1 - Conditions - Speed - X, data = Bird_Training_DataSet)Variables actually used in tree construction:[1] "Wildlife_Size" "Effect_Impact_to_flight" "Wildlife_Species" Number of terminal nodes: 6 Residual mean deviance: 0.08454 = 844.2 / 9985 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.96440 0.03564 0.03564 0.00000 0.14980 0.80840 After creating model with training data, now we apply model to test data in order to determine efficiency of model,> Bird_Test_DataSet Bird_Test_DataSet$Aircraft_Type Bird_Test_DataSet$Altitude_bin Bird_Test_DataSet$Aircraft_Model Bird_Test_DataSet$Wildlife_Number_struck Bird_Test_DataSet$Effect_Impact_to_flight Bird_Test_DataSet$Effect_Indicated_Damage Bird_Test_DataSet$Aircraft_Number_of_engines. Bird_Test_DataSet$Origin_State Bird_Test_DataSet$When_Phase_of_flight Bird_Test_DataSet$Conditions_Precipitation Bird_Test_DataSet$Wildlife_Size Bird_Test_DataSet$Conditions Bird_Test_DataSet$Wildlife_Species Bird_Test_DataSet$Pilot_warned Bird_Test_DataSet$Feet_above_ground Bird_Test_DataSet$Speed Bird_PredModel summary(Bird_PredModel)> Bird_PredModel Damage Results mean(Result == Damage)[1] 0.7425304Explanation for converting columns into numeric types

On executing tree command, without converting columns to numeric type, we got following error,> Bird_DTModel_Tree


Recommended