Date post: | 06-Aug-2015 |
Category: |
Technology |
Upload: | amanda-gilbert |
View: | 157 times |
Download: | 0 times |
Created Python MapReduce job to format the data for ingestion Python dictionary to deal with paired data Logic to ignore lines with data issues
Executed a Hadoop streaming job to ingest the dataLoaded the data into tables via hive Ingested FAA’s Aircraft Registry dataRe-ingested data by site
METHODOLOGY
Most frequently reporting crafts
Percentage of records accepted & ingested 99.80% 1924 lines rejected
INITIAL EXPLORATION
Site 1 Site 2
Number of Sightings 563715 449904
Average Speed 342.02 395.37
Average Alt 15919.28 20295.33
SITE COMPARISON
Sampled latitude and longitude data both sitesFound average latitude and longitude for each
Site one: 42.22, -70.85 Site two: 42.12, -71.49
SITE LATITUDE LONGITUDE DATA
CREATE TABLE model_summary AS SELECT mdl_code, make, model, max(speed), max(alt) FROM master_data GROUP BY mdl_code, make, model;
CREATE TABLE aircraft_summary AS SELECT ident, make, model, max(speed), max(alt) FROM master_data GROUP BY ident, make, model;
CREATE TABLE owner_summary AS SELECT owner_name, count(distinct(hexid)) AS count_hex FROM master_data GROUP BY owner_name;
MASTER DATA QUERIES