A-ONE consultants

Post on 17-Feb-2017

25 views 2 download

transcript

A-ONEConsultants

Jessica Morris & Partap Singh

Current Assignment

DATA ANALYSIS

Data Mining Goals

Analyze QVC airtime and sales history to determine the best times to sell certain products on air

Determine which states make the most purchases in order to better geographically target QVCs sales

Determine which brands and products sell the best

Data Provided

Clean the Data

In order to get the data in a format readable by HDFS file types, the data needed to be cleaned

We used a combination of Excel and Powershell to do this

Quotes needed to be removed and dates needed to be formatted as YYYY-MM-DD not MM/DD/YYYY.

Process the Data

A mixture of the Hadoop tools Hive and Impala were used

We ran a combination of queries on the tables including joins and distinct queries to get an idea of the data we were working with

These queries generated the Excel files that we further analyzed in Tableau

In a real world situation, one would not limit themselves to one tool

Example of Hive/Impala Queries

Example of Generated Data

Airtime for each product generated by Impala

Example of Generated Data (cont.)

Hive was used to generate the excel file here. The chart was created in excel and shows the top 25 sales dates...

Example of Generated Data (Cont)

...As well as how many orders contained what products on these dates

Visualization of Data

in Tableau

Visualization - Tableau

Visualization - Tableau

Visualization - Tableau

Visualization - Tableau

Visualization - Tableau

Visualization - Tableau (map)

Visualization - Tableau (map)

Visualization - Tableau (map)

Visualization - Tableau (map)

Thank You!