+ All Categories
Home > Data & Analytics > Hannah Aizenman - Get To Know Your Data

Hannah Aizenman - Get To Know Your Data

Date post: 08-Jul-2015
Category:
Upload: pydata
View: 262 times
Download: 1 times
Share this document with a friend
Description:
A recent article in the New York Times estimates that data scientists spend somewhere between %50 and %80 of their time "collecting and preparing unruly digital data" before they ever get to the analysis. Data is often badly labeled, inconsistently sampled, incorrect in strange places, missing, and otherwise contains a whole host of errors, leading to the "garbage in, garbage out" problem. While detecting the myriad ways in which the data is broken can sometimes be difficult, traditional visualization techniques, exploratory data analytics, and cluster analysis can help. This talk will discuss some of the typical methods for sanity checking small data sets: visualization, simple statistics, and some basic combinations of the two. This talk will then veer into some machine learning techniques for exploring the underlying structure of larger data sets to verify the occurrence of known patterns and to detect outliers that could be due to errors rather than the occurance of something interesting.
25
Get To Know Your Data Hannah Aizenman @story645
Transcript
Page 1: Hannah Aizenman - Get To Know Your Data

Get To Know Your Data

Hannah Aizenman@story645

Page 3: Hannah Aizenman - Get To Know Your Data

Unprocessed Data

Page 4: Hannah Aizenman - Get To Know Your Data

Missing Observations

Page 5: Hannah Aizenman - Get To Know Your Data

Misused Technique

Page 6: Hannah Aizenman - Get To Know Your Data

Start?

Page 7: Hannah Aizenman - Get To Know Your Data

Research

Page 8: Hannah Aizenman - Get To Know Your Data

Explore Attributes

Page 9: Hannah Aizenman - Get To Know Your Data

Take Snapshots

Page 10: Hannah Aizenman - Get To Know Your Data

Plot

Page 11: Hannah Aizenman - Get To Know Your Data

Label

Page 12: Hannah Aizenman - Get To Know Your Data

Rearrange

Page 13: Hannah Aizenman - Get To Know Your Data

Higher D Data: Plot 1 Dim

Page 14: Hannah Aizenman - Get To Know Your Data

Plot Another Dim (or 2)

Page 15: Hannah Aizenman - Get To Know Your Data

Fix that Plot

Page 16: Hannah Aizenman - Get To Know Your Data

Histogram

Page 17: Hannah Aizenman - Get To Know Your Data

Min, Max, Mean, Median

Page 18: Hannah Aizenman - Get To Know Your Data

Too Much Data

Page 19: Hannah Aizenman - Get To Know Your Data

Multivariate Relationships

Page 20: Hannah Aizenman - Get To Know Your Data

Multivariate Relationships With Classes

Page 21: Hannah Aizenman - Get To Know Your Data

Known Patterns

Page 22: Hannah Aizenman - Get To Know Your Data

Expected Values

Page 23: Hannah Aizenman - Get To Know Your Data

Look For Structure

Page 24: Hannah Aizenman - Get To Know Your Data

Incorporate Outside Knowledge

Page 25: Hannah Aizenman - Get To Know Your Data

Weave it All Together


Recommended