DE Presentation v2

Post on 12-Apr-2017

93 views 0 download

transcript

SceneFindrStephanie Stark

Motivation● Interested in hearing live music, but don’t

know where to go?

Pipeline

Data Sources

Data Sources

Data Sources

Data Sources

Data Sources

Pipeline

ETL

Artists

Events

Feature Extraction

K-Means Clusterin

g

Recommendations

Database

Pipeline

Scaling

500gb Artist Data

9 Hours

500gb Event Data

Lessons Learned (the hard way!)● Scala● Parallelized ML algorithms

About Me

B.A., Mount Holyoke CollegeMajor: MathematicsMinor: Computer Science

Education

Interests ReadingArt HistoryHiking

Stephanie Stark

Future WorkImplement TF/IDF compatibility for projectUse PCAImplement cosine similarity for feature clusteringCluster within metro areaUse Redis as a cache for feature vectors