HADOOP IN A RELATIONAL DATA WAREHOUSE Data and Analytics/Enterprise DW, Expedia June 2013 Arek Kaczmarek
Transcript
1. HADOOP IN A RELATIONAL DATA WAREHOUSE Data
andAnalytics/Enterprise DW, Expedia June 2013 Arek Kaczmarek
2. Background Expedia Site Competitors DW Legacy EDW DNA Hadoop
at Expedia Original Purpose Early expectations
3. A case study Project objective Datasets Competitive shopping
comparisons Properties Bookings Clickstream demand Forecast
4. DW architecture whats different? Normalized vs denormalized
tables Does it matter? Performance Ingestion speed Analytical
flexibility
5. DEV work do you need different skills? Data files: csv, tsv,
txt or xml which work best? Hive: HQL UDFs for analytic functions
do you need them? Optimization reuse your knowledge? Architecture
(temp tables, partitions) HQL (set parameters) Load_tags:
partitioning, appending, syncing
6. RDBMSes and Hadoop whats their relationship? - Syncing from
DB2 - Exporting into HBase - Importing from SQLServer - Exporting
into SQLServer - Exporting into DB2
7. Place of Hadoop in a Relational Data Warehouse? Conflicting
Mutually exclusive Coexisting Complementing
8. Whats the new Data Warehouse for data and analytics?
Complementing: Polyglot Persistence