Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm
INTEL CONFIDENTIAL2
Why big data memory characterization?• Workloads, Methodology and Metrics• Measurements and results• Conclusion and outlook
Agenda
INTEL CONFIDENTIAL3
Why big data memory characterization?
• Studies show exponential data growth to come.
• Big Data: information from unstructured data
• Primary technologies are Hadoop and NoSQL
INTEL CONFIDENTIAL4
Large data volumes can put pressure on the memory subsystemOptimizations tradeoff CPU cycles to reduce load on memory, ex: compression
Why big data memory characterization?
Important to understand memory usages of big data
PowerMemory consumes upto 40% of total server power
PerformanceMemory latency,
capacity, bandwidth are important
INTEL CONFIDENTIAL5
Why big data memory characterization?
How do latency-hiding optimizations apply to big data workloads?
DRAM scaling is hitting limits
Emerging memories have higher latency
Focus on latency hiding optimizations
INTEL CONFIDENTIAL6
Executive Summary
• Provide insight into memory access characteristics of big data applications
• Examine implications on prefetchability, compressibility, cacheability
• Understand impact on memory architectures for big data usage models
INTEL CONFIDENTIAL7
• Why big data memory characterization? Workloads, Methodology and Metrics• Measurements and results• Conclusion and outlook
Agenda
INTEL CONFIDENTIAL8
Big Data workloads
• Sort
• WordCount
• Hive Join
• Hive Aggregation
• NoSQL indexing
We analyze these workloads using hardware DIMM traces, performance counter monitoring, and performance measurements
INTEL CONFIDENTIAL9
General Characterization
Memory footprint from DIMM trace• Memory in GB touched atleast once by the application• Amount of memory to keep the workload „in memory“
EMON:• CPI• Cache behavior: L1, L2, LLC MPI• Instruction and Data TLB MPI
Understand how the workloads use memory
INTEL CONFIDENTIAL10
Cache Line Working Set Characterization1. For each cache line, compute number of times it is
referenced
2. Sort cache lines by their number of references
3. Select a footprint size, say X MB
4. What fraction of total references is contained in X MB of the hottest cache lines?
Identifies the hot working set of application
INTEL CONFIDENTIAL11
Cache Simulation
Run workload through a LRU cache simulator and vary the cache size
Considers temporal nature, not only spatial• Streaming through regions larger than cache size• Eviction and replacement policies impact cacheability• Focus on smaller sub-regions
Hit rates indicate potential for cacheability in tiered memory architecture
INTEL CONFIDENTIAL12
Entropy• Compressibility and Predictability important• Signal with high information content – harder to
compress and difficult to predict• Entropy helps understand this behavior. For a set
of cache lines K:
Lower entropy more compressibility, predictability
INTEL CONFIDENTIAL13
Entropy - example
Footprint: 640BReferences: 100References/line: 10
Footprint: 640BReferences: 100References/line: 10
Footprint: 640BReferences: 100References/line: 10
<<64 byte cache: 10%192 byte cache: 30%Entropy: 1
64 byte cache: 19%192 byte cache: 57%Entropy: 0.785
64 byte cache: 91%192 byte cache: 93%Entropy: 0.217
(A) (B) (C)
Lower entropy more compressibility, predictability
INTEL CONFIDENTIAL14
Correlation and Trend AnalysisExamine trace for trendsEg: increasing trend in upper physical address ranges Aggressively prefetch to an upper cache
• With s = 64, l=1000, test function f mimics ascending stride through memory of 1000 cache lines
• Negative correlation with f indicates decreasing trend
High correlation strong trend predict, prefetch
INTEL CONFIDENTIAL15
• Why big data memory characterization?• Big Data Workloads• Methodology and Metrics Measurements and results• Conclusion and outlook
Agenda
INTEL CONFIDENTIAL16
General Characterization
• NoSQL and sort have highest footprints
• Hadoop Compression reduces footprints and improves execution time
INTEL CONFIDENTIAL17
General Characterization
• Sort has highest cache miss rates (transform large volume from one representation to another)
• Compression helps reduce LLC misses
L2 MPKI
INTEL CONFIDENTIAL18
General Characterization
• Workloads have high peak bandwidths
• Sort has ~10x larger footprint than wordcount, but lower DTLB MPKI: memory references not well contained within page granularities, and are widespread
INTEL CONFIDENTIAL19
Cache Line Working Set Characterization
Hottest 100MB contains 20% of all references
NoSQL has most spread among its cache lines
Sort has 60% references in 120GB footprint within 1GB
INTEL CONFIDENTIAL20
Cache Simulation
Percentage cache hits higher than percentage references from footprint analysis
Big Data workloads operate on smaller memory regions at a time
INTEL CONFIDENTIAL21
Entropy
Big Data workloads have higher entropy (>13) than SPEC workloads (>7) they are less compressible, predictable
from [Shao et al 2013]
INTEL CONFIDENTIAL22
Normalized Correlation
• Hive aggregation has high correlation magnitudes (+,-)
• Enabling prefetchers has higher correlation in generalPotential for effective prediction and prefetching schemes for
workloads like Hive aggregation
INTEL CONFIDENTIAL23
• Big Data workloads are memory intensive• Potential for latency hiding techniques like
cacheability and predictability to be successful• Large 4th level cache can benefit big data workloads
• Future work • Including more workloads in the study• Scaling dataset sizes, etc
Take Aways & Next Steps