Date post: | 22-Jan-2018 |
Category: |
Software |
Upload: | dr-jaegwang-lim |
View: | 353 times |
Download: | 5 times |
Dongguk University
Jaegwang Lim
CSE7098-01
1. Intro
• GPMR(GPU Map-Reduce)
– Google Map-Reduce
– pronounced G-Primer
– Stand along Machine
– Multiple GPU devices
• Existing GPU Map-Reduce work only targets solo GPUs
– No Network I/O
2. Background - GPU
• GPU ?
• Nvidia 10-Series Architecture
– 240 Thread Processors execute kernel threads
– 30 Multiprocessors, each contains
2. Background - GPU
• “Local” memory resides in devices DRAM
– Use registers and shared memory to minimize local memory use
• Host can read and write global memory but not shared
memory
2. Background - GPU
• Kernel launches a grid of threads blocks
– Threads within a block cooperate via shared memory
– Threads within a block can synchronize
– Threads in different block cannot cooperate
3. Implementation
• CPU to GPU
Chunk0
Chunk1
Chunk2
Chunk3
PCI
Data
Networks
Scheduler
3. Implementation
• Map
Global Memory
Chunk0
Block 0 Block 1 Block 2
Shared Memory Shared Memory Shared Memory
Chunk00 Chunk01 Chunk02
Scheduler
3. Implementation
• Map
Global Memory (4GB ~)
Chunk0
Block 0 Block 1 Block 2Shared Memory Shared Memory Shared Memory
Chunk00 Chunk01 Chunk02
Scheduler
Combiner
Bin
3. Implementation
• Reduce
Bin
Global Memory (4GB ~)
Chunk0
Block 0 Block 1 Block 2
Shared Memory Shared Memory Shared Memory
Chunk00 Chunk01 Chunk02
Combiner
Sort
Scheduler
ReducerOutput
3. Implementation
• Overall Local
GPU 0
GPU 1
GPU 2
GPU 3
…….
CPU
Map & Reduce
Map & Reduce
Map & Reduce
Map & Reduce
Scheduler
&
Bin
Network
3. Implementation
• Overall GlobalGPU 0
GPU 1
GPU 2
GPU 3
…….
CPU
Map & Reduce
Map & Reduce
Map & Reduce
Map & Reduce
Scheduler
&
Bin
Network
GPU 0
GPU 1
GPU 2
GPU 3
…….
CPU
Map & Reduce
Map & Reduce
Map & Reduce
Map & Reduce
Scheduler
&
Bin
GPU 0
GPU 1
GPU 2
GPU 3
…….
CPU
Map & Reduce
Map & Reduce
Map & Reduce
Map & Reduce
Scheduler
&
Bin
GPU 0
GPU 1
GPU 2
GPU 3
…….
CPU
Map & Reduce
Map & Reduce
Map & Reduce
Map & Reduce
Scheduler
&
Bin
4. Benchmark Result
4. Benchmark Result
4. Benchmark Result
• GPMR vs Phoenix
• GPMR vs Mars
5. Conclusion
• High performance
• New capability
• New scalability
• Limitation
– Low GPU Memory (512MB)
– Network I/O for GPU