Date post: | 24-Dec-2015 |
Category: |
Documents |
Upload: | gillian-mcdowell |
View: | 243 times |
Download: | 2 times |
Multicore Computers
Chip multiprocessor: combines two or more processors (cores) on a die.
Each core consists of: registers, ALU, pipeline hardware, control unit and L1 cache.
Goal: increase instruction-level parallelism. Superscalar
Replicate execution resources enabling parallel execution of instructions in parallel pipelines.
Simultaneous multithreading (SMT)Duplicate register banks so that multiple threads can share the use of pipeline resources.
Problem: Managing multiple threads and power consumption.
Hardware Performance Issues
Why Multicore?
Control power density by using more of the chip area for cache memory (instead of logic transistors).
Near linear performance improvement.
Servers Multithreaded native applications Multiprocess applications Java applications Multiinstance applications Valve Game Software
Applications That Benefit From Multicore Systems
Reprogrammed Source engine software to use multithreading to exploit the power of multicore processor chips from Intel and AMD.
Twice the performance with coarse threading. Hybrid threading approach (combine coarse
with fine-grained threading). Scene-rendering lists for multiple scenes in
parallel (and other graphic-related simulation).
Valve
Multicore Organization
Variables in a multicore organization:
Number of core processors on the chip
Number of levels of cache memory
Amount of cache memory that is shared
Intel Core Duo: individual cores are superscalar. Intel Core i7: Implement SMT cores.
Advantages: scales up the number of hardware threads that the system supports.
Multicore system with four cores (and SMT) that supports four simultaneous threads in each core, on the application level, appears the same as 16 cores.
SMT appears to be more attractive than superscalar.
Superscalar or SMT?
Intel Core Duo
Introduced in 2006. Two x86 superscalar
processors. Separate thermal
control units. Advanced
Programming Interrupt Controller
support between 0 and 255 hardware interrupt inputs.
maintains a list of interrupts, showing their priority and status.
DIC satisfies two requirements: Routing an interrupt request to a single CPU or CPUs, as
required. Provide interprocessor communication so a thread on
one CPU can cause activity by a thread on another CPU. Interrupts:
Inactive - processed by that CPU but pending or active in some CPUs to which it is targeted.
Pending – asserted but processing has not started. Active – started but processing is not completed.
Interrupt Handling
Snoop unit control (SCU): resolve bottlenecks related to access to shared data.
The SCU introduces three types of optimization: direct data intervention
enables copying clean data from one CPU L1 data cache to another CPU L1 data cache without accessing external memory.
duplicated tag RAMs duplicated versions of L1 tag RAMs used by the SCU to check
for data availability before sending coherency commands to the relevant CPUs.
migratory lines enables moving dirty data from one CPU to another without
writing to L2 and reading the data back in from external memory.
Cache Coherency