Hyperthreading

1.

HYPERTHREADING

DEFINITION:

THREAD: It is a program fragment which is assigned for execution by the multitask operating system to one of processors of the multiprocessor hardware system. They are sequences of related instructions or tasks running independently which make up a program.

MULTI THREADING: It is a computation architecture which runs multiple processes simultaneously and aims to increase utilization of a single core by leveraging thread-level as well as instruction-level parallelism. Applied for multitasking between multiple related threads of programs.

ADVANTAGES:

If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, which thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed.

If a thread cannot use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle.

If several threads work on the same set of data, they can actually share its caching, leading to better cache usage or synchronization on its values.

DISADVANTAGES:

Multiple threads can interfere with each other when sharing hardware resources such as caches or translation look aside buffers(TLBs).

Execution times of a single-thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that are necessary to accommodate thread-switching hardware.

Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.

HYPER THREADING: A high-performance computing architecture that simulates some degree of overlap in executing two or more independent sets of instructions. It is an Intel's term for its simultaneous multithreading implementation in their Pentium 4 and Core i7 CPUs.

11070220

2

HT is a microprocessor simultaneous multithreading technology (SMT) that supports the concurrent execution of multiple separate instruction streams, referred to as threads of execution, on a single physical processor. When HT is used with the Intel Xeon processors that support it, there are two threads of execution per physical processor.

A feature of certain Pentium 4 chips that makes one physical CPU appear as two logical CPUs. It uses additional registers to overlap two instruction streams in order to achieve an approximate 30% gain in performance. Multithreaded applications take advantage of the Hyper-Threaded hardware as they would on any dual-processor system; however, the performance gain cannot equal that of true dual-processor CPUs.

11070220

http://en.wikipedia.org/wiki/File:Hyper-threaded_CPU.png

3

AN OVERVIEW:

Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two virtual processors, and shares the workload between them. Each logical processor that is contained within an HT processor appears to the operating system as an individual processor. This means that tools or services within Windows that display information about processors, such as the Windows Task Manager or Windows Performance Monitor, will display processor information for every logical processor that Windows is utilizing.

Hyper-threading requires both operating system and CPU support; conventional multiprocessor support is not enough.

The first Intel processors to support Hyper-Threading Technology (HT) are the IA-32 Xeon family of processors that were released in the first half of calendar year 2002. Although Xeon processors are supported in workstations, HT technology is initially targeted at dual-processor and multiprocessor server configurations.

The HT in the Xeon processors makes two architectural states available on the same physical processor. Each architectural state can execute an instruction stream, which means that two concurrent threads of execution can occur on a single physical processor. Each thread of execution can be independently halted or interrupted. These architectural states are referred to as logical processors in this report.

The main difference between the execution environment provided by the Xeon HT processor, compared with that provided by two traditional single-threaded processors, is that HT shares certain processor resources: there is only one execution engine, one on-board cache set, and one system bus interface. This means that the logical processors on an HT processor must compete for use of these shared resources. As a result, an HT processor will not provide the same performance capability as two similarly equipped single-threaded processors.

The two logical processors on an HT processor are treated equally with respect to access to the shared resources. We are to refer to the logical processors on an HT processor, in order of use, as the first and second logical processors.

Windows XP and Windows .NET Server include generic identification and support for IA-32 processors that implement HT using the Intel-defined CPUID instruction identification mechanism. However, this support is not guaranteed for processors that have not been tested with these operating systems.

11070220

4

SMT processors may support more than two logical processors in the future. However, the discussions and examples in this white paper assume the use of two logical processors, as used in the Xeon family of processors.

FIGURE 1: Intel Pentium 4 @ 3.80 GHz with Hyper-Threading Technology.

_______________________________________________

BASIC WORKING OF HYPERTHREADING:

1. This technology is meant to increase efficiency of operation of a processor. The matter is that, according to Intel, only 30% of all execution units in the processor work the most part of time. And the idea to load other 70% looks logical (the Pentium 4 processor, by the way, which has incorporated this technology, doesn't suffer from superfluous performance per megahertz). The main point of the Hyper Threading technology is that during implementation of one thread of a program, idle execution units can work with another thread of the program (or a thread of another program). Or, for example, while executing one sequence of instructions they may wait for data from memory for execution of another sequence.

2. When executing different threads, the processor must "know" which instructions refer to which threads. That is why there is some mechanism which helps the processor do it.

3. It is also clear that taking into account a small number of general-purpose registers in the x86 architecture (8 in all) each thread has its own set of registers. However, this limitation is evaded by renaming the registers. That is, there are much more physical

11070220

http://en.wikipedia.org/wiki/File:HT-Pentium4.JPG

http://en.wikipedia.org/wiki/File:HT-Pentium4.JPG

5

registers than logical ones. The Pentium III has 40. The Pentium 4 has obviously more. According to the unconfirmed information, they are 128.

4. It's also known that when several threads need the same resources or one of the threads waits for data the "pause" instruction must be applied to avoid a performance drop. Certainly, this requires recompilation of the programs.

5. Sometimes execution of several threads can worsen the performance. For example, because the L2 cache is not extendable, and when active threads will try to load the cache it's possible that the struggle for the cache will result in constant clearing and reload of data in the L2 cache.

6. Intel states that the gain can reach 30% in case of optimization of programs for this technology. (Or, rather, Intel states that on the today's server programs and applications the measured gain is up to 30%) It's a decent reason for the optimization.

11070220

6

FIGURE 2: Represents threads of program1

Represents threads of program 2

Represents idle CPU processes

Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously.

When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)

11070220

7

This technology is transparent to operating systems and programs. All that is required to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.

Programs are made up of execution threads. These threads are sequences of related instructions. Earlier, most programs consisted of a single thread. The operating systems in those days were capable of running only one such program at a time. The result was that your PC would freeze while it printed a document or a spreadsheet. The system was incapable of doing two things simultaneously. Innovations in the operating system introduced multitasking in which one program could be briefly suspended and another one run. By quickly swapping programs in and out in this manner, the system gave the appearance of running the programs simultaneously. However, the underlying processor was, in fact, at all times running just a single thread.

It is possible to optimize operating system behaviour on multi-processor hyper-threading capable systems, such as the Linux techniques discussed in Kernel Traffic. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's process scheduler is unaware of hyper-threading it will treat all four processors as being the same. If only two processes are eligible to run it might choose to schedule those processes on the two logical processors that happen to belong to one of the physical processors; that processor would become extremely busy while the other would be idle, leading to poorer performance than is possible with better scheduling. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA systems.

By the beginning of this decade, processor design had gained additional execution resources (such as logic dedicated to floating-point and integer math) to support executing multiple instructions in parallel. Intel saw an opportunity in these extra facilities. The company reasoned it could make better use of these resources by employing them to execute two separate threads simultaneously on the same processor core. Intel named this simultaneous processing Hyper-Threading Technology and released it on the Intel Xeon processors in 2003. According to Intel benchmarks, applications that were written using multiple threads could see improvements of up to 30% by running on processors with HT Technology. More important, however, two programs could now run simultaneously on a processor without having to be swapped in and out (See Figure 3.) To induce the operating system to recognize one processor as two possible execution pipelines, the new chips were made to appear as two logical processors to the operating system.

The performance boost of HT Technology was limited by the availability of shared resources to the two executing threads. As a result, HT Technology cannot approach the processing throughput of two distinct processors because of the contention for these

11070220

8

shared resources. To achieve greater performance gains on a single chip, a processor would require two separate cores, such that each thread would have its own complete set of execution resources. Enter multi-core.

FIGURE 3: Threads being executed by early, recent and next generation processors.

Above diagram portrays difference between an early processor and a not so recent processor. First consisted of single core which used to handle one program at one time. Multitasking was not possible.

Second diagram gives an insight into the recent processors which create a virtual second processor to handle many threads or programs simultaneously.

A third diagram depicts new generation multicore processor in action, executing 4 threads simultaneously . It provides speed coupled with multitasking abilities at the extreme.

_________________________________________________________________________

COMPONENTS OF HYPERTHREADING:

11070220

9

1 . REGISTER ALIAS TABLES:-

• Map the architectural registers to physical rename registers.

• Each logical processor needs its own set of architectural registers because they have to be tracked independently.

2. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER:-

APIC is duplicated so that interrupts for each logical processor can be handled independently

3. RETURN STACK PREDICTOR: -

11070220

10

The return stack predictor is duplicated for accurate tracking of call/return pairs

4. INSTRUCTION TRANSLATION LOOK-ASIDE BUFFER:-

• The instruction translation look-aside buffer is duplicated because of its small size and replication is simpler than sharing.

5. NEXT-INSTRUCTION POINTER:-

• The next-instruction pointer and other control logic permits program progress to be tracked independently .

6. TRACE-CACHE NEXT-INSTRUCTION POINTER:-

• The trace-cache next-instruction pointer stores decoded instructions and serves as the first-level instruction cache .

THREADING ALGORITHMS: Time-slicing: Time slicing is a technique used for achieving high power-saving effect

on terminal devices. It is based on the time-multiplexed transmission of different services

A processor switches between threads in fixed time intervals.

High expenses, especially if one of the processes is in the wait state.

Switch-on-event

Task switching in case of long pauses

Waiting for data coming from a relatively slow source, CPU resources are given to other processes

Multiprocessing

Distribute the load over many processors

Adds extra cost

Simultaneous multi-threading

Multiple threads execute on a single processor without switching.

Basis of Intel’s Hyper-Threading technology

COMPARING PAST AND HYPERTHREADED

11070220

http://en.wikipedia.org/wiki/Multiplexing

11

PROCESSORS:

Here are some multitasking workloads that are just too much for a single logical processor

11070220

12

These benchmarks were taken using popular software packages that are already multithreaded and shows the percent increase in performance

PROCESSORS COMPATIBLE WITH HYPERTHREADING:

INTEL PROCESSOR:

INTEL SERIES NAME

DESKTOP:

Intel® Pentium® 4 processor Extreme Edition supporting Intel® Hyper-Threading Technology

NOTEBOOKS:

Mobile Intel® Pentium® 4 processors supporting

11070220

http://www.intel.com/products/processor/mobilepentium4/index.htm

http://www.intel.com/products/processor/pentium4HTXE/index.htm

http://www.intel.com/products/processor/pentium4HTXE/index.htm

13

Intel® Hyper-Threading Technology

Intel® Xeon® processor

CHIPSET

Intel® chipsets:

Intel Desktop Chipsets (DESKTOPS)

Intel Mobile Chipsets (NOTEBOOKS)

Intel Server Chipsets (WORKSTATIONS)

MULTI CORE PROCESSORS:-

Multi-core processors: Multi-core processors, as the name implies, contain two or more distinct cores in the same physical package. Figure 2 shows how this appears in relation to previous technologies.

In this design, each core has its own execution pipeline. And each core has the resources required to run without blocking resources needed by the other software threads.

While the example in Figure 2 shows a two-core design, there is no inherent limitation in the number of cores that can be placed on a single chip. Intel has committed to shipping dual-core processors in 2005, but it will add additional cores in the future. Mainframe processors today use more than two cores, so there is precedent for this kind of development.

11070220

http://www.intel.com/products/server/chipsets/index.htm

http://www.intel.com/products/laptop/chipsets/index.htm

http://www.intel.com/products/desktop/chipsets/index.htm

http://www.intel.com/products/server/processors/index.htm

http://www.intel.com/products/processor/mobilepentium4/index.htm

14

FIGURE 5:

Figure 2. Multi-Core processors have multiple execution cores on a single chip.

The multi-core design enables two or more cores to run at somewhat slower speeds and at much lower temperatures. The combined throughput of these cores delivers processing power greater than the maximum available today on single-core processors and at a much lower level of power consumption. In this way, Intel increases the capabilities of server platforms as predicted by Moore's Law while the technology no longer pushes the outer limits of physical constraints

11070220

15

TABLE DEPICTING NOMENCLATURE OF INTEL MULTICORE PROCESSORS:

Intel Core 2 processor family

Logo *Desktop Laptop

Code-named Core Date released Code-named Core Date released

ConroeAllendaleWolfdale

dual (65 nm)dual (65 nm)dual (45 nm)

Aug 2006Jan 2007Jan 2008

MeromPenryn

dual (65 nm)dual (45 nm)

Jul 2006Jan 2008

Conroe XEKentsfield XEYorkfield XE

dual (65 nm)quad (65 nm)quad (45 nm)

Jul 2006Nov 2006Nov 2007

Merom XEPenryn XEPenryn XE

dual (65 nm)dual (45 nm)quad (45 nm)

Jul 2007Jan 2008Aug 2008

KentsfieldYorkfield

quad (65 nm)quad (45 nm)

Jan 2007Mar 2008

Penryn quad (45 nm) Aug 2008

Desktop version not availableMeromPenryn

solo (65 nm)solo (45 nm)

Sep 2007May 2008

11070220

http://en.wikipedia.org/wiki/Laptop

http://en.wikipedia.org/wiki/Desktop_computer

http://en.wikipedia.org/wiki/File:Intel_Core_2_Duo.png

http://en.wikipedia.org/wiki/File:Intel_Core_2_Extreme.png

http://en.wikipedia.org/wiki/File:Intel_Core_2_Quad.png

http://en.wikipedia.org/wiki/File:Intel_Core_2_Solo.png

16

MERITS AND DEMERITS OF USAGE OF hyper THREADING:

ADVANTAGES:

The advantages of hyper-threading are listed as:

improved support for multi-threaded code,

allowing multiple threads to run simultaneously. No performance loss if only one thread is active. Increased performance with multiple threads.

improved reaction and response time. Faster speed and higher efficiency.

Better resource utilization.

Extra architecture only adds about 5% to the total die area.

The largest boost in performance will likely be noticed while running CPU-intensive processes, like antivirus scans, playing high end games, ripping/burning media (requiring file conversion), or searching for folders

DISADVANTAGES:

To take advantage of hyper-threading performance, serial execution can not be used.

Threads are non-deterministic and involve extra design Threads have increased overhead

Shared resource conflicts

In addition to operating system (OS) support, adjustments to existing software are required to maximize utilization of the computing resources provided by multi-core processors

Integration of a multi-core chip drives production yields down and they are more difficult to manage thermally than lower-density single-chip designs. Intel has partially countered this first problem by creating its quad-core designs by combining two dual-core on a single die with a unified cache, hence any two working dual-core dies can be used, as opposed to producing four cores on a single die and requiring all four to work to produce a quad-core

11070220

http://en.wikipedia.org/wiki/Operating_system

17

It suffers from a serious security flaw which permits local information disclosure, including allowing an unprivileged user to steal an RSA private key being used on the same machine. Administrators of multi-user systems are strongly advised to take action to disable Hyper-Threading immediately; single-user systems (i.e., desktop computers) are not affected. Earlier this year, Intel hyper threading was revealed to have a security flaw where threads could find information from each other through the shared cache despite having no access to each other's memory space.

Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage.

SYMMETRIC MULTIPROCESSING:

Short for Symmetric Multiprocessing, a computer architecture that provides fast performance by making multiple CPUs available to complete individual processes simultaneously (multiprocessing). Unlike asymmetrical processing, any idle processor can be assigned any task, and additional CPUs can be added to improve performance and handle increased loads. A variety of specialized operating systems and hardware arrangements are available to support SMP. Specific applications can benefit from SMP if the code allows multithreading.

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory. Most common multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.

SMP systems allow any processor to work on any task no matter where the data for that task are located in memory; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.

____________________________________________________________________________________

11070220

http://en.wikipedia.org/wiki/Operating_system

http://en.wikipedia.org/wiki/Multi-core_(computing)

http://en.wikipedia.org/wiki/Main_memory

http://en.wikipedia.org/wiki/Main_memory

http://en.wikipedia.org/wiki/Multiprocessor

http://en.wikipedia.org/wiki/Computing

http://www.webopedia.com/TERM/S/multithreading.html

http://www.webopedia.com/TERM/S/multiprocessing.html

http://www.webopedia.com/TERM/S/process.html

http://www.webopedia.com/TERM/S/CPU.html

http://www.webopedia.com/TERM/S/architecture.html

18

SIMULTANEOUS MULTITHREADING (SMT):

Simultaneous multithreading (SMT), is a technique for improving the overall efficiency of superscalar ( executing multiple instructions at the same) CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.

SMT is one of the two main implementations of multithreading, the other form being temporal multithreading. In temporal multithreading, only one thread of instructions can execute in any given pipeline stage at a time. In simultaneous multithreading, instructions from more than one thread can be executing in any given pipeline stage at a time. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity have limited the number to two for most SMT implementations.

Because the technique is really an efficiency solution and there is inevitable increased conflict on shared resources, measuring or agreeing on the effectiveness of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.

However, in most current cases, SMT is about hiding memory latency, efficiency and increased throughput of computations per amount of hardware used.

____________________________________________________________________

Steps To Develop Parallel Programming For A Hyper Threading Technology:

11070220

http://en.wikipedia.org/wiki/Temporal_multithreading

http://en.wikipedia.org/wiki/CPU_design

http://en.wikipedia.org/wiki/Thread_(computer_science)

http://en.wikipedia.org/wiki/Multithreading_(computer_hardware)

http://en.wikipedia.org/wiki/Multithreading_(computer_hardware)

19

Partitioning : The partitioning stage of a design is intended to expose opportunities for parallel execution. Hence, the focus is on defining a large number of small tasks in order to yield what is termed a fine-grained decomposition of a problem.

Communication : The tasks generated by a partition are intended to execute concurrently but cannot, in general, execute independently. The computation to be performed in one task will typically require data associated with another task. Data must then be transferred between tasks so as to allow computation to proceed. This information flow is specified in the communication phase of a design.

Agglomeration : In the third stage, we move from the abstract toward the concrete. We revisit decisions made in the partitioning and communication phases with a view to obtaining an algorithm that will execute efficiently on some class of parallel computer. In particular, we consider whether it is useful to combine, or agglomerate, tasks identified by the partitioning phase, so as to provide a smaller number of tasks, each of greater size. We also determine whether it is worthwhile to replicate data and/or computation.

Mapping : In the fourth and final stage of the parallel algorithm design process, we specify where each task is to execute. This mapping problem does not arise on uniprocessors or on shared-memory computers that provide automatic task scheduling.

On the other hand, on the server side, multicore processors are ideal because they allow many users to connect to a site simultaneously and have independent threads of execution. This allows for Web servers and application servers that have much better throughput.

FIGURE 5: Depicts effect of HTT on processor speed and performance using Windows Task

11070220

http://en.wikipedia.org/wiki/Throughput

http://en.wikipedia.org/wiki/Thread_(computer_science)

http://en.wikipedia.org/wiki/Server-side

20

manager

FUTURE:

Older Pentium 4 based CPUs use hyper-threading, but the newer Pentium M based cores Merom, Conroe, and Woodcrest do not. Hyper-threading is a specialized form of simultaneous multithreading (SMT).

The Intel Atom is an in-order single-core processor with hyper-threading, for low power mobile PCs and low-price desktop PCs.

11070220

21

Diagram of a generic dual core processor, with CPU-local level 1 caches, and a shared, on-die level 2 cache.

HT Technology enables gaming enthusiasts to play the latest titles and experience ultra-realistic effects and game play. And multimedia enthusiasts can create, edit, and encode graphically intensive files while running background applications such as virus scan in the background–all without slowing down. Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading makes a return. Nehalem contains 4 cores and effectively scales 8 threads and has a speed of 3.4 GHz thus providing an out of the world performance.

It may drastically improve computer performance in multicore processors by duplicating the number of virtual processors, thus dividing the workload between the physical as well as virtual processors. According to Intel, it boosts system performance by almost 30%.

It will improve CPU intensive processes, virus scans, ripping and burning CDs and DVDs, multimedia applications, video quality, improve connectivity between processor itself .

As it consumes less power at higher efficiency, it may be embedded in future multicore processors.

Improve business productivity by doing more at once without slowing down

11070220

http://en.wikipedia.org/wiki/File:Dual_Core_Generic.svg

http://en.wikipedia.org/wiki/File:Dual_Core_Generic.svg

22

Provide faster response times for Internet and e-Business applications, enhancing customer experiences

Increase the number of transactions that can be processed simultaneously

Utilize existing technologies while maintaining future readiness with compatibility for existing 32-bit applications and OSs while being prepared for the future of 64-bit

11070220

Date post:	26-Nov-2014
Category:	Documents
Upload:	rahul-dixit
View:	88 times
Download:	1 times

Hyperthreading

Documents