AMD K-6 Processor Evaluation

Post on 14-Jan-2016

91 views 0 download

Tags:

description

AMD K-6 Processor Evaluation. Registers. AMD-K6 Registers. General purpose registers Segment registers Floating point registers MMX registers EFLAGS register. Continue. Control registers Task register Debug registers Test registers Descriptor/memory registers - PowerPoint PPT Presentation

transcript

AMD K-6 Processor Evaluation

Registers

AMD-K6 Registers

• General purpose registers

• Segment registers

• Floating point registers

• MMX registers

• EFLAGS register

Continue...

• Control registers

• Task register

• Debug registers

• Test registers

• Descriptor/memory registers

• Model-specific registers (MSRs)-Model 6

General-Purpose Registers

• 8 32-bit general-purpose registers

• EAX

• EBX

• ECX

Continue

• EDX

• EDI

• ESI

• ESP

• EBP

Segment registers

• 6 16-bit segment registers

• Used as pointers to areas (segments) of memory

• CS • DS

• ES

Continue

• FS

• GS

• SS

Floating-Point Registers

• 8 80-bit numeric floating point registers

• Help the floating-point execution unit

• Labeled FPR0–FPR7

MMX Registers

• 8 64-bit MMX registers

• Used by multimedia software

EFLAGS Register

• Provides for three different types of flags

– System flags – Control flag– Status flags

Control Registers

• 5 control registers

• Contain system control bits and pointers

Task Register

• Contains a pointer to the Task State Segment of the current task

Debug Registers

• 8 Debug registers

• Labled DR0-DR7

Descriptors

• Define, protect, and isolate code segments, data segments, task state segments, and gates

Memory Management Registers

• The AMD-K6 processor controls segmented memory management with 4 registers:

– Global Descriptor Table Register

– Interrupt Descriptor Table Register

– Local Descriptor Table Register

– Task Register

Model-Specific Registers (MSR)

• 5 MSRs

– Machine Check Address Register (MCAR)– Machine Check Type Register (MCTR)– Test Register 12 (TR12)– Time Stamp Counter (TSC)– Write Handling Control Register (WHCR)

MCAR and MCTR

• Both are 64-bit

• The AMD-K6 processor does not support the generation of a machine check exception, so these are used MCAR and MCTR are used instead

Test Register 12

• Disable the L1 caches

Time Stamp Counter (TSC)

• 16-bit

• The time stamp counter (TSC) MSR is incremented by the processor with each process or clock cycle

Write Handling Control Register (WHCR)

• Contains three fields: WCDE bit, Write Allocate Enable Limit (WAELIM) field, and the Write Allocate Enable 15-to-16-Mbyte (WAE15M) bit

CPU SPEED

CPU SPEED

• Very fast under Windows NT 4.0

• The 32-bit performance is excellent

• Runs Windows 95 faster than the Intel Pentium MMX

Continue

• Good choice for a great gaming machine

• Good engine for running Microsoft Office, surfing the net, and checking email

Continue

• If Windows NT is the primary operating system, the AMD K6 should be considered as a low cost but good performing alternative to a Pentium Pro or Pentium II

CPU Type: RISC86

RISC86 Superscalar Microarchitecture

• RISC86 microarchitecture - Internally translates x86 instructions into RISC86 operations– x86 Instructions - 1 to 15 bytes– RISC86 opcodes - simpler fixed-length

• Superscalar operation - multiple decode, execution, and retirement– Centralized Schedule Buffer/ Instruction Control Unit

• Buffers and manages up to 24 RISC86 operations at one time– Equates to 12 x86 instructions

– Multiple Decoders• Buffer can receive up to 4 RISC86 operations from decoders in

1 clock– 7 Parallel Execution Units

• Buffer can issue up to 6 RISC86 operations to execution units in 1 clock

x86 Instruction Categories (Short and Long Decodes)

• Short Decode– Common x86 instructions 7 bytes in length– Produce 1 RISC86 operations– 2 processed per clock– Processed completely within the decoders

• Long Decode– More complex and somewhat common x86 instructions 11 bytes

in length– Produce up to 4 RISC86 operations– 1 processed per clock– Processed completely within the decoders

x86 Instruction Categories (Vector Decode)

• Vector Decode– Complex x86 instructions requiring long sequences or RISC86

instructions– 1 processed per clock– Decoders generate an initial set of 4 RISC86 operations

• Decode is completed by fetching a sequence of additional operations from an on-chip ROM at a rate of 4 operations per clock

RISC86 Operations Categories

• Memory load operations (load)• Memory store operations (store)• Integer register operations (alu/alux)• MMX register operations (meu)• Floating-point register operations (float)• Branch condition evaluations (branch)

x86 to RISC86 Translation Example

I nstructions (x86) Operations (RISC86)

MOV CX, [SP + 4] Load

ADD AX, BX Alu (Add)

CMP CX, [AX] LoadAlu (Sub)

J Z f oo Branch

Instruction Set

Instruction Set

• Categories– Arithmetic– Conversions– Logical Operations– Transfers and Memory Operations

• Compatibility– Uses full Intel Instruction Set

• Features– Three Separate Instruction Sets

• Integer Instruction Set• Floating-Point Instruction Set• MMX Instruction Set

Technologies Used

Technologies Used

• RISC86 Superscalar microarchitecture– This enables leading-edge performance on both Microsoft Windows

95 and Windows NT operating systems, and the installed base of x86 software

• Socket 7-compatible Bus Interface– This allows PC manufacturers and resellers to leverage today’s

infrastructure to quickly bring superior price/performance PC systems to market

Detailed Comparison of the K-6 to the Intel Pentium

AMD K- 6 I ntel Pentium Pro

x86 Decoders 2 Sophisticated, 1 Long, 1vector

1 Sophisticated, 2 simple

Average RI SCops/ x86:32 bit code

1.2 (lower is better) 1.5

Average RI SCops/ x86:16 bit code

1.5 (lower is better) 2.0

Maximum ROp I ssue Rate 6 5

Physical Registers 48 40

Centralized Buff ermax/ active

24/ 18 40/ 20

FPU Multiply/ ADDLatency

2/ 2 5/ 3

Pipeline Stages 6 12

Continued

AMD K- 6 I ntel Pentium Pro

Misaligned Loads 1 cycle penalty 6 cycle penalty

Branch History Table 8192 entries 512 entries

Branch predictionaccuracy

95% 85-90%

Misprediction Penalty

I nstruction/ Data TLB 64/ 128 entries 32/ 64 entries

L1 I nstruction-Cache 32 KB+ Predecode 2-WaySet-Assoc.

8KB 2-Way Set-Assoc.

L1 Data Cache 32KB, 2-Way Set-Assoc. 8KB, 4-Way Set-Assoc.

Continued

AMD K- 6 I ntel Pentium Pro

Local Bus Bandwidth 528 MB/ Sec 528 MB/ sec

Local Bus Latency 2 clocks 5-7 clocks

Factors Affecting Performance

Factors Affecting Performance

• Pipelining, prefetching, and predecoding– Using a 32 byte instruction cache line, lines are prefetched and

predecoded. This enables the decoders to efficiently decode multiple instruction simultaneously

• Multiple Decoders– The decoders issue up to four opertions at a time to the

centralized schedule buffer which buffers and manages up to 24 operations at a time.

• Parallel Execution Units– The Instruction Control Unit issues up to six instruction to the

execution units and they are executed in parallel

Addressing Modes

Memory Map

Address Range (Decimal) Address Range (Hex) Size Description

1024K-131072K 100000-8000000 130048K Extended Memory960K-1023K F0000-FFFFF 64 K AMI System Bios

952K-959K EE000-EFFFF 8K FLASH Boot Block (Availableas HIMEM)

948K-951K ED000-EDFFF 4K ECSD (Plug and PlayConfiguration area)

944K-947K EC000-ECFFF 4K OEM Logo Area (Available asUMB)

896K-943K E0000-EBFFF 48K BIOS Reserved (Available asUMB)

640K-895K A0000-DFFFF 256 K Available High DOS Memory(open to the ISA & PCI bus)

639K 9FC00-9FFFF 1K Extended BIOS Data(moveable by QEMM,

386MAX)512K-638K 80000-9FBFF 127 K Extended conventional

0K-511K 00000-7FFFF 512 K Conventional

Addressing Modes

• Direct Addressing:- address operand byte points directly to the target data - only for internal RAM

• Indirect Addressing:- two address operand bytes pointing to another pair of address bytes, containing the address of the operand- for internal and external RAM

• Immediate Constants:- value of a constant can follow the operation code in the program memory

• Indexed Addressing:- only program memory can be accessed- only read operations are possible- addressing mode reads lookup tables in the program memory- base register points to the base of the table entry and the accumulator is set up with the table entry number

Addressing Modes

•Register Instructions:- for register banks containing registers from R0 to R7 - 3-bit register specification in the operation code of the instruction- no address byte

• Register-Specific Instructions:- some instructions are specific to certain registers- no address byte necessary

- operation code does the pointing itself

Perspective On Role In The Market Place

• Microprocessor Market:

- short product life cycles

- migration to higher performance microprocessors

- dominant position of Intel Corporation setting standards affecting margins and profitability of competitors restricting innovation and differentiation of product offerings

- successful competition possible if: new process technologies higher performance microprocessors greater volumes significant capital expenditures

AMD-K6 Processor Family Roadmap

Perspective On Role In The Market Place

- K6 - key element of further developments

- aggressive technology transition schedule

- possible risks and uncertainties:

successful fabrication of higher performance AMD-K6 ?

Intel’s new product introduction, marketing strategies and

pricing ?

continued development of worldwide market acceptance ?

availability of financial and other resources ?

possible adverse market conditions in the PS market ?

unexpected interruptions of production ?