3D Now Technology Manual

8/2/2019 3D Now Technology Manual

1/72

3DNow!TechnologyManual

TM


2/72

Trademarks

AMD, the AMD logo, K6, 3DNow!, AMD Athlon, an d combina tions th ereof, and K86 are trad ema rks, and AMD-K6

is a registered t rade mark of Advanced Micro Devices, Inc.

MMX is a trade mark of Intel Corporation.

Other produ ct names used in this publication are for identification purp oses only and may be trad emarks of

their respective companies.

2000 Advanced Micro Devices, Inc. All rights reserved.

The conten ts of this documen t are p rovided in connect ion with Advanced Micro Devices, Inc.

(AMD) products. AMD makes no representations or warrant ies with respect to t he a ccuracy

or completeness of the contents of this pub lication and reserves the right to make changes to

specifications and product descriptions at any time without notice. No license, whether

express, implied, arising by estopp el or otherwise, to any intellectual proper ty rights is grant edby this pub lication. Except a s set forth in AMDs Standa rd Terms and Cond itions of Sale, AMD

assumes no liability what soever, and disclaims any express or implied warra nty, relatin g to its

products including, but not limited to, the implied warranty of merchantability, fitness for a

part icular pu rpose, or infringement of any intellectual property right.

AMDs products are not d esigned, inten ded, aut horized or warranted for u se as components

in systems inten ded for surgical implant into t he body, or in other ap plications intend ed t o

suppor t or susta in life, or in any oth er ap plication in which th e failure of AMDs product could

create a situat ion where personal injury, death, or severe proper ty or environmental dam age

may occur. AMD reserves the right t o discontinue or ma ke chan ges to its products at a ny time

without notice.


3/72

Contents iii

21928G/0March 2000 3DNow! Technology Manual

Contents

Revision Hist ory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 3DNow! Technology 1

In t roduct ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Key F unct ion alit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Fe a tu re De te ction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Reg ist e r Se t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Data Typ es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3DNow! Inst ru ction For ma ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Def in it ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Execut ion Re sources on AMD-K6 Pr ocessors . . . . . . . . . . . . 11

Task Swit ching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Excep t ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Pr efixe s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 3DNow! Instruction Set 17

FE MMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

PAVGU SB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19PF 2ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

PF ACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

PF ADD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

PF CMP EQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

PF CMP GE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

PF CMP GT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

PF MAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

PF MIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

PF MU L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

PF RCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

PF RCPIT1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

PF RCPIT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43


4/72

iv Contents

3DNow!Technology Manual 21928G/0March 2000

PF RSQIT1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

PF RSQR T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

PF SUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

PF SUBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

PI2FD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

PMULHRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

PR EF ETCH /PRE FE TCHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Division and Square Root 59

Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Divid e Exa mp les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Squ ar e R oot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Squ are Root E xamp les. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


5/72

List of Figures v

21928G/0March 2000 3DNow!Technology Manual

List of Figures

F igu re 1. 3DNow!/MMX Re gister s . . . . . . . . . . . . . . . . . . . . . . . . 5

Figure 2. 3DNow! Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Figure 3. Single-Pre cision, Floating-Point Data Forma t. . . . . . . . . . 6

Figure 4. Int eger Dat a Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Figure 5 . Regis ter X Uni t and Register Y Unit Resources . . . . . . 13


6/72

vi List of Figures



7/72

List of Tables vii


List of Tables

Tab le 1. 3DNow! Tech nology Expon en t R an ges. . . . . . . . . . . . 10

Table 2. 3DNow! Floating-Point Instructions. . . . . . . . . . . . . . . . 14

Tab le 3. 3DNow! Per fo rmance-Enhancement Inst ruct ions . . . . 14

Table 4. 3DNow! and MMX Instr uction E xcept ions . . . . . . . . 15

Tab le 5. Numer ica l Range for the PF2ID Inst ruct ion . . . . . . . . . 22

Tab le 6. Numer ica l Range for the PFACC Inst ruct ion . . . . . . . . 24

Tab le 7. Numer ica l Range for the PFADD Inst ruct ion . . . . . . . . 26

Tab le 8. Numer ica l Range for the PFCMPEQ Inst ruct ion . . . . . 28

Tab le 9. Numer ica l Range for the PFCMPGE Inst ruct ion . . . . . 30

Table 10. Numerical Range for the PFCMPGT Inst ruct ion . . . . . 32

Table 11. Numerical Range for the PFMAX Inst ruct ion . . . . . . . 34

Table 12. Numerical Range for the PFMIN Inst ruct ion . . . . . . . . 36

Table 13. Numerical Range for the PFMUL Inst ruct ion . . . . . . . 38

Table 14. Numerical Range for the PFRCP Ins t ruct ion . . . . . . . . 40

Table 15. Numerical Range for the PFRCPIT1 Inst ruct ion . . . . . 42

Table 16. Numerical Range for the PFRCPIT2 Inst ruct ion . . . . . 44

Table 17. Numerical Range for the PFRSQIT1 Inst ruct ion . . . . . 46

Table 18. Numerical Range for the PFRSQRT Inst ruct ion . . . . . 48

Table 19. Numerical Range for the PFSUB Inst ruct ion . . . . . . . . 50

Table 20. Numerical Range for the PFSUBR Ins t ruct ion . . . . . . 52

Tab le 21. Summary of PREFETCH Inst ruct ion TypeOp t ion s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57


8/72

viii List of Tables



9/72

Revision History ix


Revision History

Date Rev DescriptionFeb 1998 A Initial Release

Feb 1998 B Clarified CPUID usage in Feature Detection on page 3.

May 1998 C Revised description of 3DNow! instructions in Definitions on page 9.

May 1998 C Revised function descriptions in Table 2, 3DNow!Floating-Point Instructions, on page 14.

Sept 1998 D Revised code example for the PFRSQRT instruction on page48.

Sept 1998 DChanged exceptions generated for the PREFETCH/PREFETCHW instructions to none, deletedexception table, and revised PREFETCHW description on page56.

Sept 1998 D Added PUNPCKLDQ instruction to the division example (24-bit precision) on page60.

Nov 1998 E Added sample code that tests for the presence of extended function 8000_0001h on page3.

Nov 1998 EClarified instruction descriptions of PFRCPIT1on page 41, PFRCPIT2 on page 43, and PFRSQIT1 onpage 45.

Nov 1998 E Added PUNPCKLDQ instruction and clarified comments to the square root examples on page 62.

Aug 1999 FChanged X variable to Z in Newton-Raphson recurrence definitions, and swapped order ofPFMUL and PUNPCKLDQ instructions in square root example (24-bit precision) in Chapter 3 onpage 59.

Aug 1999 F Added references to the AMD Athlonprocessor throughout the manual.

Mar 2000 G Updated and clarified the PFACC instruction operation description on page 23.


10/72

x Revision History



11/72


Chapter 1 3DNow! Technology 1

13DNow!Technology

Introduction

3DNow! Techn ology is a s ignif ican t in novat ion to t he x86

archi tec ture tha t d r ives today 's pe rsonal comput ers . 3DNow!

t e c h n o l og y i s a gr o u p o f n e w i n s t r u c t i o n s t h a t o p e n s t h e

traditional processing bottlenecks for floating-point-intensive

a n d m u l t i m e d i a a p p l i c a t i o n s . Wi t h 3 D N o w ! t e c h n o l o gy,

h a r d w a r e a n d s o ft w a r e a p p l i ca t i o n s c a n i m p l e m e n t m o r e

powerful solutions to creat e a m ore ente rt aining and p roductive

P C p l a t f o r m . E xa m p l e s of t h e t y p e o f im p r o ve m e n t s t h a t

3 D No w! t e c h n o l ogy e n a b l e s a r e f a s t e r f r a m e r a t e s o n

h i g h -r e s o l u t i o n s c e n e s , m u c h b e t t e r p h y s i ca l m o d e l i n g of

r e a l -w or l d e n v i r o n m e n t s , sh a r p e r a n d m o r e d e t a i l e d 3 D

imaging, smoother v ideo playback, and n ea r the at er-qua l i ty

audio.

AMD h a s t a k e n a l e a d e r s h i p r o l e i n d e v e l op i n g t h e s e n e w

instructions that ena ble exciting new levels of per forman ce and

realism. 3DNow! te chnology was de fined a nd impleme nte d incollaboration with indep end ent software developers, including

o p e r a t i n g sy s t e m d e s i gn e r s , a p p l i c a t i o n d e v e l op e r s , a n d

graph ics vendors . I t i s compat ib le wi th t oday 's exist ing x86

sof tware a nd re qu i res no opera t ing system suppor t , the reby

e n a b l i n g 3 D No w! a p p l i c a t i o n s t o w or k w i t h a l l e x i st i n g

oper at ing system s. 3DNow! te chnology is impleme nt ed on t he

AMD-K6 -2, AMD-K6-III, an d AMD Ath lon processors. The


12/72

2 3DNow!Technology Chapter 1


A M D A t h l o n p r o c e s s or i m p l e m e n t s f iv e n e w 3 D N ow !

technology ins t ruct ions th at add s t reaming and digita l s ignal

processing (DSP) te chnologies. For more informa tion, see the

AM D Ex tensions to th e 3DNow! and MM X Instru ction Sets

Manual, order # 22466.

Key Functionality

The 3DNow! te chnology ins t ru ct ions are in t en ded to open a

maj or p rocessing bo t t l ene ck in a 3D graph ics app l ica t ion

f loat ing-point opera tions. Today's 3D app lications are facing

l im i t a t i on s d u e t o t h e f a c t t h a t o n l y on e f l oa t i n g -p o i n t

execution u nit e xists in t he most advanced x86 processors. Thefront en d of a typical 3D graphics software pipe line per forms

o b j e c t p h y s i c s, g e o m e t r y t r a n s fo r m a t i o n s , cl i p p i n g , a n d

l i g h t i n g c a l c u l a t i o n s . T h e s e c o m p u t a t i o n s a r e v e r y

f l oa t i n g -p o i n t i n t e n s i ve a n d o f t e n l i m i t t h e f e a t u r e s a n d

funct ionality of a 3D app licat ion. The source of per forman ce for

the 3DNow! instru ctions originat es from the single instruction

m u l t i p l e d a t a ( S I M D ) i m p l e m e n t a t i o n . W it h S I M D , e a c h

i n s t r u c t i o n n o t o n l y op e r a t e s o n t w o s in g l e -p r e c i s io n ,

float ing-point opera nds, but th e microarchitect ure within th e

processor ca n e xecute up to t wo 3DNow! instruct ions per clock

thr ough two regis ter execut ion p ipel ines , which a l lows for atot al of four f loat ing-point op era tions pe r clock. In addit ion,

beca use t he 3DNow! instru ctions use th e same floating-point

registers a s the MMX technology instructions, task switching

bet ween MMX and 3DNow! operations is eliminated.

The 3DNow! te chnology instru ction set cont ains 21 instru ctions

tha t support SIMD floating-point ope rations an d include s SIMD

i n t e g e r o p e r a t i o n s , d a t a p r e f e t c h i n g , a n d f a s t e r

MMX-to-floatin g-poin t switch ing. To imp rove MPE G de coding,

t h e 3 DNow! i n s t r u c t i o n s i n c l u d e a s p e c i f ic S I MD i n t e g e r

ins t ru ct ion create d to fac i li ta t e p ixel-mot ion compe nsat ion.Because m edia-based software t ypically operat es on large da ta

s e t s , t h e p r o c e s so r o ft e n n e e d s t o w a it f o r t h i s d a t a t o b e

tra nsferr ed from ma in mem ory. The e xtra t ime involved with

re t r ieving th is dat a can be avoided by us ing the new 3DNow!

instruction called P REFE TCH. This instru ction can e nsure th at

dat a is in t he level 1 cache when i t is need ed. To improve th e

time it takes t o switch b etween MMX and x87 code, the 3DNow!


13/72



ins t ruct ions include the FEMMS (fas t ent ry /exi t mul t imed ia

s t a t e ) i n s t r u c t i o n , wh i ch e l im i n a t e s m u c h o f t h e o ve r h e a d

involved wit h th e switch. The a dd ition of 3DNow! te chnology

expand s the capabil i t ies of the AMD family of processors and

enab les a new gene ration of enr iched user ap plications.

Feature Detection

To proper ly iden t i fy and u se th e 3DNow! ins t r uc t ions , the

app lication program must det ermine if the processor supports

the m. The CPU ID instru ction gives programme rs the ab ility to

det erm ine t he presen ce of 3DNow! technology on a processor.

S of t w a r e a p p l i ca t i o n s m u s t f i r s t t e s t t o s e e i f t h e C P U I D

i n s t r u c t i o n i s s u p p o r t e d . F or a d e t a i le d d e s c r i p t i o n o f t h eC P U I D i n s t r u c t i on , s e e t h e AMD Processor Recogni t ion

Application Note, orde r# 20734.

The presen ce of th e CPUID instruction is indicate d by the ID

bi t (21) in the EF LAGS reg i s te r . I f th i s b i t i s wr i t a b le , th e

CPUID ins t ru ct ion is sup por t ed . The fo l lowing code sa mple

shows how to test for the presen ce of the CPUID instruction.

pushfd ; save EFLAGSpop eax ; store EFLAGS in EAXmov ebx, eax ; save in EBX for later testing

xor eax, 00200000h ; toggle bit 21push eax ; put to stackpopfd ; save changed EAX to EFLAGSpushfd ; push EFLAGS to TOSpop eax ; store EFLAGS in EAXcmp eax, ebx ; see if bit 21 has changedjz NO_CPUID ; if no change, no CPUID

Once the sof tware has ident i f ied t he pr ocessor s suppor t for

C P U I D , i t m u s t t e s t f or e x t e n d e d f u n c t i o n s b y e x e c u t i n g

ext en de d fun ction 8000_0000h (EAX=8000_0000h). The EAX

r e g i st e r r e t u r n s t h e l a r g e s t e x t e n d e d f u n c t i o n i n p u t v a l u e

defined for t he CPU ID instru ction on t he processor. If the valueis great er th an 8000_0000h, exten ded fun ctions are sup porte d.

The following code sa mp le shows how to test for th e p resen ce of

exte nd ed fun ction 8000_0001h.

mov eax, 80000000h ; query for extended functionsCPUID ; get extended function limitcmp eax, 80000000h ; is 8000_0001h supported?jbe NO_EXTENDEDMSR ; if not, 3DNow! tech. not supported


14/72



The next step is for th e programmer to dete rmine if the 3DNow!

instruct ions are support ed. Exte nde d fun ction 8000_0001h of

the CPUID instru ction provides th is informa tion by retur ning

the extended fea ture b i ts in the EDX regis ter. If b i t 31 in t he

EDX register is set t o 1, 3DNow! instr uctions are supp orte d. The

following code sa mp le shows how to t est for 3DNow! instr uction

support .

mov eax, 80000001h ; setup ext. function 8000_0001hCPUID ; call the functiontest edx, 80000000h ; test bit 31jnz YES_3DNow! ; 3DNow! technology supported

The processor supports all of the above features.

Concate nat ing the code examp les above will produce th e ba sis

for a CPU detection software routine. A more comprehensive

code example is available on the AMD website at

http://www.amd.com/products/cpg/bin/.

Register Set

The complete mul t imedia un i ts in th e p rocessor combine th e

existing MMX instr uctions with t he n ew 3DNow! instr uctions.

In ad dit ion, by merging 3DNow! with MMX, it be comes possible

t o wr i t e x 8 6 p r o g r a m s c o n t a i n i n g b o t h i n t e g e r, MMX, a n df l o a t i n g - p o i n t g r a p h i c s i n s t r u c t i o n s wi t h n o p e r f o r m a n c e

pena l ty fo r swi tch ing be tween the mul t imed ia ( in teger ) and

3DNow! (floating-point) units.

The p rocessor imple me nt s eight 64-bit 3DNow!/MMX registers.

These registers a re m app ed onto th e floating-point re gisters. As

shown in Figure 1, the 3DNow! an d MMX instr uctions refe r to

th ese re gisters as mm0 to mm7. Mappin g the n ew 3DNow!/MMX

r e g i st e r s o n t o t h e f l oa t i n g -p o i n t r e g is t e r s t a c k e n a b l e s

backwards compat ibility for th e re gister saving th at must occur

as a resu lt of ta sk switching.


15/72



Figure 1. 3DNow!/MMX Registers

Alias ing t he 3DNow!/MMX regis ters onto th e f loat ing-point

register stack provides a safe met hod t o introduce 3DNow! and

MMX technology, becau se it doe s not req uire m odificat ions to

e x i st i n g o p e r a t i n g s y s t e m s . In s t e a d o f r e q u i r i n g o p e r a t i n g

s y s t e m m o d i f i c a t i o n s , n e w 3 DNo w! a n d MMX t e c h n o l o gy

app lications are support ed through device d rivers, 3DNow! and

MMX libra ries, or Dyna mic Link Libra ry (DLL) files.

C u r r e n t o p e r a t i n g s y st e m s h a v e s u p p o r t f o r f l o a t i n g -p o i n t

o p e r a t i on s a n d t h e f l oa t i n g -p o i n t r e g i s t e r s t a t e . Us i n g t h e

f l o a t i n g -p o i n t r e g i s t e r s f o r 3 D N ow ! a n d M M X c o d e i s a

conven ien t way o f imp leme nt ing non-in t r us ive supp or t fo r

3 DNow! a n d MMX i n s t r u c t i o n s . E v e r y t i m e t h e p r o c e s s or

execu te s a 3DNow! or MMX instr uction, all th e float ing-point

reg i s te r t ag b i t s a re se t to ze ro (00b=va l id ) , excep t for th e

FEMMS and E MMS instru ctions, which set a l l tag bits to one

(11b=empty).

Note: Executing the PREFETCH instruction does not change the

tag bits.

TAG BITS 63 0

mm0

mm7

mm1

mm6

mm5

mm2

mm3

mm4

xx

xx

xx

xx

xx

xx

xx

xx


16/72



Data Types

3DNow! te chno logy uses a p acked d a t a fo rmat . The da t a i s

pa cked in a single, 64-bit 3DNow!/MMX register or a qu adwordmemory operand.

Figure 2 shows the 3DNow! floating-point d at a t ype. D0 and D1

e a c h h o l d a n I E E E 3 2 -b i t s i n g l e -p r e c i s i on , f l o a t i n g -p o i n t

doubleword.

Figure 2. 3DNow!Data Type

F ig u r e 3 o n p a g e 6 s h o w s t h e f o r m a t o f t h e I E E E 3 2 -b i t ,

single-pre cision, float ing-point format .

Figure 3. Single-Precision, Floating-Point Data Format

63 032 31

(32 bits x 2) Two packed, single-precision, floating-point doublewords

D0D1

031

32-bit, single-precision, floating-point doubleword

22

SignificandBiased ExponentS

Value definitions

1.X=(1)S*0 Biased Exponent=02.X=(1)S*2(Biased Exponent 127)*Significand 0


17/72



Figure 4 shows the forma ts for th e integer d ata types.

Figure 4. Integer Data Types

63 56 55 47

63

39 31 23 15 7

47

63

63

31 15

48 40 32 24 16

0

032

48 32 16 0

08

31

(8 bits x 8) Packed bytes

(16 bits x 4) Packed words

(32 bits x 2) Packed doublewords

(64 bits x 1) Quadword

B2 B1B4 B3B5 B0B6B7

W0W1W2W3

D0D1

Q0


18/72



3DNow!Instruction Formats

The forma t of 3DNow! ins t ruct ion e ncodings is based on th econvent ional x86 modR/M instru ction forma t and is s imilar t o

the forma t u sed by MMX instructions. The assembly language

synta x used for t he 3DNow! instr uctions is as follows:

3DNow! Mnemonic mmreg1, mmreg2/mem64

The des t ina t ion and source1 opera nd (mmre g1) mus t be an

M M X r e g i s t e r ( m m 0 m m 7 ) . Th e s o u r c e 2 o p e r a n d

(mmreg2/me m64) can be e i th er a n MMX registe r or a 64-bi t

memory value.

The en coding uses the op code pre fix 0Fh followed by a second

o p c o d e b y t e o f 0F h . To d i f f e r e n t i a t e t h e v a r i o u s 3 DNo w!

instru ctions, a th ird instr uction suffix byte is used. This suffix

b y t e o c cu p i e s t h e s a m e p o s i t i on a t t h e e n d o f a 3 D N ow !

ins t ruct ions as would an imm8 byte . The opcode format is as

follows:

0Fh 0Fh modR/M [sib] [displacement] 3DNow!_suffix

T h e s p e c i f ic o p e r a n d s ( m m r e g 1 a n d m m r e g 2 /m e m 6 4 )

det erm ine th e values used in modR/M [sib] [displacement ], and

fo l low conven t iona l x86 en cod ings . The 3DNow! su f f ix i sde t e rm ined b y the ac tua l 3DNow! ins t ruc t ion . The 3DNow!

suffixes are defined in Tab le 2 on page 14.

As an examp le , the 3DNow! PFMUL ins t ruct ion can produce

th e following opcode s, de pe nd ing on its use:

Opcode Inst ruct ion

0F 0F CA B4 PFMUL mm1, mm20F 0F 0B B4 PFMUL mm1, [ebx]0F 0F 4B 0A B4 PFMUL mm1, [ebx+10]

26 0F 0F 0B B4 PFMUL mm1, es:[ebx]0F 0F 4C 83 0A B4 PFMUL mm1, [ebx+eax*4+10]

T h e e n c o d i n g o f t h e t w o p e r f o r m a n c e -e n h a n c e m e n t

ins t ruct ions (FEMMS and PR EFE TCH) uses a s ingle opcode

prefix 0Fh. The det ails of the opcodes for these instructions are

shown on pa ges 18 and 56 respectively.


19/72



Definitions

3DNow! te chno logy p rov ides 21 ad d i t iona l ins t r uc t ions t osupport high-per forma nce, 3D graphics and a udio processing.

3DNow! ins t ru ct ions are vector ins t r uct ions tha t opera te on

6 4 -b i t r e g i s t e r s . 3 D N ow ! i n s t r u c t i o n s a r e S I M D e a c h

instr uction ope rat es on pa irs of 32-bit value s.

The de finitions for th e 3DNow! instr uctions star ting on p age 17

conta in d esignat ions classifying e ach instru ction as vectored or

scalar. Vector ins t r uct ions opera t e in pa ra l le l on two se ts of

32-bit, single-pre cision, floating-point words. Inst ru ctions t ha t

a re l abe led as sca la r ins t ruc t ions opera te on a s ing le se t o f

3 2 -b i t o p e r a n d s ( f r o m t h e l o w h a l v e s o f t h e t w o 6 4-b i toperands).

T h e 3 D N o w ! s i n g l e -p r e c i s i o n , f l oa t i n g -p o i n t f o r m a t i s

compatible with the IEEE-754, single-precision format. This

forma t compr ises a 1-bit s ign, an 8-bit b iased e xponent , and a

23-bit s ignificand with one hidde n int eger b it for a tota l of 24

b i t s i n t h e s i g n i fi c a n d . Th e b i a s o f t h e e x p o n e n t i s 12 7 ,

c o n s i st e n t w i t h t h e I E E E s i n g l e -p r e c i s io n s t a n d a r d . Th e

significands are normalized to be within the range of [1,2).

In con t ras t to th e IEEE s tandard t ha t d icta tes four round ingmode s , 3DNow! te chnology sup por ts one roun ding mode

e i t he r roun d-to -ne are s t o r roun d-to -ze ro ( t run ca t ion) . The

hardware implemen ta t ion of 3DNow! te chnology det erm ines

t h e r o u n d i n g m o d e . T h e A M D p r o c e s s o r s i m p l e m e n t

round-to-nea rest mode . Regardless of the r ounding mode u sed,

th e f loa t ing-po in t -to -in t ege r an d in t ege r -to -f loa t ing-po in t

convers ion ins t r uc t ions , PF2ID an d PI2FD, a lways use th e

roun d-to-zero (tr un cation) mode .

The larges t , repre sentab le , norma l number in magni tu de for

th i s p rec i sion in he xadec imal has an e xponen t o f FEh a nd a

significand of 7FFF FFh , with a nume rical value of 2127 (2 223).

Al l resul t s th a t over f low above the ma ximum-rep resen ta ble

p o s i t i v e v a l u e a r e s a t u r a t e d t o e i t h e r t h i s

maximum-repre sent able norm al numb er or t o positive infinity.

S i m i l a r l y, a l l r e s u l t s t h a t o v e r f l o w b e l o w t h e

minimum-representa ble negative value are satura ted to either


20/72



t h i s m i n i m u m -r e p r e s e n t a b l e n o r m a l n u m b e r o r t o n e g a t i ve

infinity.

The implem ent a t ion of 3DNow! te chnology det erm ines how

a r i t h m e t i c ov e r f l ow i s h a n d l e d e i t h e r p r o p e r l y s i gn e dm a x i m u m - o r m i n i m u m -r e p r e s e n t a b l e n o r m a l n u m b e r s o r

proper ly s igned inf in i t ies . The processor gene ra te s proper ly

signed m aximum- or minimum-represen table normal num bers.

Infinit ies and NaNs are not supp orted as operan ds to 3DNow!

instructions.

The smal les t representa ble normal number in magni tud e for

th i s p rec i sion in he xadec imal has an exponen t o f 01h an d a

s i g n i f ic a n d o f 0 00 0 0 0h , w i t h a n u m e r i c a l va l u e o f 2 1 2 6 .

Accord ing ly, a l l r e su l t s be low th i s min imum repr esen tab leva l u e i n m a g n i t u d e a r e h e l d t o ze r o . Ta b l e 1 s h o w s t h e

expone nt ra nges supp orte d by the 3DNow! te chnology.

Like MMX instr uctions, 3DNow! instructions do n ot gen erat e

nume ric except ions nor do they se t a ny s ta tus f lags . I t i s the

users responsibility to ensure tha t in-range d ata is provided to

3DNow! instructions and tha t al l computa tions remain within

valid ranges (or are held a s expected).

Table 1. 3DNow!Technology Exponent Ranges

BiasedExponent

Description

FFh Unsupported *

00h Zero

00h


21/72



Execution Resources on AM D-K6 Processors

T h e r e g i s t e r o p e r a t i o n s o f a l l 3 D N ow ! f l o a t i n g -p o i n t

ins t ruct ions are executed by e i ther th e regis ter X uni t or theregister Y uni t . One opera t ion can b e issued to each regis ter

unit e ach clock cycle, for a ma ximum issue and e xecution ra te

of two 3DNow! ope rat ions pe r cycle. All 3DNow! ope rat ions

have an execu t ion la t en cy o f two clock cyc les an d a re fu l ly

pipelined.

Even t hough 3DNow! execution resources are not d uplicated in

bo th re g i ste r un i t s (for examp le , the re a re n o t two pa i r s o f

3DNow! multipliers, just one share d pa ir of multipliers), the re

a r e n o i n s t r u c t i o n -d e c o d e o r o p e r a t i o n -i s su e p a i r i n g

rest rictions. Whe n, for exa mp le, a 3DNow! multip ly opera tionstarts execution in a re gister un it , that un it grabs and uses the

o n e s h a r e d p a i r o f 3DNow! m u l t i p l i e r s . On ly wh e n a c t u a l

con te n t ion occurs be twee n t wo 3DNow! opera t ions s ta r t ing

execution at t he same time is one of the opera tions held up for

o n e c yc l e i n it s f ir s t e x e c u t i o n p i p e s t a g e wh i l e t h e o t h e r

proceeds. The delay is never more t han one cycle.

F or c o d e o p t i m i z a t i o n p u r p o s e s , 3 D N ow ! o p e r a t i o n s a r e

grouped in t o two ca te gor ies . These ca t egor ies a re ba sed on

execution resources and are important when creating properly

sche du led code . As long as two 3DNow! opera tions th at s tar t

execut ion s imul tane ously do not fa l l in to th e sam e cat egory,

both operations will start execution without d elay.

The first cat egory of instructions conta ins the operat ions for t he

fol lowing 3DNow! ins t ru ct ion s: PFADD, PF SUB, PF SUBR,

PFACC, PFCMPx, PFMIN, PFMAX, PI2FD, PF2ID, PFRCP, and

PFRSQRT.

The second cate gory contains th e operations for th e following

3DNow! ins t r uc t ions : PFMU L, PFRCP IT1, PFR SQIT1 , and

PFRCPIT2.

Note: 3DNow! add and m ult iply operations, am ong other

com binat ions, can execut e sim ult aneously.

Normally, in h igh-pe rforma nce 3DNow! code, a ll of th e 3DNow!

instructions are properly scheduled a par t from each other so as

to avoid de lays due to execution resource conten tions (as well

as taking into account depe nden cies and execution latencies) .


22/72



For fur the r informat ion regarding code opt imizat ion, see th e

AMD-K6 Processor Code Opt im izat ion Application N ote, order #

21924. This d ocumen t provide s in-dep th d iscuss ions of code

optimization techn ique s for th e processor.

F or e x e c u t i o n r e s ou r c e s i n fo r m a t i o n on t h e A M D A t h l o n

p r o c e s s o r, r e f e r t o t h e A M D A t h l o n P r oc es so r x 8 6 C od e

Optim ization Guide, order # 22007.

T h e S I M D 3 D N ow ! i n s t r u c t i o n s f o r a l l p r o c e s s o r s a r e

summarized in Ta b le 2 on p a ge 1 4. The dedicated and shared

execut ion resources of the regis ter X u ni t and regis ter Y u ni t

are shown in Figure 5 on p age 13. The execut ion resources for

some MMX opera tions, as well as al l 3DNow! opera tions, are

shared bet ween t he two register units. For content ion-checking

purp oses , each box represen t s a ca tegory o f opera t ions tha tcann ot start e xecution simultan eously. In addit ion, the MMX

and 3DNow! multiplies use t he same h ardware, while MMX and

3DNow! adds a nd subtracts d o not.

The 3DNow! pe r forma nce-enh ancem ent ins t ruc t ions fo r a l l

AMD processors are summa rized in Ta b le 3 on p a ge 14. The

F E M M S i n s t r u c t i o n d o e s n o t u s e a n y s p e c i fi c e x e c u t i o n

resource or p ipel ine . The PREF ETCH ins t ruct ion is opera ted

on in the Load unit.


23/72



Figure 5. Register X Unit and Register Y Unit Resources

IntegerALU

IntegerShift

IntegerMultiply

and DivideIntegerALU

MMXALU

Add/Subtract,

Compare

MMXShifter

IntegerByte

Operations

IntegerSpecial

Registers

IntegerSegment

Register Loads

MMXALU

Add/Subtract,

Compare

MMXALU

Logical, Pack,

Unpack

Register X ExecutionPipeline

3DNow!

Add/Subtract,

Compare, Integer

Conversion,

Reciprocal and

Reciprocal

Square Root

Table Lookup

MMXand

3DNow!

Multiply,

Reciprocal and

Reciprocal

Square Root

Iteration

MMXALU

Logical, Pack,

Unpack

Shared Register X and Y

Resources

Register Y ExecutionPipeline

Dedicated Register XResources

Dedicated Register YResources


24/72



Table 2. 3DNow!Floating-Point Instructions

Operation FunctionOpcodeSuffix

PAVGUSB Packed 8-bit Unsigned Integer Averaging BFh

PFADD Packed Floating-Point Addition 9Eh

PFSUB Packed Floating-Point Subtraction 9Ah

PFSUBR Packed Floating-Point Reverse Subtraction AAh

PFACC Packed Floating-Point Accumulate AEh

PFCMPGE Packed Floating-Point Comparison, Greater or Equal 90h

PFCMPGT Packed Floating-Point Comparison, Greater A0h

PFCMPEQ Packed Floating-Point Comparison, Equal B0h

PFMIN Packed Floating-Point Minimum 94h

PFMAX Packed Floating-Point Maximum A4h

PI2FD Packed 32-bit Integer to Floating-Point Conversion 0Dh

PF2ID Packed Floating-Point to 32-bit Integer 1Dh

PFRCP Packed Floating-Point Reciprocal Approximation 96h

PFRSQRT Packed Floating-Point Reciprocal Square Root Approximation 97h

PFMUL Packed Floating-Point Multiplication B4h

PFRCPIT1 Packed Floating-Point Reciprocal First Iteration Step A6h

PFRSQIT1 Packed Floating-Point Reciprocal Square Root First Iteration Step A7h

PFRCPIT2 Packed Floating-Point Reciprocal/Reciprocal Square Root Second Iteration Step B6h

PMULHRW Packed 16-bit Integer Multiply with rounding B7h

Table 3. 3DNow!Performance-Enhancement Instructions

Operation FunctionOpcode

Second Byte

FEMMS Faster entry/exit of the MMXor floating-point state 0Eh

PREFETCH/PREFETCHW * Prefetch at least a 32-byte line into L1 data cache (Dcache) 0Dh

Note:

* The AMD-K6-2 and AMD-K6-IIIprocessors execute the PREFETCHW instruction identically to the PREFETCH instruction.On the AMD Athlon processor, PREFETCHW can increase performance by providing a hint to the processor of an intent tomodify the cache line.


25/72



Task Switching

With respe ct to t ask switching, treat the 3DNow! instruct ions

exactly the sam e a s MMX instru ctions. Operating system d esign

must be take n into account whe n writing a 3DNow! program.

Th e p r o gr a m m e r m u s t k n o w wh e t h e r t h e o p e r a t i n g sy st e m

aut omatically saves the curre nt state s when t ask switching, or if

the 3DNow! program ha s to provide the code to save sta tes.

If a task switch occurs, the Cont rol Registe r (CR0) Task Switch

(TS) bit is set to 1. The p rocessor then generat es an inte rru pt 7

( in t 7 Device Not Ava i lab le ) whe n i t en coun te r s the n ex t

f loa t in g-po in t , 3DNow!, o r MMX ins t ru c t ion , a l lowing th e

opera t ing syste m t o save t he s ta t e o f t he 3DNow!/MMX/FP

registers.

In a mul t i t a sk ing opera t ing system , i f the re i s a t a sk switch

w h e n 3 D N ow !/M M X a p p l i c a t i o n s a r e r u n n i n g w it h o l d e r

a p p l i ca t i o n s t h a t d o n o t i n c l u d e M M X i n s t r u c t i o n s , t h e

MMX/FP re gister stat e is still saved a utoma tically thr ough the

int 7 han dler.

Exceptions

Tab le 4 conta ins a l i st of except ions tha t 3DNow! and MMX

instructions can generate .

Table 4. 3DNow!and MMX Instruction Exceptions

Exception RealVirtual8086 Protected Description

Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the controlregister (CR0) is set to 1.

Stack exception (12) X X X During instruction execution, the stack segment limit was exceeded.

General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

Page fault (14) X X A page fault resulted from the execution of the instruction.

Floating-point exceptionpending (16)

X X X An exception is pending due to the floating-point execution unit.

Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1. (InProtected Mode, CPL = 3.)


26/72



T h e r u l e s f or e x c e p t i o n s a r e t h e s a m e f o r b o t h M M X a n d

3DNow! ins t ru c t ions . In add i t ion , excep t ion de t ec t ion an d

ha ndling is iden tical for MMX an d 3DNow! instr uctions. None

of the e xception handlers nee d modification.

Notes:

1. An in valid opcode exception (int errupt 6) occurs if a

3DNow! instruction is executed on a processor that does

not support 3DNow! instructions.

2. If a floatin g-point exception is pending and t he processor

encoun ters a 3DNow! in struction, FERR# is asserted an d,

if CR0.NE = 1, an interrupt 16 is generated. (This is the

sam e for MM X in structions.)

Prefixes

The following pre fixes can be used with 3DNow! instr uctions:

s The segme nt over ride pre fixes (2Eh /CS, 36h/SS, 3Eh/DS,

26h/ES, 64h/FS, and 65h/GS) affect 3DNow! instructions

that contain a memory operand.

s The a dd ress-size over ride pre fix (67h) affect s 3DNow!

instructions that contain a memory opera nd.

s The ope ran d-size over ride pre fix (66h) is ignored.

s The LOCK pre fix (F0h) tr iggers an invalid opcode e xception(interrupt 6).

s The REP prefixes (F3h/ REP/ REPE / REPZ, F2h/ REPNE/

REPNZ) are ignored.


27/72


Chapter 2 3DNow! Instruction Set 17

23DNow!Instruction Set

T h e f o l l ow i n g 3 D N ow ! i n s t r u c t i o n d e f i n i t i o n s a r e i n

alphab etical order according to the instruction mnem onics.


28/72

18 3DNow!Instruction Set Chapter 2


FEMMS

mnemonic opcode description

FEMMS 0F 0Eh Faster Enter/Exit of the MMX or floating-point state

Privilege: none

Registers Affected: MMX

Flags Affected: none

Exceptions Generated:

Like the EMMS instr uction, the FE MMS instruction can be u sed to clear the MMX

stat e fo l lowing th e exe cut ion of a b lock of MMX ins t ru ct ions . Becau se th e MMX

registers a nd tag words are share d with the floating-point u nit, it is necessary to clear

th e stat e be fore execut ing floating-point instru ctions. Unlike th e EMMS instruct ion,

th e con ten t s o f the MMX/f loa t ing -p o i n t r e g is t e r s a r e u n d e f in e d a f t e r a F E MMSinstruct ion is executed . There fore , the FEMMS inst ru ct ion offers a fas ter context

switch at th e en d of an MMX routine whe re th e values in the MMX registers are n o

longer r equired . FEMMS can also be used prior to execut ing MMX instructions where

th e pre ceding floatin g-point register values are n o longer req uired , which facilita te s

faster context switching.


Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is setto 1.

Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of thecontrol register (CR0) is set to 1.




29/72



PAVGUSB

mnemonic opcode/imm8 description

PAVGUSB mmreg1, mmreg2/mem64 0F 0Fh / BFh Average of unsigned packed 8-bit values

Privilege: None


Flags Affected: None


The PAVGUSB instr uction pr oduce s the r ound ed avera ges of th e eight u nsigned 8-bit

inte ger values in th e source operan d (an MMX register or a 64-bit me mory location)

and the eight corresponding unsigned 8-bit integer values in the d estinat ion opera nd

(an MMX register) . It d oes so by adding the source and destination byte values an d

the n adding a 001h to the 9-bit inter med iate value. The interm ediat e value is then

divided by 2 (shifte d right one p lace) and t he e ight u nsigned 8-bit results are stored

in the MMX register spe cified as the de stination operand .

The PAVGUSB ins t ru c t ion can be u sed fo r p ixe l ave rag in g in MPEG-2 mot ion

compe nsation an d video scaling operat ions.



Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

Stack exception (12) X During instruction execution, the stack segment limit was exceeded.


Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.




Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

(In Protected Mode, CPL = 3.)


30/72



Functional Illustration of the PAVGUSB Instruction

The following list expla ins th e fun ctional illustra tion of th e PAVGUSB instr uction:

s The round ed byte average of FFh a nd FF h is FFh .

s The rounded byte average of FFh and 00h is 80h.

s The rounded byte average of 01h and FFh is also 80h.

s The rounded byte average of 0Fh a nd 10h is 10h.

s The rounde d byte average of 00h and 01h is 01h.

s The rounde d byte average of 70h and 44h is 5Ah.

s The rounded byte average of 07h and F7h is 7Fh.

s The rounde d byte average of 9Ah and A8h is A1h.

The eq ua tions for byte avera ging with roun ding are a s follows:

s mm reg1[63:56] = (mm re g1[63:56] + mm re g2/me m64[63:56] + 01h)/2






s mm reg1[15:8] = (mm re g1[15:8] + mmr eg2/mem 64[15:8] + 01h)/2

s mm reg1[7:0] = (mm re g1[7:0] + mmr eg2/mem 64[7:0] + 01h)/2

FFh FFh 01h 0Fh 9Ah00h 70h 07hmmreg2/mem64

mmreg1

per byte averaging

= = = = = = ==

FFh 80h 80h 10h A1h01h 5Ah 7Fhmmreg1

FFh 00h FFh 10h A8h01h 44h F7h

063

063

063

Indicates a value that was rounded-up


31/72



PF2ID


PF2ID mmreg1, mmreg2/mem64 0Fh 0Fh / 1Dh Converts packed floating-point operand to packed32-bit integer

Privilege: none




P F 2 I D i s a v e c t o r i n s t r u c t i o n t h a t c o n v e r t s a v e c t o r r e g i s t e r c o n t a i n i n g

single-precision, floating-point operands to 32-bit signed integers using truncation.

Tab le 5 on page 22 shows the nu mer ical range of th e PF 2ID instruction.

The PF 2ID instruction per forms t he following operat ions:

IF (mmreg2/mem64[31:0] >= 231)THEN mmreg1[31:0] = 7FFF_FFFFh

ELSEIF (mmreg2/mem64[31:0] = 231)

THEN mmreg1[63:32] = 7FFF_FFFFhELSEIF (mmreg2/mem64[63:32]


32/72



Related Instructions See the PI2FD instruction.

Table 5. Numerical Range for the PF2ID Instruction

Source 2 Source 1 and Destination

0 0

Normal, abs(Source 1)


33/72



PFACC


PFACC mmreg1, mmreg2/mem64 0Fh 0Fh / AEh Floating-point accumulate

Privilege: none




PFACC is a vector ins t ru ct ion tha t accumu late s the two words of the de s t inat ion

operan d an d the source operan d and stores the results in the low and high words of

de stina tion ope ran d resp ectively. Both ope ran ds are single-pre cision, floatin g-point

opera nd s with 24-bit significands. Tab le 6 on page 24 shows the nume rical range of the

PFACC instru ction.

The PFACC instr uction pe rforms the following operat ions:

temp = mmreg2/mem64

mmreg1[31:0] = mmreg1[31:0] + mmreg1[63:32]mmreg1[63:32] = temp[31:0] + temp[63:32]













34/72



Table 6. Numerical Range for the PFACC Instruction

Source 2

0 Normal Unsupported

Source 1 andDestination

0 +/ 0 1 Source 2 Source 2

Normal Source 1 Normal, +/ 0 2 Undefined

Unsupported Source 1 Undefined Undefined

Notes:

1. The sign of the result is the logical AND of the signs of the source operands.

2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operandthat is larger in magnitude (if the magnitudes are equal, the sign of source 1 is used). If the absolute value of the resultis greater than or equal to 2128, the result is the largest normal number with the sign being the sign of the source operandthat is larger in magnitude.


35/72



PFADD


PFADD mmreg1, mmreg2/mem64 0Fh 0Fh / 9Eh Packed, floating-point addition

Privilege: none




PFADD is a vector instruct ion th at pe rforms addition of the destina tion operand a nd

th e source ope rand . Both ope rand s are single-pre cision, float ing-point ope rand s with

24-bit s ignifican ds. Ta b le 7 on p a ge 26 shows the nu mer ica l r ange o f the PFADD

instruction.

The PFADD instr uction pe rforms the following ope rat ions:

mmreg1[31:0] = mmreg1[31:0] + mmreg2/mem64[31:0]mmreg1[63:32] = mmreg1[63:32] + mmreg2/mem64[63:32]













36/72



Table 7. Numerical Range for the PFADD Instruction

Source 2



0 +/ 0 1 Source 2 Source 2

Normal Source 1 Normal, +/ 0 2 Undefined

Unsupported Source 1 Undefined Undefined

Notes:

1. The sign of the result is the logical AND of the signs of the source operands.

2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operandthat is larger in magnitude (if the magnitudes are equal, the sign of source 1 is used). If the absolute value of the resultis greater than or equal to 2128, the result is the largest normal number with the sign being the sign of the source operandthat is larger in magnitude.


37/72



PFCMPEQ


PFCMPEQ mmreg1, mmreg2/mem64 0Fh 0Fh / B0h Packed floating-point comparison, equal to

Privilege: none




PFCMPEQ is a vec to r ins t ruc t ion th a t pe r forms a compar i son o f the de s t ina t ion

operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the

result of the corresponding compar ison. Tab le 8 on page 28 shows the num erical range

of the PFCMPEQ instruction.

The PF CMPEQ instruction pe rforms t he following operat ions:

IF (mmreg1[31:0] = mmreg2/mem64[31:0])THEN mmreg1[31:0] = FFFF_FFFFh

ELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] = mmreg2/mem64[63:32]

THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h













38/72



Related Instructions See the P FCMPGE instruction.

See th e PFCMPGT instruction.

Table 8. Numerical Range for the PFCMPEQ Instruction

Source 2



0 FFFF_FFFFh 1 0000_0000h 0000_0000h

Normal 0000_0000h0000_0000h,

FFFF_FFFFh 20000_0000h

Unsupported 0000_0000h 0000_0000h Undefined

Notes:

1. Positive zero is equal to negative zero.

2. The result is FFFF_FFFFh if source 1 and source 2 have identical signs, exponents, and mantissas. Otherwise, the result is0000_0000h.


39/72



PFCMPGE


PFCMPGE mmreg1, mmreg2/mem64 0Fh 0Fh / 90h Packed floating-point comparison, greater than orequal to

Privilege: none




PFCMPGE i s a vec to r ins t ruc t ion th a t pe r forms a compar i son o f the de s t ina t ion


result of the corresponding compar ison. Tab le 9 on page 30 shows the num erical range

of the PFCMPGE instruction.

The PF CMPGE instruction pe rforms t he following operations:

IF (mmreg1[31:0] >= mmreg2/mem64[31:0])

THEN mmreg1[31:0] = FFFF_FFFFhELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] >= mmreg2/mem64[63:32]











Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.(In Protected Mode, CPL = 3.)


40/72



Related Instructions See the PF CMPEQ instruction.

See th e PFCMPGT instruction.

Table 9. Numerical Range for the PFCMPGE Instruction

Source 2



0 FFFF_FFFFh 10000_0000h,

FFFF_FFFFh 2Undefined

Normal0000_0000h,

FFFF_FFFFh 3

0000_0000h,


Unsupported Undefined Undefined Undefined

Notes:

1. Positive zero is equal to negative zero.

2. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h.

3. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h.

4. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative and source 1 is smallerthan or equal in magnitude to source 2, or if source 1 and source 2 are both positive and source 1 is greater than or equal inmagnitude to source 2. The result is 0000_0000h in all other cases.


41/72



PFCMPGT


PFCMPGT mmreg1, mmreg2/mem64 0Fh 0Fh / A0h Packed floating-point comparison, greater than

Privilege: none




PFCMPGT is a vec tor ins t ru c t ion tha t pe r forms a compar i son o f the de s t ina t ion


resul t of the corresponding compa rison. Ta b le 10 on p a g e 32 shows the nume rical

range of the P FCMPGT instruction.

The PFCMPGT instru ction per forms the following ope rat ions:

IF (mmreg1[31:0] > mmreg2/mem64[31:0])THEN mmreg1[31:0] = FFFF_FFFFh

ELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] > mmreg2/mem64[63:32]




Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.










42/72



Related Instructions See the PF CMPEQ instruction.

See the P FCMPGE instruction.

Table 10. Numerical Range for the PFCMPGT Instruction

Source 2



0 0000_0000h0000_0000h,


Normal0000_0000h,

FFFF_FFFFh 2

0000_0000h,



Notes:

1. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h.

2. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h.

3. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative and source 1 is smaller inmagnitude than source 2, or if source 1 and source 2 are positive and source 1 is greater in magnitude than source 2. The resultis 0000_0000h in all other cases.


43/72



PFMAX


PFMAX mmreg1, mmreg2/mem64 0Fh 0Fh / A4h Packed floating-point maximum

Privilege: none




PFMAX is a vector ins t ru ct ion tha t re tur ns th e larger of the t wo single-precis ion,

f loat ing-point operan ds . Any opera t ion with a zero and a negat ive num ber re tur ns

positive zero. An opera tion consisting of t wo zeros re tu rn s positive zero. Table 11 on

p age 34 shows the nu mer ical range of the P FMAX instruction.

The PF MAX instru ction per forms t he following oper ations:

IF (mmreg1[31:0] > mmreg2/mem64[31:0])THEN mmreg1[31:0] = mmreg1[31:0]

ELSE mmreg1[31:0] = mmreg2/mem64[31:0]IF (mmreg1[63:32] > mmreg2/mem64[63:32])

THEN mmreg1[63:32] = mmreg1[63:32]ELSE mmreg1[63:32] = mmreg2/mem64[63:32]













44/72



Related Instructions See the PFMIN instruction.

Table 11. Numerical Range for the PFMAX Instruction

Source 2



0 +0 Source 2, +0 1 Undefined

Normal Source 1, +0 2 Source 1/Source 2 3 Undefined


Notes:

1. The result is source 2, if source 2 is positive. Otherwise, the result is positive zero.

2. The result is source 1, if source 1 is positive. Otherwise, the result is positive zero.

3. The result is source 1, if source 1 is positive and source 2 is negative. The result is source 1, if both are positive and source 1 isgreater in magnitude than source 2. The result is source 1, if both are negative and source 1 is lesser in magnitude than source2. The result is source 2 in all other cases.


45/72



PFMIN


PFMIN mmreg1, mmreg2/mem64 0Fh 0Fh / 94h Packed floating-point minimum

Privilege: none




PFMIN is a vector ins t ru ct ion t hat re tu rns t he smal ler of the two s ingle-precis ion,

f loat ing-point opera nds . Any opera t ion with a zero and a posi t ive num ber re t urn s

positive zero. An opera tion consisting of t wo zeros re tu rn s positive zero. Table 12 on

p age 36 shows the nu mer ical range of the P FMIN instruction.

The PFMIN instr uction pe rforms the following opera tions:

IF (mmreg1[31:0] < mmreg2/mem64[31:0])THEN mmreg1[31:0] = mmreg1[31:0]

ELSE mmreg1[31:0] = mmreg2/mem64[31:0]IF (mmreg1[63:32] < mmreg2/mem64[63:32])

THEN mmreg1[63:32] = mmreg1[63:32]ELSE mmreg1[63:32] = mmreg2/mem64[63:32]













46/72



Related Instructions See the PFMAX instru ction.

Table 12. Numerical Range for the PFMIN Instruction

Source 2



0 +0 Source 2, +0 1 Undefined

Normal Source 1, +0 2 Source 1/Source 2 3 Undefined


Notes:

1. The result is source 2, if source 2 is negative. Otherwise, the result is positive zero.

2. The result is source 1, if source 1 is negative. Otherwise, the result is positive zero.

3. The result is source 1, if source 1 is negative and source 2 is positive. The result is source 1, if both are negative and source 1 isgreater in magnitude than source 2. The result is source 1, if both are positive and source 1 is lesser in magnitude than source2. The result is source 2 in all other cases.


47/72



PFMUL


PFMUL mmreg1, mmreg2/mem64 0Fh 0Fh / B4h Packed floating-point multiplication

Privilege: none




P F MUL i s a ve c t o r i n s t r u c t i o n t h a t p e r f o r m s m u l t i p l i ca t i o n o f t h e d e s t i n a t i o n

operan d a nd the source operan d. Both ope rand s are single-precision, f loating-point

opera nds with 24-bit significand s. Tab le 13 on page 38 shows the numer ical range of

the PF MUL instruction.

The PF MUL instruction pe rforms t he following operations:

mmreg1[31:0] = mmreg1[31:0] * mmreg2/mem64[31:0]mmreg1[63:32] = mmreg1[63:32] * mmreg2/mem64[63:32]













48/72



Table 13. Numerical Range for the PFMUL Instruction

Source 2



0 +/ 0 1 +/ 0 1 +/ 0 1

Normal +/ 0 1 Normal, +/ 0 2 Undefined

Unsupported +/ 0 1 Undefined Undefined

Notes:

1. The sign of the result is the exclusive-OR of the signs of the source operands.

2. If the absolute value of the result is less then 2126, the result is zero with the sign being the exclusive-OR of the signs of thesource operands. If the absolute value of the product is greater than or equal to 2 128, the result is the largest normal numberwith the sign being exclusive-OR of the signs of the source operands.


49/72



PFRCP


PFRCP mmreg1, mmreg2/mem64 0Fh 0Fh / 96h Floating-point reciprocal approximation

Privilege: none




PFR CP is a scalar instruction th at retu rns a low-precision estimat e of the reciprocal of

the source operand . The single result value is duplicated in both h igh an d low halves

of this instruction s 64-bit r esu lt. The source op era nd is single-pre cision with a 24-bit

s ign i f i can d , and t he re su l t i s accura t e to 14 b i t s . Ta b le 14 on p a ge 4 0 shows th e

nume rical range of the PFRCP instruction.

Increa sed accur acy (th e full 24 bit s of a single-pre cision significand ) requ ires th e use

of two add i t ional ins t ruct ions (PFR CPIT1 an d PF RCPIT2). The f i rs t s ta ge of th is

increase or refinement in a ccuracy (PFR CPIT1) requ ires that th e input a nd outp ut of

t h e a l r e a d y e xe c u t e d P F R C P i n s t r u ct i on b e u s e d a s i n p u t t o t h e P F R C P IT 1

i n s t r u c t i o n . R e f e r t o D i v i s i o n a n d S q u a r e R o o t o n p a g e 5 9 f o r a n

app lication-specific example of how to use this instru ction a nd relate d instr uctions.

The PF RCP instru ction p erforms th e following operations:

mmreg1[31:0] = reciprocal(mmreg2/mem64[31:0])mmreg1[63:32] = reciprocal(mmreg2/mem64[31:0])













50/72



In the following code exam ple, the b old l ine i l lustrat es the PFR CP instruction in a

seque nce used t o comput e q = a/b accurat e to 24 bits:

X0 = PFRCP(b)

X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)

Related Instructions See the PFRCPIT1 instru ction.

See the PFRCPIT2 instru ction.

Table 14. Numerical Range for the PFRCP Instruction


Source 2

0 +/ Maximum Normal 1

Normal Normal, +/ 0 2

UnsupportedUndefined

Notes:

1. The result has the same sign as the source operand.

2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operand.Otherwise, the result is a normal with the sign being the same sign as the source operand.


51/72



PFRCPIT1


PFRCPIT1 mmreg1, mmreg2/mem64 0Fh 0Fh / A6h Packed floating-point reciprocal, first iteration step

Privilege: none




PFRCPIT1 i s a vec to r ins t ruc t ion tha t pe r fo rms the f i r st in te r media te s tep in t he

Newton-Raph son i tera t ion to ref ine th e re c iprocal approximat ion p roduced by t he

PFR CP instru ction (the second and final step complete s the iterat ion and is accurat e

t o 2 4 b i t s ) . Ta b le 15 on p a g e 4 2 s h o ws t h e n u m e r i c a l r a n g e o f t h e P F R C P I T 1

instruction.

The beh avior of this instruction is only defined for t hose combinations of opera nds

such th at one source operand was the input to th e PFR CP inst ruct ion an d th e o ther

source operan d was the out put of the sam e PFR CP instru ction. Refer to Division an d

S q u a r e R o ot on p a ge 59 for a n a pp l ica t ion-spec i f ic examp le o f how to u se th i s

instruction and relate d instructions.













52/72



In the following code example, th e b old line illustrates t he P FRCPIT1 instru ction in a


X0 = PFRCP(b)


Related Instructions See the PFR CP instru ction.

See the PFRCPIT2 instru ction.

Table 15. Numerical Range for the PFRCPIT1 Instruction

Source 2



0 +/ 0 1 +/ 0 1 +/ 0 1

Normal +/ 0 1 Normal 2 Undefined


Notes:


2. The sign is positive.


53/72



PFRCPIT2


PFRCPIT2 mmreg1, mmreg2/mem64 0Fh 0Fh / B6h Packed floating-point reciprocal/reciprocal squareroot, second iteration step

Privilege: none




PFRCPIT2 is a vector ins t ruct ion that performs the second and f inal in term edia te

step in the Newton-Rap hson iteration t o refine the reciprocal or reciprocal square root

ap prox imat ion p roduced b y the PF RCP an d PFSQRT ins t r uc t ions , r e spec t ive ly.

Tab le 16 on page 44 shows the nu mer ical range of the PFR CPIT2 instru ction.


such that t he first source operan d (mmreg1) was the outpu t of either t he P FRCPIT1 or

PF RSQIT1 ins t ru ct ions and th e second source operand (mm reg2/mem 64) was the

outpu t of eithe r the PFR CP or PFR SQRT instructions. Refer t o Division and Squa re

Root on p a ge 59 for an app lication-specific examp le of how to use this instr uction

and related instructions.










Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.(In Protected Mode, CPL = 3.)


54/72



In the following code example, th e b old line illustrates t he P FRCPIT2 instru ction in a


X0 = PFRCP(b)


Related Instructions See the PFRCPIT1 instru ction.

See th e PF RSQIT1 instru ction.

See the PFR CP instru ction.

See the P FRSQRT instruction.

Table 16. Numerical Range for the PFRCPIT2 Instruction

Source 2



0 +/ 0 1 +/ 0 1 +/ 0 1

Normal +/ 0 1 Normal, +/ 0 2 Undefined


Notes:


2. If the absolute value of the result is less then 2126, the result is zero with the sign being the exclusive-OR of the signs of thesource operands. If the absolute value of the product is greater than or equal to 2 128, the result is the largest normal numberwith the sign being exclusive-OR of the signs of the source operands.


55/72



PFRSQIT1


PFRSQIT1 mmreg1, mmreg2/mem64 0Fh 0Fh / A7h Packed floating-point reciprocal square root, firstiteration step

Privilege: none




PFRSQIT1 i s a vec to r ins t ruc t ion tha t pe r fo rms the f i r st in te r media te s tep in t he

Ne wt o n -R a p h s o n it e r a t i o n t o r e f i n e t h e r e c i p r o c a l s q u a r e r o o t a p p r o xi m a t i o n

p r o d u c e d b y t h e P F S QR T i n s t r u c t i o n ( t h e s e c o n d a n d f i n a l s t e p c o m p l e t e s t h e

iterat ion and is accurate t o 24 bits). Table 17 on page 46 shows the nu mer ical range of

the PFR SQIT1 instruction.


such th at one source operand was the input to the PFRSQRT instruction an d t he other

source operan d is the squ are of the outpu t of the sam e PFR SQRT instru ction. Refer to

Division and Square Root on p a ge 59 for an ap plication-specific example of how to

use this instru ction and re lated instru ctions.






Segment over run (13) X X One

Date post:	06-Apr-2018
Category:	Documents
Upload:	ridwan-fatoni
View:	225 times
Download:	0 times

3D Now Technology Manual

Documents