+ All Categories
Home > Documents > 3D Now Technology Manual

3D Now Technology Manual

Date post: 06-Apr-2018
Category:
Upload: ridwan-fatoni
View: 225 times
Download: 0 times
Share this document with a friend

of 72

Transcript
  • 8/2/2019 3D Now Technology Manual

    1/72

    3DNow!TechnologyManual

    TM

  • 8/2/2019 3D Now Technology Manual

    2/72

    Trademarks

    AMD, the AMD logo, K6, 3DNow!, AMD Athlon, an d combina tions th ereof, and K86 are trad ema rks, and AMD-K6

    is a registered t rade mark of Advanced Micro Devices, Inc.

    MMX is a trade mark of Intel Corporation.

    Other produ ct names used in this publication are for identification purp oses only and may be trad emarks of

    their respective companies.

    2000 Advanced Micro Devices, Inc. All rights reserved.

    The conten ts of this documen t are p rovided in connect ion with Advanced Micro Devices, Inc.

    (AMD) products. AMD makes no representations or warrant ies with respect to t he a ccuracy

    or completeness of the contents of this pub lication and reserves the right to make changes to

    specifications and product descriptions at any time without notice. No license, whether

    express, implied, arising by estopp el or otherwise, to any intellectual proper ty rights is grant edby this pub lication. Except a s set forth in AMDs Standa rd Terms and Cond itions of Sale, AMD

    assumes no liability what soever, and disclaims any express or implied warra nty, relatin g to its

    products including, but not limited to, the implied warranty of merchantability, fitness for a

    part icular pu rpose, or infringement of any intellectual property right.

    AMDs products are not d esigned, inten ded, aut horized or warranted for u se as components

    in systems inten ded for surgical implant into t he body, or in other ap plications intend ed t o

    suppor t or susta in life, or in any oth er ap plication in which th e failure of AMDs product could

    create a situat ion where personal injury, death, or severe proper ty or environmental dam age

    may occur. AMD reserves the right t o discontinue or ma ke chan ges to its products at a ny time

    without notice.

  • 8/2/2019 3D Now Technology Manual

    3/72

    Contents iii

    21928G/0March 2000 3DNow! Technology Manual

    Contents

    Revision Hist ory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

    1 3DNow! Technology 1

    In t roduct ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Key F unct ion alit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Fe a tu re De te ction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    Reg ist e r Se t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Data Typ es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    3DNow! Inst ru ction For ma ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    Def in it ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Execut ion Re sources on AMD-K6 Pr ocessors . . . . . . . . . . . . 11

    Task Swit ching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Excep t ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Pr efixe s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2 3DNow! Instruction Set 17

    FE MMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    PAVGU SB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19PF 2ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    PF ACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    PF ADD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    PF CMP EQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    PF CMP GE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    PF CMP GT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    PF MAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    PF MIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    PF MU L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    PF RCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    PF RCPIT1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    PF RCPIT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

  • 8/2/2019 3D Now Technology Manual

    4/72

    iv Contents

    3DNow!Technology Manual 21928G/0March 2000

    PF RSQIT1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    PF RSQR T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    PF SUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    PF SUBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    PI2FD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    PMULHRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    PR EF ETCH /PRE FE TCHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    3 Division and Square Root 59

    Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    Divid e Exa mp les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    Squ ar e R oot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    Squ are Root E xamp les. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

  • 8/2/2019 3D Now Technology Manual

    5/72

    List of Figures v

    21928G/0March 2000 3DNow!Technology Manual

    List of Figures

    F igu re 1. 3DNow!/MMX Re gister s . . . . . . . . . . . . . . . . . . . . . . . . 5

    Figure 2. 3DNow! Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Figure 3. Single-Pre cision, Floating-Point Data Forma t. . . . . . . . . . 6

    Figure 4. Int eger Dat a Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Figure 5 . Regis ter X Uni t and Register Y Unit Resources . . . . . . 13

  • 8/2/2019 3D Now Technology Manual

    6/72

    vi List of Figures

    3DNow!Technology Manual 21928G/0March 2000

  • 8/2/2019 3D Now Technology Manual

    7/72

    List of Tables vii

    21928G/0March 2000 3DNow!Technology Manual

    List of Tables

    Tab le 1. 3DNow! Tech nology Expon en t R an ges. . . . . . . . . . . . 10

    Table 2. 3DNow! Floating-Point Instructions. . . . . . . . . . . . . . . . 14

    Tab le 3. 3DNow! Per fo rmance-Enhancement Inst ruct ions . . . . 14

    Table 4. 3DNow! and MMX Instr uction E xcept ions . . . . . . . . 15

    Tab le 5. Numer ica l Range for the PF2ID Inst ruct ion . . . . . . . . . 22

    Tab le 6. Numer ica l Range for the PFACC Inst ruct ion . . . . . . . . 24

    Tab le 7. Numer ica l Range for the PFADD Inst ruct ion . . . . . . . . 26

    Tab le 8. Numer ica l Range for the PFCMPEQ Inst ruct ion . . . . . 28

    Tab le 9. Numer ica l Range for the PFCMPGE Inst ruct ion . . . . . 30

    Table 10. Numerical Range for the PFCMPGT Inst ruct ion . . . . . 32

    Table 11. Numerical Range for the PFMAX Inst ruct ion . . . . . . . 34

    Table 12. Numerical Range for the PFMIN Inst ruct ion . . . . . . . . 36

    Table 13. Numerical Range for the PFMUL Inst ruct ion . . . . . . . 38

    Table 14. Numerical Range for the PFRCP Ins t ruct ion . . . . . . . . 40

    Table 15. Numerical Range for the PFRCPIT1 Inst ruct ion . . . . . 42

    Table 16. Numerical Range for the PFRCPIT2 Inst ruct ion . . . . . 44

    Table 17. Numerical Range for the PFRSQIT1 Inst ruct ion . . . . . 46

    Table 18. Numerical Range for the PFRSQRT Inst ruct ion . . . . . 48

    Table 19. Numerical Range for the PFSUB Inst ruct ion . . . . . . . . 50

    Table 20. Numerical Range for the PFSUBR Ins t ruct ion . . . . . . 52

    Tab le 21. Summary of PREFETCH Inst ruct ion TypeOp t ion s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

  • 8/2/2019 3D Now Technology Manual

    8/72

    viii List of Tables

    3DNow!Technology Manual 21928G/0March 2000

  • 8/2/2019 3D Now Technology Manual

    9/72

    Revision History ix

    21928G/0March 2000 3DNow!Technology Manual

    Revision History

    Date Rev DescriptionFeb 1998 A Initial Release

    Feb 1998 B Clarified CPUID usage in Feature Detection on page 3.

    May 1998 C Revised description of 3DNow! instructions in Definitions on page 9.

    May 1998 C Revised function descriptions in Table 2, 3DNow!Floating-Point Instructions, on page 14.

    Sept 1998 D Revised code example for the PFRSQRT instruction on page48.

    Sept 1998 DChanged exceptions generated for the PREFETCH/PREFETCHW instructions to none, deletedexception table, and revised PREFETCHW description on page56.

    Sept 1998 D Added PUNPCKLDQ instruction to the division example (24-bit precision) on page60.

    Nov 1998 E Added sample code that tests for the presence of extended function 8000_0001h on page3.

    Nov 1998 EClarified instruction descriptions of PFRCPIT1on page 41, PFRCPIT2 on page 43, and PFRSQIT1 onpage 45.

    Nov 1998 E Added PUNPCKLDQ instruction and clarified comments to the square root examples on page 62.

    Aug 1999 FChanged X variable to Z in Newton-Raphson recurrence definitions, and swapped order ofPFMUL and PUNPCKLDQ instructions in square root example (24-bit precision) in Chapter 3 onpage 59.

    Aug 1999 F Added references to the AMD Athlonprocessor throughout the manual.

    Mar 2000 G Updated and clarified the PFACC instruction operation description on page 23.

  • 8/2/2019 3D Now Technology Manual

    10/72

    x Revision History

    3DNow!Technology Manual 21928G/0March 2000

  • 8/2/2019 3D Now Technology Manual

    11/72

    21928G/0March 2000 3DNow!Technology Manual

    Chapter 1 3DNow! Technology 1

    13DNow!Technology

    Introduction

    3DNow! Techn ology is a s ignif ican t in novat ion to t he x86

    archi tec ture tha t d r ives today 's pe rsonal comput ers . 3DNow!

    t e c h n o l og y i s a gr o u p o f n e w i n s t r u c t i o n s t h a t o p e n s t h e

    traditional processing bottlenecks for floating-point-intensive

    a n d m u l t i m e d i a a p p l i c a t i o n s . Wi t h 3 D N o w ! t e c h n o l o gy,

    h a r d w a r e a n d s o ft w a r e a p p l i ca t i o n s c a n i m p l e m e n t m o r e

    powerful solutions to creat e a m ore ente rt aining and p roductive

    P C p l a t f o r m . E xa m p l e s of t h e t y p e o f im p r o ve m e n t s t h a t

    3 D No w! t e c h n o l ogy e n a b l e s a r e f a s t e r f r a m e r a t e s o n

    h i g h -r e s o l u t i o n s c e n e s , m u c h b e t t e r p h y s i ca l m o d e l i n g of

    r e a l -w or l d e n v i r o n m e n t s , sh a r p e r a n d m o r e d e t a i l e d 3 D

    imaging, smoother v ideo playback, and n ea r the at er-qua l i ty

    audio.

    AMD h a s t a k e n a l e a d e r s h i p r o l e i n d e v e l op i n g t h e s e n e w

    instructions that ena ble exciting new levels of per forman ce and

    realism. 3DNow! te chnology was de fined a nd impleme nte d incollaboration with indep end ent software developers, including

    o p e r a t i n g sy s t e m d e s i gn e r s , a p p l i c a t i o n d e v e l op e r s , a n d

    graph ics vendors . I t i s compat ib le wi th t oday 's exist ing x86

    sof tware a nd re qu i res no opera t ing system suppor t , the reby

    e n a b l i n g 3 D No w! a p p l i c a t i o n s t o w or k w i t h a l l e x i st i n g

    oper at ing system s. 3DNow! te chnology is impleme nt ed on t he

    AMD-K6 -2, AMD-K6-III, an d AMD Ath lon processors. The

  • 8/2/2019 3D Now Technology Manual

    12/72

    2 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    A M D A t h l o n p r o c e s s or i m p l e m e n t s f iv e n e w 3 D N ow !

    technology ins t ruct ions th at add s t reaming and digita l s ignal

    processing (DSP) te chnologies. For more informa tion, see the

    AM D Ex tensions to th e 3DNow! and MM X Instru ction Sets

    Manual, order # 22466.

    Key Functionality

    The 3DNow! te chnology ins t ru ct ions are in t en ded to open a

    maj or p rocessing bo t t l ene ck in a 3D graph ics app l ica t ion

    f loat ing-point opera tions. Today's 3D app lications are facing

    l im i t a t i on s d u e t o t h e f a c t t h a t o n l y on e f l oa t i n g -p o i n t

    execution u nit e xists in t he most advanced x86 processors. Thefront en d of a typical 3D graphics software pipe line per forms

    o b j e c t p h y s i c s, g e o m e t r y t r a n s fo r m a t i o n s , cl i p p i n g , a n d

    l i g h t i n g c a l c u l a t i o n s . T h e s e c o m p u t a t i o n s a r e v e r y

    f l oa t i n g -p o i n t i n t e n s i ve a n d o f t e n l i m i t t h e f e a t u r e s a n d

    funct ionality of a 3D app licat ion. The source of per forman ce for

    the 3DNow! instru ctions originat es from the single instruction

    m u l t i p l e d a t a ( S I M D ) i m p l e m e n t a t i o n . W it h S I M D , e a c h

    i n s t r u c t i o n n o t o n l y op e r a t e s o n t w o s in g l e -p r e c i s io n ,

    float ing-point opera nds, but th e microarchitect ure within th e

    processor ca n e xecute up to t wo 3DNow! instruct ions per clock

    thr ough two regis ter execut ion p ipel ines , which a l lows for atot al of four f loat ing-point op era tions pe r clock. In addit ion,

    beca use t he 3DNow! instru ctions use th e same floating-point

    registers a s the MMX technology instructions, task switching

    bet ween MMX and 3DNow! operations is eliminated.

    The 3DNow! te chnology instru ction set cont ains 21 instru ctions

    tha t support SIMD floating-point ope rations an d include s SIMD

    i n t e g e r o p e r a t i o n s , d a t a p r e f e t c h i n g , a n d f a s t e r

    MMX-to-floatin g-poin t switch ing. To imp rove MPE G de coding,

    t h e 3 DNow! i n s t r u c t i o n s i n c l u d e a s p e c i f ic S I MD i n t e g e r

    ins t ru ct ion create d to fac i li ta t e p ixel-mot ion compe nsat ion.Because m edia-based software t ypically operat es on large da ta

    s e t s , t h e p r o c e s so r o ft e n n e e d s t o w a it f o r t h i s d a t a t o b e

    tra nsferr ed from ma in mem ory. The e xtra t ime involved with

    re t r ieving th is dat a can be avoided by us ing the new 3DNow!

    instruction called P REFE TCH. This instru ction can e nsure th at

    dat a is in t he level 1 cache when i t is need ed. To improve th e

    time it takes t o switch b etween MMX and x87 code, the 3DNow!

  • 8/2/2019 3D Now Technology Manual

    13/72

    Chapter 1 3DNow! Technology 3

    21928G/0March 2000 3DNow!Technology Manual

    ins t ruct ions include the FEMMS (fas t ent ry /exi t mul t imed ia

    s t a t e ) i n s t r u c t i o n , wh i ch e l im i n a t e s m u c h o f t h e o ve r h e a d

    involved wit h th e switch. The a dd ition of 3DNow! te chnology

    expand s the capabil i t ies of the AMD family of processors and

    enab les a new gene ration of enr iched user ap plications.

    Feature Detection

    To proper ly iden t i fy and u se th e 3DNow! ins t r uc t ions , the

    app lication program must det ermine if the processor supports

    the m. The CPU ID instru ction gives programme rs the ab ility to

    det erm ine t he presen ce of 3DNow! technology on a processor.

    S of t w a r e a p p l i ca t i o n s m u s t f i r s t t e s t t o s e e i f t h e C P U I D

    i n s t r u c t i o n i s s u p p o r t e d . F or a d e t a i le d d e s c r i p t i o n o f t h eC P U I D i n s t r u c t i on , s e e t h e AMD Processor Recogni t ion

    Application Note, orde r# 20734.

    The presen ce of th e CPUID instruction is indicate d by the ID

    bi t (21) in the EF LAGS reg i s te r . I f th i s b i t i s wr i t a b le , th e

    CPUID ins t ru ct ion is sup por t ed . The fo l lowing code sa mple

    shows how to test for the presen ce of the CPUID instruction.

    pushfd ; save EFLAGSpop eax ; store EFLAGS in EAXmov ebx, eax ; save in EBX for later testing

    xor eax, 00200000h ; toggle bit 21push eax ; put to stackpopfd ; save changed EAX to EFLAGSpushfd ; push EFLAGS to TOSpop eax ; store EFLAGS in EAXcmp eax, ebx ; see if bit 21 has changedjz NO_CPUID ; if no change, no CPUID

    Once the sof tware has ident i f ied t he pr ocessor s suppor t for

    C P U I D , i t m u s t t e s t f or e x t e n d e d f u n c t i o n s b y e x e c u t i n g

    ext en de d fun ction 8000_0000h (EAX=8000_0000h). The EAX

    r e g i st e r r e t u r n s t h e l a r g e s t e x t e n d e d f u n c t i o n i n p u t v a l u e

    defined for t he CPU ID instru ction on t he processor. If the valueis great er th an 8000_0000h, exten ded fun ctions are sup porte d.

    The following code sa mp le shows how to test for th e p resen ce of

    exte nd ed fun ction 8000_0001h.

    mov eax, 80000000h ; query for extended functionsCPUID ; get extended function limitcmp eax, 80000000h ; is 8000_0001h supported?jbe NO_EXTENDEDMSR ; if not, 3DNow! tech. not supported

  • 8/2/2019 3D Now Technology Manual

    14/72

    4 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    The next step is for th e programmer to dete rmine if the 3DNow!

    instruct ions are support ed. Exte nde d fun ction 8000_0001h of

    the CPUID instru ction provides th is informa tion by retur ning

    the extended fea ture b i ts in the EDX regis ter. If b i t 31 in t he

    EDX register is set t o 1, 3DNow! instr uctions are supp orte d. The

    following code sa mp le shows how to t est for 3DNow! instr uction

    support .

    mov eax, 80000001h ; setup ext. function 8000_0001hCPUID ; call the functiontest edx, 80000000h ; test bit 31jnz YES_3DNow! ; 3DNow! technology supported

    The processor supports all of the above features.

    Concate nat ing the code examp les above will produce th e ba sis

    for a CPU detection software routine. A more comprehensive

    code example is available on the AMD website at

    http://www.amd.com/products/cpg/bin/.

    Register Set

    The complete mul t imedia un i ts in th e p rocessor combine th e

    existing MMX instr uctions with t he n ew 3DNow! instr uctions.

    In ad dit ion, by merging 3DNow! with MMX, it be comes possible

    t o wr i t e x 8 6 p r o g r a m s c o n t a i n i n g b o t h i n t e g e r, MMX, a n df l o a t i n g - p o i n t g r a p h i c s i n s t r u c t i o n s wi t h n o p e r f o r m a n c e

    pena l ty fo r swi tch ing be tween the mul t imed ia ( in teger ) and

    3DNow! (floating-point) units.

    The p rocessor imple me nt s eight 64-bit 3DNow!/MMX registers.

    These registers a re m app ed onto th e floating-point re gisters. As

    shown in Figure 1, the 3DNow! an d MMX instr uctions refe r to

    th ese re gisters as mm0 to mm7. Mappin g the n ew 3DNow!/MMX

    r e g i st e r s o n t o t h e f l oa t i n g -p o i n t r e g is t e r s t a c k e n a b l e s

    backwards compat ibility for th e re gister saving th at must occur

    as a resu lt of ta sk switching.

  • 8/2/2019 3D Now Technology Manual

    15/72

    Chapter 1 3DNow! Technology 5

    21928G/0March 2000 3DNow!Technology Manual

    Figure 1. 3DNow!/MMX Registers

    Alias ing t he 3DNow!/MMX regis ters onto th e f loat ing-point

    register stack provides a safe met hod t o introduce 3DNow! and

    MMX technology, becau se it doe s not req uire m odificat ions to

    e x i st i n g o p e r a t i n g s y s t e m s . In s t e a d o f r e q u i r i n g o p e r a t i n g

    s y s t e m m o d i f i c a t i o n s , n e w 3 DNo w! a n d MMX t e c h n o l o gy

    app lications are support ed through device d rivers, 3DNow! and

    MMX libra ries, or Dyna mic Link Libra ry (DLL) files.

    C u r r e n t o p e r a t i n g s y st e m s h a v e s u p p o r t f o r f l o a t i n g -p o i n t

    o p e r a t i on s a n d t h e f l oa t i n g -p o i n t r e g i s t e r s t a t e . Us i n g t h e

    f l o a t i n g -p o i n t r e g i s t e r s f o r 3 D N ow ! a n d M M X c o d e i s a

    conven ien t way o f imp leme nt ing non-in t r us ive supp or t fo r

    3 DNow! a n d MMX i n s t r u c t i o n s . E v e r y t i m e t h e p r o c e s s or

    execu te s a 3DNow! or MMX instr uction, all th e float ing-point

    reg i s te r t ag b i t s a re se t to ze ro (00b=va l id ) , excep t for th e

    FEMMS and E MMS instru ctions, which set a l l tag bits to one

    (11b=empty).

    Note: Executing the PREFETCH instruction does not change the

    tag bits.

    TAG BITS 63 0

    mm0

    mm7

    mm1

    mm6

    mm5

    mm2

    mm3

    mm4

    xx

    xx

    xx

    xx

    xx

    xx

    xx

    xx

  • 8/2/2019 3D Now Technology Manual

    16/72

    6 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    Data Types

    3DNow! te chno logy uses a p acked d a t a fo rmat . The da t a i s

    pa cked in a single, 64-bit 3DNow!/MMX register or a qu adwordmemory operand.

    Figure 2 shows the 3DNow! floating-point d at a t ype. D0 and D1

    e a c h h o l d a n I E E E 3 2 -b i t s i n g l e -p r e c i s i on , f l o a t i n g -p o i n t

    doubleword.

    Figure 2. 3DNow!Data Type

    F ig u r e 3 o n p a g e 6 s h o w s t h e f o r m a t o f t h e I E E E 3 2 -b i t ,

    single-pre cision, float ing-point format .

    Figure 3. Single-Precision, Floating-Point Data Format

    63 032 31

    (32 bits x 2) Two packed, single-precision, floating-point doublewords

    D0D1

    031

    32-bit, single-precision, floating-point doubleword

    22

    SignificandBiased ExponentS

    Value definitions

    1.X=(1)S*0 Biased Exponent=02.X=(1)S*2(Biased Exponent 127)*Significand 0

  • 8/2/2019 3D Now Technology Manual

    17/72

    Chapter 1 3DNow! Technology 7

    21928G/0March 2000 3DNow!Technology Manual

    Figure 4 shows the forma ts for th e integer d ata types.

    Figure 4. Integer Data Types

    63 56 55 47

    63

    39 31 23 15 7

    47

    63

    63

    31 15

    48 40 32 24 16

    0

    032

    48 32 16 0

    08

    31

    (8 bits x 8) Packed bytes

    (16 bits x 4) Packed words

    (32 bits x 2) Packed doublewords

    (64 bits x 1) Quadword

    B2 B1B4 B3B5 B0B6B7

    W0W1W2W3

    D0D1

    Q0

  • 8/2/2019 3D Now Technology Manual

    18/72

    8 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    3DNow!Instruction Formats

    The forma t of 3DNow! ins t ruct ion e ncodings is based on th econvent ional x86 modR/M instru ction forma t and is s imilar t o

    the forma t u sed by MMX instructions. The assembly language

    synta x used for t he 3DNow! instr uctions is as follows:

    3DNow! Mnemonic mmreg1, mmreg2/mem64

    The des t ina t ion and source1 opera nd (mmre g1) mus t be an

    M M X r e g i s t e r ( m m 0 m m 7 ) . Th e s o u r c e 2 o p e r a n d

    (mmreg2/me m64) can be e i th er a n MMX registe r or a 64-bi t

    memory value.

    The en coding uses the op code pre fix 0Fh followed by a second

    o p c o d e b y t e o f 0F h . To d i f f e r e n t i a t e t h e v a r i o u s 3 DNo w!

    instru ctions, a th ird instr uction suffix byte is used. This suffix

    b y t e o c cu p i e s t h e s a m e p o s i t i on a t t h e e n d o f a 3 D N ow !

    ins t ruct ions as would an imm8 byte . The opcode format is as

    follows:

    0Fh 0Fh modR/M [sib] [displacement] 3DNow!_suffix

    T h e s p e c i f ic o p e r a n d s ( m m r e g 1 a n d m m r e g 2 /m e m 6 4 )

    det erm ine th e values used in modR/M [sib] [displacement ], and

    fo l low conven t iona l x86 en cod ings . The 3DNow! su f f ix i sde t e rm ined b y the ac tua l 3DNow! ins t ruc t ion . The 3DNow!

    suffixes are defined in Tab le 2 on page 14.

    As an examp le , the 3DNow! PFMUL ins t ruct ion can produce

    th e following opcode s, de pe nd ing on its use:

    Opcode Inst ruct ion

    0F 0F CA B4 PFMUL mm1, mm20F 0F 0B B4 PFMUL mm1, [ebx]0F 0F 4B 0A B4 PFMUL mm1, [ebx+10]

    26 0F 0F 0B B4 PFMUL mm1, es:[ebx]0F 0F 4C 83 0A B4 PFMUL mm1, [ebx+eax*4+10]

    T h e e n c o d i n g o f t h e t w o p e r f o r m a n c e -e n h a n c e m e n t

    ins t ruct ions (FEMMS and PR EFE TCH) uses a s ingle opcode

    prefix 0Fh. The det ails of the opcodes for these instructions are

    shown on pa ges 18 and 56 respectively.

  • 8/2/2019 3D Now Technology Manual

    19/72

    Chapter 1 3DNow! Technology 9

    21928G/0March 2000 3DNow!Technology Manual

    Definitions

    3DNow! te chno logy p rov ides 21 ad d i t iona l ins t r uc t ions t osupport high-per forma nce, 3D graphics and a udio processing.

    3DNow! ins t ru ct ions are vector ins t r uct ions tha t opera te on

    6 4 -b i t r e g i s t e r s . 3 D N ow ! i n s t r u c t i o n s a r e S I M D e a c h

    instr uction ope rat es on pa irs of 32-bit value s.

    The de finitions for th e 3DNow! instr uctions star ting on p age 17

    conta in d esignat ions classifying e ach instru ction as vectored or

    scalar. Vector ins t r uct ions opera t e in pa ra l le l on two se ts of

    32-bit, single-pre cision, floating-point words. Inst ru ctions t ha t

    a re l abe led as sca la r ins t ruc t ions opera te on a s ing le se t o f

    3 2 -b i t o p e r a n d s ( f r o m t h e l o w h a l v e s o f t h e t w o 6 4-b i toperands).

    T h e 3 D N o w ! s i n g l e -p r e c i s i o n , f l oa t i n g -p o i n t f o r m a t i s

    compatible with the IEEE-754, single-precision format. This

    forma t compr ises a 1-bit s ign, an 8-bit b iased e xponent , and a

    23-bit s ignificand with one hidde n int eger b it for a tota l of 24

    b i t s i n t h e s i g n i fi c a n d . Th e b i a s o f t h e e x p o n e n t i s 12 7 ,

    c o n s i st e n t w i t h t h e I E E E s i n g l e -p r e c i s io n s t a n d a r d . Th e

    significands are normalized to be within the range of [1,2).

    In con t ras t to th e IEEE s tandard t ha t d icta tes four round ingmode s , 3DNow! te chnology sup por ts one roun ding mode

    e i t he r roun d-to -ne are s t o r roun d-to -ze ro ( t run ca t ion) . The

    hardware implemen ta t ion of 3DNow! te chnology det erm ines

    t h e r o u n d i n g m o d e . T h e A M D p r o c e s s o r s i m p l e m e n t

    round-to-nea rest mode . Regardless of the r ounding mode u sed,

    th e f loa t ing-po in t -to -in t ege r an d in t ege r -to -f loa t ing-po in t

    convers ion ins t r uc t ions , PF2ID an d PI2FD, a lways use th e

    roun d-to-zero (tr un cation) mode .

    The larges t , repre sentab le , norma l number in magni tu de for

    th i s p rec i sion in he xadec imal has an e xponen t o f FEh a nd a

    significand of 7FFF FFh , with a nume rical value of 2127 (2 223).

    Al l resul t s th a t over f low above the ma ximum-rep resen ta ble

    p o s i t i v e v a l u e a r e s a t u r a t e d t o e i t h e r t h i s

    maximum-repre sent able norm al numb er or t o positive infinity.

    S i m i l a r l y, a l l r e s u l t s t h a t o v e r f l o w b e l o w t h e

    minimum-representa ble negative value are satura ted to either

  • 8/2/2019 3D Now Technology Manual

    20/72

    10 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    t h i s m i n i m u m -r e p r e s e n t a b l e n o r m a l n u m b e r o r t o n e g a t i ve

    infinity.

    The implem ent a t ion of 3DNow! te chnology det erm ines how

    a r i t h m e t i c ov e r f l ow i s h a n d l e d e i t h e r p r o p e r l y s i gn e dm a x i m u m - o r m i n i m u m -r e p r e s e n t a b l e n o r m a l n u m b e r s o r

    proper ly s igned inf in i t ies . The processor gene ra te s proper ly

    signed m aximum- or minimum-represen table normal num bers.

    Infinit ies and NaNs are not supp orted as operan ds to 3DNow!

    instructions.

    The smal les t representa ble normal number in magni tud e for

    th i s p rec i sion in he xadec imal has an exponen t o f 01h an d a

    s i g n i f ic a n d o f 0 00 0 0 0h , w i t h a n u m e r i c a l va l u e o f 2 1 2 6 .

    Accord ing ly, a l l r e su l t s be low th i s min imum repr esen tab leva l u e i n m a g n i t u d e a r e h e l d t o ze r o . Ta b l e 1 s h o w s t h e

    expone nt ra nges supp orte d by the 3DNow! te chnology.

    Like MMX instr uctions, 3DNow! instructions do n ot gen erat e

    nume ric except ions nor do they se t a ny s ta tus f lags . I t i s the

    users responsibility to ensure tha t in-range d ata is provided to

    3DNow! instructions and tha t al l computa tions remain within

    valid ranges (or are held a s expected).

    Table 1. 3DNow!Technology Exponent Ranges

    BiasedExponent

    Description

    FFh Unsupported *

    00h Zero

    00h

  • 8/2/2019 3D Now Technology Manual

    21/72

    Chapter 1 3DNow! Technology 11

    21928G/0March 2000 3DNow!Technology Manual

    Execution Resources on AM D-K6 Processors

    T h e r e g i s t e r o p e r a t i o n s o f a l l 3 D N ow ! f l o a t i n g -p o i n t

    ins t ruct ions are executed by e i ther th e regis ter X uni t or theregister Y uni t . One opera t ion can b e issued to each regis ter

    unit e ach clock cycle, for a ma ximum issue and e xecution ra te

    of two 3DNow! ope rat ions pe r cycle. All 3DNow! ope rat ions

    have an execu t ion la t en cy o f two clock cyc les an d a re fu l ly

    pipelined.

    Even t hough 3DNow! execution resources are not d uplicated in

    bo th re g i ste r un i t s (for examp le , the re a re n o t two pa i r s o f

    3DNow! multipliers, just one share d pa ir of multipliers), the re

    a r e n o i n s t r u c t i o n -d e c o d e o r o p e r a t i o n -i s su e p a i r i n g

    rest rictions. Whe n, for exa mp le, a 3DNow! multip ly opera tionstarts execution in a re gister un it , that un it grabs and uses the

    o n e s h a r e d p a i r o f 3DNow! m u l t i p l i e r s . On ly wh e n a c t u a l

    con te n t ion occurs be twee n t wo 3DNow! opera t ions s ta r t ing

    execution at t he same time is one of the opera tions held up for

    o n e c yc l e i n it s f ir s t e x e c u t i o n p i p e s t a g e wh i l e t h e o t h e r

    proceeds. The delay is never more t han one cycle.

    F or c o d e o p t i m i z a t i o n p u r p o s e s , 3 D N ow ! o p e r a t i o n s a r e

    grouped in t o two ca te gor ies . These ca t egor ies a re ba sed on

    execution resources and are important when creating properly

    sche du led code . As long as two 3DNow! opera tions th at s tar t

    execut ion s imul tane ously do not fa l l in to th e sam e cat egory,

    both operations will start execution without d elay.

    The first cat egory of instructions conta ins the operat ions for t he

    fol lowing 3DNow! ins t ru ct ion s: PFADD, PF SUB, PF SUBR,

    PFACC, PFCMPx, PFMIN, PFMAX, PI2FD, PF2ID, PFRCP, and

    PFRSQRT.

    The second cate gory contains th e operations for th e following

    3DNow! ins t r uc t ions : PFMU L, PFRCP IT1, PFR SQIT1 , and

    PFRCPIT2.

    Note: 3DNow! add and m ult iply operations, am ong other

    com binat ions, can execut e sim ult aneously.

    Normally, in h igh-pe rforma nce 3DNow! code, a ll of th e 3DNow!

    instructions are properly scheduled a par t from each other so as

    to avoid de lays due to execution resource conten tions (as well

    as taking into account depe nden cies and execution latencies) .

  • 8/2/2019 3D Now Technology Manual

    22/72

    12 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    For fur the r informat ion regarding code opt imizat ion, see th e

    AMD-K6 Processor Code Opt im izat ion Application N ote, order #

    21924. This d ocumen t provide s in-dep th d iscuss ions of code

    optimization techn ique s for th e processor.

    F or e x e c u t i o n r e s ou r c e s i n fo r m a t i o n on t h e A M D A t h l o n

    p r o c e s s o r, r e f e r t o t h e A M D A t h l o n P r oc es so r x 8 6 C od e

    Optim ization Guide, order # 22007.

    T h e S I M D 3 D N ow ! i n s t r u c t i o n s f o r a l l p r o c e s s o r s a r e

    summarized in Ta b le 2 on p a ge 1 4. The dedicated and shared

    execut ion resources of the regis ter X u ni t and regis ter Y u ni t

    are shown in Figure 5 on p age 13. The execut ion resources for

    some MMX opera tions, as well as al l 3DNow! opera tions, are

    shared bet ween t he two register units. For content ion-checking

    purp oses , each box represen t s a ca tegory o f opera t ions tha tcann ot start e xecution simultan eously. In addit ion, the MMX

    and 3DNow! multiplies use t he same h ardware, while MMX and

    3DNow! adds a nd subtracts d o not.

    The 3DNow! pe r forma nce-enh ancem ent ins t ruc t ions fo r a l l

    AMD processors are summa rized in Ta b le 3 on p a ge 14. The

    F E M M S i n s t r u c t i o n d o e s n o t u s e a n y s p e c i fi c e x e c u t i o n

    resource or p ipel ine . The PREF ETCH ins t ruct ion is opera ted

    on in the Load unit.

  • 8/2/2019 3D Now Technology Manual

    23/72

    Chapter 1 3DNow! Technology 13

    21928G/0March 2000 3DNow!Technology Manual

    Figure 5. Register X Unit and Register Y Unit Resources

    IntegerALU

    IntegerShift

    IntegerMultiply

    and DivideIntegerALU

    MMXALU

    Add/Subtract,

    Compare

    MMXShifter

    IntegerByte

    Operations

    IntegerSpecial

    Registers

    IntegerSegment

    Register Loads

    MMXALU

    Add/Subtract,

    Compare

    MMXALU

    Logical, Pack,

    Unpack

    Register X ExecutionPipeline

    3DNow!

    Add/Subtract,

    Compare, Integer

    Conversion,

    Reciprocal and

    Reciprocal

    Square Root

    Table Lookup

    MMXand

    3DNow!

    Multiply,

    Reciprocal and

    Reciprocal

    Square Root

    Iteration

    MMXALU

    Logical, Pack,

    Unpack

    Shared Register X and Y

    Resources

    Register Y ExecutionPipeline

    Dedicated Register XResources

    Dedicated Register YResources

  • 8/2/2019 3D Now Technology Manual

    24/72

    14 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    Table 2. 3DNow!Floating-Point Instructions

    Operation FunctionOpcodeSuffix

    PAVGUSB Packed 8-bit Unsigned Integer Averaging BFh

    PFADD Packed Floating-Point Addition 9Eh

    PFSUB Packed Floating-Point Subtraction 9Ah

    PFSUBR Packed Floating-Point Reverse Subtraction AAh

    PFACC Packed Floating-Point Accumulate AEh

    PFCMPGE Packed Floating-Point Comparison, Greater or Equal 90h

    PFCMPGT Packed Floating-Point Comparison, Greater A0h

    PFCMPEQ Packed Floating-Point Comparison, Equal B0h

    PFMIN Packed Floating-Point Minimum 94h

    PFMAX Packed Floating-Point Maximum A4h

    PI2FD Packed 32-bit Integer to Floating-Point Conversion 0Dh

    PF2ID Packed Floating-Point to 32-bit Integer 1Dh

    PFRCP Packed Floating-Point Reciprocal Approximation 96h

    PFRSQRT Packed Floating-Point Reciprocal Square Root Approximation 97h

    PFMUL Packed Floating-Point Multiplication B4h

    PFRCPIT1 Packed Floating-Point Reciprocal First Iteration Step A6h

    PFRSQIT1 Packed Floating-Point Reciprocal Square Root First Iteration Step A7h

    PFRCPIT2 Packed Floating-Point Reciprocal/Reciprocal Square Root Second Iteration Step B6h

    PMULHRW Packed 16-bit Integer Multiply with rounding B7h

    Table 3. 3DNow!Performance-Enhancement Instructions

    Operation FunctionOpcode

    Second Byte

    FEMMS Faster entry/exit of the MMXor floating-point state 0Eh

    PREFETCH/PREFETCHW * Prefetch at least a 32-byte line into L1 data cache (Dcache) 0Dh

    Note:

    * The AMD-K6-2 and AMD-K6-IIIprocessors execute the PREFETCHW instruction identically to the PREFETCH instruction.On the AMD Athlon processor, PREFETCHW can increase performance by providing a hint to the processor of an intent tomodify the cache line.

  • 8/2/2019 3D Now Technology Manual

    25/72

    Chapter 1 3DNow! Technology 15

    21928G/0March 2000 3DNow!Technology Manual

    Task Switching

    With respe ct to t ask switching, treat the 3DNow! instruct ions

    exactly the sam e a s MMX instru ctions. Operating system d esign

    must be take n into account whe n writing a 3DNow! program.

    Th e p r o gr a m m e r m u s t k n o w wh e t h e r t h e o p e r a t i n g sy st e m

    aut omatically saves the curre nt state s when t ask switching, or if

    the 3DNow! program ha s to provide the code to save sta tes.

    If a task switch occurs, the Cont rol Registe r (CR0) Task Switch

    (TS) bit is set to 1. The p rocessor then generat es an inte rru pt 7

    ( in t 7 Device Not Ava i lab le ) whe n i t en coun te r s the n ex t

    f loa t in g-po in t , 3DNow!, o r MMX ins t ru c t ion , a l lowing th e

    opera t ing syste m t o save t he s ta t e o f t he 3DNow!/MMX/FP

    registers.

    In a mul t i t a sk ing opera t ing system , i f the re i s a t a sk switch

    w h e n 3 D N ow !/M M X a p p l i c a t i o n s a r e r u n n i n g w it h o l d e r

    a p p l i ca t i o n s t h a t d o n o t i n c l u d e M M X i n s t r u c t i o n s , t h e

    MMX/FP re gister stat e is still saved a utoma tically thr ough the

    int 7 han dler.

    Exceptions

    Tab le 4 conta ins a l i st of except ions tha t 3DNow! and MMX

    instructions can generate .

    Table 4. 3DNow!and MMX Instruction Exceptions

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X X X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1. (InProtected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    26/72

    16 3DNow!Technology Chapter 1

    3DNow!Technology Manual 21928G/0March 2000

    T h e r u l e s f or e x c e p t i o n s a r e t h e s a m e f o r b o t h M M X a n d

    3DNow! ins t ru c t ions . In add i t ion , excep t ion de t ec t ion an d

    ha ndling is iden tical for MMX an d 3DNow! instr uctions. None

    of the e xception handlers nee d modification.

    Notes:

    1. An in valid opcode exception (int errupt 6) occurs if a

    3DNow! instruction is executed on a processor that does

    not support 3DNow! instructions.

    2. If a floatin g-point exception is pending and t he processor

    encoun ters a 3DNow! in struction, FERR# is asserted an d,

    if CR0.NE = 1, an interrupt 16 is generated. (This is the

    sam e for MM X in structions.)

    Prefixes

    The following pre fixes can be used with 3DNow! instr uctions:

    s The segme nt over ride pre fixes (2Eh /CS, 36h/SS, 3Eh/DS,

    26h/ES, 64h/FS, and 65h/GS) affect 3DNow! instructions

    that contain a memory operand.

    s The a dd ress-size over ride pre fix (67h) affect s 3DNow!

    instructions that contain a memory opera nd.

    s The ope ran d-size over ride pre fix (66h) is ignored.

    s The LOCK pre fix (F0h) tr iggers an invalid opcode e xception(interrupt 6).

    s The REP prefixes (F3h/ REP/ REPE / REPZ, F2h/ REPNE/

    REPNZ) are ignored.

  • 8/2/2019 3D Now Technology Manual

    27/72

    21928G/0March 2000 3DNow!Technology Manual

    Chapter 2 3DNow! Instruction Set 17

    23DNow!Instruction Set

    T h e f o l l ow i n g 3 D N ow ! i n s t r u c t i o n d e f i n i t i o n s a r e i n

    alphab etical order according to the instruction mnem onics.

  • 8/2/2019 3D Now Technology Manual

    28/72

    18 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    FEMMS

    mnemonic opcode description

    FEMMS 0F 0Eh Faster Enter/Exit of the MMX or floating-point state

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    Like the EMMS instr uction, the FE MMS instruction can be u sed to clear the MMX

    stat e fo l lowing th e exe cut ion of a b lock of MMX ins t ru ct ions . Becau se th e MMX

    registers a nd tag words are share d with the floating-point u nit, it is necessary to clear

    th e stat e be fore execut ing floating-point instru ctions. Unlike th e EMMS instruct ion,

    th e con ten t s o f the MMX/f loa t ing -p o i n t r e g is t e r s a r e u n d e f in e d a f t e r a F E MMSinstruct ion is executed . There fore , the FEMMS inst ru ct ion offers a fas ter context

    switch at th e en d of an MMX routine whe re th e values in the MMX registers are n o

    longer r equired . FEMMS can also be used prior to execut ing MMX instructions where

    th e pre ceding floatin g-point register values are n o longer req uired , which facilita te s

    faster context switching.

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is setto 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of thecontrol register (CR0) is set to 1.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

  • 8/2/2019 3D Now Technology Manual

    29/72

    Chapter 2 3DNow! Instruction Set 19

    21928G/0March 2000 3DNow!Technology Manual

    PAVGUSB

    mnemonic opcode/imm8 description

    PAVGUSB mmreg1, mmreg2/mem64 0F 0Fh / BFh Average of unsigned packed 8-bit values

    Privilege: None

    Registers Affected: MMX

    Flags Affected: None

    Exceptions Generated:

    The PAVGUSB instr uction pr oduce s the r ound ed avera ges of th e eight u nsigned 8-bit

    inte ger values in th e source operan d (an MMX register or a 64-bit me mory location)

    and the eight corresponding unsigned 8-bit integer values in the d estinat ion opera nd

    (an MMX register) . It d oes so by adding the source and destination byte values an d

    the n adding a 001h to the 9-bit inter med iate value. The interm ediat e value is then

    divided by 2 (shifte d right one p lace) and t he e ight u nsigned 8-bit results are stored

    in the MMX register spe cified as the de stination operand .

    The PAVGUSB ins t ru c t ion can be u sed fo r p ixe l ave rag in g in MPEG-2 mot ion

    compe nsation an d video scaling operat ions.

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    30/72

    20 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Functional Illustration of the PAVGUSB Instruction

    The following list expla ins th e fun ctional illustra tion of th e PAVGUSB instr uction:

    s The round ed byte average of FFh a nd FF h is FFh .

    s The rounded byte average of FFh and 00h is 80h.

    s The rounded byte average of 01h and FFh is also 80h.

    s The rounded byte average of 0Fh a nd 10h is 10h.

    s The rounde d byte average of 00h and 01h is 01h.

    s The rounde d byte average of 70h and 44h is 5Ah.

    s The rounded byte average of 07h and F7h is 7Fh.

    s The rounde d byte average of 9Ah and A8h is A1h.

    The eq ua tions for byte avera ging with roun ding are a s follows:

    s mm reg1[63:56] = (mm re g1[63:56] + mm re g2/me m64[63:56] + 01h)/2

    s mm reg1[55:48] = (mm re g1[55:48] + mm re g2/me m64[55:48] + 01h)/2

    s mm reg1[47:40] = (mm re g1[47:40] + mm re g2/me m64[47:40] + 01h)/2

    s mm reg1[39:32] = (mm re g1[39:32] + mm re g2/me m64[39:32] + 01h)/2

    s mm reg1[31:24] = (mm re g1[31:24] + mm re g2/me m64[31:24] + 01h)/2

    s mm reg1[23:16] = (mm re g1[23:16] + mm re g2/me m64[23:16] + 01h)/2

    s mm reg1[15:8] = (mm re g1[15:8] + mmr eg2/mem 64[15:8] + 01h)/2

    s mm reg1[7:0] = (mm re g1[7:0] + mmr eg2/mem 64[7:0] + 01h)/2

    FFh FFh 01h 0Fh 9Ah00h 70h 07hmmreg2/mem64

    mmreg1

    per byte averaging

    = = = = = = ==

    FFh 80h 80h 10h A1h01h 5Ah 7Fhmmreg1

    FFh 00h FFh 10h A8h01h 44h F7h

    063

    063

    063

    Indicates a value that was rounded-up

  • 8/2/2019 3D Now Technology Manual

    31/72

    Chapter 2 3DNow! Instruction Set 21

    21928G/0March 2000 3DNow!Technology Manual

    PF2ID

    mnemonic opcode/imm8 description

    PF2ID mmreg1, mmreg2/mem64 0Fh 0Fh / 1Dh Converts packed floating-point operand to packed32-bit integer

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    P F 2 I D i s a v e c t o r i n s t r u c t i o n t h a t c o n v e r t s a v e c t o r r e g i s t e r c o n t a i n i n g

    single-precision, floating-point operands to 32-bit signed integers using truncation.

    Tab le 5 on page 22 shows the nu mer ical range of th e PF 2ID instruction.

    The PF 2ID instruction per forms t he following operat ions:

    IF (mmreg2/mem64[31:0] >= 231)THEN mmreg1[31:0] = 7FFF_FFFFh

    ELSEIF (mmreg2/mem64[31:0] = 231)

    THEN mmreg1[63:32] = 7FFF_FFFFhELSEIF (mmreg2/mem64[63:32]

  • 8/2/2019 3D Now Technology Manual

    32/72

    22 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Related Instructions See the PI2FD instruction.

    Table 5. Numerical Range for the PF2ID Instruction

    Source 2 Source 1 and Destination

    0 0

    Normal, abs(Source 1)

  • 8/2/2019 3D Now Technology Manual

    33/72

    Chapter 2 3DNow! Instruction Set 23

    21928G/0March 2000 3DNow!Technology Manual

    PFACC

    mnemonic opcode/imm8 description

    PFACC mmreg1, mmreg2/mem64 0Fh 0Fh / AEh Floating-point accumulate

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFACC is a vector ins t ru ct ion tha t accumu late s the two words of the de s t inat ion

    operan d an d the source operan d and stores the results in the low and high words of

    de stina tion ope ran d resp ectively. Both ope ran ds are single-pre cision, floatin g-point

    opera nd s with 24-bit significands. Tab le 6 on page 24 shows the nume rical range of the

    PFACC instru ction.

    The PFACC instr uction pe rforms the following operat ions:

    temp = mmreg2/mem64

    mmreg1[31:0] = mmreg1[31:0] + mmreg1[63:32]mmreg1[63:32] = temp[31:0] + temp[63:32]

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    34/72

    24 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Table 6. Numerical Range for the PFACC Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +/ 0 1 Source 2 Source 2

    Normal Source 1 Normal, +/ 0 2 Undefined

    Unsupported Source 1 Undefined Undefined

    Notes:

    1. The sign of the result is the logical AND of the signs of the source operands.

    2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operandthat is larger in magnitude (if the magnitudes are equal, the sign of source 1 is used). If the absolute value of the resultis greater than or equal to 2128, the result is the largest normal number with the sign being the sign of the source operandthat is larger in magnitude.

  • 8/2/2019 3D Now Technology Manual

    35/72

    Chapter 2 3DNow! Instruction Set 25

    21928G/0March 2000 3DNow!Technology Manual

    PFADD

    mnemonic opcode/imm8 description

    PFADD mmreg1, mmreg2/mem64 0Fh 0Fh / 9Eh Packed, floating-point addition

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFADD is a vector instruct ion th at pe rforms addition of the destina tion operand a nd

    th e source ope rand . Both ope rand s are single-pre cision, float ing-point ope rand s with

    24-bit s ignifican ds. Ta b le 7 on p a ge 26 shows the nu mer ica l r ange o f the PFADD

    instruction.

    The PFADD instr uction pe rforms the following ope rat ions:

    mmreg1[31:0] = mmreg1[31:0] + mmreg2/mem64[31:0]mmreg1[63:32] = mmreg1[63:32] + mmreg2/mem64[63:32]

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    36/72

    26 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Table 7. Numerical Range for the PFADD Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +/ 0 1 Source 2 Source 2

    Normal Source 1 Normal, +/ 0 2 Undefined

    Unsupported Source 1 Undefined Undefined

    Notes:

    1. The sign of the result is the logical AND of the signs of the source operands.

    2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operandthat is larger in magnitude (if the magnitudes are equal, the sign of source 1 is used). If the absolute value of the resultis greater than or equal to 2128, the result is the largest normal number with the sign being the sign of the source operandthat is larger in magnitude.

  • 8/2/2019 3D Now Technology Manual

    37/72

    Chapter 2 3DNow! Instruction Set 27

    21928G/0March 2000 3DNow!Technology Manual

    PFCMPEQ

    mnemonic opcode/imm8 description

    PFCMPEQ mmreg1, mmreg2/mem64 0Fh 0Fh / B0h Packed floating-point comparison, equal to

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFCMPEQ is a vec to r ins t ruc t ion th a t pe r forms a compar i son o f the de s t ina t ion

    operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the

    result of the corresponding compar ison. Tab le 8 on page 28 shows the num erical range

    of the PFCMPEQ instruction.

    The PF CMPEQ instruction pe rforms t he following operat ions:

    IF (mmreg1[31:0] = mmreg2/mem64[31:0])THEN mmreg1[31:0] = FFFF_FFFFh

    ELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] = mmreg2/mem64[63:32]

    THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    38/72

    28 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Related Instructions See the P FCMPGE instruction.

    See th e PFCMPGT instruction.

    Table 8. Numerical Range for the PFCMPEQ Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 FFFF_FFFFh 1 0000_0000h 0000_0000h

    Normal 0000_0000h0000_0000h,

    FFFF_FFFFh 20000_0000h

    Unsupported 0000_0000h 0000_0000h Undefined

    Notes:

    1. Positive zero is equal to negative zero.

    2. The result is FFFF_FFFFh if source 1 and source 2 have identical signs, exponents, and mantissas. Otherwise, the result is0000_0000h.

  • 8/2/2019 3D Now Technology Manual

    39/72

    Chapter 2 3DNow! Instruction Set 29

    21928G/0March 2000 3DNow!Technology Manual

    PFCMPGE

    mnemonic opcode/imm8 description

    PFCMPGE mmreg1, mmreg2/mem64 0Fh 0Fh / 90h Packed floating-point comparison, greater than orequal to

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFCMPGE i s a vec to r ins t ruc t ion th a t pe r forms a compar i son o f the de s t ina t ion

    operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the

    result of the corresponding compar ison. Tab le 9 on page 30 shows the num erical range

    of the PFCMPGE instruction.

    The PF CMPGE instruction pe rforms t he following operations:

    IF (mmreg1[31:0] >= mmreg2/mem64[31:0])

    THEN mmreg1[31:0] = FFFF_FFFFhELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] >= mmreg2/mem64[63:32]

    THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.(In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    40/72

    30 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Related Instructions See the PF CMPEQ instruction.

    See th e PFCMPGT instruction.

    Table 9. Numerical Range for the PFCMPGE Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 FFFF_FFFFh 10000_0000h,

    FFFF_FFFFh 2Undefined

    Normal0000_0000h,

    FFFF_FFFFh 3

    0000_0000h,

    FFFF_FFFFh 4Undefined

    Unsupported Undefined Undefined Undefined

    Notes:

    1. Positive zero is equal to negative zero.

    2. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h.

    3. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h.

    4. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative and source 1 is smallerthan or equal in magnitude to source 2, or if source 1 and source 2 are both positive and source 1 is greater than or equal inmagnitude to source 2. The result is 0000_0000h in all other cases.

  • 8/2/2019 3D Now Technology Manual

    41/72

    Chapter 2 3DNow! Instruction Set 31

    21928G/0March 2000 3DNow!Technology Manual

    PFCMPGT

    mnemonic opcode/imm8 description

    PFCMPGT mmreg1, mmreg2/mem64 0Fh 0Fh / A0h Packed floating-point comparison, greater than

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFCMPGT is a vec tor ins t ru c t ion tha t pe r forms a compar i son o f the de s t ina t ion

    operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the

    resul t of the corresponding compa rison. Ta b le 10 on p a g e 32 shows the nume rical

    range of the P FCMPGT instruction.

    The PFCMPGT instru ction per forms the following ope rat ions:

    IF (mmreg1[31:0] > mmreg2/mem64[31:0])THEN mmreg1[31:0] = FFFF_FFFFh

    ELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] > mmreg2/mem64[63:32]

    THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    42/72

    32 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Related Instructions See the PF CMPEQ instruction.

    See the P FCMPGE instruction.

    Table 10. Numerical Range for the PFCMPGT Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 0000_0000h0000_0000h,

    FFFF_FFFFh 1Undefined

    Normal0000_0000h,

    FFFF_FFFFh 2

    0000_0000h,

    FFFF_FFFFh 3Undefined

    Unsupported Undefined Undefined Undefined

    Notes:

    1. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h.

    2. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h.

    3. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative and source 1 is smaller inmagnitude than source 2, or if source 1 and source 2 are positive and source 1 is greater in magnitude than source 2. The resultis 0000_0000h in all other cases.

  • 8/2/2019 3D Now Technology Manual

    43/72

    Chapter 2 3DNow! Instruction Set 33

    21928G/0March 2000 3DNow!Technology Manual

    PFMAX

    mnemonic opcode/imm8 description

    PFMAX mmreg1, mmreg2/mem64 0Fh 0Fh / A4h Packed floating-point maximum

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFMAX is a vector ins t ru ct ion tha t re tur ns th e larger of the t wo single-precis ion,

    f loat ing-point operan ds . Any opera t ion with a zero and a negat ive num ber re tur ns

    positive zero. An opera tion consisting of t wo zeros re tu rn s positive zero. Table 11 on

    p age 34 shows the nu mer ical range of the P FMAX instruction.

    The PF MAX instru ction per forms t he following oper ations:

    IF (mmreg1[31:0] > mmreg2/mem64[31:0])THEN mmreg1[31:0] = mmreg1[31:0]

    ELSE mmreg1[31:0] = mmreg2/mem64[31:0]IF (mmreg1[63:32] > mmreg2/mem64[63:32])

    THEN mmreg1[63:32] = mmreg1[63:32]ELSE mmreg1[63:32] = mmreg2/mem64[63:32]

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    44/72

    34 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Related Instructions See the PFMIN instruction.

    Table 11. Numerical Range for the PFMAX Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +0 Source 2, +0 1 Undefined

    Normal Source 1, +0 2 Source 1/Source 2 3 Undefined

    Unsupported Undefined Undefined Undefined

    Notes:

    1. The result is source 2, if source 2 is positive. Otherwise, the result is positive zero.

    2. The result is source 1, if source 1 is positive. Otherwise, the result is positive zero.

    3. The result is source 1, if source 1 is positive and source 2 is negative. The result is source 1, if both are positive and source 1 isgreater in magnitude than source 2. The result is source 1, if both are negative and source 1 is lesser in magnitude than source2. The result is source 2 in all other cases.

  • 8/2/2019 3D Now Technology Manual

    45/72

    Chapter 2 3DNow! Instruction Set 35

    21928G/0March 2000 3DNow!Technology Manual

    PFMIN

    mnemonic opcode/imm8 description

    PFMIN mmreg1, mmreg2/mem64 0Fh 0Fh / 94h Packed floating-point minimum

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFMIN is a vector ins t ru ct ion t hat re tu rns t he smal ler of the two s ingle-precis ion,

    f loat ing-point opera nds . Any opera t ion with a zero and a posi t ive num ber re t urn s

    positive zero. An opera tion consisting of t wo zeros re tu rn s positive zero. Table 12 on

    p age 36 shows the nu mer ical range of the P FMIN instruction.

    The PFMIN instr uction pe rforms the following opera tions:

    IF (mmreg1[31:0] < mmreg2/mem64[31:0])THEN mmreg1[31:0] = mmreg1[31:0]

    ELSE mmreg1[31:0] = mmreg2/mem64[31:0]IF (mmreg1[63:32] < mmreg2/mem64[63:32])

    THEN mmreg1[63:32] = mmreg1[63:32]ELSE mmreg1[63:32] = mmreg2/mem64[63:32]

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    46/72

    36 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Related Instructions See the PFMAX instru ction.

    Table 12. Numerical Range for the PFMIN Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +0 Source 2, +0 1 Undefined

    Normal Source 1, +0 2 Source 1/Source 2 3 Undefined

    Unsupported Undefined Undefined Undefined

    Notes:

    1. The result is source 2, if source 2 is negative. Otherwise, the result is positive zero.

    2. The result is source 1, if source 1 is negative. Otherwise, the result is positive zero.

    3. The result is source 1, if source 1 is negative and source 2 is positive. The result is source 1, if both are negative and source 1 isgreater in magnitude than source 2. The result is source 1, if both are positive and source 1 is lesser in magnitude than source2. The result is source 2 in all other cases.

  • 8/2/2019 3D Now Technology Manual

    47/72

    Chapter 2 3DNow! Instruction Set 37

    21928G/0March 2000 3DNow!Technology Manual

    PFMUL

    mnemonic opcode/imm8 description

    PFMUL mmreg1, mmreg2/mem64 0Fh 0Fh / B4h Packed floating-point multiplication

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    P F MUL i s a ve c t o r i n s t r u c t i o n t h a t p e r f o r m s m u l t i p l i ca t i o n o f t h e d e s t i n a t i o n

    operan d a nd the source operan d. Both ope rand s are single-precision, f loating-point

    opera nds with 24-bit significand s. Tab le 13 on page 38 shows the numer ical range of

    the PF MUL instruction.

    The PF MUL instruction pe rforms t he following operations:

    mmreg1[31:0] = mmreg1[31:0] * mmreg2/mem64[31:0]mmreg1[63:32] = mmreg1[63:32] * mmreg2/mem64[63:32]

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    48/72

    38 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    Table 13. Numerical Range for the PFMUL Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +/ 0 1 +/ 0 1 +/ 0 1

    Normal +/ 0 1 Normal, +/ 0 2 Undefined

    Unsupported +/ 0 1 Undefined Undefined

    Notes:

    1. The sign of the result is the exclusive-OR of the signs of the source operands.

    2. If the absolute value of the result is less then 2126, the result is zero with the sign being the exclusive-OR of the signs of thesource operands. If the absolute value of the product is greater than or equal to 2 128, the result is the largest normal numberwith the sign being exclusive-OR of the signs of the source operands.

  • 8/2/2019 3D Now Technology Manual

    49/72

    Chapter 2 3DNow! Instruction Set 39

    21928G/0March 2000 3DNow!Technology Manual

    PFRCP

    mnemonic opcode/imm8 description

    PFRCP mmreg1, mmreg2/mem64 0Fh 0Fh / 96h Floating-point reciprocal approximation

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFR CP is a scalar instruction th at retu rns a low-precision estimat e of the reciprocal of

    the source operand . The single result value is duplicated in both h igh an d low halves

    of this instruction s 64-bit r esu lt. The source op era nd is single-pre cision with a 24-bit

    s ign i f i can d , and t he re su l t i s accura t e to 14 b i t s . Ta b le 14 on p a ge 4 0 shows th e

    nume rical range of the PFRCP instruction.

    Increa sed accur acy (th e full 24 bit s of a single-pre cision significand ) requ ires th e use

    of two add i t ional ins t ruct ions (PFR CPIT1 an d PF RCPIT2). The f i rs t s ta ge of th is

    increase or refinement in a ccuracy (PFR CPIT1) requ ires that th e input a nd outp ut of

    t h e a l r e a d y e xe c u t e d P F R C P i n s t r u ct i on b e u s e d a s i n p u t t o t h e P F R C P IT 1

    i n s t r u c t i o n . R e f e r t o D i v i s i o n a n d S q u a r e R o o t o n p a g e 5 9 f o r a n

    app lication-specific example of how to use this instru ction a nd relate d instr uctions.

    The PF RCP instru ction p erforms th e following operations:

    mmreg1[31:0] = reciprocal(mmreg2/mem64[31:0])mmreg1[63:32] = reciprocal(mmreg2/mem64[31:0])

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    50/72

    40 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    In the following code exam ple, the b old l ine i l lustrat es the PFR CP instruction in a

    seque nce used t o comput e q = a/b accurat e to 24 bits:

    X0 = PFRCP(b)

    X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)

    Related Instructions See the PFRCPIT1 instru ction.

    See the PFRCPIT2 instru ction.

    Table 14. Numerical Range for the PFRCP Instruction

    Source 1 andDestination

    Source 2

    0 +/ Maximum Normal 1

    Normal Normal, +/ 0 2

    UnsupportedUndefined

    Notes:

    1. The result has the same sign as the source operand.

    2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operand.Otherwise, the result is a normal with the sign being the same sign as the source operand.

  • 8/2/2019 3D Now Technology Manual

    51/72

    Chapter 2 3DNow! Instruction Set 41

    21928G/0March 2000 3DNow!Technology Manual

    PFRCPIT1

    mnemonic opcode/imm8 description

    PFRCPIT1 mmreg1, mmreg2/mem64 0Fh 0Fh / A6h Packed floating-point reciprocal, first iteration step

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFRCPIT1 i s a vec to r ins t ruc t ion tha t pe r fo rms the f i r st in te r media te s tep in t he

    Newton-Raph son i tera t ion to ref ine th e re c iprocal approximat ion p roduced by t he

    PFR CP instru ction (the second and final step complete s the iterat ion and is accurat e

    t o 2 4 b i t s ) . Ta b le 15 on p a g e 4 2 s h o ws t h e n u m e r i c a l r a n g e o f t h e P F R C P I T 1

    instruction.

    The beh avior of this instruction is only defined for t hose combinations of opera nds

    such th at one source operand was the input to th e PFR CP inst ruct ion an d th e o ther

    source operan d was the out put of the sam e PFR CP instru ction. Refer to Division an d

    S q u a r e R o ot on p a ge 59 for a n a pp l ica t ion-spec i f ic examp le o f how to u se th i s

    instruction and relate d instructions.

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.

    (In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    52/72

    42 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    In the following code example, th e b old line illustrates t he P FRCPIT1 instru ction in a

    seque nce used t o comput e q = a/b accurat e to 24 bits:

    X0 = PFRCP(b)

    X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)

    Related Instructions See the PFR CP instru ction.

    See the PFRCPIT2 instru ction.

    Table 15. Numerical Range for the PFRCPIT1 Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +/ 0 1 +/ 0 1 +/ 0 1

    Normal +/ 0 1 Normal 2 Undefined

    Unsupported +/ 0 1 Undefined Undefined

    Notes:

    1. The sign of the result is the exclusive-OR of the signs of the source operands.

    2. The sign is positive.

  • 8/2/2019 3D Now Technology Manual

    53/72

    Chapter 2 3DNow! Instruction Set 43

    21928G/0March 2000 3DNow!Technology Manual

    PFRCPIT2

    mnemonic opcode/imm8 description

    PFRCPIT2 mmreg1, mmreg2/mem64 0Fh 0Fh / B6h Packed floating-point reciprocal/reciprocal squareroot, second iteration step

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFRCPIT2 is a vector ins t ruct ion that performs the second and f inal in term edia te

    step in the Newton-Rap hson iteration t o refine the reciprocal or reciprocal square root

    ap prox imat ion p roduced b y the PF RCP an d PFSQRT ins t r uc t ions , r e spec t ive ly.

    Tab le 16 on page 44 shows the nu mer ical range of the PFR CPIT2 instru ction.

    The beh avior of this instruction is only defined for t hose combinations of opera nds

    such that t he first source operan d (mmreg1) was the outpu t of either t he P FRCPIT1 or

    PF RSQIT1 ins t ru ct ions and th e second source operand (mm reg2/mem 64) was the

    outpu t of eithe r the PFR CP or PFR SQRT instructions. Refer t o Division and Squa re

    Root on p a ge 59 for an app lication-specific examp le of how to use this instr uction

    and related instructions.

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.

    Page fault (14) X X A page fault resulted from the execution of the instruction.

    Floating-point exceptionpending (16)

    X X X An exception is pending due to the floating-point execution unit.

    Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.(In Protected Mode, CPL = 3.)

  • 8/2/2019 3D Now Technology Manual

    54/72

    44 3DNow!Instruction Set Chapter 2

    3DNow!Technology Manual 21928G/0March 2000

    In the following code example, th e b old line illustrates t he P FRCPIT2 instru ction in a

    seque nce used t o comput e q = a/b accurat e to 24 bits:

    X0 = PFRCP(b)

    X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)

    Related Instructions See the PFRCPIT1 instru ction.

    See th e PF RSQIT1 instru ction.

    See the PFR CP instru ction.

    See the P FRSQRT instruction.

    Table 16. Numerical Range for the PFRCPIT2 Instruction

    Source 2

    0 Normal Unsupported

    Source 1 andDestination

    0 +/ 0 1 +/ 0 1 +/ 0 1

    Normal +/ 0 1 Normal, +/ 0 2 Undefined

    Unsupported +/ 0 1 Undefined Undefined

    Notes:

    1. The sign of the result is the exclusive-OR of the signs of the source operands.

    2. If the absolute value of the result is less then 2126, the result is zero with the sign being the exclusive-OR of the signs of thesource operands. If the absolute value of the product is greater than or equal to 2 128, the result is the largest normal numberwith the sign being exclusive-OR of the signs of the source operands.

  • 8/2/2019 3D Now Technology Manual

    55/72

    Chapter 2 3DNow! Instruction Set 45

    21928G/0March 2000 3DNow!Technology Manual

    PFRSQIT1

    mnemonic opcode/imm8 description

    PFRSQIT1 mmreg1, mmreg2/mem64 0Fh 0Fh / A7h Packed floating-point reciprocal square root, firstiteration step

    Privilege: none

    Registers Affected: MMX

    Flags Affected: none

    Exceptions Generated:

    PFRSQIT1 i s a vec to r ins t ruc t ion tha t pe r fo rms the f i r st in te r media te s tep in t he

    Ne wt o n -R a p h s o n it e r a t i o n t o r e f i n e t h e r e c i p r o c a l s q u a r e r o o t a p p r o xi m a t i o n

    p r o d u c e d b y t h e P F S QR T i n s t r u c t i o n ( t h e s e c o n d a n d f i n a l s t e p c o m p l e t e s t h e

    iterat ion and is accurate t o 24 bits). Table 17 on page 46 shows the nu mer ical range of

    the PFR SQIT1 instruction.

    The beh avior of this instruction is only defined for t hose combinations of opera nds

    such th at one source operand was the input to the PFRSQRT instruction an d t he other

    source operan d is the squ are of the outpu t of the sam e PFR SQRT instru ction. Refer to

    Division and Square Root on p a ge 59 for an ap plication-specific example of how to

    use this instru ction and re lated instru ctions.

    Exception RealVirtual8086 Protected Description

    Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.

    Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.

    Stack exception (12) X During instruction execution, the stack segment limit was exceeded.

    General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.

    Segment over run (13) X X One


Recommended