Date post: | 06-Apr-2018 |
Category: |
Documents |
Upload: | ridwan-fatoni |
View: | 225 times |
Download: | 0 times |
of 72
8/2/2019 3D Now Technology Manual
1/72
3DNow!TechnologyManual
TM
8/2/2019 3D Now Technology Manual
2/72
Trademarks
AMD, the AMD logo, K6, 3DNow!, AMD Athlon, an d combina tions th ereof, and K86 are trad ema rks, and AMD-K6
is a registered t rade mark of Advanced Micro Devices, Inc.
MMX is a trade mark of Intel Corporation.
Other produ ct names used in this publication are for identification purp oses only and may be trad emarks of
their respective companies.
2000 Advanced Micro Devices, Inc. All rights reserved.
The conten ts of this documen t are p rovided in connect ion with Advanced Micro Devices, Inc.
(AMD) products. AMD makes no representations or warrant ies with respect to t he a ccuracy
or completeness of the contents of this pub lication and reserves the right to make changes to
specifications and product descriptions at any time without notice. No license, whether
express, implied, arising by estopp el or otherwise, to any intellectual proper ty rights is grant edby this pub lication. Except a s set forth in AMDs Standa rd Terms and Cond itions of Sale, AMD
assumes no liability what soever, and disclaims any express or implied warra nty, relatin g to its
products including, but not limited to, the implied warranty of merchantability, fitness for a
part icular pu rpose, or infringement of any intellectual property right.
AMDs products are not d esigned, inten ded, aut horized or warranted for u se as components
in systems inten ded for surgical implant into t he body, or in other ap plications intend ed t o
suppor t or susta in life, or in any oth er ap plication in which th e failure of AMDs product could
create a situat ion where personal injury, death, or severe proper ty or environmental dam age
may occur. AMD reserves the right t o discontinue or ma ke chan ges to its products at a ny time
without notice.
8/2/2019 3D Now Technology Manual
3/72
Contents iii
21928G/0March 2000 3DNow! Technology Manual
Contents
Revision Hist ory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 3DNow! Technology 1
In t roduct ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Key F unct ion alit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Fe a tu re De te ction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Reg ist e r Se t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Data Typ es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3DNow! Inst ru ction For ma ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Def in it ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Execut ion Re sources on AMD-K6 Pr ocessors . . . . . . . . . . . . 11
Task Swit ching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Excep t ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Pr efixe s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 3DNow! Instruction Set 17
FE MMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
PAVGU SB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19PF 2ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
PF ACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
PF ADD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
PF CMP EQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
PF CMP GE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
PF CMP GT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
PF MAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
PF MIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
PF MU L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
PF RCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
PF RCPIT1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
PF RCPIT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8/2/2019 3D Now Technology Manual
4/72
iv Contents
3DNow!Technology Manual 21928G/0March 2000
PF RSQIT1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
PF RSQR T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
PF SUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
PF SUBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
PI2FD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
PMULHRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
PR EF ETCH /PRE FE TCHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Division and Square Root 59
Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Divid e Exa mp les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Squ ar e R oot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Squ are Root E xamp les. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8/2/2019 3D Now Technology Manual
5/72
List of Figures v
21928G/0March 2000 3DNow!Technology Manual
List of Figures
F igu re 1. 3DNow!/MMX Re gister s . . . . . . . . . . . . . . . . . . . . . . . . 5
Figure 2. 3DNow! Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 3. Single-Pre cision, Floating-Point Data Forma t. . . . . . . . . . 6
Figure 4. Int eger Dat a Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Figure 5 . Regis ter X Uni t and Register Y Unit Resources . . . . . . 13
8/2/2019 3D Now Technology Manual
6/72
vi List of Figures
3DNow!Technology Manual 21928G/0March 2000
8/2/2019 3D Now Technology Manual
7/72
List of Tables vii
21928G/0March 2000 3DNow!Technology Manual
List of Tables
Tab le 1. 3DNow! Tech nology Expon en t R an ges. . . . . . . . . . . . 10
Table 2. 3DNow! Floating-Point Instructions. . . . . . . . . . . . . . . . 14
Tab le 3. 3DNow! Per fo rmance-Enhancement Inst ruct ions . . . . 14
Table 4. 3DNow! and MMX Instr uction E xcept ions . . . . . . . . 15
Tab le 5. Numer ica l Range for the PF2ID Inst ruct ion . . . . . . . . . 22
Tab le 6. Numer ica l Range for the PFACC Inst ruct ion . . . . . . . . 24
Tab le 7. Numer ica l Range for the PFADD Inst ruct ion . . . . . . . . 26
Tab le 8. Numer ica l Range for the PFCMPEQ Inst ruct ion . . . . . 28
Tab le 9. Numer ica l Range for the PFCMPGE Inst ruct ion . . . . . 30
Table 10. Numerical Range for the PFCMPGT Inst ruct ion . . . . . 32
Table 11. Numerical Range for the PFMAX Inst ruct ion . . . . . . . 34
Table 12. Numerical Range for the PFMIN Inst ruct ion . . . . . . . . 36
Table 13. Numerical Range for the PFMUL Inst ruct ion . . . . . . . 38
Table 14. Numerical Range for the PFRCP Ins t ruct ion . . . . . . . . 40
Table 15. Numerical Range for the PFRCPIT1 Inst ruct ion . . . . . 42
Table 16. Numerical Range for the PFRCPIT2 Inst ruct ion . . . . . 44
Table 17. Numerical Range for the PFRSQIT1 Inst ruct ion . . . . . 46
Table 18. Numerical Range for the PFRSQRT Inst ruct ion . . . . . 48
Table 19. Numerical Range for the PFSUB Inst ruct ion . . . . . . . . 50
Table 20. Numerical Range for the PFSUBR Ins t ruct ion . . . . . . 52
Tab le 21. Summary of PREFETCH Inst ruct ion TypeOp t ion s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8/2/2019 3D Now Technology Manual
8/72
viii List of Tables
3DNow!Technology Manual 21928G/0March 2000
8/2/2019 3D Now Technology Manual
9/72
Revision History ix
21928G/0March 2000 3DNow!Technology Manual
Revision History
Date Rev DescriptionFeb 1998 A Initial Release
Feb 1998 B Clarified CPUID usage in Feature Detection on page 3.
May 1998 C Revised description of 3DNow! instructions in Definitions on page 9.
May 1998 C Revised function descriptions in Table 2, 3DNow!Floating-Point Instructions, on page 14.
Sept 1998 D Revised code example for the PFRSQRT instruction on page48.
Sept 1998 DChanged exceptions generated for the PREFETCH/PREFETCHW instructions to none, deletedexception table, and revised PREFETCHW description on page56.
Sept 1998 D Added PUNPCKLDQ instruction to the division example (24-bit precision) on page60.
Nov 1998 E Added sample code that tests for the presence of extended function 8000_0001h on page3.
Nov 1998 EClarified instruction descriptions of PFRCPIT1on page 41, PFRCPIT2 on page 43, and PFRSQIT1 onpage 45.
Nov 1998 E Added PUNPCKLDQ instruction and clarified comments to the square root examples on page 62.
Aug 1999 FChanged X variable to Z in Newton-Raphson recurrence definitions, and swapped order ofPFMUL and PUNPCKLDQ instructions in square root example (24-bit precision) in Chapter 3 onpage 59.
Aug 1999 F Added references to the AMD Athlonprocessor throughout the manual.
Mar 2000 G Updated and clarified the PFACC instruction operation description on page 23.
8/2/2019 3D Now Technology Manual
10/72
x Revision History
3DNow!Technology Manual 21928G/0March 2000
8/2/2019 3D Now Technology Manual
11/72
21928G/0March 2000 3DNow!Technology Manual
Chapter 1 3DNow! Technology 1
13DNow!Technology
Introduction
3DNow! Techn ology is a s ignif ican t in novat ion to t he x86
archi tec ture tha t d r ives today 's pe rsonal comput ers . 3DNow!
t e c h n o l og y i s a gr o u p o f n e w i n s t r u c t i o n s t h a t o p e n s t h e
traditional processing bottlenecks for floating-point-intensive
a n d m u l t i m e d i a a p p l i c a t i o n s . Wi t h 3 D N o w ! t e c h n o l o gy,
h a r d w a r e a n d s o ft w a r e a p p l i ca t i o n s c a n i m p l e m e n t m o r e
powerful solutions to creat e a m ore ente rt aining and p roductive
P C p l a t f o r m . E xa m p l e s of t h e t y p e o f im p r o ve m e n t s t h a t
3 D No w! t e c h n o l ogy e n a b l e s a r e f a s t e r f r a m e r a t e s o n
h i g h -r e s o l u t i o n s c e n e s , m u c h b e t t e r p h y s i ca l m o d e l i n g of
r e a l -w or l d e n v i r o n m e n t s , sh a r p e r a n d m o r e d e t a i l e d 3 D
imaging, smoother v ideo playback, and n ea r the at er-qua l i ty
audio.
AMD h a s t a k e n a l e a d e r s h i p r o l e i n d e v e l op i n g t h e s e n e w
instructions that ena ble exciting new levels of per forman ce and
realism. 3DNow! te chnology was de fined a nd impleme nte d incollaboration with indep end ent software developers, including
o p e r a t i n g sy s t e m d e s i gn e r s , a p p l i c a t i o n d e v e l op e r s , a n d
graph ics vendors . I t i s compat ib le wi th t oday 's exist ing x86
sof tware a nd re qu i res no opera t ing system suppor t , the reby
e n a b l i n g 3 D No w! a p p l i c a t i o n s t o w or k w i t h a l l e x i st i n g
oper at ing system s. 3DNow! te chnology is impleme nt ed on t he
AMD-K6 -2, AMD-K6-III, an d AMD Ath lon processors. The
8/2/2019 3D Now Technology Manual
12/72
2 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
A M D A t h l o n p r o c e s s or i m p l e m e n t s f iv e n e w 3 D N ow !
technology ins t ruct ions th at add s t reaming and digita l s ignal
processing (DSP) te chnologies. For more informa tion, see the
AM D Ex tensions to th e 3DNow! and MM X Instru ction Sets
Manual, order # 22466.
Key Functionality
The 3DNow! te chnology ins t ru ct ions are in t en ded to open a
maj or p rocessing bo t t l ene ck in a 3D graph ics app l ica t ion
f loat ing-point opera tions. Today's 3D app lications are facing
l im i t a t i on s d u e t o t h e f a c t t h a t o n l y on e f l oa t i n g -p o i n t
execution u nit e xists in t he most advanced x86 processors. Thefront en d of a typical 3D graphics software pipe line per forms
o b j e c t p h y s i c s, g e o m e t r y t r a n s fo r m a t i o n s , cl i p p i n g , a n d
l i g h t i n g c a l c u l a t i o n s . T h e s e c o m p u t a t i o n s a r e v e r y
f l oa t i n g -p o i n t i n t e n s i ve a n d o f t e n l i m i t t h e f e a t u r e s a n d
funct ionality of a 3D app licat ion. The source of per forman ce for
the 3DNow! instru ctions originat es from the single instruction
m u l t i p l e d a t a ( S I M D ) i m p l e m e n t a t i o n . W it h S I M D , e a c h
i n s t r u c t i o n n o t o n l y op e r a t e s o n t w o s in g l e -p r e c i s io n ,
float ing-point opera nds, but th e microarchitect ure within th e
processor ca n e xecute up to t wo 3DNow! instruct ions per clock
thr ough two regis ter execut ion p ipel ines , which a l lows for atot al of four f loat ing-point op era tions pe r clock. In addit ion,
beca use t he 3DNow! instru ctions use th e same floating-point
registers a s the MMX technology instructions, task switching
bet ween MMX and 3DNow! operations is eliminated.
The 3DNow! te chnology instru ction set cont ains 21 instru ctions
tha t support SIMD floating-point ope rations an d include s SIMD
i n t e g e r o p e r a t i o n s , d a t a p r e f e t c h i n g , a n d f a s t e r
MMX-to-floatin g-poin t switch ing. To imp rove MPE G de coding,
t h e 3 DNow! i n s t r u c t i o n s i n c l u d e a s p e c i f ic S I MD i n t e g e r
ins t ru ct ion create d to fac i li ta t e p ixel-mot ion compe nsat ion.Because m edia-based software t ypically operat es on large da ta
s e t s , t h e p r o c e s so r o ft e n n e e d s t o w a it f o r t h i s d a t a t o b e
tra nsferr ed from ma in mem ory. The e xtra t ime involved with
re t r ieving th is dat a can be avoided by us ing the new 3DNow!
instruction called P REFE TCH. This instru ction can e nsure th at
dat a is in t he level 1 cache when i t is need ed. To improve th e
time it takes t o switch b etween MMX and x87 code, the 3DNow!
8/2/2019 3D Now Technology Manual
13/72
Chapter 1 3DNow! Technology 3
21928G/0March 2000 3DNow!Technology Manual
ins t ruct ions include the FEMMS (fas t ent ry /exi t mul t imed ia
s t a t e ) i n s t r u c t i o n , wh i ch e l im i n a t e s m u c h o f t h e o ve r h e a d
involved wit h th e switch. The a dd ition of 3DNow! te chnology
expand s the capabil i t ies of the AMD family of processors and
enab les a new gene ration of enr iched user ap plications.
Feature Detection
To proper ly iden t i fy and u se th e 3DNow! ins t r uc t ions , the
app lication program must det ermine if the processor supports
the m. The CPU ID instru ction gives programme rs the ab ility to
det erm ine t he presen ce of 3DNow! technology on a processor.
S of t w a r e a p p l i ca t i o n s m u s t f i r s t t e s t t o s e e i f t h e C P U I D
i n s t r u c t i o n i s s u p p o r t e d . F or a d e t a i le d d e s c r i p t i o n o f t h eC P U I D i n s t r u c t i on , s e e t h e AMD Processor Recogni t ion
Application Note, orde r# 20734.
The presen ce of th e CPUID instruction is indicate d by the ID
bi t (21) in the EF LAGS reg i s te r . I f th i s b i t i s wr i t a b le , th e
CPUID ins t ru ct ion is sup por t ed . The fo l lowing code sa mple
shows how to test for the presen ce of the CPUID instruction.
pushfd ; save EFLAGSpop eax ; store EFLAGS in EAXmov ebx, eax ; save in EBX for later testing
xor eax, 00200000h ; toggle bit 21push eax ; put to stackpopfd ; save changed EAX to EFLAGSpushfd ; push EFLAGS to TOSpop eax ; store EFLAGS in EAXcmp eax, ebx ; see if bit 21 has changedjz NO_CPUID ; if no change, no CPUID
Once the sof tware has ident i f ied t he pr ocessor s suppor t for
C P U I D , i t m u s t t e s t f or e x t e n d e d f u n c t i o n s b y e x e c u t i n g
ext en de d fun ction 8000_0000h (EAX=8000_0000h). The EAX
r e g i st e r r e t u r n s t h e l a r g e s t e x t e n d e d f u n c t i o n i n p u t v a l u e
defined for t he CPU ID instru ction on t he processor. If the valueis great er th an 8000_0000h, exten ded fun ctions are sup porte d.
The following code sa mp le shows how to test for th e p resen ce of
exte nd ed fun ction 8000_0001h.
mov eax, 80000000h ; query for extended functionsCPUID ; get extended function limitcmp eax, 80000000h ; is 8000_0001h supported?jbe NO_EXTENDEDMSR ; if not, 3DNow! tech. not supported
8/2/2019 3D Now Technology Manual
14/72
4 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
The next step is for th e programmer to dete rmine if the 3DNow!
instruct ions are support ed. Exte nde d fun ction 8000_0001h of
the CPUID instru ction provides th is informa tion by retur ning
the extended fea ture b i ts in the EDX regis ter. If b i t 31 in t he
EDX register is set t o 1, 3DNow! instr uctions are supp orte d. The
following code sa mp le shows how to t est for 3DNow! instr uction
support .
mov eax, 80000001h ; setup ext. function 8000_0001hCPUID ; call the functiontest edx, 80000000h ; test bit 31jnz YES_3DNow! ; 3DNow! technology supported
The processor supports all of the above features.
Concate nat ing the code examp les above will produce th e ba sis
for a CPU detection software routine. A more comprehensive
code example is available on the AMD website at
http://www.amd.com/products/cpg/bin/.
Register Set
The complete mul t imedia un i ts in th e p rocessor combine th e
existing MMX instr uctions with t he n ew 3DNow! instr uctions.
In ad dit ion, by merging 3DNow! with MMX, it be comes possible
t o wr i t e x 8 6 p r o g r a m s c o n t a i n i n g b o t h i n t e g e r, MMX, a n df l o a t i n g - p o i n t g r a p h i c s i n s t r u c t i o n s wi t h n o p e r f o r m a n c e
pena l ty fo r swi tch ing be tween the mul t imed ia ( in teger ) and
3DNow! (floating-point) units.
The p rocessor imple me nt s eight 64-bit 3DNow!/MMX registers.
These registers a re m app ed onto th e floating-point re gisters. As
shown in Figure 1, the 3DNow! an d MMX instr uctions refe r to
th ese re gisters as mm0 to mm7. Mappin g the n ew 3DNow!/MMX
r e g i st e r s o n t o t h e f l oa t i n g -p o i n t r e g is t e r s t a c k e n a b l e s
backwards compat ibility for th e re gister saving th at must occur
as a resu lt of ta sk switching.
8/2/2019 3D Now Technology Manual
15/72
Chapter 1 3DNow! Technology 5
21928G/0March 2000 3DNow!Technology Manual
Figure 1. 3DNow!/MMX Registers
Alias ing t he 3DNow!/MMX regis ters onto th e f loat ing-point
register stack provides a safe met hod t o introduce 3DNow! and
MMX technology, becau se it doe s not req uire m odificat ions to
e x i st i n g o p e r a t i n g s y s t e m s . In s t e a d o f r e q u i r i n g o p e r a t i n g
s y s t e m m o d i f i c a t i o n s , n e w 3 DNo w! a n d MMX t e c h n o l o gy
app lications are support ed through device d rivers, 3DNow! and
MMX libra ries, or Dyna mic Link Libra ry (DLL) files.
C u r r e n t o p e r a t i n g s y st e m s h a v e s u p p o r t f o r f l o a t i n g -p o i n t
o p e r a t i on s a n d t h e f l oa t i n g -p o i n t r e g i s t e r s t a t e . Us i n g t h e
f l o a t i n g -p o i n t r e g i s t e r s f o r 3 D N ow ! a n d M M X c o d e i s a
conven ien t way o f imp leme nt ing non-in t r us ive supp or t fo r
3 DNow! a n d MMX i n s t r u c t i o n s . E v e r y t i m e t h e p r o c e s s or
execu te s a 3DNow! or MMX instr uction, all th e float ing-point
reg i s te r t ag b i t s a re se t to ze ro (00b=va l id ) , excep t for th e
FEMMS and E MMS instru ctions, which set a l l tag bits to one
(11b=empty).
Note: Executing the PREFETCH instruction does not change the
tag bits.
TAG BITS 63 0
mm0
mm7
mm1
mm6
mm5
mm2
mm3
mm4
xx
xx
xx
xx
xx
xx
xx
xx
8/2/2019 3D Now Technology Manual
16/72
6 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
Data Types
3DNow! te chno logy uses a p acked d a t a fo rmat . The da t a i s
pa cked in a single, 64-bit 3DNow!/MMX register or a qu adwordmemory operand.
Figure 2 shows the 3DNow! floating-point d at a t ype. D0 and D1
e a c h h o l d a n I E E E 3 2 -b i t s i n g l e -p r e c i s i on , f l o a t i n g -p o i n t
doubleword.
Figure 2. 3DNow!Data Type
F ig u r e 3 o n p a g e 6 s h o w s t h e f o r m a t o f t h e I E E E 3 2 -b i t ,
single-pre cision, float ing-point format .
Figure 3. Single-Precision, Floating-Point Data Format
63 032 31
(32 bits x 2) Two packed, single-precision, floating-point doublewords
D0D1
031
32-bit, single-precision, floating-point doubleword
22
SignificandBiased ExponentS
Value definitions
1.X=(1)S*0 Biased Exponent=02.X=(1)S*2(Biased Exponent 127)*Significand 0
8/2/2019 3D Now Technology Manual
17/72
Chapter 1 3DNow! Technology 7
21928G/0March 2000 3DNow!Technology Manual
Figure 4 shows the forma ts for th e integer d ata types.
Figure 4. Integer Data Types
63 56 55 47
63
39 31 23 15 7
47
63
63
31 15
48 40 32 24 16
0
032
48 32 16 0
08
31
(8 bits x 8) Packed bytes
(16 bits x 4) Packed words
(32 bits x 2) Packed doublewords
(64 bits x 1) Quadword
B2 B1B4 B3B5 B0B6B7
W0W1W2W3
D0D1
Q0
8/2/2019 3D Now Technology Manual
18/72
8 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
3DNow!Instruction Formats
The forma t of 3DNow! ins t ruct ion e ncodings is based on th econvent ional x86 modR/M instru ction forma t and is s imilar t o
the forma t u sed by MMX instructions. The assembly language
synta x used for t he 3DNow! instr uctions is as follows:
3DNow! Mnemonic mmreg1, mmreg2/mem64
The des t ina t ion and source1 opera nd (mmre g1) mus t be an
M M X r e g i s t e r ( m m 0 m m 7 ) . Th e s o u r c e 2 o p e r a n d
(mmreg2/me m64) can be e i th er a n MMX registe r or a 64-bi t
memory value.
The en coding uses the op code pre fix 0Fh followed by a second
o p c o d e b y t e o f 0F h . To d i f f e r e n t i a t e t h e v a r i o u s 3 DNo w!
instru ctions, a th ird instr uction suffix byte is used. This suffix
b y t e o c cu p i e s t h e s a m e p o s i t i on a t t h e e n d o f a 3 D N ow !
ins t ruct ions as would an imm8 byte . The opcode format is as
follows:
0Fh 0Fh modR/M [sib] [displacement] 3DNow!_suffix
T h e s p e c i f ic o p e r a n d s ( m m r e g 1 a n d m m r e g 2 /m e m 6 4 )
det erm ine th e values used in modR/M [sib] [displacement ], and
fo l low conven t iona l x86 en cod ings . The 3DNow! su f f ix i sde t e rm ined b y the ac tua l 3DNow! ins t ruc t ion . The 3DNow!
suffixes are defined in Tab le 2 on page 14.
As an examp le , the 3DNow! PFMUL ins t ruct ion can produce
th e following opcode s, de pe nd ing on its use:
Opcode Inst ruct ion
0F 0F CA B4 PFMUL mm1, mm20F 0F 0B B4 PFMUL mm1, [ebx]0F 0F 4B 0A B4 PFMUL mm1, [ebx+10]
26 0F 0F 0B B4 PFMUL mm1, es:[ebx]0F 0F 4C 83 0A B4 PFMUL mm1, [ebx+eax*4+10]
T h e e n c o d i n g o f t h e t w o p e r f o r m a n c e -e n h a n c e m e n t
ins t ruct ions (FEMMS and PR EFE TCH) uses a s ingle opcode
prefix 0Fh. The det ails of the opcodes for these instructions are
shown on pa ges 18 and 56 respectively.
8/2/2019 3D Now Technology Manual
19/72
Chapter 1 3DNow! Technology 9
21928G/0March 2000 3DNow!Technology Manual
Definitions
3DNow! te chno logy p rov ides 21 ad d i t iona l ins t r uc t ions t osupport high-per forma nce, 3D graphics and a udio processing.
3DNow! ins t ru ct ions are vector ins t r uct ions tha t opera te on
6 4 -b i t r e g i s t e r s . 3 D N ow ! i n s t r u c t i o n s a r e S I M D e a c h
instr uction ope rat es on pa irs of 32-bit value s.
The de finitions for th e 3DNow! instr uctions star ting on p age 17
conta in d esignat ions classifying e ach instru ction as vectored or
scalar. Vector ins t r uct ions opera t e in pa ra l le l on two se ts of
32-bit, single-pre cision, floating-point words. Inst ru ctions t ha t
a re l abe led as sca la r ins t ruc t ions opera te on a s ing le se t o f
3 2 -b i t o p e r a n d s ( f r o m t h e l o w h a l v e s o f t h e t w o 6 4-b i toperands).
T h e 3 D N o w ! s i n g l e -p r e c i s i o n , f l oa t i n g -p o i n t f o r m a t i s
compatible with the IEEE-754, single-precision format. This
forma t compr ises a 1-bit s ign, an 8-bit b iased e xponent , and a
23-bit s ignificand with one hidde n int eger b it for a tota l of 24
b i t s i n t h e s i g n i fi c a n d . Th e b i a s o f t h e e x p o n e n t i s 12 7 ,
c o n s i st e n t w i t h t h e I E E E s i n g l e -p r e c i s io n s t a n d a r d . Th e
significands are normalized to be within the range of [1,2).
In con t ras t to th e IEEE s tandard t ha t d icta tes four round ingmode s , 3DNow! te chnology sup por ts one roun ding mode
e i t he r roun d-to -ne are s t o r roun d-to -ze ro ( t run ca t ion) . The
hardware implemen ta t ion of 3DNow! te chnology det erm ines
t h e r o u n d i n g m o d e . T h e A M D p r o c e s s o r s i m p l e m e n t
round-to-nea rest mode . Regardless of the r ounding mode u sed,
th e f loa t ing-po in t -to -in t ege r an d in t ege r -to -f loa t ing-po in t
convers ion ins t r uc t ions , PF2ID an d PI2FD, a lways use th e
roun d-to-zero (tr un cation) mode .
The larges t , repre sentab le , norma l number in magni tu de for
th i s p rec i sion in he xadec imal has an e xponen t o f FEh a nd a
significand of 7FFF FFh , with a nume rical value of 2127 (2 223).
Al l resul t s th a t over f low above the ma ximum-rep resen ta ble
p o s i t i v e v a l u e a r e s a t u r a t e d t o e i t h e r t h i s
maximum-repre sent able norm al numb er or t o positive infinity.
S i m i l a r l y, a l l r e s u l t s t h a t o v e r f l o w b e l o w t h e
minimum-representa ble negative value are satura ted to either
8/2/2019 3D Now Technology Manual
20/72
10 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
t h i s m i n i m u m -r e p r e s e n t a b l e n o r m a l n u m b e r o r t o n e g a t i ve
infinity.
The implem ent a t ion of 3DNow! te chnology det erm ines how
a r i t h m e t i c ov e r f l ow i s h a n d l e d e i t h e r p r o p e r l y s i gn e dm a x i m u m - o r m i n i m u m -r e p r e s e n t a b l e n o r m a l n u m b e r s o r
proper ly s igned inf in i t ies . The processor gene ra te s proper ly
signed m aximum- or minimum-represen table normal num bers.
Infinit ies and NaNs are not supp orted as operan ds to 3DNow!
instructions.
The smal les t representa ble normal number in magni tud e for
th i s p rec i sion in he xadec imal has an exponen t o f 01h an d a
s i g n i f ic a n d o f 0 00 0 0 0h , w i t h a n u m e r i c a l va l u e o f 2 1 2 6 .
Accord ing ly, a l l r e su l t s be low th i s min imum repr esen tab leva l u e i n m a g n i t u d e a r e h e l d t o ze r o . Ta b l e 1 s h o w s t h e
expone nt ra nges supp orte d by the 3DNow! te chnology.
Like MMX instr uctions, 3DNow! instructions do n ot gen erat e
nume ric except ions nor do they se t a ny s ta tus f lags . I t i s the
users responsibility to ensure tha t in-range d ata is provided to
3DNow! instructions and tha t al l computa tions remain within
valid ranges (or are held a s expected).
Table 1. 3DNow!Technology Exponent Ranges
BiasedExponent
Description
FFh Unsupported *
00h Zero
00h
8/2/2019 3D Now Technology Manual
21/72
Chapter 1 3DNow! Technology 11
21928G/0March 2000 3DNow!Technology Manual
Execution Resources on AM D-K6 Processors
T h e r e g i s t e r o p e r a t i o n s o f a l l 3 D N ow ! f l o a t i n g -p o i n t
ins t ruct ions are executed by e i ther th e regis ter X uni t or theregister Y uni t . One opera t ion can b e issued to each regis ter
unit e ach clock cycle, for a ma ximum issue and e xecution ra te
of two 3DNow! ope rat ions pe r cycle. All 3DNow! ope rat ions
have an execu t ion la t en cy o f two clock cyc les an d a re fu l ly
pipelined.
Even t hough 3DNow! execution resources are not d uplicated in
bo th re g i ste r un i t s (for examp le , the re a re n o t two pa i r s o f
3DNow! multipliers, just one share d pa ir of multipliers), the re
a r e n o i n s t r u c t i o n -d e c o d e o r o p e r a t i o n -i s su e p a i r i n g
rest rictions. Whe n, for exa mp le, a 3DNow! multip ly opera tionstarts execution in a re gister un it , that un it grabs and uses the
o n e s h a r e d p a i r o f 3DNow! m u l t i p l i e r s . On ly wh e n a c t u a l
con te n t ion occurs be twee n t wo 3DNow! opera t ions s ta r t ing
execution at t he same time is one of the opera tions held up for
o n e c yc l e i n it s f ir s t e x e c u t i o n p i p e s t a g e wh i l e t h e o t h e r
proceeds. The delay is never more t han one cycle.
F or c o d e o p t i m i z a t i o n p u r p o s e s , 3 D N ow ! o p e r a t i o n s a r e
grouped in t o two ca te gor ies . These ca t egor ies a re ba sed on
execution resources and are important when creating properly
sche du led code . As long as two 3DNow! opera tions th at s tar t
execut ion s imul tane ously do not fa l l in to th e sam e cat egory,
both operations will start execution without d elay.
The first cat egory of instructions conta ins the operat ions for t he
fol lowing 3DNow! ins t ru ct ion s: PFADD, PF SUB, PF SUBR,
PFACC, PFCMPx, PFMIN, PFMAX, PI2FD, PF2ID, PFRCP, and
PFRSQRT.
The second cate gory contains th e operations for th e following
3DNow! ins t r uc t ions : PFMU L, PFRCP IT1, PFR SQIT1 , and
PFRCPIT2.
Note: 3DNow! add and m ult iply operations, am ong other
com binat ions, can execut e sim ult aneously.
Normally, in h igh-pe rforma nce 3DNow! code, a ll of th e 3DNow!
instructions are properly scheduled a par t from each other so as
to avoid de lays due to execution resource conten tions (as well
as taking into account depe nden cies and execution latencies) .
8/2/2019 3D Now Technology Manual
22/72
12 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
For fur the r informat ion regarding code opt imizat ion, see th e
AMD-K6 Processor Code Opt im izat ion Application N ote, order #
21924. This d ocumen t provide s in-dep th d iscuss ions of code
optimization techn ique s for th e processor.
F or e x e c u t i o n r e s ou r c e s i n fo r m a t i o n on t h e A M D A t h l o n
p r o c e s s o r, r e f e r t o t h e A M D A t h l o n P r oc es so r x 8 6 C od e
Optim ization Guide, order # 22007.
T h e S I M D 3 D N ow ! i n s t r u c t i o n s f o r a l l p r o c e s s o r s a r e
summarized in Ta b le 2 on p a ge 1 4. The dedicated and shared
execut ion resources of the regis ter X u ni t and regis ter Y u ni t
are shown in Figure 5 on p age 13. The execut ion resources for
some MMX opera tions, as well as al l 3DNow! opera tions, are
shared bet ween t he two register units. For content ion-checking
purp oses , each box represen t s a ca tegory o f opera t ions tha tcann ot start e xecution simultan eously. In addit ion, the MMX
and 3DNow! multiplies use t he same h ardware, while MMX and
3DNow! adds a nd subtracts d o not.
The 3DNow! pe r forma nce-enh ancem ent ins t ruc t ions fo r a l l
AMD processors are summa rized in Ta b le 3 on p a ge 14. The
F E M M S i n s t r u c t i o n d o e s n o t u s e a n y s p e c i fi c e x e c u t i o n
resource or p ipel ine . The PREF ETCH ins t ruct ion is opera ted
on in the Load unit.
8/2/2019 3D Now Technology Manual
23/72
Chapter 1 3DNow! Technology 13
21928G/0March 2000 3DNow!Technology Manual
Figure 5. Register X Unit and Register Y Unit Resources
IntegerALU
IntegerShift
IntegerMultiply
and DivideIntegerALU
MMXALU
Add/Subtract,
Compare
MMXShifter
IntegerByte
Operations
IntegerSpecial
Registers
IntegerSegment
Register Loads
MMXALU
Add/Subtract,
Compare
MMXALU
Logical, Pack,
Unpack
Register X ExecutionPipeline
3DNow!
Add/Subtract,
Compare, Integer
Conversion,
Reciprocal and
Reciprocal
Square Root
Table Lookup
MMXand
3DNow!
Multiply,
Reciprocal and
Reciprocal
Square Root
Iteration
MMXALU
Logical, Pack,
Unpack
Shared Register X and Y
Resources
Register Y ExecutionPipeline
Dedicated Register XResources
Dedicated Register YResources
8/2/2019 3D Now Technology Manual
24/72
14 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
Table 2. 3DNow!Floating-Point Instructions
Operation FunctionOpcodeSuffix
PAVGUSB Packed 8-bit Unsigned Integer Averaging BFh
PFADD Packed Floating-Point Addition 9Eh
PFSUB Packed Floating-Point Subtraction 9Ah
PFSUBR Packed Floating-Point Reverse Subtraction AAh
PFACC Packed Floating-Point Accumulate AEh
PFCMPGE Packed Floating-Point Comparison, Greater or Equal 90h
PFCMPGT Packed Floating-Point Comparison, Greater A0h
PFCMPEQ Packed Floating-Point Comparison, Equal B0h
PFMIN Packed Floating-Point Minimum 94h
PFMAX Packed Floating-Point Maximum A4h
PI2FD Packed 32-bit Integer to Floating-Point Conversion 0Dh
PF2ID Packed Floating-Point to 32-bit Integer 1Dh
PFRCP Packed Floating-Point Reciprocal Approximation 96h
PFRSQRT Packed Floating-Point Reciprocal Square Root Approximation 97h
PFMUL Packed Floating-Point Multiplication B4h
PFRCPIT1 Packed Floating-Point Reciprocal First Iteration Step A6h
PFRSQIT1 Packed Floating-Point Reciprocal Square Root First Iteration Step A7h
PFRCPIT2 Packed Floating-Point Reciprocal/Reciprocal Square Root Second Iteration Step B6h
PMULHRW Packed 16-bit Integer Multiply with rounding B7h
Table 3. 3DNow!Performance-Enhancement Instructions
Operation FunctionOpcode
Second Byte
FEMMS Faster entry/exit of the MMXor floating-point state 0Eh
PREFETCH/PREFETCHW * Prefetch at least a 32-byte line into L1 data cache (Dcache) 0Dh
Note:
* The AMD-K6-2 and AMD-K6-IIIprocessors execute the PREFETCHW instruction identically to the PREFETCH instruction.On the AMD Athlon processor, PREFETCHW can increase performance by providing a hint to the processor of an intent tomodify the cache line.
8/2/2019 3D Now Technology Manual
25/72
Chapter 1 3DNow! Technology 15
21928G/0March 2000 3DNow!Technology Manual
Task Switching
With respe ct to t ask switching, treat the 3DNow! instruct ions
exactly the sam e a s MMX instru ctions. Operating system d esign
must be take n into account whe n writing a 3DNow! program.
Th e p r o gr a m m e r m u s t k n o w wh e t h e r t h e o p e r a t i n g sy st e m
aut omatically saves the curre nt state s when t ask switching, or if
the 3DNow! program ha s to provide the code to save sta tes.
If a task switch occurs, the Cont rol Registe r (CR0) Task Switch
(TS) bit is set to 1. The p rocessor then generat es an inte rru pt 7
( in t 7 Device Not Ava i lab le ) whe n i t en coun te r s the n ex t
f loa t in g-po in t , 3DNow!, o r MMX ins t ru c t ion , a l lowing th e
opera t ing syste m t o save t he s ta t e o f t he 3DNow!/MMX/FP
registers.
In a mul t i t a sk ing opera t ing system , i f the re i s a t a sk switch
w h e n 3 D N ow !/M M X a p p l i c a t i o n s a r e r u n n i n g w it h o l d e r
a p p l i ca t i o n s t h a t d o n o t i n c l u d e M M X i n s t r u c t i o n s , t h e
MMX/FP re gister stat e is still saved a utoma tically thr ough the
int 7 han dler.
Exceptions
Tab le 4 conta ins a l i st of except ions tha t 3DNow! and MMX
instructions can generate .
Table 4. 3DNow!and MMX Instruction Exceptions
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X X X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment overrun (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1. (InProtected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
26/72
16 3DNow!Technology Chapter 1
3DNow!Technology Manual 21928G/0March 2000
T h e r u l e s f or e x c e p t i o n s a r e t h e s a m e f o r b o t h M M X a n d
3DNow! ins t ru c t ions . In add i t ion , excep t ion de t ec t ion an d
ha ndling is iden tical for MMX an d 3DNow! instr uctions. None
of the e xception handlers nee d modification.
Notes:
1. An in valid opcode exception (int errupt 6) occurs if a
3DNow! instruction is executed on a processor that does
not support 3DNow! instructions.
2. If a floatin g-point exception is pending and t he processor
encoun ters a 3DNow! in struction, FERR# is asserted an d,
if CR0.NE = 1, an interrupt 16 is generated. (This is the
sam e for MM X in structions.)
Prefixes
The following pre fixes can be used with 3DNow! instr uctions:
s The segme nt over ride pre fixes (2Eh /CS, 36h/SS, 3Eh/DS,
26h/ES, 64h/FS, and 65h/GS) affect 3DNow! instructions
that contain a memory operand.
s The a dd ress-size over ride pre fix (67h) affect s 3DNow!
instructions that contain a memory opera nd.
s The ope ran d-size over ride pre fix (66h) is ignored.
s The LOCK pre fix (F0h) tr iggers an invalid opcode e xception(interrupt 6).
s The REP prefixes (F3h/ REP/ REPE / REPZ, F2h/ REPNE/
REPNZ) are ignored.
8/2/2019 3D Now Technology Manual
27/72
21928G/0March 2000 3DNow!Technology Manual
Chapter 2 3DNow! Instruction Set 17
23DNow!Instruction Set
T h e f o l l ow i n g 3 D N ow ! i n s t r u c t i o n d e f i n i t i o n s a r e i n
alphab etical order according to the instruction mnem onics.
8/2/2019 3D Now Technology Manual
28/72
18 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
FEMMS
mnemonic opcode description
FEMMS 0F 0Eh Faster Enter/Exit of the MMX or floating-point state
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
Like the EMMS instr uction, the FE MMS instruction can be u sed to clear the MMX
stat e fo l lowing th e exe cut ion of a b lock of MMX ins t ru ct ions . Becau se th e MMX
registers a nd tag words are share d with the floating-point u nit, it is necessary to clear
th e stat e be fore execut ing floating-point instru ctions. Unlike th e EMMS instruct ion,
th e con ten t s o f the MMX/f loa t ing -p o i n t r e g is t e r s a r e u n d e f in e d a f t e r a F E MMSinstruct ion is executed . There fore , the FEMMS inst ru ct ion offers a fas ter context
switch at th e en d of an MMX routine whe re th e values in the MMX registers are n o
longer r equired . FEMMS can also be used prior to execut ing MMX instructions where
th e pre ceding floatin g-point register values are n o longer req uired , which facilita te s
faster context switching.
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate MMX instruction bit (EM) of the control register (CR0) is setto 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bit (TS) of thecontrol register (CR0) is set to 1.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
8/2/2019 3D Now Technology Manual
29/72
Chapter 2 3DNow! Instruction Set 19
21928G/0March 2000 3DNow!Technology Manual
PAVGUSB
mnemonic opcode/imm8 description
PAVGUSB mmreg1, mmreg2/mem64 0F 0Fh / BFh Average of unsigned packed 8-bit values
Privilege: None
Registers Affected: MMX
Flags Affected: None
Exceptions Generated:
The PAVGUSB instr uction pr oduce s the r ound ed avera ges of th e eight u nsigned 8-bit
inte ger values in th e source operan d (an MMX register or a 64-bit me mory location)
and the eight corresponding unsigned 8-bit integer values in the d estinat ion opera nd
(an MMX register) . It d oes so by adding the source and destination byte values an d
the n adding a 001h to the 9-bit inter med iate value. The interm ediat e value is then
divided by 2 (shifte d right one p lace) and t he e ight u nsigned 8-bit results are stored
in the MMX register spe cified as the de stination operand .
The PAVGUSB ins t ru c t ion can be u sed fo r p ixe l ave rag in g in MPEG-2 mot ion
compe nsation an d video scaling operat ions.
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
30/72
20 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Functional Illustration of the PAVGUSB Instruction
The following list expla ins th e fun ctional illustra tion of th e PAVGUSB instr uction:
s The round ed byte average of FFh a nd FF h is FFh .
s The rounded byte average of FFh and 00h is 80h.
s The rounded byte average of 01h and FFh is also 80h.
s The rounded byte average of 0Fh a nd 10h is 10h.
s The rounde d byte average of 00h and 01h is 01h.
s The rounde d byte average of 70h and 44h is 5Ah.
s The rounded byte average of 07h and F7h is 7Fh.
s The rounde d byte average of 9Ah and A8h is A1h.
The eq ua tions for byte avera ging with roun ding are a s follows:
s mm reg1[63:56] = (mm re g1[63:56] + mm re g2/me m64[63:56] + 01h)/2
s mm reg1[55:48] = (mm re g1[55:48] + mm re g2/me m64[55:48] + 01h)/2
s mm reg1[47:40] = (mm re g1[47:40] + mm re g2/me m64[47:40] + 01h)/2
s mm reg1[39:32] = (mm re g1[39:32] + mm re g2/me m64[39:32] + 01h)/2
s mm reg1[31:24] = (mm re g1[31:24] + mm re g2/me m64[31:24] + 01h)/2
s mm reg1[23:16] = (mm re g1[23:16] + mm re g2/me m64[23:16] + 01h)/2
s mm reg1[15:8] = (mm re g1[15:8] + mmr eg2/mem 64[15:8] + 01h)/2
s mm reg1[7:0] = (mm re g1[7:0] + mmr eg2/mem 64[7:0] + 01h)/2
FFh FFh 01h 0Fh 9Ah00h 70h 07hmmreg2/mem64
mmreg1
per byte averaging
= = = = = = ==
FFh 80h 80h 10h A1h01h 5Ah 7Fhmmreg1
FFh 00h FFh 10h A8h01h 44h F7h
063
063
063
Indicates a value that was rounded-up
8/2/2019 3D Now Technology Manual
31/72
Chapter 2 3DNow! Instruction Set 21
21928G/0March 2000 3DNow!Technology Manual
PF2ID
mnemonic opcode/imm8 description
PF2ID mmreg1, mmreg2/mem64 0Fh 0Fh / 1Dh Converts packed floating-point operand to packed32-bit integer
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
P F 2 I D i s a v e c t o r i n s t r u c t i o n t h a t c o n v e r t s a v e c t o r r e g i s t e r c o n t a i n i n g
single-precision, floating-point operands to 32-bit signed integers using truncation.
Tab le 5 on page 22 shows the nu mer ical range of th e PF 2ID instruction.
The PF 2ID instruction per forms t he following operat ions:
IF (mmreg2/mem64[31:0] >= 231)THEN mmreg1[31:0] = 7FFF_FFFFh
ELSEIF (mmreg2/mem64[31:0] = 231)
THEN mmreg1[63:32] = 7FFF_FFFFhELSEIF (mmreg2/mem64[63:32]
8/2/2019 3D Now Technology Manual
32/72
22 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Related Instructions See the PI2FD instruction.
Table 5. Numerical Range for the PF2ID Instruction
Source 2 Source 1 and Destination
0 0
Normal, abs(Source 1)
8/2/2019 3D Now Technology Manual
33/72
Chapter 2 3DNow! Instruction Set 23
21928G/0March 2000 3DNow!Technology Manual
PFACC
mnemonic opcode/imm8 description
PFACC mmreg1, mmreg2/mem64 0Fh 0Fh / AEh Floating-point accumulate
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFACC is a vector ins t ru ct ion tha t accumu late s the two words of the de s t inat ion
operan d an d the source operan d and stores the results in the low and high words of
de stina tion ope ran d resp ectively. Both ope ran ds are single-pre cision, floatin g-point
opera nd s with 24-bit significands. Tab le 6 on page 24 shows the nume rical range of the
PFACC instru ction.
The PFACC instr uction pe rforms the following operat ions:
temp = mmreg2/mem64
mmreg1[31:0] = mmreg1[31:0] + mmreg1[63:32]mmreg1[63:32] = temp[31:0] + temp[63:32]
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
34/72
24 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Table 6. Numerical Range for the PFACC Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +/ 0 1 Source 2 Source 2
Normal Source 1 Normal, +/ 0 2 Undefined
Unsupported Source 1 Undefined Undefined
Notes:
1. The sign of the result is the logical AND of the signs of the source operands.
2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operandthat is larger in magnitude (if the magnitudes are equal, the sign of source 1 is used). If the absolute value of the resultis greater than or equal to 2128, the result is the largest normal number with the sign being the sign of the source operandthat is larger in magnitude.
8/2/2019 3D Now Technology Manual
35/72
Chapter 2 3DNow! Instruction Set 25
21928G/0March 2000 3DNow!Technology Manual
PFADD
mnemonic opcode/imm8 description
PFADD mmreg1, mmreg2/mem64 0Fh 0Fh / 9Eh Packed, floating-point addition
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFADD is a vector instruct ion th at pe rforms addition of the destina tion operand a nd
th e source ope rand . Both ope rand s are single-pre cision, float ing-point ope rand s with
24-bit s ignifican ds. Ta b le 7 on p a ge 26 shows the nu mer ica l r ange o f the PFADD
instruction.
The PFADD instr uction pe rforms the following ope rat ions:
mmreg1[31:0] = mmreg1[31:0] + mmreg2/mem64[31:0]mmreg1[63:32] = mmreg1[63:32] + mmreg2/mem64[63:32]
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
36/72
26 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Table 7. Numerical Range for the PFADD Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +/ 0 1 Source 2 Source 2
Normal Source 1 Normal, +/ 0 2 Undefined
Unsupported Source 1 Undefined Undefined
Notes:
1. The sign of the result is the logical AND of the signs of the source operands.
2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operandthat is larger in magnitude (if the magnitudes are equal, the sign of source 1 is used). If the absolute value of the resultis greater than or equal to 2128, the result is the largest normal number with the sign being the sign of the source operandthat is larger in magnitude.
8/2/2019 3D Now Technology Manual
37/72
Chapter 2 3DNow! Instruction Set 27
21928G/0March 2000 3DNow!Technology Manual
PFCMPEQ
mnemonic opcode/imm8 description
PFCMPEQ mmreg1, mmreg2/mem64 0Fh 0Fh / B0h Packed floating-point comparison, equal to
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFCMPEQ is a vec to r ins t ruc t ion th a t pe r forms a compar i son o f the de s t ina t ion
operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the
result of the corresponding compar ison. Tab le 8 on page 28 shows the num erical range
of the PFCMPEQ instruction.
The PF CMPEQ instruction pe rforms t he following operat ions:
IF (mmreg1[31:0] = mmreg2/mem64[31:0])THEN mmreg1[31:0] = FFFF_FFFFh
ELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] = mmreg2/mem64[63:32]
THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
38/72
28 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Related Instructions See the P FCMPGE instruction.
See th e PFCMPGT instruction.
Table 8. Numerical Range for the PFCMPEQ Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 FFFF_FFFFh 1 0000_0000h 0000_0000h
Normal 0000_0000h0000_0000h,
FFFF_FFFFh 20000_0000h
Unsupported 0000_0000h 0000_0000h Undefined
Notes:
1. Positive zero is equal to negative zero.
2. The result is FFFF_FFFFh if source 1 and source 2 have identical signs, exponents, and mantissas. Otherwise, the result is0000_0000h.
8/2/2019 3D Now Technology Manual
39/72
Chapter 2 3DNow! Instruction Set 29
21928G/0March 2000 3DNow!Technology Manual
PFCMPGE
mnemonic opcode/imm8 description
PFCMPGE mmreg1, mmreg2/mem64 0Fh 0Fh / 90h Packed floating-point comparison, greater than orequal to
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFCMPGE i s a vec to r ins t ruc t ion th a t pe r forms a compar i son o f the de s t ina t ion
operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the
result of the corresponding compar ison. Tab le 9 on page 30 shows the num erical range
of the PFCMPGE instruction.
The PF CMPGE instruction pe rforms t he following operations:
IF (mmreg1[31:0] >= mmreg2/mem64[31:0])
THEN mmreg1[31:0] = FFFF_FFFFhELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] >= mmreg2/mem64[63:32]
THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
40/72
30 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Related Instructions See the PF CMPEQ instruction.
See th e PFCMPGT instruction.
Table 9. Numerical Range for the PFCMPGE Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 FFFF_FFFFh 10000_0000h,
FFFF_FFFFh 2Undefined
Normal0000_0000h,
FFFF_FFFFh 3
0000_0000h,
FFFF_FFFFh 4Undefined
Unsupported Undefined Undefined Undefined
Notes:
1. Positive zero is equal to negative zero.
2. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h.
3. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h.
4. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative and source 1 is smallerthan or equal in magnitude to source 2, or if source 1 and source 2 are both positive and source 1 is greater than or equal inmagnitude to source 2. The result is 0000_0000h in all other cases.
8/2/2019 3D Now Technology Manual
41/72
Chapter 2 3DNow! Instruction Set 31
21928G/0March 2000 3DNow!Technology Manual
PFCMPGT
mnemonic opcode/imm8 description
PFCMPGT mmreg1, mmreg2/mem64 0Fh 0Fh / A0h Packed floating-point comparison, greater than
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFCMPGT is a vec tor ins t ru c t ion tha t pe r forms a compar i son o f the de s t ina t ion
operand a nd th e source operand a nd gene rate s all one bits or all zero bits based on the
resul t of the corresponding compa rison. Ta b le 10 on p a g e 32 shows the nume rical
range of the P FCMPGT instruction.
The PFCMPGT instru ction per forms the following ope rat ions:
IF (mmreg1[31:0] > mmreg2/mem64[31:0])THEN mmreg1[31:0] = FFFF_FFFFh
ELSE mmreg1[31:0] = 0000_0000hIF (mmreg1[63:32] > mmreg2/mem64[63:32]
THEN mmreg1[63:32] = FFFF_FFFFhELSE mmreg1[63:32] = 0000_0000h
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
42/72
32 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Related Instructions See the PF CMPEQ instruction.
See the P FCMPGE instruction.
Table 10. Numerical Range for the PFCMPGT Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 0000_0000h0000_0000h,
FFFF_FFFFh 1Undefined
Normal0000_0000h,
FFFF_FFFFh 2
0000_0000h,
FFFF_FFFFh 3Undefined
Unsupported Undefined Undefined Undefined
Notes:
1. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h.
2. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h.
3. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative and source 1 is smaller inmagnitude than source 2, or if source 1 and source 2 are positive and source 1 is greater in magnitude than source 2. The resultis 0000_0000h in all other cases.
8/2/2019 3D Now Technology Manual
43/72
Chapter 2 3DNow! Instruction Set 33
21928G/0March 2000 3DNow!Technology Manual
PFMAX
mnemonic opcode/imm8 description
PFMAX mmreg1, mmreg2/mem64 0Fh 0Fh / A4h Packed floating-point maximum
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFMAX is a vector ins t ru ct ion tha t re tur ns th e larger of the t wo single-precis ion,
f loat ing-point operan ds . Any opera t ion with a zero and a negat ive num ber re tur ns
positive zero. An opera tion consisting of t wo zeros re tu rn s positive zero. Table 11 on
p age 34 shows the nu mer ical range of the P FMAX instruction.
The PF MAX instru ction per forms t he following oper ations:
IF (mmreg1[31:0] > mmreg2/mem64[31:0])THEN mmreg1[31:0] = mmreg1[31:0]
ELSE mmreg1[31:0] = mmreg2/mem64[31:0]IF (mmreg1[63:32] > mmreg2/mem64[63:32])
THEN mmreg1[63:32] = mmreg1[63:32]ELSE mmreg1[63:32] = mmreg2/mem64[63:32]
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
44/72
34 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Related Instructions See the PFMIN instruction.
Table 11. Numerical Range for the PFMAX Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +0 Source 2, +0 1 Undefined
Normal Source 1, +0 2 Source 1/Source 2 3 Undefined
Unsupported Undefined Undefined Undefined
Notes:
1. The result is source 2, if source 2 is positive. Otherwise, the result is positive zero.
2. The result is source 1, if source 1 is positive. Otherwise, the result is positive zero.
3. The result is source 1, if source 1 is positive and source 2 is negative. The result is source 1, if both are positive and source 1 isgreater in magnitude than source 2. The result is source 1, if both are negative and source 1 is lesser in magnitude than source2. The result is source 2 in all other cases.
8/2/2019 3D Now Technology Manual
45/72
Chapter 2 3DNow! Instruction Set 35
21928G/0March 2000 3DNow!Technology Manual
PFMIN
mnemonic opcode/imm8 description
PFMIN mmreg1, mmreg2/mem64 0Fh 0Fh / 94h Packed floating-point minimum
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFMIN is a vector ins t ru ct ion t hat re tu rns t he smal ler of the two s ingle-precis ion,
f loat ing-point opera nds . Any opera t ion with a zero and a posi t ive num ber re t urn s
positive zero. An opera tion consisting of t wo zeros re tu rn s positive zero. Table 12 on
p age 36 shows the nu mer ical range of the P FMIN instruction.
The PFMIN instr uction pe rforms the following opera tions:
IF (mmreg1[31:0] < mmreg2/mem64[31:0])THEN mmreg1[31:0] = mmreg1[31:0]
ELSE mmreg1[31:0] = mmreg2/mem64[31:0]IF (mmreg1[63:32] < mmreg2/mem64[63:32])
THEN mmreg1[63:32] = mmreg1[63:32]ELSE mmreg1[63:32] = mmreg2/mem64[63:32]
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
46/72
36 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Related Instructions See the PFMAX instru ction.
Table 12. Numerical Range for the PFMIN Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +0 Source 2, +0 1 Undefined
Normal Source 1, +0 2 Source 1/Source 2 3 Undefined
Unsupported Undefined Undefined Undefined
Notes:
1. The result is source 2, if source 2 is negative. Otherwise, the result is positive zero.
2. The result is source 1, if source 1 is negative. Otherwise, the result is positive zero.
3. The result is source 1, if source 1 is negative and source 2 is positive. The result is source 1, if both are negative and source 1 isgreater in magnitude than source 2. The result is source 1, if both are positive and source 1 is lesser in magnitude than source2. The result is source 2 in all other cases.
8/2/2019 3D Now Technology Manual
47/72
Chapter 2 3DNow! Instruction Set 37
21928G/0March 2000 3DNow!Technology Manual
PFMUL
mnemonic opcode/imm8 description
PFMUL mmreg1, mmreg2/mem64 0Fh 0Fh / B4h Packed floating-point multiplication
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
P F MUL i s a ve c t o r i n s t r u c t i o n t h a t p e r f o r m s m u l t i p l i ca t i o n o f t h e d e s t i n a t i o n
operan d a nd the source operan d. Both ope rand s are single-precision, f loating-point
opera nds with 24-bit significand s. Tab le 13 on page 38 shows the numer ical range of
the PF MUL instruction.
The PF MUL instruction pe rforms t he following operations:
mmreg1[31:0] = mmreg1[31:0] * mmreg2/mem64[31:0]mmreg1[63:32] = mmreg1[63:32] * mmreg2/mem64[63:32]
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
48/72
38 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
Table 13. Numerical Range for the PFMUL Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +/ 0 1 +/ 0 1 +/ 0 1
Normal +/ 0 1 Normal, +/ 0 2 Undefined
Unsupported +/ 0 1 Undefined Undefined
Notes:
1. The sign of the result is the exclusive-OR of the signs of the source operands.
2. If the absolute value of the result is less then 2126, the result is zero with the sign being the exclusive-OR of the signs of thesource operands. If the absolute value of the product is greater than or equal to 2 128, the result is the largest normal numberwith the sign being exclusive-OR of the signs of the source operands.
8/2/2019 3D Now Technology Manual
49/72
Chapter 2 3DNow! Instruction Set 39
21928G/0March 2000 3DNow!Technology Manual
PFRCP
mnemonic opcode/imm8 description
PFRCP mmreg1, mmreg2/mem64 0Fh 0Fh / 96h Floating-point reciprocal approximation
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFR CP is a scalar instruction th at retu rns a low-precision estimat e of the reciprocal of
the source operand . The single result value is duplicated in both h igh an d low halves
of this instruction s 64-bit r esu lt. The source op era nd is single-pre cision with a 24-bit
s ign i f i can d , and t he re su l t i s accura t e to 14 b i t s . Ta b le 14 on p a ge 4 0 shows th e
nume rical range of the PFRCP instruction.
Increa sed accur acy (th e full 24 bit s of a single-pre cision significand ) requ ires th e use
of two add i t ional ins t ruct ions (PFR CPIT1 an d PF RCPIT2). The f i rs t s ta ge of th is
increase or refinement in a ccuracy (PFR CPIT1) requ ires that th e input a nd outp ut of
t h e a l r e a d y e xe c u t e d P F R C P i n s t r u ct i on b e u s e d a s i n p u t t o t h e P F R C P IT 1
i n s t r u c t i o n . R e f e r t o D i v i s i o n a n d S q u a r e R o o t o n p a g e 5 9 f o r a n
app lication-specific example of how to use this instru ction a nd relate d instr uctions.
The PF RCP instru ction p erforms th e following operations:
mmreg1[31:0] = reciprocal(mmreg2/mem64[31:0])mmreg1[63:32] = reciprocal(mmreg2/mem64[31:0])
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
50/72
40 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
In the following code exam ple, the b old l ine i l lustrat es the PFR CP instruction in a
seque nce used t o comput e q = a/b accurat e to 24 bits:
X0 = PFRCP(b)
X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)
Related Instructions See the PFRCPIT1 instru ction.
See the PFRCPIT2 instru ction.
Table 14. Numerical Range for the PFRCP Instruction
Source 1 andDestination
Source 2
0 +/ Maximum Normal 1
Normal Normal, +/ 0 2
UnsupportedUndefined
Notes:
1. The result has the same sign as the source operand.
2. If the absolute value of the result is less then 2126, the result is zero with the sign being the sign of the source operand.Otherwise, the result is a normal with the sign being the same sign as the source operand.
8/2/2019 3D Now Technology Manual
51/72
Chapter 2 3DNow! Instruction Set 41
21928G/0March 2000 3DNow!Technology Manual
PFRCPIT1
mnemonic opcode/imm8 description
PFRCPIT1 mmreg1, mmreg2/mem64 0Fh 0Fh / A6h Packed floating-point reciprocal, first iteration step
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFRCPIT1 i s a vec to r ins t ruc t ion tha t pe r fo rms the f i r st in te r media te s tep in t he
Newton-Raph son i tera t ion to ref ine th e re c iprocal approximat ion p roduced by t he
PFR CP instru ction (the second and final step complete s the iterat ion and is accurat e
t o 2 4 b i t s ) . Ta b le 15 on p a g e 4 2 s h o ws t h e n u m e r i c a l r a n g e o f t h e P F R C P I T 1
instruction.
The beh avior of this instruction is only defined for t hose combinations of opera nds
such th at one source operand was the input to th e PFR CP inst ruct ion an d th e o ther
source operan d was the out put of the sam e PFR CP instru ction. Refer to Division an d
S q u a r e R o ot on p a ge 59 for a n a pp l ica t ion-spec i f ic examp le o f how to u se th i s
instruction and relate d instructions.
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.
(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
52/72
42 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
In the following code example, th e b old line illustrates t he P FRCPIT1 instru ction in a
seque nce used t o comput e q = a/b accurat e to 24 bits:
X0 = PFRCP(b)
X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)
Related Instructions See the PFR CP instru ction.
See the PFRCPIT2 instru ction.
Table 15. Numerical Range for the PFRCPIT1 Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +/ 0 1 +/ 0 1 +/ 0 1
Normal +/ 0 1 Normal 2 Undefined
Unsupported +/ 0 1 Undefined Undefined
Notes:
1. The sign of the result is the exclusive-OR of the signs of the source operands.
2. The sign is positive.
8/2/2019 3D Now Technology Manual
53/72
Chapter 2 3DNow! Instruction Set 43
21928G/0March 2000 3DNow!Technology Manual
PFRCPIT2
mnemonic opcode/imm8 description
PFRCPIT2 mmreg1, mmreg2/mem64 0Fh 0Fh / B6h Packed floating-point reciprocal/reciprocal squareroot, second iteration step
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFRCPIT2 is a vector ins t ruct ion that performs the second and f inal in term edia te
step in the Newton-Rap hson iteration t o refine the reciprocal or reciprocal square root
ap prox imat ion p roduced b y the PF RCP an d PFSQRT ins t r uc t ions , r e spec t ive ly.
Tab le 16 on page 44 shows the nu mer ical range of the PFR CPIT2 instru ction.
The beh avior of this instruction is only defined for t hose combinations of opera nds
such that t he first source operan d (mmreg1) was the outpu t of either t he P FRCPIT1 or
PF RSQIT1 ins t ru ct ions and th e second source operand (mm reg2/mem 64) was the
outpu t of eithe r the PFR CP or PFR SQRT instructions. Refer t o Division and Squa re
Root on p a ge 59 for an app lication-specific examp le of how to use this instr uction
and related instructions.
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not avai lable (7) X X X Save the floating-point or MMX state if the task swi tch bit (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One of the instruction data operands falls outside the address range 00000hto 0FFFFh.
Page fault (14) X X A page fault resulted from the execution of the instruction.
Floating-point exceptionpending (16)
X X X An exception is pending due to the floating-point execution unit.
Alignment check (17) X X An unaligned memory reference resulted from the instruction execution,and the alignment mask bit (AM) of the control register (CR0) is set to 1.(In Protected Mode, CPL = 3.)
8/2/2019 3D Now Technology Manual
54/72
44 3DNow!Instruction Set Chapter 2
3DNow!Technology Manual 21928G/0March 2000
In the following code example, th e b old line illustrates t he P FRCPIT2 instru ction in a
seque nce used t o comput e q = a/b accurat e to 24 bits:
X0 = PFRCP(b)
X1 = PFRCPIT1(b,X0)X2 = PFRCPIT2(X1,X0)q = PFMUL(a,X2)
Related Instructions See the PFRCPIT1 instru ction.
See th e PF RSQIT1 instru ction.
See the PFR CP instru ction.
See the P FRSQRT instruction.
Table 16. Numerical Range for the PFRCPIT2 Instruction
Source 2
0 Normal Unsupported
Source 1 andDestination
0 +/ 0 1 +/ 0 1 +/ 0 1
Normal +/ 0 1 Normal, +/ 0 2 Undefined
Unsupported +/ 0 1 Undefined Undefined
Notes:
1. The sign of the result is the exclusive-OR of the signs of the source operands.
2. If the absolute value of the result is less then 2126, the result is zero with the sign being the exclusive-OR of the signs of thesource operands. If the absolute value of the product is greater than or equal to 2 128, the result is the largest normal numberwith the sign being exclusive-OR of the signs of the source operands.
8/2/2019 3D Now Technology Manual
55/72
Chapter 2 3DNow! Instruction Set 45
21928G/0March 2000 3DNow!Technology Manual
PFRSQIT1
mnemonic opcode/imm8 description
PFRSQIT1 mmreg1, mmreg2/mem64 0Fh 0Fh / A7h Packed floating-point reciprocal square root, firstiteration step
Privilege: none
Registers Affected: MMX
Flags Affected: none
Exceptions Generated:
PFRSQIT1 i s a vec to r ins t ruc t ion tha t pe r fo rms the f i r st in te r media te s tep in t he
Ne wt o n -R a p h s o n it e r a t i o n t o r e f i n e t h e r e c i p r o c a l s q u a r e r o o t a p p r o xi m a t i o n
p r o d u c e d b y t h e P F S QR T i n s t r u c t i o n ( t h e s e c o n d a n d f i n a l s t e p c o m p l e t e s t h e
iterat ion and is accurate t o 24 bits). Table 17 on page 46 shows the nu mer ical range of
the PFR SQIT1 instruction.
The beh avior of this instruction is only defined for t hose combinations of opera nds
such th at one source operand was the input to the PFRSQRT instruction an d t he other
source operan d is the squ are of the outpu t of the sam e PFR SQRT instru ction. Refer to
Division and Square Root on p a ge 59 for an ap plication-specific example of how to
use this instru ction and re lated instru ctions.
Exception RealVirtual8086 Protected Description
Invalid opcode (6) X X X The emulate instruction bit (EM) of the control register (CR0) is set to 1.
Device not available (7) X X X Save the floating-point or MMX state if the task switch bi t (TS) of the controlregister (CR0) is set to 1.
Stack exception (12) X During instruction execution, the stack segment limit was exceeded.
General protection (13) X During instruction execution, the effective address of one of the segmentregisters used for the operand points to an illegal memory location.
Segment over run (13) X X One