Post on 21-Jan-2016
description
transcript
1
Assembly Language FundamentalsAssembly Language Fundamentals
Chapter 2Chapter 2
2
Directives and InstructionsDirectives and Instructions
Assembly language statements are either directives or instructions
Instructions are executable statements. They are translated by the assembler into machine instructions. Ex:
call MySub ;transfer of controlmov ax,5 ;data transfer
Directives tells the assembler how to generate machine code and allocate storage. Ex:
count db 50 ;creates 1 byte ;of storage
;initialized to 50
3
A Template for Assembly Language ProgramsA Template for Assembly Language Programs
.386 = directive to accept all instructions of 386 and previous processors (use .586 to assemble Pentium specific instructions)
end = directive that marks the end of the program
main = label of the entry point of the program (first instruction to execute)
ret = instruction that returns the control to the caller (here the Win32 console)
Macros to perform I/O are included in csi2121.inc
.386
.model flatinclude csi2121.inc
.data ;data allocation ;directives here
.code
main: ;instructions here
retend
4
The FLAT Memory ModelThe FLAT Memory Model
The .model flat directive tells the assembler to generate code that will run in protected mode and in 32-bit mode
Also ask the assembler to do whatever is needed in order that code, stack, and data share the same 32-bit memory segment All the segment registers will be loaded with the correct
values at load time and do not need to be changed by the programmer
Only the offset part of a logical address becomes relevant Each data byte (or instruction) is referred to only by a 32-bit
offset address The directives .code and .data mark the beginning of the
code and data segments. They are used only for protection.code is read-only.data is read and write
5
Steps to Produce an Executable FileSteps to Produce an Executable File
The assembler produces an object file from the assembly language source
The object file contains machine language code with some external and relocatable addresses that will be resolved by the linker. There values are undetermined at that stage.
The linker extract object modules (compiled procedures) from a library and links them with the object file to produce the executable file.
The addresses in the executable file are all resolved but they are still logical addresses.
Assembler linkerSource file
Object file
library
Executable file
6
Using Borland’s BCC32Using Borland’s BCC32
All these steps are performed with the command:
bcc32 –v hello.asm The bcc32 command calls TASM32 to assemble
and produce an object file It then calls ILINK32 to link this object file with the
C/C++ library functions and Win32 functions used by the program to produce the executable file hello.exe
The –v option produces full debugging info See the LabInfo page for all the info you need
7
NamesNames
A name identifies either: a variable a label a constant a keyword (assembler-reserved word).
8
Names (Cont.)Names (Cont.)
A variable is a symbolic name for a location in memory that was allocated by a data allocation directive. Ex:
count db 50 ; allocates 1 byte to
; variable count
A label is a name given to an instruction. It must be followed by ‘:’. Ex:
main:mov eax, 5xor eax, ebxjump main
9
Names (Cont.)Names (Cont.)
The first character must be a letter or any one of ‘@’, ‘_’, ‘$’, ‘?’
subsequent characters can include digits A programmer chosen name must be different
from an assembler reserved word avoid using ‘@’ as the first character since many
keywords start with it When called from bcc32, the TASM32 assembler is
case sensitive for user-defined words but case insensitive for the assembler reserved words
10
Integer ConstantsInteger Constants
Integer constants are made of numerical digits with, possibly, a sign and a suffix. Ex: -23 (a negative integer, base 10 is default) 1011b (a binary number) 1011 (a decimal number) 0A7Ch (an hexadecimal number) A7Ch (this is the name of a variable, an
hexadecimal number must start with a decimal digit)
11
Character and String ConstantsCharacter and String Constants
They are any sequence of characters enclosed either in single or double quotation marks. Embedded quotes are permitted. Ex: ‘A’ ‘ABC’ “Hello World!” “123” (this is a string, not a number) “This isn’t a test” ‘Say “hello” to him’
12
Simple Data Allocation DirectivesSimple Data Allocation Directives
The DB (define byte) directive allocates storage for one or more byte values [name] DB initval [,initval]
Each initializer can be any constant. Ex:
a db 10, 32, 41h ;allocate 3 bytes
b db 0Ah, 20h,‘A’;same values as above A question mark (?) in the initializer leaves the initial
value of the variable undefined. Ex:
c db ? ;the initial value for c is ;undefined
Everything that follows “;” is ignored by the assembler. It is thus a comment
13
Simple Data Allocation Directives (cont.)Simple Data Allocation Directives (cont.)
A string is stored as a sequence of characters. Ex:
aString db “ABCD”
bString DB ‘A’,’B’,’C’,’D’;same valuescString db 41h,42h,43h,44h ;same values again
The (offset) address of a variable is the address of its first byte. Ex: If the following data segment starts at address 0
.data
Var1 db “ABC”
Var2 db “DEFG” The address of Var1 is 0 = the address of ‘A’ The address of ‘B’ is 1 The address of ‘C’ is 2 The address of Var2 is 3 The address of ‘E’ is 4 …
14
Simple Data Allocation Directives (cont.)Simple Data Allocation Directives (cont.)
Define Word (DW) allocates a sequence of words. Ex:
A dw 1234h, 5678h ; allocates 2 words
Intel’s x86 are little endian processors: the lowest order byte (of a word or double word) is always stored at the lowest address.
Ex: if variable A (above) is located at address 0, we have: address: 0 1 2 3 value: 34h 12h 78h 56h
15
Simple Data Allocation Directives (cont.)Simple Data Allocation Directives (cont.)
Define Double Word (DD) allocates a sequence of double words. Ex:
B dd 12345678h ;allocates 1 double word
If this variable is located at address of 0, we have: address: 0 1 2 3 value: 78h 56h 34h 12h
If a value fits into a byte, it will be stored in the lowest ordered byte available. Ex:
V dw ‘A’ the value will be stored as:
address: 0 1 value: 41h 00h
16
Simple Data Allocation Directives (cont.)Simple Data Allocation Directives (cont.)
The DUP operator enables us to repeat values when allocating storage. Ex:
a db 100 dup(?) ;100 bytes
;uninitializedb db 3 dup(“Ho”) ;6 bytes: “HoHoHo”
DUP can be nested:c db 2 dup(‘a’, 2 dup(‘b’)) ;this allocates 6 bytes:‘abbabb’
DUP must be used with data allocation directives There is a bug is some TASM32 versions:
b db 3 dup(“Ho”) Will allocate 6 bytes that will be filled with 0 (i.e. the specified
initial values are ignored).
17
ConstantsConstants
We can use the equal-sign (=) directive or the EQU directive to give a name to a constant. Ex:
one = 1 ;this is a constant
two equ 2; also a constant The EQU and = directives are equivalent The assembler does not allocate storage to a
constant (in contrast with data allocation directives)
It merely substitutes, at assembly time, the value of the constant at each occurrence of the assigned name
18
Constants (cont.)Constants (cont.)
In place of a constant, we can use a constant expression involving the standard operators used in HLLs: +, -, *, /
Ex: the following constant expression is evaluated at assembly time and given a name at assembly time:
A = (-3 * 8) + 2 A constant can be defined in terms of another
constant:
B = (A+2)/2
19
ExerciseExercise 1 1
Suppose that the following data segment starts at address 0
.dataA DW 1,2B DW 6ABCh Z EQU 232C DB 'ABCD'
A) Find the address of variable A. B) Find the address of variable B. C) Find the address of variable C. D) Find the address of character ‘C’.
20
Data Transfer InstructionsData Transfer Instructions
The MOV instruction transfers the content of the source operand to the destination operand
mov destination,source This changes the content of destination (but not the content
of source) Both operands must be of the same size. An operand can be either direct or indirect Direct operands (this chapter) are either:
Immediate (a constant): noted imm Register: noted reg Memory variable (with displacement), noted mem
Indirect operands are used for indirect addressing (later chapter)
21
Data Transfer Instructions (cont.)Data Transfer Instructions (cont.)
Some restrictions on MOV: imm cannot be the destination operand... EIP cannot be an operand Source and destination cannot both be mem.
Direct memory-to-memory data transfer is forbidden!
mov wordVar1,wordVar2; illegal
22
Data Transfer Instructions (cont.)Data Transfer Instructions (cont.)
The type of an operand is given by its size (byte, word, doubleword…)
Both operands of MOV must be of the same type Type check is done by the assembler The type assigned to a mem operand is given by
its data allocation directive (DB, DW…) The type assigned to a register is given by its size An imm source operand of MOV must fit into the
size of the destination operand
23
Data Transfer Instructions (cont.)Data Transfer Instructions (cont.)
Examples of MOV usage:
mov bh, 255; 8-bit operands
mov al, 256; error: cst too large
mov bx, AwordVar; 16-bit operands
mov bx, AbyteVar; error: size mismatch
mov edx, AdoublewordVar;32-bit operands
mov cx, bl ; error: size mismatch
mov wordVar1,wordVar2 ;error: mem-to-mem
24
MOVZX: Move with Zero ExtendMOVZX: Move with Zero Extend Often we want to move the content of a source operand into
a destination operand of larger size The MOVZX instruction does this operation by filling with
zeros the high order part of the destination. Usage: MOVZX destination,source
Immediate operands are not allowed here The size of destination must be strictly larger than the size
of source Example:
mov bh, 80hmovzx ah,bh ;illegal, size mismatch movzx ax,bh ;AX = 0080hmovzx ecx,ax ;ECX = 00000080h
Notice that if the signed value in the source operand is negative, then MOVZX will not preserve the sign.
mov bh, 80h ;BH = 80h (negative)movzx ax,bh ;AX = 0080h (positive)
25
MOVSX: Move with Sign ExtendMOVSX: Move with Sign Extend We can use the MOVSX instruction to preserve the sign of
the source operand. Usage: MOVSX destination,source
The high order part of the destination operand will be the sign extension of the source operand The sign extension of a negative number is …111111 The sign extension of a positive number is …0000000 Examples:
mov bh, 80h ;BH = 80h (negative)movsx ax,bh ;AX = FF80h (negative);FFh is the sign extension of 80hmov bl, 7Ah ;BL = 7Ah (positive)movsx ax,bl ;AX = 007Ah (positive);00h is the sign extension of 7Ah
MOVSX preserves the signed value whereas MOVZX preserves the unsigned value
Immediate operands are not allowed and the size of destination must be strictly larger than the size of source.
26
Data Transfer Instructions (cont.)Data Transfer Instructions (cont.)
We can add a displacement to a memory operand to access a memory value without a name Ex:
.data
arrB db 10h, 20h
arrW dw 1234h, 5678h arrB+1 refers to the location one byte beyond the beginning of
arrB and arrW+2 refers to the location two bytes beyond the beginning of arrW.
mov al,arrB ; AL = 10h
mov al,arrB+1 ;AL=20h (mem with displacement)
mov ax,arrW+2 ; AX = 5678h
mov ax,arrW+1 ; AX = 7812h ; little endian convention!
mov ax,arrW-2 ; AX = 2010h negative ; displacement permitted
27
Data Transfer Instructions (cont.)Data Transfer Instructions (cont.)
The XCHG instruction exchanges the content of the source and destination operands:
XCHG destination,source
Only mem and reg operands are permitted (and must be of the same size)
Both operands cannot be mem (direct mem-to-mem exchange is forbidden).
To exchange the content of word1 and word2, we have to do:
mov ax,word1xchg word2,axmov word1,ax
28
ExerciseExercise 2 2
Given the following data segment
.data
A dw 1234h,-1
B dd 55h,66778899h Indicate if the following instruction is legal. If it is, indicate
the value, in hexadecimal, of the destination operand immediately after the instruction is executed (please verify your answers with a debugger)
MOV eax,A
MOV bx,A+1
MOV bx,A+2
MOV dx,A+4
MOV cx,B+1
MOV edx,B+2
29
Simple Arithmetic InstructionsSimple Arithmetic Instructions
The ADD instruction adds the source to the destination and stores the result in the destination (source remains unchanged)
ADD destination,source The SUB instruction subtracts the source from
the destination and stores the result in the destination (source remains unchanged)
SUB destination,source Both operands must be of the same size and
they cannot be both mem operands Recall that to perform A - B the CPU in fact
performs A + NEG(B)
30
Simple Arithmetic Instructions (cont.)Simple Arithmetic Instructions (cont.)
ADD and SUB affect all the status flags according to the result of the operation
ZF (zero flag) = 1 iff the result is zero SF (sign flag) = 1 iff the msb of the result is one OF (overflow flag) = 1 iff there is a signed overflow CF (carry flag) = 1 iff there is an unsigned overflow
Signed overflow: when the operation generates an out-of-range (erroneous) signed value
Unsigned overflow: when the operation generates an out-of-range (erroneous) unsigned value
31
More on OverflowsMore on Overflows
A unsigned overflow occurs if and only if (IFF) the unsigned value of the result does not fit into the destination operand This occurs IFF the unsigned interpretation of
the result is erroneous It is signaled by CF=1
A signed overflow occurs IFF the signed value of the result does not fit into the destination operand This occurs IFF the signed interpretation of the
result is erroneous It is signaled by OF=1
32
Simple Arithmetic Instructions (cont.)Simple Arithmetic Instructions (cont.)
Both types of overflow occur independently and are signaled separately by CF and OF
mov al, 0FFh
add al,1 ; AL=00h, OF=0, CF=1
mov al,7Fh
add al, 1 ; AL=80h, OF=1, CF=0
mov al,80h
add al,80h ; AL=00h, OF=1, CF=1
Hence: we can have either type of overflow or both of them at the same time
33
Overflow ExampleOverflow Example
mov ax,4000h add ax,ax ;AX = 8000h
Unsigned Interpretation: The sum of the 2 magnitudes 4000h + 4000h
gives 8000h. This is the result in AX (the unsigned value of the result is correct). CF=0
Signed Interpretation: we add two positive numbers: 4000h + 4000h and have obtained a negative number! the signed value of the result in AX is erroneous.
Hence OF=1
34
Overflow ExampleOverflow Example
mov ax,8000h sub ax,0FFFFh ;AX = 8001h
Unsigned Interpretation: from the magnitude 8000h we subtract the
larger magnitude FFFFh the unsigned value of the result is erroneous.
Hence CF=1 Signed Interpretation:
We subtract -1 from the negative number 8000h and obtained the correct signed result 8001h. Hence OF=0
35
Overflow ExampleOverflow Example
mov ah,40h sub ah,80h ;AH = C0h
Unsigned Interpretation: we subtract from 40h the larger number 80h the unsigned value of the result is wrong.
Hence CF=1 Signed Interpretation:
we subtract from 40h (64) a negative number 80h (-128) to obtain a negative number
the signed value of the result is wrong. Hence OF=1
36
ExerciseExercise 3 3
For each of these instructions, give the content (in hexadecimal) of the destination operand and the CF and OF flags immediately after the execution of the instruction (verify your answers with a debugger). ADD AX,BX when AX contains 8000h and BX
contains FFFFh. SUB AL,BL when AL contains 00h and BL contains
80h. ADD AH,BH when AH contains 2Fh and BH
contains 52h. SUB AX,BX when AX contains 0001h and BX
contains FFFFh.
37
Simple Arithmetic Instructions (cont.)Simple Arithmetic Instructions (cont.)
The INC (increment) and DEC (decrement) instructions add 1 or subtracts 1 from a single operand (mem or reg operand)
INC destination
DEC destination They affect all status flags, except CF. Say that
initially we have, CF=OF=0
mov bh,0FFh ; CF=0, OF=0
inc bh ; bh=00h, CF=0, OF=0
mov bh,7Fh ; CF=0, OF=0
inc bh ; bh=80h, CF=0, OF=1
38
Simple Arithmetic Instructions (cont.)Simple Arithmetic Instructions (cont.)
The NEG instruction performs the twos complement of its operand
NEG destination Where destination is either mem or reg
CF=0 IFF the result is 0 OF=1 IFF there is a signed overflow. Ex:
mov ax,-5
neg ax; CF = 1, OF = 0
mov ax,8000h
neg ax; CF=1, OF=1 signed overflow!
39
I/O on the Win32 ConsoleI/O on the Win32 Console Our programs will communicate with the user via the Win32
console (the MS-DOS box) Input is done on the keyboard Output is done on the screen
Modern OS like Windows forbids user programs to interact directly with I/O hardware User programs can only perform I/O operation via system
calls For simplicity, our programs will perform I/O operations by
using macros that are provided in the csi2121.inc file These macros are calling C libraries functions like printf()
which, in turn, are calling the Win32 API Hence, these I/O operations will be slow but simple to use
and easy to migrate to another OS We will examine the mechanisms involved in I/O operations
later in the course
40
Character OutputCharacter Output
The putch macro prints on the screen the character of the operand’s ASCII code. Usage:
putch source Where source must be a 32-bit operand
i.e. either imm, reg32, or mem32 (a double word variable) .dataaword dw 41hadword dd 61h.codeputch aword ;error: 16-bit operandputch adword ;‘a’ is written on screenputch ‘b’ ;’b’ is written on screenmov eax,’c’putch eax ;’c’ is written on screenputch ax ;error: 16-bit operand
41
Character Output (cont.)Character Output (cont.)
Also: the cursor will advance one position after printing the character
The putch macro calls the putchar() function from the C library. Hence: The number 10 = 0Ah will direct the cursor to the
beginning of the next line (the “newline character” in C). So the <CR> and <LF> functions are both performed on the screen.
putch 10 ;move the cursor to the
;beginning of next line
42
String OutputString Output
To print a string, use the following macro:
putstr source Where source must be mem operand (i.e. the name of a
variable). It cannot be a reg or imm operand. This macro calls printf(“%s”, ) of the C library. Hence:
The number 10 = 0Ah will move the cursor to the beginning of the next line (the “newline character” in C)
The string must be a “null terminating” string. The last character must have ASCII code = 0h. Ex:
.data
msg db “hello”,0ah,“world”,0h
.code
putstr msg ;prints ‘hello’ on one line
;and ‘world’ on the next line
43
Integer OutputInteger Output
To print the signed value of an integer, use:putint source
Where source must be a 32-bit operand i.e. either imm, reg32, or mem32 (a double word variable) . Ex:
.dataaword dw 243adword dd -266.codeputint aword ;error: 16-bit operandputint adword ;-266 is written on screen
putint -1 ; -1 is written on screenmov eax,0FFFFFFFFhputint eax ;-1 is written on screenputint ax ;error: 16-bit operand
44
Character InputCharacter Input To read one or more character on the keyboard, we will use
the getch macro. Usage:getch
This macro calls getchar() from the C library. So it uses a memory buffer that we will call the input buffer.
Upon execution of getch, the input buffer is first examined. If the input buffer is empty, then getch waits for the user to
enter an input line (a sequence of char ended by <CR>). Each character that the user enters (at the keyboard) is
copied into the input buffer When the user enters the <CR>: the screen cursor move to
the next line, the value 0Ah is stored in the input buffer and the control is pass to the instruction following getch
The ASCII code of the first character entered on the keyboard will be stored in AL. The remaining bits of EAX are filled with zeros. Ex:
mov eax,-1getch ; eax=41h if the user first hits ‘A’
45
Character Input (cont.)Character Input (cont.) Example: Suppose that the input buffer is initially empty
and, upon execution of getch, the users enters “hello”+<CR> on the keyboard.
Then, when the control returns to the instruction following getch, EAX contains 068h (= ‘h’) and the input buffer looks like this:
‘h’ ‘e’ ‘l’‘l’ ‘o’ 0Ah
Pointer tonext char
Pointer tolast char
If the input buffer is not empty when getch is executed, then EAX will get loaded with the ASCII code of the next character in the input buffer and the pointer to the next char will increase by one.
The input buffer is empty only when the pointer to the next char points beyond the last character (i.e: 0Ah)
The user is prompted only when the input buffer is empty
46
Character Input (example)Character Input (example)
Try to understand this program
It first prints “?” and moves the cursor to the next line awaiting user input
When the user enters “abcdef” + <CR>, the program displays (before exiting):
abc But if, instead, the user enters “a” +
<CR>, the program displays:
a
and the cursor moves to the next line awaiting user input. If the user then enters “bcdef”+<CR>, the program prints on the next line (before exiting):
b
.386
.model flatinclude csi2121.inc
.code main:
putch '?'putch 10getchputch eaxgetchputch eaxgetchputch eaxret
end
47
I/O Example: Case ConversionI/O Example: Case Conversion.386.model flatinclude csi2121.inc
.datamsg1 db "Enter a lower case letter: ",0msg2 db 'In upper case it is: 'char db ?,0
.code main:
putstr msg1 getch ;char in eax and goto next linesub al,20h ;converts to upper casemov char,alputstr msg2ret
end