+ All Categories
Home > Documents > STEVEN R. BAGLEY THE ASSEMBLER

STEVEN R. BAGLEY THE ASSEMBLER

Date post: 23-Dec-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
32
THE ASSEMBLER STEVEN R. BAGLEY
Transcript
Page 1: STEVEN R. BAGLEY THE ASSEMBLER

T H E A S S E M B L E RS T E V E N R . B A G L E Y

Page 2: STEVEN R. BAGLEY THE ASSEMBLER

I N T R O D U C T I O N

• Looking at how to build a computer from scratch

• Started with the NAND gate and worked up…

• Until we can build a CPU

• Reached the divide between hardware and software

• Today, looking at how the Assembler works

Or Machine Language as N2T calls it

Page 3: STEVEN R. BAGLEY THE ASSEMBLER

M A C H I N E C O D E

N A N D

M U X A D D E R D M U X… …

M U X 1 6 A D D 1 6 O R 1 6… …

A N D O R N O T

A L U

D F L I P - F L O P

R E G I S T E R

B I T

C P U

C O S

SO

FT

WA

RE

SO

FT

WA

RE

HA

RD

WA

RE

HA

RD

WA

RE

A L G O R I T H M S

T H O U G H T

Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it’s getting less abstract and more concreteAssembly sits at the human side of Machine Code

Page 4: STEVEN R. BAGLEY THE ASSEMBLER

M A C H I N E C O D E

N A N D

M U X A D D E R D M U X… …

M U X 1 6 A D D 1 6 O R 1 6… …

A N D O R N O T

A L U

D F L I P - F L O P

R E G I S T E R

B I T

C P U

C O S

SO

FT

WA

RE

SO

FT

WA

RE

HA

RD

WA

RE

HA

RD

WA

RE

A L G O R I T H M S

T H O U G H T

A S S E M B LY

Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it’s getting less abstract and more concreteAssembly sits at the human side of Machine Code

Page 5: STEVEN R. BAGLEY THE ASSEMBLER

T H E A S S E M B L E R

• Assembly language is a symbolic representation of machine code

• In a human readable form

• An Assembler is a tool that takes this symbolic representation

• Converts them into the binary bit patterns needed by the CPU

• Can also provide help during the conversion

• Changing the syntax of the program

• Not the semantics

Demo with the N2T Assembler running on a real piece of assembly codeE.g. allowing you to use labels instead of needing to compute which address.Syntax — how its expressedSemantics — it’s meaning

Page 6: STEVEN R. BAGLEY THE ASSEMBLER

T H E A S S E M B L E R

• Assembler has many of the same stages as a compiler

• But generally in a much simplified form

• Understanding how an assembler works gives us an insight into what the compiler must do

• Also helps us to understand how the bits in an instruction relate to its function

Which might help us understand what the CPU is doing on the other side

Page 7: STEVEN R. BAGLEY THE ASSEMBLER

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

Page 8: STEVEN R. BAGLEY THE ASSEMBLER

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

Page 9: STEVEN R. BAGLEY THE ASSEMBLER

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP

PA R S E R

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

Page 10: STEVEN R. BAGLEY THE ASSEMBLER

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP

PA R S E RS Y N TA X

T R E E

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

Page 11: STEVEN R. BAGLEY THE ASSEMBLER

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP

PA R S E R C O D E G E N E R AT ES Y N TA X

T R E E

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

Page 12: STEVEN R. BAGLEY THE ASSEMBLER

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP

PA R S E R C O D E G E N E R AT E

0000000000010000 1110111111001000 0000000000010001 1110101010001000 0000000000010000 1111110000010000 0000000001100100 1110010011010000 0000000000010010 1110001100000001 0000000000010000 1111110000010000 0000000000010001 1111000010001000 0000000000010000 1111110111001000 0000000000000100 1110101010000111 0000000000010010 1110101010000111

S Y N TA X T R E E

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

Page 13: STEVEN R. BAGLEY THE ASSEMBLER

A S S E M B L E R S Y N TA X

• Assembly languages almost always use a rigid syntax

• One instruction per line…

• One to one mapping between assembly instruction and generated machine code

• Makes writing the parser much simpler…

• Support for labels which complicate things slightly

In fact you could get away without a syntax tree for an assembler since there is a one to one mapping between assembler instruction and machine code pattern. IT also makes two-pass assembly simpler as we’ll see later.

Page 14: STEVEN R. BAGLEY THE ASSEMBLER

H A C K A S S E M B L E R S Y N TA X

• Hack assembler syntax is simple

• Each line can contain either:

• An Instruction

• A instruction

• C instruction

• A Label

Two different instruction typesLabel just labels a particular point in the program

Page 15: STEVEN R. BAGLEY THE ASSEMBLER

A S S E M B L E R O P E R AT I O N

• Basic operation of the Assembler then is straight-forward

• While not end of file

• Read a line from file

• Determine type of line (Parser)

• Could be Instruction or a Label

• If an Instruction, generate correct bit-pattern for instruction (Code Generate)

• If a label, note position of label in generated output

And repeatWe’ll ignore comments — but they are just ignored as inputLast point means we need to know what address each instruction is generated on

Page 16: STEVEN R. BAGLEY THE ASSEMBLER

PA R S I N G

• Going to use a sample assembly file as an example

• See how we would parse each line

• Will assume that white space has already been stripped from the line

First character of buffer will be the first character of the instruction

Page 17: STEVEN R. BAGLEY THE ASSEMBLER

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP

Use this program as an example. Follow through how we convert the different types of instruction as we see themFirst thing we have is an A-instruction

Page 18: STEVEN R. BAGLEY THE ASSEMBLER

PA R S I N G A - I N S T R U C T I O N

• Assembler language for all A-instructions start with an ‘@‘ symbol

• So if it starts with an ‘@‘, it has to be an A instruction

• A instructions load the A register with a value

• The value is the second half of the instruction

• Assembler lets it be either

• A literal value

• The address of a label

• The name of a variable

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

In the case of a label, or variable name we need to calculate the address from the nameLook at that later…

Page 19: STEVEN R. BAGLEY THE ASSEMBLER

PA R S I N G A - I N S T R U C T I O N

• Easy to tell if its a value or label

• Values are a series of digits

• If the first character after the ‘@‘ is a digit, then it must be a value

• This is why most programming languages don’t let you start a label with a digit

• Would be ambiguous whether it was a literal value or part of a label without parsing the whole label

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

In the case of a label, or variable name we need to calculate the address from the nameLook at that later…Aim is to make the programming language easy to understand

Page 20: STEVEN R. BAGLEY THE ASSEMBLER

C O D E G E N E R AT I N G A - I N S T R U C T I O N

• Can extract the value (in this case 100) from the line and convert it to an integer

• Then need to generate the correct bit-pattern for an A-instruction

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

In the case of a label, or variable name we need to calculate the address from the nameLook at that later…

Page 21: STEVEN R. BAGLEY THE ASSEMBLER

A - I N S T R U C T I O N

• The A-instruction is used to set the A-register to a 15-bit value

• Assembler syntax: @value

• Binary: 0vvv vvvv vvvv vvvv

• So @5, loads A with the value 5Binary: 0000 0000 0000 0101

where the 15 vs for the 15-bits for the binary value

Page 22: STEVEN R. BAGLEY THE ASSEMBLER

C O D E G E N E R AT I N G A - I N S T R U C T I O N

• Need to make sure the value can fit in 15-bits

• Since this is all we have space to encode in the instruction

• If it can’t, then we have an error — need to flag it and stop assembling

• Next step is to produce the correct bit-pattern for the instruction

• Most significant bit must be zero to signify that it is an A-instruction

• Rest of the bits (0–14) are just the binary number for the value

15-bits allows us to store all the positive numbers you can fit in a 16-bit register (0-32767), we’d need to take a different approach to store a negative number

Page 23: STEVEN R. BAGLEY THE ASSEMBLER

H A C K C P U I N T E R N A L S

It’s effectively the opposite of what happens in the CPU.The assembler produces the bit patternsThe Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off)Demo how this works on the screenDemo how to write an assembler

Page 24: STEVEN R. BAGLEY THE ASSEMBLER

PA R S I N G C - I N S T R U C T I O N

• Next instruction is a C-Instruction

• Format of these can vary immensely

• Makes it trickier to write a parser for it

• On the other hand, we can easily tell if it is a label or A-Instruction

• So we’ll assume for this implementation that any other line is a C-instruction

• Once we know it is a C-instruction we can start to break it down

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

First two are relatively straight-forward and similar but some of the others are radically differentAny other non-blank line

Page 25: STEVEN R. BAGLEY THE ASSEMBLER

C I N S T R U C T I O N

• Does everything else…

• Assembler syntax: dest=comp;jump

• Either dest field or jump field can be omitted

• comp is some computation, specified by the cx bits below…

• Binary: 111a c1c2c3c4 c5c6d1d2 d3j1j2j3

• a switches one side of the computation between A register (when 0) and M (when 1)

In our example the jump is omitted (so no semicolon)Other side of the computation is always D

Page 26: STEVEN R. BAGLEY THE ASSEMBLER

PA R S I N G C - I N S T R U C T I O N

• C-instructions contain are split around the ‘;’

• Left-hand side contains the ALU operation to perform

• Including optionally updating a value stored in a register/memory

• Right-hand side specifies whether to jump or not

• Right-hand side can be optional

• The ‘;’ can only be optional if the jump isn’t present

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

First two are relatively straight-forward and similar but some of the others are radically differentAny other non-blank line

Page 27: STEVEN R. BAGLEY THE ASSEMBLER

PA R S I N G C - I N S T R U C T I O N

• Can effectively split the parsing in two around the ‘;’

• Parse right-hand side to work out jump

• Just string comparison (to find out the correct value)

• Parse left-hand side to work out the ALU operation and register/memory updates

• Look for ‘=‘ in left hand side

• If found, parse left-hand side of ‘=‘ to find what to update

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

First two are relatively straight-forward and similar but some of the others are radically differentAny other non-blank line

Page 28: STEVEN R. BAGLEY THE ASSEMBLER

C I N S T R U C T I O N : C O M P U TAT I O N

when a = 0 c1 c2 c3 c4 c5 c6 when a =10 1 0 1 0 1 0 0

1 1 1 1 1 1 1 1

-1 1 1 1 0 1 0 -1D 0 0 1 1 0 0 D

A 1 1 0 0 0 0 M!D 0 0 1 1 0 1 !D

!A 1 1 0 0 0 1 !M-D 0 0 1 1 1 1 -D

-A 1 1 0 0 1 1 -MD+1 0 1 1 1 1 1 D+1

A+1 1 1 0 1 1 1 M+1D-1 0 0 1 1 1 0 D-1

A-1 1 1 0 0 1 0 M-1D+A 0 0 0 0 1 0 D+M

D-A 0 1 0 0 1 1 D-MA-D 0 0 0 1 1 1 M-D

D&A 0 0 0 0 0 0 D&MD|A 0 1 0 1 0 1 D|M

dest =

As used by ALUc-bits select what operation is placed into the destinationThese are the same bit patterns that control the ALU we designed earlierCan connect these bits up to the ALU…And the output to whatever destination we want

Page 29: STEVEN R. BAGLEY THE ASSEMBLER

C I N S T R U C T I O N : D E S T I N AT I O N

d1 d2 d3 destination

0 0 0 nu l l not stored

0 0 1 M RAM[A] updated

0 1 0 D D register updated

0 1 1 MD RAM[A] and D updated

1 0 0 A A register updated

1 0 1 AM A and RAM[A]

1 1 0 AD A and D registers

1 1 1 AMD A, D and RAM[A] updated

Each destination bit basically describes whether one of the three possible destinations is updated a bit (e.g. Memory is updated whenever d3 is set

Page 30: STEVEN R. BAGLEY THE ASSEMBLER

C I N S T R U C T I O N : J U M P

j1 j2 j3 mnemonic effect

0 0 0 nu l l No Jump

0 0 1 JGT If out > 0 then jump

0 1 0 JEQ If out = 0 then jump

0 1 1 JGE If out >= 0 then jump

1 0 0 JLT If out < 0 then jump

1 0 1 JNE If out != 0 then jump

1 1 0 JLE If out <= 0 then jump

1 1 1 JMP Always Jump

We can chose between

Page 31: STEVEN R. BAGLEY THE ASSEMBLER

C O D E G E N E R AT I O N C - I N S T R U C T I O N

• Again just a matter of setting the correct bits based on the input

• This time the 16-bits of the instruction are split into groups

• Need to consider each of the groups separately

• Start with the simple ones

• Jump bits

• Destination bits

• Parsing more complex for the ALU control bits

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

Exactly the same when dealing with CPU implementation

Page 32: STEVEN R. BAGLEY THE ASSEMBLER

H A C K C P U I N T E R N A L S

It’s effectively the opposite of what happens in the CPU.The assembler produces the bit patternsThe Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off)Demo how this works on the screenDemo how to write an assembler


Recommended