STEVEN R. BAGLEY THE ASSEMBLER

T H E A S S E M B L E RS T E V E N R . B A G L E Y

I N T R O D U C T I O N

• Looking at how to build a computer from scratch

• Started with the NAND gate and worked up…

• Until we can build a CPU

• Reached the divide between hardware and software

• Today, looking at how the Assembler works

Or Machine Language as N2T calls it

M A C H I N E C O D E

N A N D

M U X A D D E R D M U X… …

M U X 1 6 A D D 1 6 O R 1 6… …

A N D O R N O T

A L U

D F L I P - F L O P

R E G I S T E R

B I T

C P U

C O S

SO

FT

WA

RE

SO

FT

WA

RE

HA

RD

WA

RE

HA

RD

WA

RE

A L G O R I T H M S

T H O U G H T

Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it’s getting less abstract and more concreteAssembly sits at the human side of Machine Code

M A C H I N E C O D E

N A N D

M U X A D D E R D M U X… …

M U X 1 6 A D D 1 6 O R 1 6… …

A N D O R N O T

A L U

D F L I P - F L O P

R E G I S T E R

B I T

C P U

C O S

SO

FT

WA

RE

SO

FT

WA

RE

HA

RD

WA

RE

HA

RD

WA

RE

A L G O R I T H M S

T H O U G H T

A S S E M B LY

Start off with an abstract idea of what we want the program to do, convert that into algorithms then into C and then directly into Machine code the hardware can execute At each step it’s getting less abstract and more concreteAssembly sits at the human side of Machine Code

T H E A S S E M B L E R

• Assembly language is a symbolic representation of machine code

• In a human readable form

• An Assembler is a tool that takes this symbolic representation

• Converts them into the binary bit patterns needed by the CPU

• Can also provide help during the conversion

• Changing the syntax of the program

• Not the semantics

Demo with the N2T Assembler running on a real piece of assembly codeE.g. allowing you to use labels instead of needing to compute which address.Syntax — how its expressedSemantics — it’s meaning

T H E A S S E M B L E R

• Assembler has many of the same stages as a compiler

• But generally in a much simplified form

• Understanding how an assembler works gives us an insight into what the compiler must do

• Also helps us to understand how the bits in an instruction relate to its function

Which might help us understand what the CPU is doing on the other side

Typical structure of an Assembler or compilerParser usually split into two phases. Firstly, tokenise break the ASCII characters up into tokens. Secondly, use those tokens to build the Syntax treeThen from the syntax tree we can generate the machine code for each instruction. I.e. the correct bit patterns

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum M=D+M @i M=M+1 @LOOP 0;JMP (END) @END 0; JMP



PA R S E R



PA R S E RS Y N TA X

T R E E



PA R S E R C O D E G E N E R AT ES Y N TA X

T R E E



PA R S E R C O D E G E N E R AT E

0000000000010000 1110111111001000 0000000000010001 1110101010001000 0000000000010000 1111110000010000 0000000001100100 1110010011010000 0000000000010010 1110001100000001 0000000000010000 1111110000010000 0000000000010001 1111000010001000 0000000000010000 1111110111001000 0000000000000100 1110101010000111 0000000000010010 1110101010000111

S Y N TA X T R E E


A S S E M B L E R S Y N TA X

• Assembly languages almost always use a rigid syntax

• One instruction per line…

• One to one mapping between assembly instruction and generated machine code

• Makes writing the parser much simpler…

• Support for labels which complicate things slightly

In fact you could get away without a syntax tree for an assembler since there is a one to one mapping between assembler instruction and machine code pattern. IT also makes two-pass assembly simpler as we’ll see later.

H A C K A S S E M B L E R S Y N TA X

• Hack assembler syntax is simple

• Each line can contain either:

• An Instruction

• A instruction

• C instruction

• A Label

Two different instruction typesLabel just labels a particular point in the program

A S S E M B L E R O P E R AT I O N

• Basic operation of the Assembler then is straight-forward

• While not end of file

• Read a line from file

• Determine type of line (Parser)

• Could be Instruction or a Label

• If an Instruction, generate correct bit-pattern for instruction (Code Generate)

• If a label, note position of label in generated output

And repeatWe’ll ignore comments — but they are just ignored as inputLast point means we need to know what address each instruction is generated on

PA R S I N G

• Going to use a sample assembly file as an example

• See how we would parse each line

• Will assume that white space has already been stripped from the line

First character of buffer will be the first character of the instruction


Use this program as an example. Follow through how we convert the different types of instruction as we see themFirst thing we have is an A-instruction

PA R S I N G A - I N S T R U C T I O N

• Assembler language for all A-instructions start with an ‘@‘ symbol

• So if it starts with an ‘@‘, it has to be an A instruction

• A instructions load the A register with a value

• The value is the second half of the instruction

• Assembler lets it be either

• A literal value

• The address of a label

• The name of a variable

@i M=1 @sum M=0 (LOOP) @i D=M @100 D=D-A @END D; JGT @i D=M @sum ...

In the case of a label, or variable name we need to calculate the address from the nameLook at that later…

PA R S I N G A - I N S T R U C T I O N

• Easy to tell if its a value or label

• Values are a series of digits

• If the first character after the ‘@‘ is a digit, then it must be a value

• This is why most programming languages don’t let you start a label with a digit

• Would be ambiguous whether it was a literal value or part of a label without parsing the whole label


In the case of a label, or variable name we need to calculate the address from the nameLook at that later…Aim is to make the programming language easy to understand

C O D E G E N E R AT I N G A - I N S T R U C T I O N

• Can extract the value (in this case 100) from the line and convert it to an integer

• Then need to generate the correct bit-pattern for an A-instruction


In the case of a label, or variable name we need to calculate the address from the nameLook at that later…

A - I N S T R U C T I O N

• The A-instruction is used to set the A-register to a 15-bit value

• Assembler syntax: @value

• Binary: 0vvv vvvv vvvv vvvv

• So @5, loads A with the value 5Binary: 0000 0000 0000 0101

where the 15 vs for the 15-bits for the binary value

C O D E G E N E R AT I N G A - I N S T R U C T I O N

• Need to make sure the value can fit in 15-bits

• Since this is all we have space to encode in the instruction

• If it can’t, then we have an error — need to flag it and stop assembling

• Next step is to produce the correct bit-pattern for the instruction

• Most significant bit must be zero to signify that it is an A-instruction

• Rest of the bits (0–14) are just the binary number for the value

15-bits allows us to store all the positive numbers you can fit in a 16-bit register (0-32767), we’d need to take a different approach to store a negative number

H A C K C P U I N T E R N A L S

It’s effectively the opposite of what happens in the CPU.The assembler produces the bit patternsThe Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off)Demo how this works on the screenDemo how to write an assembler

PA R S I N G C - I N S T R U C T I O N

• Next instruction is a C-Instruction

• Format of these can vary immensely

• Makes it trickier to write a parser for it

• On the other hand, we can easily tell if it is a label or A-Instruction

• So we’ll assume for this implementation that any other line is a C-instruction

• Once we know it is a C-instruction we can start to break it down


First two are relatively straight-forward and similar but some of the others are radically differentAny other non-blank line

C I N S T R U C T I O N

• Does everything else…

• Assembler syntax: dest=comp;jump

• Either dest field or jump field can be omitted

• comp is some computation, specified by the cx bits below…

• Binary: 111a c1c2c3c4 c5c6d1d2 d3j1j2j3

• a switches one side of the computation between A register (when 0) and M (when 1)

In our example the jump is omitted (so no semicolon)Other side of the computation is always D


• C-instructions contain are split around the ‘;’

• Left-hand side contains the ALU operation to perform

• Including optionally updating a value stored in a register/memory

• Right-hand side specifies whether to jump or not

• Right-hand side can be optional

• The ‘;’ can only be optional if the jump isn’t present




• Can effectively split the parsing in two around the ‘;’

• Parse right-hand side to work out jump

• Just string comparison (to find out the correct value)

• Parse left-hand side to work out the ALU operation and register/memory updates

• Look for ‘=‘ in left hand side

• If found, parse left-hand side of ‘=‘ to find what to update



C I N S T R U C T I O N : C O M P U TAT I O N

when a = 0 c1 c2 c3 c4 c5 c6 when a =10 1 0 1 0 1 0 0

1 1 1 1 1 1 1 1

-1 1 1 1 0 1 0 -1D 0 0 1 1 0 0 D

A 1 1 0 0 0 0 M!D 0 0 1 1 0 1 !D

!A 1 1 0 0 0 1 !M-D 0 0 1 1 1 1 -D

-A 1 1 0 0 1 1 -MD+1 0 1 1 1 1 1 D+1

A+1 1 1 0 1 1 1 M+1D-1 0 0 1 1 1 0 D-1

A-1 1 1 0 0 1 0 M-1D+A 0 0 0 0 1 0 D+M

D-A 0 1 0 0 1 1 D-MA-D 0 0 0 1 1 1 M-D

D&A 0 0 0 0 0 0 D&MD|A 0 1 0 1 0 1 D|M

dest =

As used by ALUc-bits select what operation is placed into the destinationThese are the same bit patterns that control the ALU we designed earlierCan connect these bits up to the ALU…And the output to whatever destination we want

C I N S T R U C T I O N : D E S T I N AT I O N

d1 d2 d3 destination

0 0 0 nu l l not stored

0 0 1 M RAM[A] updated

0 1 0 D D register updated

0 1 1 MD RAM[A] and D updated

1 0 0 A A register updated

1 0 1 AM A and RAM[A]

1 1 0 AD A and D registers

1 1 1 AMD A, D and RAM[A] updated

Each destination bit basically describes whether one of the three possible destinations is updated a bit (e.g. Memory is updated whenever d3 is set

C I N S T R U C T I O N : J U M P

j1 j2 j3 mnemonic effect

0 0 0 nu l l No Jump

0 0 1 JGT If out > 0 then jump

0 1 0 JEQ If out = 0 then jump

0 1 1 JGE If out >= 0 then jump

1 0 0 JLT If out < 0 then jump

1 0 1 JNE If out != 0 then jump

1 1 0 JLE If out <= 0 then jump

1 1 1 JMP Always Jump

We can chose between

C O D E G E N E R AT I O N C - I N S T R U C T I O N

• Again just a matter of setting the correct bits based on the input

• This time the 16-bits of the instruction are split into groups

• Need to consider each of the groups separately

• Start with the simple ones

• Jump bits

• Destination bits

• Parsing more complex for the ALU control bits


Exactly the same when dealing with CPU implementation

H A C K C P U I N T E R N A L S

It’s effectively the opposite of what happens in the CPU.The assembler produces the bit patternsThe Instruction decoder looks at the bit pattern to work out which bits of the CPU to turn (or off)Demo how this works on the screenDemo how to write an assembler

Date post:	23-Dec-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

STEVEN R. BAGLEY THE ASSEMBLER

Documents