Top-down Perspective: The Computer Organization and Design ...

1

Top-down Perspective: The Computer Organization and Design Underneath

the Execution of C Programming Language

Mingkai Li1 1University of Science and Technology of China

Abstract

In Yale Patt’s book Introduction to computing systems,

from bits and gates to C and beyond, the intricacies of the

magnificent computing world reveal themselves as a huge,

systematically interconnected collection of some very

simple parts. Although implementations of many modern

architectures vary greatly to gain shorter response time or

greater throughput (bandwidth as sometimes called), the

underneath computer organization and design is no more

than hardware and software consisting of hierarchical

layers using abstraction, with each lower layer hiding

details from the level above. The C programming language

provides a machine-independent interface with the

underlying ISA and hardware, tremendously enhancing the

program’s expressiveness and readability. Different from

the bottom-up approach adopted in Yale Patt’s book, we

take a top-down perspective to uncover the details

underneath the execution of C programming language

step-by-step. To better elaborate the execution of the

program on a specific implementation and avoid

unnecessary complexities of the modern-time architectures,

we choose an education-oriented implementation called

Little Computer 3 (or LC3 for short) introduced by

University of Texas, Austin. In this article, we’ll briefly

elaborate some important ideas and protocols in the

domain from interpreter/compiler to the very fundamental

digital logic devices. We’ll observe the process as the

program written in C programming language being

translated, assembled and finally being executed by the

computer instruction by instruction, clock cycle by clock

cycle over the data path. And see how the combinations of

the simplest CMOS circuits have shaped today’s fast-

changing Information Technology industry.

1 Introduction

The C programming language was developed in 1972 by

Dennis Ritchie at Bell Laboratory. The language was

developed initially for the designing of compilers and

operating systems, thus allowing the C programmer to

manipulate data items at a relatively very low level. To

better elaborate the computer’s actions when executing the

high-level programming language, we create an example

C program. Although simple, the program shows some of

the most important features of the language, allowing us to

discuss further about the implementation method of things

like preprocessing, linking, subroutine, control instruction,

data movement instruction, memory-mapped IO, so on and

so forth. Thus, help the readers quickly grasp a rough

recognition about all those lower layers of abstractions

behind the high-level programming language through this

article.

At the very beginning, we need to establish an

overview about the hierarchy or the layers of abstractions

about the whole computing system. As shown in Figure 2,

the instruction set architecture (or ISA as abbreviation)

plays a vital role as the communication between the

hardware and the low-level system software. Computer

Origination and Design defines the ISA as “anything the

programmer need to know to make a binary machine work

Figure 1: An example program calculating the absolute

value written in C

2

Figure 2: Layers of abstractions of modern computer

architecture

correctly” [1]. And in the future part of this article, we’ll

discuss in a little more detail about some of its most

important components such as memory organization,

instruction set, addressing mode, privilege, priority so on

and so forth. Cause after all, it’s ISA that links the biggest

gap between a high-level programming language like C

and the fundamental movements of the electrons to support

the whole computing system. In the rest of this article,

we’ll start from the top of the layers above, beginning with

the amazing translation process of the programming

language (both high-level and assembly language),

spanning the gap between software and hardware with the

help of ISA, and observing how the commands from the

software are executed successfully by the underneath

circuits. Although LC3 [2, 3] is quite different from most

implementations of computer architecture today, it will

help the beginners to understand the intricacies of

computing systems in a more elegant manner.

2 Programming Language

In Programming Language Pragmatics, the programming

language is described as “the art of telling another human

being what one wants the computer to do” [4]. High-level

programming language such as C is designed as machine-

independent and human-friendly. The creation of high-

level languages makes the programmer no longer need to

write functionally-similar code for different machines

repeatedly and considerably alleviates the workload. But

accompanying the benefits, the biggest problem for the

machine is that the high-level programming language is so

ambiguous as it doesn’t define any kind of specific actions

over specific memory spaces or registers, and the

implementations of the same program for different

machines may be distinctly different. To execute such a

machine-independent programming language, it needs to

be translated into a specific machine-dependent assembly

language with the help of the low-level system software

such as compiler or interpreter. Compared with high-level

programming languages, assembly language is relatively

much more machine-friendly. It is nothing more than some

kind of more human-readable mnemonics as it has a well-

defined correspondence relationship with 0s and 1s in the

instructions. The transition is completed easily by the

assembler.

2.1 Translating high-level languages

In the rest of this section, we will introduce two distinct

translation techniques adopted by high-level languages,

and then discuss the technique C uses in more details.

2.1.1 Interpretation and Compilation

In Compilers, principles, techniques and tools, the study of

compilers is described as “full of beautiful examples where

complicated real-world problems are solved by abstracting

the essence of the problem mathematically” [5]. How the

translation is done depends on the particular high-level

language. Some languages like LISP, BASIC, Python and

Java adopt a translation technique called interpretation by

the interpreter, while other languages such as C, C++, Rust

and FORTRAN may use another technique called

compilation via a compiler.

The interpreter is a visual machine that executes the

program. It read a single line (or a section, command or

subroutine) of the high-level language program, and

directly carry out the effects of the line on the underlying

hardware repeatedly until the end of the program.

Interpreted code is more portable across different

computing systems, since it’s nothing more than the input

data to the interpreter in different platforms. However, it

makes the program to execute for a much longer time with

the interpreter as an intermediary. The compiler, on the

other hand, doesn’t execute the program itself. It analyzes

the high-level language program as a whole and generates

the corresponding assembly language or even machine

language based on the particular machine. The high-level

language program needs only to be compiled once and can

be executed many times afterwards, thus incredibly

enhancing the program’s efficiency. These two different

translation techniques have both pros and cons based on

3

the specific application scenarios. As the C programming

language was initially developed for the designing of

compilers and operating systems, the adoption of the

compilation technique guarantees the effectiveness and

dependency of the product.

2.1.2 The Compilation Process of C

The C compiler transforms the C source program into an

output assembly language or machine code file named

executable image. Figure 3 shows an illustration of the

overall compilation process of C. As we can see, the C

compiler has three interconnected components named as

preprocessor, compiler and linker respectively.

At the beginning of the C compilation process, the

preprocessor scans the whole C source file, looking for and

acting upon C preprocessor directives. Let’s take the C

program at the beginning of this article as an example. The

preprocessor will scan the whole program, substituting

preprocessor directive ZERO and NEGONE with 0 and -1,

and inserting the contents in stdio.h into the source file at

the corresponding line.

After that, the compiler will transform the

preprocessed program into object modules by two major

phases called analysis and synthesis. The analysis parsed

the program, breaking it into its constituent parts, and

synthesis translates these parts, optimizing the code for

better performance at the same time. Each of these two

phases are typically divided into many subphases such as

parsing, register allocation, instruction scheduling so on

and so forth. When the compiler is working, an internal

Figure 3: Overall compilation process of the C

programming language

bookkeeping mechanism called symbol table is created.

Again, let’s take the C program at the beginning of this

article as an example. The symbol table of the program is

shown in Table 1. The symbol table keeps the variable’s

identifier, type, location and scope. The memories

allocated for the variables are arranged in the form of a

stack, hence the location for each variable can be expressed

as an offset relative to a certain memory location.

Table 1: Symbol table of the example C program

Identifier Type Location

(as an

offset)

Scope

x int 0 main

y int -1 main

The linker takes over after the compiler has generated

all the object modules. It’s linker’s job to link all the object

modules to form an executable image of the program.

Finally, the whole compilation process is accomplished.

Depending on different C compilers, the executable image

may be written in either assembly language or machine

code. If it’s the latter situation, the executable image can

be directly loaded into memory and executed by the

underlying hardware. Otherwise, it needs to be assembled

first by a two-pass process.

2.2 The Two-pass Assembly Process

Let’s take a look at an example RISC-V assembly language

program to get a straightforward recognition [6]. As

4

Figure 4: RISC-V assembly language program calculating

the greatest common divisor of two positive integers

shown in Figure 4, an assembly language program mainly

consists of opcode/operands, labels, pseudo-ops (also

known as assembler directives) and comment. (Notes: for

more information about RISC-V assembly language,

please check 1 and 6.) The transformation from assembly

language to the machine code is accomplished by a two-

pass process via the assembler.

The first pass is to create the symbol table. Similar to

the symbol table in the compilation process, the symbol

table in assembly process is simply a correspondence of

symbolic names (labels) with their specific memory

addresses. In the second pass, the assembler goes through

the program for a second time. The symbolic names in the

control instructions such as euclid and finish in our

example will be substituted with their specific memory

addresses according to the symbol table gained earlier.

After that, the assembly language instructions will be

translated into 0s and 1s line by line and finally get the

machine code executable image of the assembly process.

As said in the last section, the executable image can be

directly loaded into the memory and executed by the

underlying hardware.

3 Instruction Set Architecture

After we get an executable image of the C program, we are

ready to see how the actions of the computer are exactly

directed. Instruction set is the core the ISA, regarded as the

vocabulary of the computer’s language. In the rest of this

section, we will introduce you the concept of the von

Neumann Model, understanding how an instruction cycle

is accomplished, discussing about the intricacies of the

operate/data movement/control instructions, and briefly

talking about the implementations of memory-mapped IO,

interrupt, subroutines and the user/system mode.

3.1 The von Neumann Model

The von Neumann model, proposed by John von Neumann

in 1946, has become the foundation of most of the

computing systems today. Figure 4 shows an overall block

diagram of the von Neumann model. As we can see, the

model consists of five parts: memory, a processing unit,

input, output and a control unit.

The control unit exits in all kinds of machines that can

be called a computer, or a universal Turing machine as

another name. It can be abstracted as a finite state machine

(FSM), keeping track of where we are inside the execution

of both the program and each instruction. The abbreviation

PC an IR stands for program counter, which stores the

address of the next instruction, and instruction register,

which keeps the content of the current instruction,

respectively. The state of the computer transforms between

each other based on the corresponding parts in the current

instruction, directing the data path to take specific actions.

The state machine of modern time computers is usually too

Figure 4: Overall block diagram of the von Neumann

model

5

sophisticated, we just show a graph of part of the LC3 state

machine to give you a rough recognition (shown in Figure

5).

The central idea of the von Neumann model is that

the program and data are both stored as sequences of bits

in the computer’s memory, and the program is executed

one instruction at a time under the direction of the control

unit.

Before we leave, let’s discuss a little bit about the

system (kernel) and user mode. As shown in Figure 6, in

modern computers, application programs are running on

the operating system. The memory is usually separated into

several parts, with certain parts only accessible by the

system software. When a programmer or the standard

library wants to execute a certain function provided by the

operating system, it invokes a system call. Otherwise, the

application programs will be denied to get access to the

privileged memory space or the device register addresses

(which will be discussed further in memory-mapped IO).

How the operating system works and how to improve its

performance are extremely important questions in

computer science. For readers interested in more

information, please check 7 in the reference.

Figure 5: A state machine of LC3, state transformation is

activated by the information in the instruction

Figure 6: An example of system call, showing the

difference between the user mode and kernel mode

3.2 The Instruction Cycle

Instructions are executed under the direction of the control

unit in a very systematic, step-by-step manner. The

sequence of the steps (or phases in computer science

terminology) is called the instruction cycle. There’re

mainly six phases in a complete instruction cycle (although

many instructions only require part of these phases): fetch,

decode, evaluate address, fetch operands, execute and

store result.

In the FETCH phase, the computer obtains the next

instruction by the address stored in PC, loading it into the

IR, incrementing the PC simultaneously. In the DECODE

phase, the computer examines the first several bits (called

the opcode) of the instruction, figuring out what the

underlying microarchitecture is requested. If the

instruction requests a load or store action, the computer

calculates the addresses of the corresponding operands

based on the specific addressing mode of the instruction in

the EVALUATE ADDRESS phase. Then, the computer

accesses the memory, obtaining the source operands

needed in the FETCH OPERAND phase. In EXECUTE

phase, values in registers are generated if an operate

instruction, load or store happens if a data movement

instruction, PC redirected if a control instruction. And

finally, the result is stored in STORE RESULT phase.

Each of these phases in the instruction cycle may

takes several clock cycles according the specific ISA.

Factors like CPI (abbreviation for clock cycle per

instruction), instruction count and clock rate are

significant when evaluating a program’s performance.

6

Figure 7: Different component’s influence upon

program’s performance

For hardware and software’s influence upon them, please

check Figure7.

3.3 The Instruction Set & Memory-mapped

IO and Interrupt

An instruction is defined by three parts: its opcode, data

type and addressing mode. Approximately, all the

instructions can be divided in to three distinct categories:

operate instructions, data movement instructions and

control instructions. Some of the main instructions of the

Figure 8: RISC-V reference card (main part), listing kinds

of instructions and their assembly language expresses

RISC-V are shown in Figure 8.

Operate instructions process data, performing either

arithmetic or logic operations. The operands of this kind of

instructions can only be found in two places: registers or in

the instruction itself (immediate operand in compute

science terminology). RISC-V supports may operate

instructions such as ADD, ADDI, AND, ANDI, SLL, SRL

etc., performing arithmetic, logic and shifts.

Data movement instructions move information

between general-purpose registers and either memory

space or input/output devices (which can be also regarded

as some kind of special memory space). Specifically, the

data movement instructions load data from memory to the

registers, or store data from registers to the memory. The

specific memory address is calculated from the address

generation bits in the instruction. The calculation rule is

determined by the addressing mode of the instruction,

including PC-relative mode, indirect mode, base-offset

mode, immediate mode and so on [1, 2]. The creation of

different addressing modes is to reach more memory

spaces, some of which may be relatively far from the PC,

as possible.

Also, data movement instructions are the workload

when performing input/output tasks. In most modern time

computers, device registers are mapped to some particular

addresses allocated for I/O device registers rather than

normal memory spaces. The computer controls the data in

the memory-mapped device registers by exactly the same

data movement instructions to perform input/output tasks.

This is usually done by two ways. One is called polling,

the other is called interrupt. The difference between them

is that the polling method requires the computer to check

the device registers repeatedly when an I/O task is needed;

and the interrupt method, on the contrary, the computer

only stops to perform the I/O task when detecting a signal

indicating an input or output is ready, and return to the

interrupted task as if nothing has happened.

Control instructions change the sequence of the

executing instructions conditionally or unconditionally. It

reaches this goal by change the content in PC in the

EXECUTE phase of the instruction cycle. Otherwise, the

computer will execute the instruction in the next address

since the PC is always incremented during the end of the

FETCH phase. The condition of the conditional control

7

instructions is checked via the condition code, which

showing the results of the last instruction changing the

value in the registers.

4 Introduction to Microarchitecture and

Digital Logic Devices

At the end of this article, we’ll briefly introduce some

concepts of the underlying hardware. Figure 9 shows the

microarchitecture of LC3.

The microarchitecture of the computer is composed

of combinational logic circuits and sequential logic

circuits. The combinational logic circuits are responsible

for logic choices. Some basic components include encoder,

decoder, mux, D-mux, full adder, so on and so forth. On

the other hand, the sequential logic circuit, the foundation

of storage structures and finite state machine, are affected

by both the combination of the current inputs and the result

from the past ones. Some basic components include

latches and flip-flops. In fact, all these digital logic devices

are all systematical combinations of MOSFETs, the

abbreviation for Metal-oxide-semiconductor field-effect

transistors. It’s the open and close of those magical

transistors that creating our magnificent world of 1s and 0s.

5 Conclusion

Let’s remind the words of David Patterson at the beginning

Figure 9: The data path of LC3, including components for

interrupt control

of this article, the computing systems are nothing more

than “hardware and software consisting of hierarchical

layers using abstraction, with each lower layer hiding

details from the level above”. It couldn’t feel more

amazing to see how a high-level C program is compiled,

assembled and finally executed by the underlying

hardware, instruction by instruction, clock cycle after

clock cycle. It’s like we’re the conductors of an

unprecedentedly sophisticated orchestra, creating splendid

symphonies with simple waves of the baton in our hands.

No one can be indifferent towards this greatest artificiality

in the human history.

Acknowledgements: This article can never exist without

Prof. Hong An and Prof. Junxia Zhang’s great efforts in the

Introduction to computing systems(H) and Analog and

Digital Circuits Course.

Reference

[1] David A. Patterson, John L. Hennessy. Computer

Organization and Design, RISC-V Edition.

[2] Yale N. Patt, Sanjay J. Patel. Introduction to

Computing Systems, 2nd Edition.

[3] LC3 Simulator. http://wchargin.github.io/lc3web/

[4] Michael L. Scott, Morgan Kauffmann. Programming

Language Pragmatics.

[5] Alfred V. Aho. Compilers, Principles, Techniques and

Tools

[6] The RISC-V Instruction Set Manual

[7] Abraham Silberschatz, Peter Bear Galvin, Greg

Gagne. Operating System Concepts.

http://wchargin.github.io/lc3web/

Date post:	22-Jan-2022
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Top-down Perspective: The Computer Organization and Design ...

Documents