1
Top-down Perspective: The Computer Organization and Design Underneath
the Execution of C Programming Language
Mingkai Li1 1University of Science and Technology of China
Abstract
In Yale Patt’s book Introduction to computing systems,
from bits and gates to C and beyond, the intricacies of the
magnificent computing world reveal themselves as a huge,
systematically interconnected collection of some very
simple parts. Although implementations of many modern
architectures vary greatly to gain shorter response time or
greater throughput (bandwidth as sometimes called), the
underneath computer organization and design is no more
than hardware and software consisting of hierarchical
layers using abstraction, with each lower layer hiding
details from the level above. The C programming language
provides a machine-independent interface with the
underlying ISA and hardware, tremendously enhancing the
program’s expressiveness and readability. Different from
the bottom-up approach adopted in Yale Patt’s book, we
take a top-down perspective to uncover the details
underneath the execution of C programming language
step-by-step. To better elaborate the execution of the
program on a specific implementation and avoid
unnecessary complexities of the modern-time architectures,
we choose an education-oriented implementation called
Little Computer 3 (or LC3 for short) introduced by
University of Texas, Austin. In this article, we’ll briefly
elaborate some important ideas and protocols in the
domain from interpreter/compiler to the very fundamental
digital logic devices. We’ll observe the process as the
program written in C programming language being
translated, assembled and finally being executed by the
computer instruction by instruction, clock cycle by clock
cycle over the data path. And see how the combinations of
the simplest CMOS circuits have shaped today’s fast-
changing Information Technology industry.
1 Introduction
The C programming language was developed in 1972 by
Dennis Ritchie at Bell Laboratory. The language was
developed initially for the designing of compilers and
operating systems, thus allowing the C programmer to
manipulate data items at a relatively very low level. To
better elaborate the computer’s actions when executing the
high-level programming language, we create an example
C program. Although simple, the program shows some of
the most important features of the language, allowing us to
discuss further about the implementation method of things
like preprocessing, linking, subroutine, control instruction,
data movement instruction, memory-mapped IO, so on and
so forth. Thus, help the readers quickly grasp a rough
recognition about all those lower layers of abstractions
behind the high-level programming language through this
article.
At the very beginning, we need to establish an
overview about the hierarchy or the layers of abstractions
about the whole computing system. As shown in Figure 2,
the instruction set architecture (or ISA as abbreviation)
plays a vital role as the communication between the
hardware and the low-level system software. Computer
Origination and Design defines the ISA as “anything the
programmer need to know to make a binary machine work
Figure 1: An example program calculating the absolute
value written in C
2
Figure 2: Layers of abstractions of modern computer
architecture
correctly” [1]. And in the future part of this article, we’ll
discuss in a little more detail about some of its most
important components such as memory organization,
instruction set, addressing mode, privilege, priority so on
and so forth. Cause after all, it’s ISA that links the biggest
gap between a high-level programming language like C
and the fundamental movements of the electrons to support
the whole computing system. In the rest of this article,
we’ll start from the top of the layers above, beginning with
the amazing translation process of the programming
language (both high-level and assembly language),
spanning the gap between software and hardware with the
help of ISA, and observing how the commands from the
software are executed successfully by the underneath
circuits. Although LC3 [2, 3] is quite different from most
implementations of computer architecture today, it will
help the beginners to understand the intricacies of
computing systems in a more elegant manner.
2 Programming Language
In Programming Language Pragmatics, the programming
language is described as “the art of telling another human
being what one wants the computer to do” [4]. High-level
programming language such as C is designed as machine-
independent and human-friendly. The creation of high-
level languages makes the programmer no longer need to
write functionally-similar code for different machines
repeatedly and considerably alleviates the workload. But
accompanying the benefits, the biggest problem for the
machine is that the high-level programming language is so
ambiguous as it doesn’t define any kind of specific actions
over specific memory spaces or registers, and the
implementations of the same program for different
machines may be distinctly different. To execute such a
machine-independent programming language, it needs to
be translated into a specific machine-dependent assembly
language with the help of the low-level system software
such as compiler or interpreter. Compared with high-level
programming languages, assembly language is relatively
much more machine-friendly. It is nothing more than some
kind of more human-readable mnemonics as it has a well-
defined correspondence relationship with 0s and 1s in the
instructions. The transition is completed easily by the
assembler.
2.1 Translating high-level languages
In the rest of this section, we will introduce two distinct
translation techniques adopted by high-level languages,
and then discuss the technique C uses in more details.
2.1.1 Interpretation and Compilation
In Compilers, principles, techniques and tools, the study of
compilers is described as “full of beautiful examples where
complicated real-world problems are solved by abstracting
the essence of the problem mathematically” [5]. How the
translation is done depends on the particular high-level
language. Some languages like LISP, BASIC, Python and
Java adopt a translation technique called interpretation by
the interpreter, while other languages such as C, C++, Rust
and FORTRAN may use another technique called
compilation via a compiler.
The interpreter is a visual machine that executes the
program. It read a single line (or a section, command or
subroutine) of the high-level language program, and
directly carry out the effects of the line on the underlying
hardware repeatedly until the end of the program.
Interpreted code is more portable across different
computing systems, since it’s nothing more than the input
data to the interpreter in different platforms. However, it
makes the program to execute for a much longer time with
the interpreter as an intermediary. The compiler, on the
other hand, doesn’t execute the program itself. It analyzes
the high-level language program as a whole and generates
the corresponding assembly language or even machine
language based on the particular machine. The high-level
language program needs only to be compiled once and can
be executed many times afterwards, thus incredibly
enhancing the program’s efficiency. These two different
translation techniques have both pros and cons based on
3
the specific application scenarios. As the C programming
language was initially developed for the designing of
compilers and operating systems, the adoption of the
compilation technique guarantees the effectiveness and
dependency of the product.
2.1.2 The Compilation Process of C
The C compiler transforms the C source program into an
output assembly language or machine code file named
executable image. Figure 3 shows an illustration of the
overall compilation process of C. As we can see, the C
compiler has three interconnected components named as
preprocessor, compiler and linker respectively.
At the beginning of the C compilation process, the
preprocessor scans the whole C source file, looking for and
acting upon C preprocessor directives. Let’s take the C
program at the beginning of this article as an example. The
preprocessor will scan the whole program, substituting
preprocessor directive ZERO and NEGONE with 0 and -1,
and inserting the contents in stdio.h into the source file at
the corresponding line.
After that, the compiler will transform the
preprocessed program into object modules by two major
phases called analysis and synthesis. The analysis parsed
the program, breaking it into its constituent parts, and
synthesis translates these parts, optimizing the code for
better performance at the same time. Each of these two
phases are typically divided into many subphases such as
parsing, register allocation, instruction scheduling so on
and so forth. When the compiler is working, an internal
Figure 3: Overall compilation process of the C
programming language
bookkeeping mechanism called symbol table is created.
Again, let’s take the C program at the beginning of this
article as an example. The symbol table of the program is
shown in Table 1. The symbol table keeps the variable’s
identifier, type, location and scope. The memories
allocated for the variables are arranged in the form of a
stack, hence the location for each variable can be expressed
as an offset relative to a certain memory location.
Table 1: Symbol table of the example C program
Identifier Type Location
(as an
offset)
Scope
x int 0 main
y int -1 main
The linker takes over after the compiler has generated
all the object modules. It’s linker’s job to link all the object
modules to form an executable image of the program.
Finally, the whole compilation process is accomplished.
Depending on different C compilers, the executable image
may be written in either assembly language or machine
code. If it’s the latter situation, the executable image can
be directly loaded into memory and executed by the
underlying hardware. Otherwise, it needs to be assembled
first by a two-pass process.
2.2 The Two-pass Assembly Process
Let’s take a look at an example RISC-V assembly language
program to get a straightforward recognition [6]. As
4
Figure 4: RISC-V assembly language program calculating
the greatest common divisor of two positive integers
shown in Figure 4, an assembly language program mainly
consists of opcode/operands, labels, pseudo-ops (also
known as assembler directives) and comment. (Notes: for
more information about RISC-V assembly language,
please check 1 and 6.) The transformation from assembly
language to the machine code is accomplished by a two-
pass process via the assembler.
The first pass is to create the symbol table. Similar to
the symbol table in the compilation process, the symbol
table in assembly process is simply a correspondence of
symbolic names (labels) with their specific memory
addresses. In the second pass, the assembler goes through
the program for a second time. The symbolic names in the
control instructions such as euclid and finish in our
example will be substituted with their specific memory
addresses according to the symbol table gained earlier.
After that, the assembly language instructions will be
translated into 0s and 1s line by line and finally get the
machine code executable image of the assembly process.
As said in the last section, the executable image can be
directly loaded into the memory and executed by the
underlying hardware.
3 Instruction Set Architecture
After we get an executable image of the C program, we are
ready to see how the actions of the computer are exactly
directed. Instruction set is the core the ISA, regarded as the
vocabulary of the computer’s language. In the rest of this
section, we will introduce you the concept of the von
Neumann Model, understanding how an instruction cycle
is accomplished, discussing about the intricacies of the
operate/data movement/control instructions, and briefly
talking about the implementations of memory-mapped IO,
interrupt, subroutines and the user/system mode.
3.1 The von Neumann Model
The von Neumann model, proposed by John von Neumann
in 1946, has become the foundation of most of the
computing systems today. Figure 4 shows an overall block
diagram of the von Neumann model. As we can see, the
model consists of five parts: memory, a processing unit,
input, output and a control unit.
The control unit exits in all kinds of machines that can
be called a computer, or a universal Turing machine as
another name. It can be abstracted as a finite state machine
(FSM), keeping track of where we are inside the execution
of both the program and each instruction. The abbreviation
PC an IR stands for program counter, which stores the
address of the next instruction, and instruction register,
which keeps the content of the current instruction,
respectively. The state of the computer transforms between
each other based on the corresponding parts in the current
instruction, directing the data path to take specific actions.
The state machine of modern time computers is usually too
Figure 4: Overall block diagram of the von Neumann
model
5
sophisticated, we just show a graph of part of the LC3 state
machine to give you a rough recognition (shown in Figure
5).
The central idea of the von Neumann model is that
the program and data are both stored as sequences of bits
in the computer’s memory, and the program is executed
one instruction at a time under the direction of the control
unit.
Before we leave, let’s discuss a little bit about the
system (kernel) and user mode. As shown in Figure 6, in
modern computers, application programs are running on
the operating system. The memory is usually separated into
several parts, with certain parts only accessible by the
system software. When a programmer or the standard
library wants to execute a certain function provided by the
operating system, it invokes a system call. Otherwise, the
application programs will be denied to get access to the
privileged memory space or the device register addresses
(which will be discussed further in memory-mapped IO).
How the operating system works and how to improve its
performance are extremely important questions in
computer science. For readers interested in more
information, please check 7 in the reference.
Figure 5: A state machine of LC3, state transformation is
activated by the information in the instruction
Figure 6: An example of system call, showing the
difference between the user mode and kernel mode
3.2 The Instruction Cycle
Instructions are executed under the direction of the control
unit in a very systematic, step-by-step manner. The
sequence of the steps (or phases in computer science
terminology) is called the instruction cycle. There’re
mainly six phases in a complete instruction cycle (although
many instructions only require part of these phases): fetch,
decode, evaluate address, fetch operands, execute and
store result.
In the FETCH phase, the computer obtains the next
instruction by the address stored in PC, loading it into the
IR, incrementing the PC simultaneously. In the DECODE
phase, the computer examines the first several bits (called
the opcode) of the instruction, figuring out what the
underlying microarchitecture is requested. If the
instruction requests a load or store action, the computer
calculates the addresses of the corresponding operands
based on the specific addressing mode of the instruction in
the EVALUATE ADDRESS phase. Then, the computer
accesses the memory, obtaining the source operands
needed in the FETCH OPERAND phase. In EXECUTE
phase, values in registers are generated if an operate
instruction, load or store happens if a data movement
instruction, PC redirected if a control instruction. And
finally, the result is stored in STORE RESULT phase.
Each of these phases in the instruction cycle may
takes several clock cycles according the specific ISA.
Factors like CPI (abbreviation for clock cycle per
instruction), instruction count and clock rate are
significant when evaluating a program’s performance.
6
Figure 7: Different component’s influence upon
program’s performance
For hardware and software’s influence upon them, please
check Figure7.
3.3 The Instruction Set & Memory-mapped
IO and Interrupt
An instruction is defined by three parts: its opcode, data
type and addressing mode. Approximately, all the
instructions can be divided in to three distinct categories:
operate instructions, data movement instructions and
control instructions. Some of the main instructions of the
Figure 8: RISC-V reference card (main part), listing kinds
of instructions and their assembly language expresses
RISC-V are shown in Figure 8.
Operate instructions process data, performing either
arithmetic or logic operations. The operands of this kind of
instructions can only be found in two places: registers or in
the instruction itself (immediate operand in compute
science terminology). RISC-V supports may operate
instructions such as ADD, ADDI, AND, ANDI, SLL, SRL
etc., performing arithmetic, logic and shifts.
Data movement instructions move information
between general-purpose registers and either memory
space or input/output devices (which can be also regarded
as some kind of special memory space). Specifically, the
data movement instructions load data from memory to the
registers, or store data from registers to the memory. The
specific memory address is calculated from the address
generation bits in the instruction. The calculation rule is
determined by the addressing mode of the instruction,
including PC-relative mode, indirect mode, base-offset
mode, immediate mode and so on [1, 2]. The creation of
different addressing modes is to reach more memory
spaces, some of which may be relatively far from the PC,
as possible.
Also, data movement instructions are the workload
when performing input/output tasks. In most modern time
computers, device registers are mapped to some particular
addresses allocated for I/O device registers rather than
normal memory spaces. The computer controls the data in
the memory-mapped device registers by exactly the same
data movement instructions to perform input/output tasks.
This is usually done by two ways. One is called polling,
the other is called interrupt. The difference between them
is that the polling method requires the computer to check
the device registers repeatedly when an I/O task is needed;
and the interrupt method, on the contrary, the computer
only stops to perform the I/O task when detecting a signal
indicating an input or output is ready, and return to the
interrupted task as if nothing has happened.
Control instructions change the sequence of the
executing instructions conditionally or unconditionally. It
reaches this goal by change the content in PC in the
EXECUTE phase of the instruction cycle. Otherwise, the
computer will execute the instruction in the next address
since the PC is always incremented during the end of the
FETCH phase. The condition of the conditional control
7
instructions is checked via the condition code, which
showing the results of the last instruction changing the
value in the registers.
4 Introduction to Microarchitecture and
Digital Logic Devices
At the end of this article, we’ll briefly introduce some
concepts of the underlying hardware. Figure 9 shows the
microarchitecture of LC3.
The microarchitecture of the computer is composed
of combinational logic circuits and sequential logic
circuits. The combinational logic circuits are responsible
for logic choices. Some basic components include encoder,
decoder, mux, D-mux, full adder, so on and so forth. On
the other hand, the sequential logic circuit, the foundation
of storage structures and finite state machine, are affected
by both the combination of the current inputs and the result
from the past ones. Some basic components include
latches and flip-flops. In fact, all these digital logic devices
are all systematical combinations of MOSFETs, the
abbreviation for Metal-oxide-semiconductor field-effect
transistors. It’s the open and close of those magical
transistors that creating our magnificent world of 1s and 0s.
5 Conclusion
Let’s remind the words of David Patterson at the beginning
Figure 9: The data path of LC3, including components for
interrupt control
of this article, the computing systems are nothing more
than “hardware and software consisting of hierarchical
layers using abstraction, with each lower layer hiding
details from the level above”. It couldn’t feel more
amazing to see how a high-level C program is compiled,
assembled and finally executed by the underlying
hardware, instruction by instruction, clock cycle after
clock cycle. It’s like we’re the conductors of an
unprecedentedly sophisticated orchestra, creating splendid
symphonies with simple waves of the baton in our hands.
No one can be indifferent towards this greatest artificiality
in the human history.
Acknowledgements: This article can never exist without
Prof. Hong An and Prof. Junxia Zhang’s great efforts in the
Introduction to computing systems(H) and Analog and
Digital Circuits Course.
Reference
[1] David A. Patterson, John L. Hennessy. Computer
Organization and Design, RISC-V Edition.
[2] Yale N. Patt, Sanjay J. Patel. Introduction to
Computing Systems, 2nd Edition.
[3] LC3 Simulator. http://wchargin.github.io/lc3web/
[4] Michael L. Scott, Morgan Kauffmann. Programming
Language Pragmatics.
[5] Alfred V. Aho. Compilers, Principles, Techniques and
Tools
[6] The RISC-V Instruction Set Manual
[7] Abraham Silberschatz, Peter Bear Galvin, Greg
Gagne. Operating System Concepts.