+ All Categories
Home > Documents > Mabry UG Thesis

Mabry UG Thesis

Date post: 10-Apr-2018
Category:
Upload: deepak-mishra
View: 219 times
Download: 0 times
Share this document with a friend

of 47

Transcript
  • 8/8/2019 Mabry UG Thesis

    1/47

    Honors Thesis:

    Asynchronous Implementation of 8051 Microcontroller

    By: Ryan Mabry

    Advisor: Dr. Hao Zheng

  • 8/8/2019 Mabry UG Thesis

    2/47

    Abstract

    The synchronous 8051 microcontroller is a common processor found in many

    embedded systems. By using asynchronous design techniques, the performance of the

    8051 microcontroller is increased. Through simulation and the use of existing

    synchronous design tools in the asynchronous design flow, a four-phase handshaking

    approach with a stoppable clock is simulated and then implemented. The asynchronous

    architecture added to the existing synchronous architecture includes an ALU wrapper, a

    controller wrapper, and a clocking element. The asynchronous design flow consists of

    functional simulation, synthesis of synchronous blocks, timing analysis, asynchronous

    wrapper design, and timing simulation. After implementation analysis, the asynchronous

    8051 is 28.7% faster then the synchronous 8051 while only using 10% more area.

    2

  • 8/8/2019 Mabry UG Thesis

    3/47

    Acknowledgements

    I would like to thank Narender Hanchate for devoting a significant amount of his

    time in teaching me many of the tools that I used for this project.

    I would also like to thank Dr. Hao Zheng for providing me with a thesis

    opportunity that allowed me to exercise my hardware engineering design skills.

    3

  • 8/8/2019 Mabry UG Thesis

    4/47

    Table of Contents

    Chapter 1: Motivation 5

    Chapter 2: Overview and Design 7

    Chapter 3:Design Flow 12

    Chapter 4:Design Implementation and Results 15

    Chapter 5: Challenges and Conclusion 25

    Appendix A:Divmul Instructions 27

    Appendix B: Controller, RAM, and Trace.out Contents 28

    Appendix C:Primetime and Buildgates Script Files 42

    Appendix D:ALU and Controller VHDL Code 44

    4

  • 8/8/2019 Mabry UG Thesis

    5/47

    Chapter 1: MotivationOverview

    The 8051 microcontroller is a device found in many embedded systems. It can be

    found as a component in many analog to digital converters, pulse-width modulators, and

    bus interfaces.

    The 8051 microcontroller this project uses comes from the University of

    Californias Dalton Project.1 The microcontroller design found on the Dalton Project

    website is fully synchronous. The goal of this project is to develop an asynchronous

    version of the 8051 microcontroller and use synchronous design tools in the process. The

    divmul program from the Dalton Project website is used as a benchmark comparison

    between the synchronous and asynchronous controllers, and it is also used in verifying

    the asynchronous simulation results. The instructions executed by the divmul program

    are listed in Appendix A.

    Asynchronous Advantages

    There are many reasons to implement a design using asynchronous techniques.

    Asynchronous designs have the advantages over traditional synchronous designs of lower

    power consumption, no clock skew, better technology migration, and less global timing

    issues.2 There are numerous ways to implement an asynchronous circuit; these include

    fundamental mode Huffman circuits, burst-mode circuits, and Muller circuits.

    Asynchronous Disadvantages

    While asynchronous circuits may be faster and are not subject to some of the

    problems that synchronous circuits suffer from, they are much harder to design. While

    there are many tools for synchronous design out on the market, there are no complete

    design solution tools for asynchronous circuits. Since there is no global clock in

    5

  • 8/8/2019 Mabry UG Thesis

    6/47

    asynchronous designs, communication must be done through handshaking or other

    means. Since there are no tools that automatically implement handshaking protocol

    between communicating modules, if an asynchronous circuit makes extensive use of

    handshaking or other asynchronous logic, great care must be taken to ensure timing and

    data integrity.

    Asynchronous Design Flow with Synchronous Tools

    Even though complete asynchronous design solutions do not exist, one can make

    use of existing synchronous design tools in the asynchronous design flow. In this flow,

    each design is partitioned into blocks, and each block is controlled by a local clock or

    handshaking protocol. Simulation tools like Modelsim can be used to simulate the

    functionality of the asynchronous design. Once the design has been functionally

    simulated, it must now be implemented. While an entire asynchronous circuit cannot be

    designed in synchronous tools, individual logic blocks can. During the design of the

    asynchronous 8051 chip, VHDL was used to generate certain blocks of the ALU and

    controller wrappers. While timing analyzers like Primetime are used in the synchronous

    design flow to look for timing and critical path violations, they can be used in the

    asynchronous design flow to get delay numbers. With delay numbers from leading

    commercial timing analyzers, delay elements or a local clock can be implemented.

    6

  • 8/8/2019 Mabry UG Thesis

    7/47

    Chapter 2: Overview and Design

    Synchronous 8051 Architecture

    I8051_CTR

    I8051_DEC

    I8051_ALU

    I8051_ROM

    I8051_RAM

    Op-code

    ip

    Src-1Src-2Src-3

    Carry-in 1 & 2

    desCarry-out 1 & 2

    Overflow

    ALU-Op-code

    td

    addr

    data

    tdwr

    addrIs-bit-addr

    dataData-bit

    Ports

    rst

    (Rd, wr, addr, data_out, data_in)

    Clock

    A block diagram of the synchronous 8051 architecture is shown above. The

    controller retrieves data from the I8051_ROM module and sends this data to the

    I8051_DEC module, where the data is decoded into an appropriate op-code for the

    controller to execute. Depending on the decoded instruction, the I8051_CTR module will

    assert and deassert the proper control signals to the I8051_ALU and I8051_RAM

    module. During the execution phase of a particular instruction, the I8051_CTR module

    will usually read data from the I8051_RAM module, send the fetched data to the

    I8051_ALU module for an appropriate logical or arithmetic operation, then write the

    results of the ALU operation back into the I8051_RAM module. If the 8051 needs to be

    accessed by an external hardware device, such as a memory bus or another

    7

  • 8/8/2019 Mabry UG Thesis

    8/47

    microprocessor, the I8051_CTR and I8051_RAM modules feature ports that can

    interface with the device.

    8

  • 8/8/2019 Mabry UG Thesis

    9/47

  • 8/8/2019 Mabry UG Thesis

    10/47

    than the required clock period for a division instruction. This functionality is

    implemented through handshaking and a stoppable clock.

    A block diagram of the asynchronous 8051 architecture is shown on the previous

    page. The operation of the asynchronous microcontroller is similar to that of the

    synchronous microcontroller, but with a few key differences. Unlike the synchronous

    version, the clock is stopped while the I8051_CTR module waits for the I8051_ALU

    module to execute a given operation; this is done through handshaking signals generated

    by the ALU and Controller wrappers. Second, since the clock is stopped while the

    controller waits for data from the ALU, excess cycles that were in the synchronous

    version were eliminated in the asynchronous version; excess cycles are defined as clock

    cycles where the controller is waiting for the ALU to complete an operation. Third, since

    the clock is stopped while the controller waits for a given operation from the ALU to

    complete, the clock must be generated onboard the chip, however, in the synchronous

    design, the clock is generated off the chip.

    The controller wrapper generates a request signal when the controller requests an

    operation from the ALU. Upon receiving an acknowledge signal from the ALU wrapper,

    the request signal is deasserted by the controller.

    The ALU wrapper generates an acknowledge signal to signify that the ALU has

    completed the operation requested by the controller. The delay between the time the

    ALU wrapper receives the request signal from the controller wrapper and the time the

    ALU wrapper acknowledges by asserting the acknowledge signal is dependent upon the

    operation requested. The requested operation is determined by the ALU Op-Code.

    10

  • 8/8/2019 Mabry UG Thesis

    11/47

    The clocking unit is used to generate an onboard clock signal for the

    asynchronous design. The behavior of the onboard clock is the same as an ordinary

    clock, except that the clock is stopped whenever the request line is asserted and the

    acknowledge line is deasserted. During this time, the controller is waiting for the ALU to

    complete an operation. Once the ALU operation is complete and the ALU wrapper

    asserts the acknowledge signal, the clock will resume normal operation.

    11

  • 8/8/2019 Mabry UG Thesis

    12/47

    Chapter 3: Design Flow

    Asynchronous Design Flow

    Functional Simulation

    Synthesis of

    Synchronous Blocks

    Timing

    Analysis

    Asynchronous Wrapper

    Design

    Timing Simulation

    One of the objectives of this project is to develop an asynchronous design flow

    that utilizes synchronous tools. An overview of each design flow stage and how

    synchronous tools are used follows.

    The first step in the asynchronous design flow is functional simulation. Modelsim

    is used to simulate the asynchronous design. Standard VHDL compilers, like Xilinx ISE,

    Altera Quartus II, and NC VHDL, cannot synthesize VHDL code that implements

    asynchronous logic. The wrappers for the ALU and controller were written in VHDL

    and interfaced with the reduced controller. These wrappers use wait statements to

    produce the handshaking logic that is necessary to stop the clock while the ALU

    completes a given operation. Verification of the asynchronous controller was done

    against the reduced synchronous controller by performing a memory dump at the end of

    simulation. The contents of the RAM memory and controller registers were compared

    and verified to be the same.

    12

  • 8/8/2019 Mabry UG Thesis

    13/47

    The second step in the asynchronous design flow is the synthesis of synchronous

    blocks. In this stage, the synchronous 8051 microcontroller and synchronous elements of

    the asynchronous 8051 microcontroller are synthesized and turned into Verilog netlists

    by Ambit Buildgates. These netlists are used in later stages of the design flow.

    The third step in the asynchronous design flow is timing analysis. In this stage,

    Cadence Encounter and Synopsys Primetime are used. Cadence Encounter is used for the

    generation of the Resistance-Capacitance model of a circuit. Synopsys Primetime is used

    for static timing and critical path analysis. As successive elements of the ALU are

    removed to determine the critical path delay for division, multiplication, addition,

    subtraction, and logical operations, the RC network model for each case is generated in

    Encounter and imported into Primetime to enhance timing accuracy. Critical path

    numbers are also generated for the RAM, ROM, decoder, synchronous controller, and

    asynchronous controller modules in this stage.

    The fourth step in the asynchronous design flow is the design of the asynchronous

    wrappers. Since the required delay numbers are obtained in stage four of the design flow

    process, the correct number of delay elements can now be implemented in designing the

    wrappers. The combinational logic elements of the wrappers are designed in

    synthesizable VHDL code that is importable into a schematic, while the delay elements in

    the ALU wrapper are laid out by hand in a schematic editor. The schematic editor used

    during this project is Cadence Composer.

    The fifth step in the asynchronous design flow is timing simulation.

    Unfortunately, the asynchronous implementation of the 8051 microcontroller is unable to

    13

  • 8/8/2019 Mabry UG Thesis

    14/47

    be tested since the university does not have a post-synthesis timing simulator installed on

    its servers.

    14

  • 8/8/2019 Mabry UG Thesis

    15/47

    Chapter 4: Design Implementation and Results

    Handshaking

    In the asynchronous implementation of the 8051 microcontroller, the

    synchronization between the ALU and controller is implemented through four-phase

    handshaking by the use of request and acknowledge signals. When the controller needs

    an ALU operation to be done, the request signal is asserted. The acknowledge signal

    from the ALU wrapper is then asserted after a given delay period, depending on the ALU

    operation requested by the controller. During the period that the request signal is asserted

    and the acknowledge signal is deasserted, the clock is stopped. The controllers state

    should remain the same as the ALU completes the requested operation. After the

    acknowledge signal from the ALU wrapper is asserted, the request signal is deasserted by

    the controller wrapper, then a short time later, the acknowledge signal from the ALU

    wrapper is also deasserted. A figure showing how such handshaking and clocking works

    is given below.

    Ack-

    Req+Stop Clock

    Ack+

    Start Clock

    Req-

    15

  • 8/8/2019 Mabry UG Thesis

    16/47

    Functional Simulation

    The first step towards simulating an asynchronous microcontroller is to

    implement its design in VHDL code. The controller is reduced so that it only implements

    the instructions executed by the divmul program; the asynchronous controller is further

    reduced by removing excess cycles. For example, in the reduced synchronous controller

    the NOP instruction takes 7 clock cycles; in the reduced asynchronous controller, the

    NOP instruction takes 3 clock cycles.

    Wrappers for the ALU and controller are implemented in VHDL to be interfaced

    with the asynchronous reduced controller. The controller wrapper consists of VHDL

    code that generates a request signal for an appropriate ALU op-code. The ALU wrapper

    consists of VHDL code that delays the generation of the acknowledge signal according to

    the ALU operation.

    Simulation verification in Modelsim consists of two phases. The first phase

    compares the memory, registers, and trace.out of the fully synchronous 8051 model and

    the asynchronous 8051 model that does not have any excess cycles removed, but does

    have ALU and controller wrapper logic inserted. The trace.out file is a recording of each

    instruction executed by the controller. As the simulation of both progresses, what is

    written to and read from the 8051 RAM module in each clock cycle is dumped into a text

    file. After these are verified to be the same, the controller register contents of both

    designs are also confirmed to be correct. The trace.out file executed during simulation is

    also the same.

    The second phase consists of removing excess cycles from the asynchronous 8051

    model that had ALU and controller wrapper logic inserted. Instead of comparing what

    16

  • 8/8/2019 Mabry UG Thesis

    17/47

    was written to and read from the 8051 RAM module in each cycle during simulation, a

    dump of the RAM contents at the end of simulation is performed. VHDL code is

    implemented to perform a dump of controller register contents at the end of simulation.

    As excess cycles are removed from the asynchronous controller in each instruction case,

    the contents of the RAM, controller registers, and trace.out are verified to be correct.

    Appendix B gives the final correct controller register, ALU, and trace.out contents.

    ALU Wrapper Design

    SelectLogic

    ALU

    Opcode

    2to1

    Mux

    Logical

    Operations

    2to1

    Mux

    Add

    Subtract

    2to1

    Mux

    Multiply Divide

    S0S1S20 0 0

    S0

    11 1

    S1 S2

    Req

    Ack

    A block diagram showing the ALU wrapper design is given above. The ALU

    wrapper operates by selecting an appropriate delay element for a given ALU Op-code.

    Signal req to ack connection is delayed for the amount of time it takes for the ALU to

    complete the appropriate operation. The select logic takes the ALU Op-code as input and

    generates appropriate signals for the select lines that control the multiplexers. The delay

    elements for logical operations, addition and subtraction, multiply, and divide are

    designed in such a way that the delay of one element builds upon the delay of the

    previous element. For example, say logical operations have a delay of 9ns and addition

    and subtraction operations have a delay of 12.8ns. The wrapper is designed so that

    addition and subtraction delay has to pass through the 9ns delay element of logical

    operations; thus, only an additional 3.8ns delay needs to be built into the addition and

    subtraction delay element in order to meet the overall delay requirement for addition and

    17

  • 8/8/2019 Mabry UG Thesis

    18/47

    subtraction operations. This same design idea holds for the multiplication and divide

    elements. The select logic VHDL code for the ALU wrapper is given in Appendix D.

    Primetime is used to calculate the critical path through the ALU. In the

    ALU, a virtual clock was created to drive all the inputs; this was necessary since

    Primetime wont generate critical path delay numbers without a defined clock. The RC

    model for each particular ALU was extracted from Encounter and applied to Primetime

    to enhance timing result accuracy.

    In calculating ALU delay, successive operations are removed. For example, in

    order to calculate multiplication delay, the division case from the ALU VHDL code is

    removed, and the critical path delay is assumed to be that of multiplication. This process

    continues until delay numbers for logical operations are achieved. Besides the explicit

    divide, multiply, add, subtract, and logical operations, there are several ALU operations

    that are placed into add or logical operations category based on the function they

    performed. For example, the PCSADD ALU op-code is used in calculating address

    offsets in jump instructions executed by the controller, and the PCUADD ALU op-code

    is used in incrementing the program counter inside the controller; since both op-codes

    make use of the addition function, these ALU operations are assumed to be part of the

    addition delay.

    Initial delay results from Primetime are given below. Since area is also a factor in

    delay, the delay must be increased; the ALU decreased in size dramatically from division

    to logical operations. In the entire ALU, logical operations may take significantly longer

    then 6046ps; the added area from the division and multiplication functionality results in

    an increased wire length between the input and output ports. Depending on how the

    18

  • 8/8/2019 Mabry UG Thesis

    19/47

    design is physically implemented on chip, the routing of the wires may impart more delay

    then the rise and fall times of the transistors used in the actual logical operations.

    ALU Ops Delay(ps)

    Division 24675.8Multiplication 10486.1

    Subtraction 8579.21

    Addition 6310.83

    Logical Operations 6046.16

    In order to provide a safety margin to account for real world operating conditions

    like voltage and temperature fluctuations, 50% of the initial delay is added to the initial

    delay numbers. The correct numbers are given below. Since addition and subtraction

    account for one delay element in the ALU wrapper, they are lumped together in the table

    below.

    ALU Ops Delay(ps)

    Division 37000

    Multiplication 15800

    Add & Subtract 12800

    Logical Operations 9000

    Now that the delay for each stage is set, an appropriate number of buffers had to

    be placed in each stage in order to implement the required delay. The buffer unit is found

    to have a delay of 110.28ps by itself after Primetime analysis. In order to simplify delay

    element design, the buffer was rounded down to a delay of 100ps. The number of buffers

    required for each delay element is given below; note that each successive delay element

    builds upon the delay of the previous element.

    ALU Ops Buffers

    Division 213

    Multiplication 30

    Add & Subtract 38

    Logical Operations 90

    19

  • 8/8/2019 Mabry UG Thesis

    20/47

    After implementing the delay elements for division, multiplication, add &

    subtract, and logical operations, they are wired together with the select logic block and

    multiplexers. The wrapper is then exported into a Verilog netlist and imported into

    Primetime to make sure the critical path delay in the wrapper is equal to the delay

    required by division. Upon finishing Primetime analysis on the wrapper, it is found the

    critical path had a delay of 42688.85ps. The difference in the expected and actual critical

    path delay comes from the fact that the average delay of each buffer changed from

    110.28ps to 114.28ps, and the buffer delay is simplified to 100ps during the design of the

    wrapper. In order to bring the critical path delay closer to required specifications, 50

    buffer elements are removed from the division delay element. After removing 50 buffers

    from the division delay element, the critical path delay in the ALU wrapper is found to be

    36975.75ps, which meets the required specifications. The final number of buffers in each

    delay element is given below.

    ALU Ops BuffersDivision 163

    Multiplication 30

    Add & Subtract 38

    Logical Operations 90

    Controller Wrapper Design

    CTR

    Wrapper

    ALU Op-code

    Ack

    Req

    The design of the controller wrapper is much simpler then the ALU wrapper

    design. All the controller wrapper does is assert a request signal while the controller is

    20

  • 8/8/2019 Mabry UG Thesis

    21/47

    waiting for data from an ALU operation; the request signal is deasserted once an

    acknowledge signal from the ALU wrapper is received. The controller wrapper is

    designed in VHDL code. The code for the controller wrapper can be seen in Appendix D.

    Controller Modifications

    In the case of the synchronous 8051 controller, the controller must wait an

    appropriate number of clock cycles for a given instruction while a given ALU operation

    is carried out. However, in the case of the asynchronous controller, these excess cycles

    can be removed since the clock is stopped while the ALU is carrying out a requested

    operation. For example, the ADDC_1 instruction executed by the controller takes 8

    clock cycles in the synchronous version and 6 clock cycles in the asynchronous version.

    The VHDL code for the ADDC_1 instruction in the asynchronous 8051 controller is on

    the left, and the VHDL code for the ADDC_1 instruction in the synchronous 8051

    controller is on the right. In the synchronous version, execute states ES_5 and ES_6 are

    excess cycles since the controller is doing nothing while waiting for the ALU to complete

    the add operation. In the asynchronous version, the excess cycles ES_5 and ES_6 can be

    eliminated since the clock is stopped while the ALU completes the add operation.

    when OPC_ADDC_1 => when OPC_ADDC_1 => case exe_state is case exe_state is

    when ES_0 => when ES_0 => GET_RAM_ADDR_1(v8); GET_RAM_ADDR_1(v8);

    START_RD_RAM(v8); START_RD_RAM(v8);

    exe_state

    exe_state

    alu_op_code

  • 8/8/2019 Mabry UG Thesis

    22/47

    START_WR_RAM(R_ACC); START_WR_RAM(R_ACC);

    reg_cy

  • 8/8/2019 Mabry UG Thesis

    23/47

    chain in the clocking unit is to modulate the clock period of the design. The inverter gate

    at the lower input node to the AND gate in the clocking unit acts to change the clock

    from zero to one and vice-versa during normal operation. The delay of this change to the

    output clock port is dependant on the inverter chain. The delay in the inverter chain must

    be longer than the longest critical path in the RAM, ROM, asynchronous controller, and

    decoder modules to avoid timing violations. Since the length of the inverter chain

    determines how often the clock will change from zero to one and vice-versa during

    normal operation, it is equal to half the period of the clock.

    After doing Primetime analysis, the module with the longest critical path in the

    asynchronous 8051 is the RAM with a critical path of 20.6ns. Applying a 50% safety

    margin to the critical path, a critical path delay of 30.9ns is obtained. Since an inverter

    has a delay of 49.96ps, 682 inverters are necessary to create the inverter chain.

    Targeted Technology

    The asynchronous microcontroller design is targeted towards the standard cell

    library developed by the Virginia Tech VLSI for Telecommunications. The VTVT

    library is based on the TSMC 0.25u CMOS fabrication process.

    Area and Speed

    Area numbers were generated by pks_shell and delay numbers are generated in

    Primetime. The area of the synchronous version is smaller then the asynchronous version

    since the synchronous version does not have an onboard clock generator, ALU wrapper,

    or Controller wrapper. The delay is the same in both versions since the RAM module

    forms the longest critical path.

    23

  • 8/8/2019 Mabry UG Thesis

    24/47

    The cell area of the synchronous 8051 chip is 65662; the cell area of the

    asynchronous chip is approximately 72400. The RAM module dominates the area of

    both asynchronous and synchronous chips; this can be expected since the 128 bytes of

    memory the 8051 has requires a large number of flip-flops.

    The RAM module dominates the critical path delay in the asynchronous and

    synchronous 8051 versions with a delay of 20.6ns. Applying a 50% safety margin to this

    number, one obtains a critical path delay of approximately 30.9ns. This translates to a

    clock period of 30.9ns and an operating frequency of 32.3 MHz for the synchronous and

    asynchronous chip during normal operation when the clock is not stopped.

    Even though the university lacks a post-synthesis timing simulator, one can gain a

    rough estimate of performance by implementing the delay elements in behavioral VHDL

    code in Modelsim. While the timing characteristics of each gate are not reflected in

    Modelsim, it can model the behavior of the delay elements and asynchronous

    handshaking logic through wait forand wait untilstatements. During simulation in

    Modelsim, the divmul program took 221,390ns to execute on the synchronous 8051 and

    172,030ns to execute on the asynchronous 8051. The asynchronous 8051 is 28.7% faster

    than the synchronous 8051 in executing the divmul program.

    24

  • 8/8/2019 Mabry UG Thesis

    25/47

    Chapter 5: Challenges and Conclusion

    Challenges

    There were several difficulties encountered while designing the

    asynchronous 8051 microcontroller. The most significant difficulty came in learning the

    different tools that were used in the asynchronous design flow. While learning tools like

    Cadence Encounter and Ambit Buildgates, technical assistance from people familiar with

    the tools is readily available, however, while learning tools like Synopsys Primetime, the

    sole resources available are the Internet and user manuals. By establishing what tools are

    necessary for each stage in the asynchronous design flow, a great amount of time could

    be saved. During the timing analysis stage of the design flow, Synopsys Primetime,

    Design Analyzer, and Timemill are examined, and a large amount of time is spent

    learning each tool. Ultimately, only Primetime proves useful in the search for accurate

    delay numbers. The time spent learning Design Analyzer and Timemill is better utilized

    in the later stages of the asynchronous design flow.

    Conclusion

    Even though the idea behind handshaking is very simple, it takes a lot of work to

    change an existing synchronous design to an asynchronous design. The use of common

    synchronous design tools in the asynchronous design flow makes the design of the

    asynchronous 8051 microcontroller much easier.

    While the asynchronous version is implemented in schematic form, it is

    impossible to verify its correctness; no suitable post-synthesis timing simulators are

    installed on university servers.

    25

  • 8/8/2019 Mabry UG Thesis

    26/47

    Works Cited

    1. Dalton Project. http://www.cs.ucr.edu/~dalton/8051/. University of California,Department of Computer Science, Riverside, CA 92521. 7 April 2005.

    2. Scott Hauk, Asynchronous Design Methodologies: An Overview, inProceedings of the IEEE, Vol. 83, No. 1, pp. 69-93, January, 1995.

    26

    http://www.cs.ucr.edu/~dalton/8051/http://www.cs.ucr.edu/~dalton/8051/
  • 8/8/2019 Mabry UG Thesis

    27/47

    Appendix A Executed divmul Instructions

    ADD A, directADDC A, RnCJNE RN, #data, rel

    CLR ADIV ABDNJZ Rn, relINC RnJNZ relLCALL addr16LJMP addr16MOV A, RnMOV A, directMOV Rn, AMOV Rn, direct

    MOV Rn, #dataMOV direct, RnMOV direct, directMOV direct, #dataMOV @Ri, AMUL ABORL A, RnRETSJMP relXCH A, RnXCH A, directXRL A, #data

    27

  • 8/8/2019 Mabry UG Thesis

    28/47

    Appendix B Controller, RAM, and Trace.out Contents

    Characters after -- are just comments

    Controller register contents at end of simulation:

    00000 -- reg_pc_15_11000 -- reg_pc_10_801110011 -- reg_pc_7_000101110 -- reg_op111111110 -- reg_op211111110 -- reg_op300000000 -- reg_acc0 -- reg_cy0 -- reg_ac

    0 -- reg_f00 -- reg_rs10 -- reg_rs00 -- reg_ov0 -- reg_nu0 -- reg_p

    28

  • 8/8/2019 Mabry UG Thesis

    29/47

    RAM contents at end of simulation:

    00000000 -- sfr_b00000000 -- sfr_acc

    00000000 -- sfr_psw00000000 -- sfr_ie00000000 -- sfr_ip00001111 -- sfr_sp00000000 -- sfr_dpl00000000 -- sfr_dph00000000 -- sfr_scon00000000 -- sfr_sbuf00000000 -- sfr_tcon00000000 -- sfr_tmod00000000 -- sfr_tl0

    00000000 -- sfr_th000000000 -- sfr_tl100000000 -- sfr_th100001010 -- RAM byte 000000000 -- RAM byte 100000000 -- RAM byte 200001101 -- RAM byte 300000000 -- RAM byte 400001101 -- RAM byte 500000000 -- RAM byte 610000010 -- RAM byte 700000000 -- RAM byte 810000110 -- RAM byte 900000000 -- RAM byte 1000001010 -- RAM byte 1100000000 -- RAM byte 1200000100 -- RAM byte 1301001001 -- RAM byte 1400000000 -- RAM byte 1500000000 -- RAM byte 1600000000 -- RAM byte 1700000000 -- RAM byte 1800000000 -- RAM byte 1900000000 -- RAM byte 2000000000 -- RAM byte 2100000000 -- RAM byte 2200000000 -- RAM byte 2300000000 -- RAM byte 2400000000 -- RAM byte 2500000000 -- RAM byte 26

    29

  • 8/8/2019 Mabry UG Thesis

    30/47

    00000000 -- RAM byte 2700000000 -- RAM byte 2800000000 -- RAM byte 2900000000 -- RAM byte 3000000000 -- RAM byte 31

    00000000 -- RAM byte 3200000000 -- RAM byte 3300000000 -- RAM byte 3400000000 -- RAM byte 3500000000 -- RAM byte 3600000000 -- RAM byte 3700000000 -- RAM byte 3800000000 -- RAM byte 3900000000 -- RAM byte 4000000000 -- RAM byte 4100000000 -- RAM byte 42

    00000000 -- RAM byte 4300000000 -- RAM byte 4400000000 -- RAM byte 4500000000 -- RAM byte 4600000000 -- RAM byte 4700000000 -- RAM byte 4800000000 -- RAM byte 4900000000 -- RAM byte 5000000000 -- RAM byte 5100000000 -- RAM byte 5200000000 -- RAM byte 5300000000 -- RAM byte 5400000000 -- RAM byte 5500000000 -- RAM byte 5600000000 -- RAM byte 5700000000 -- RAM byte 5800000000 -- RAM byte 5900000000 -- RAM byte 6000000000 -- RAM byte 6100000000 -- RAM byte 6200000000 -- RAM byte 6300000000 -- RAM byte 6400000000 -- RAM byte 6500000000 -- RAM byte 6600000000 -- RAM byte 6700000000 -- RAM byte 6800000000 -- RAM byte 6900000000 -- RAM byte 7000000000 -- RAM byte 7100000000 -- RAM byte 72

    30

  • 8/8/2019 Mabry UG Thesis

    31/47

    00000000 -- RAM byte 7300000000 -- RAM byte 7400000000 -- RAM byte 7500000000 -- RAM byte 7600000000 -- RAM byte 77

    00000000 -- RAM byte 7800000000 -- RAM byte 7900000000 -- RAM byte 8000000000 -- RAM byte 8100000000 -- RAM byte 8200000000 -- RAM byte 8300000000 -- RAM byte 8400000000 -- RAM byte 8500000000 -- RAM byte 8600000000 -- RAM byte 8700000000 -- RAM byte 88

    00000000 -- RAM byte 8900000000 -- RAM byte 9000000000 -- RAM byte 9100000000 -- RAM byte 9200000000 -- RAM byte 9300000000 -- RAM byte 9400000000 -- RAM byte 9500000000 -- RAM byte 9600000000 -- RAM byte 9700000000 -- RAM byte 9800000000 -- RAM byte 9900000000 -- RAM byte 10000000000 -- RAM byte 10100000000 -- RAM byte 10200000000 -- RAM byte 10300000000 -- RAM byte 10400000000 -- RAM byte 10500000000 -- RAM byte 10600000000 -- RAM byte 10700000000 -- RAM byte 10800000000 -- RAM byte 10900000000 -- RAM byte 11000000000 -- RAM byte 11100000000 -- RAM byte 11200000000 -- RAM byte 11300000000 -- RAM byte 11400000000 -- RAM byte 11500000000 -- RAM byte 11600000000 -- RAM byte 11700000000 -- RAM byte 118

    31

  • 8/8/2019 Mabry UG Thesis

    32/47

    00000000 -- RAM byte 11900000000 -- RAM byte 12000000000 -- RAM byte 12100000000 -- RAM byte 12200000000 -- RAM byte 123

    00000000 -- RAM byte 12400000000 -- RAM byte 12500000000 -- RAM byte 12600000000 -- RAM byte 127

    32

  • 8/8/2019 Mabry UG Thesis

    33/47

    Trace.out contents at end of simulation

    NOPLJMP

    MOV 7CLR 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    33

  • 8/8/2019 Mabry UG Thesis

    34/47

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1

    MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    34

  • 8/8/2019 Mabry UG Thesis

    35/47

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1

    MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    35

  • 8/8/2019 Mabry UG Thesis

    36/47

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1

    MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    36

  • 8/8/2019 Mabry UG Thesis

    37/47

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1

    MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    37

  • 8/8/2019 Mabry UG Thesis

    38/47

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1

    MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13

    DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 13DJNZ 1MOV 12LJMPMOV 12MOV 7CLR 1MOV 5INC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2

    38

  • 8/8/2019 Mabry UG Thesis

    39/47

    CJNE 3INC 2CJNE 3MOV 1XRL 4

    ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2

    CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1

    39

  • 8/8/2019 Mabry UG Thesis

    40/47

    JNZINC 2CJNE 3INC 2CJNE 3

    MOV 1XRL 4ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1

    JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZINC 2CJNE 3INC 2CJNE 3MOV 1XRL 4ORL 1JNZMOV 6LCALLCJNE 3MOV 1MOV 9

    40

  • 8/8/2019 Mabry UG Thesis

    41/47

    DIVMOV 5MOV 6RETMOV 9

    MOV 6LCALLCJNE 3MOV 1MOV 9DIVMOV 5MOV 6RETMOV 9MOV 6

    LCALLMOV 1MOV 5MOV 9MULMOV 5MOV 1XCH 2XCH 1MULADD 1

    41

  • 8/8/2019 Mabry UG Thesis

    42/47

  • 8/8/2019 Mabry UG Thesis

    43/47

    ALU Division Buildgates tcl script file:

    read_tlf vtvtlib25.tlfread_lef vtvtlib25.lef

    read_vhdl i8051_lib.vhd i8051_alu.vhddo_build_genericset_current_module I8051_ALUdo_optimizewrite_Verilog -hier i8051_ALU_division.vexit

    ALU Division Primetime tcl script file:

    read_Verilog i8051_ALU_division.vread_db vtvtlib25.db

    set link_path "* vtvtlib25.db"link_designcreate_clock -name vclk -period 2 -waveform {0 1}set_input_delay 1 -clock vclk [all_inputs]set_output_delay 1 -clock vclk [all_outputs]set_driving_cell -lib_cell inv_1 [all_inputs]set_load -pin_load 1 [all_outputs]read_parasitics -format SPEF \ I8051_ALU.spefreport_timing > timing.txt

    43

  • 8/8/2019 Mabry UG Thesis

    44/47

    Appendix D ALU and Controller VHDL Code

    ALU wrapper select logic VHDL code:

    library IEEE;

    use IEEE.STD_LOGIC_1164.all;

    use IEEE.STD_LOGIC_ARITH.all;use WORK.I8051_LIB.all;

    entity ALU_wrapper isport(

    alu_op_code : in UNSIGNED(3 downto 0)select0 : out std_logic;

    select1 : out std_logic;select2 : out std_logic; );

    end ALU_wrapper;

    architecture BEHAVIORAL of ALU_wrapper is

    begin

    process(alu_op_code)begin

    CASE alu_op_code ISWHEN ALU_OPC_NONE =>

    select0

  • 8/8/2019 Mabry UG Thesis

    45/47

    WHEN ALU_OPC_XOR =>

    select0

  • 8/8/2019 Mabry UG Thesis

    46/47

    Controller Wrapper VHDL Code:

    library IEEE;

    use IEEE.STD_LOGIC_1164.all;use IEEE.STD_LOGIC_ARITH.all;

    use WORK.I8051_LIB.all;

    entity CTR_wrapper isport(

    alu_op_code : in UNSIGNED(3 downto 0);

    ack : in std_logic;

    req : out std_logic);end CTR_wrapper;

    architecture BEHAVIORAL of CTR_wrapper is

    beginprocess(alu_op_code, ack)

    beginif ack = '1' then

    req

    req

    req

    req

    req

    req

    req

    req

    req

    req

    req

    req

    req req

    req

    req

    req

    req

  • 8/8/2019 Mabry UG Thesis

    47/47

    END CASE;

    END IF;

    END PROCESS;END BEHAVIORAL;


Recommended