NEEDLEMAN-WUNSCH AND SMITH-WATERMAN
IMPLEMENTATION FOR SPAM/UCE INLINE FILTER
CHIEW MING THONG
FACULTY OF COMPUTER SCIENCE AND INFORMATION
TECHNOLOGY
UNIVERSITY OF MALAYA
KUALA LUMPUR
2011
NEEDLEMAN-WUNSCH AND SMITH-WATERMAN
IMPLEMENTATION FOR SPAM/UCE INLINE FILTER
CHIEW MING THONG
SUBMISSION OF DISSERTATION FOR THE FULFILLMENT OF
THE DEGREE OF MASTER OF COMPUTER SCIENCE
FACULTY OF COMPUTER SCIENCE AND INFORMATION
TECHNOLOGY
UNIVERSITY OF MALAYA
KUALA LUMPUR
2011
ii
Abstract
Spam have been a significant problem as it consumes bandwidth of the internet, waste surfers
time, waste computational resources of internet service providers and reduce the efficiency of
email as a way of communication. Despite various anti spam solutions introduced, spam mails
tend to be able to avoid detection by slightly modifying their spam signature. This helps to avoid
anti-spam solutions from successfully detecting the keywords in emails that are closely
associated with spam. Two algorithms named Needleman-Wunsch and Smith-Waterman will be
implemented on FPGA as spam detection engine. Both algorithms share its origin from the
theory of dynamic programming and are normally implemented in bioinformatics for sequence
alignment. As both are well-known for their ability to detect sequences with slight changes
caused by mutation, these two algorithms will be used to detect spam messages that slightly
change its spam keyword. FPGA have been selected as the device for implementation. As
hardware are faster than software, using FPGA helps to reduce the scanning time and reduce the
CPU load of the computer. The advancement of FPGA technologies help to make it capable of
becoming a standalone scanning unit. The effectiveness of both algorithms in spam scanning will
be looked into. The corpus from Text Retrieval Conference (TREC 2007) will be used to test the
effectiveness of the anti-spam engines.
iii
Acknowledgement
I’d like to dedicate my special thanks to Mr. Emran bin Mohd Tamil for his supervision and
support during this project. Also not forgotten Mr. Mohd Yamani Idna Idris, Mr. Zaidi Razak
and Mr. Noorzaily Mohamed Noor for their continuous guidance and valuable advises. Thanks
to Mr. Farid from Symmid for his Xilinx knowledge and teaching and everybody from the
System On Chip development group. Last but not least to my beloved family for their patient and
support during this research.
iv
Table of Content
Abstract …………………………………………………………………………………………...ii
Acknowledgement………………………………………………………………………………. iii
Table of Content …………………………………………………………………………………iv
List of Figures ……………………………………………………………………...…………...viii
List of Tables ………………………………………………………………….…………………xi
Chapter 1 Introduction …………………………………………………….……………………...1
1.1 Background …………………………………………………………………………...1
1.2 Motivation and Purposes ……………………………………………………………...3
1.3 Research Objective …………………………………………………………………...3
1.4 Thesis outline ………………………………………...……………………………….4
Chapter 2 Literature Review and Technology Background ……………….……………………..6
2.1 What is spam? ………………………………………………………………………...6
2.2 The history of spam …………………………………………………………………..6
2.3 Privacy ………………………………………………………………………………..7
2.4 Spam Prevention ……………………………………………………………………...8
2.5 Rules and Regulations …………………………………………….…………………9
2.6 CAN-SPAM Act …………………………………………………………………….10
2.7 Ways to prevent spam ……………………………………………………………….13
2.8 Combined Solutions …………………………………………………………………17
2.9 The Challenge of Botnets …………………………………………...……………….17
2.10 The Effectiveness of Spam Filtering ……………………………………………….18
v
2.11 Bayesian Poisoning ………………………………………………..……………….19
2.12 Needleman-Wunsch Algorithm ……………………………...…………………….20
2.13 Smith-Waterman Algorithm ……………………………...………………………..20
2.14 Global Alignment and Local Alignment …………………………………………...21
2.15 Processing Time Issue ……………………………………………………………...22
2.16 Heuristics Based Algorithms ………………………………………………………22
2.17 The Selection of Algorithm………………………………………………………...22
2.18 Calculations of Needleman-Wunsch Algorithm …………………………………...24
2.19 Calculations of Smith-Waterman Algorithm ……………………..………………..27
2.20 FPGA …………………………………………………………………….………...29
2.21 FPGA Platforms…………………………………………………………………….30
2.22 Future Demand……………………………………………………………………...31
2.23 Parallelism………………………………………………………………………….31
Chapter 3 Design Methodology………………….……………………………...………………33
3.1 Flow Diagram………………………………………………………………………..33
3.2 Overall Architecture…………………………………………………………………34
3.2.1 Crossover Ethernet Cable………………………………………………….36
3.2.2 Null Modem cable…………………………………………………………36
3.2.3 JTAG Cable………………………………………………………………..37
3.3 Microblaze…………………………………………………………………………...37
3.4 Microblaze Hardware Design………………………………………………………..38
3.5 Microblaze Software Design…………………………………………………………39
3.6 Software and Hardware Development Applications…………………………………43
vi
3.6.1 Xilinx ISE Design Suite 10.1………………………………………………43
3.6.2 Xilinx ISE 10.1(Xilinx Integrated System Environment)…………………43
3.6.3 Xilinx XPS 10.1(Xilinx Platform Studio)…………………………………44
3.6.4 Xilinx XPS SDK 10.1(Xilinx Platform Studio Software Development
Kit)…………………………………………………………………………45
3.6.5 Xilinx ISE Simulator……….………………………………………………45
3.6.6 Xilinx Chipscope Pro………………………………………………………46
3.7 Other Applications Used……………………………………………………………..47
3.7.1 Hyper Terminal…………………………………………………………….47
3.7.2 Bray Terminal v1.9b……………………………………………………….48
3.7.3 Simple TCP Client…………………………………………………………49
3.8 Programming Languages Used………………………………………………………49
3.8.1 VHDL……………………………………………………………………...49
3.8.2 C Programming…………………………………………………………….50
3.9 Development Board………………………………………………………………….50
Chapter 4 Design and Implementation ……………………………….……….………………...52
4.1 Fast Simplex Link(FSL)…………………………………………..…………………52
4.2 IP Design ………………………………………….…………………..…………….54
4.2.1 FSL side_relay……………………………………………………………..54
4.2.2 Needleman-Wunsch and Smith-Waterman Hardware……………………..57
4.2.2.1 Hardware test_nw and test_sw…………………….……………..58
4.2.2.2 Hardware array_proc…………………………….………………63
4.2.2.3 Hardware processing_element…………………….……………..64
vii
4.3 Parallelism…………………………………………………………...……..………...67
Chapter 5 Results, Simulations, Analysis, and Testing……………………………..…………...71
5.1 Development of side_relay Unit…..…………………………………..……………..72
5.2 Developing the Needleman-Wunsch Algorithm IP Unit…………..….…………..…73
5.3 Developing the Smith-Waterman Algorithm IP Unit…………...…….……………..76
5.4 Overall Microblaze with Needleman-Wunsch IP and Microblaze with Smith-
Waterman IP……………………………….…………………..….………………….78
5.5 Post-Route Simulation ………………………………………………………………..80
5.6 Hardware Timing Diagram………………………………………………..…………..81
5.7 Processing Element of Needleman-Wunsch Post-Route Simulation …………………84
5.8 Processing Element of Smith-Waterman Post-Route Simulation …………………….85
5.9 Microblaze FSL side_relay unit Post-Route Simulation ……………………………..86
5.10 Hardware Data Sampling…………………..………………………………………...87
5.11 Xilinx Chipscope Pro Sampling of FSL Interface ……………………..……………87
5.12 Spam Mail Testing ……………………………………………………………..……91
5.12.1 Testing Criteria …………………………………………………..……….92
5.12.2 Results………………………………………………………………….…95
Chapter 6 Conclusion …………………………………………….……………………………..99
6.1 Contributions…………………………………………...……………………………..99
6.2 Further Improvements and Future Works………….…………….………….………100
Reference ……………………………………………………………..………………………..102
Appendix A…………………………..……………….…………………..…………………….113
Appendix B…………………………..……………….…………………..…………………….114
viii
Appendix C…………………………..……………….…………………..…………………….136
List of Figures
Figure 2.1 The category of anti-spam solutions(Hunt & Carpinter 2006)…………...……………8
Figure 2.2 Model Of Email Delivery. Source from (Hoanca 2006)……………………………..12
Figure 2.3 The computed score table of Needleman-Wunsch…………………………………...24
Figure 2.4 The calculation of a cell in the matrix table of Needleman-Wunsch algorithm……...25
Figure 2.5 The completed calculations of the matrix table of Needleman-Wunsch algorithm….25
Figure 2.6 The traceback being performed on the matrix table of Needleman-Wunsch
algorithm........................................................................................................................................26
Figure 2.7 The score table of Smith-Waterman algorithm………………………………………27
Figure 2.8 The computed matrix table of Smith-Waterman algorithm………………………….28
Figure 2.9 The traceback being performed on the computed matrix table of Smith-Waterman
algorithm…………………………………………………………………………………………29
Figure 3.1 The stages involved in designing the inline filter…………………………………....33
Figure 3.2 Illustrates the operations of the FPGA systems.……………………………….……..34
Figure 3.3 The overall design of the architecture…………………………………….………….35
Figure 3.4 The design of the Microblaze and how it interconnects with the inline filter. ……...39
Figure 3.5 The flow of the software in Microblaze once the design is started.…...………...…...40
Figure 3.6 The flow of software after a connection is accepted.….…………………..….……...41
Figure 3.7 Further elaboration of the scanning function is shown in this figure.……..…………42
Figure 3.8 The sample coding used to display output from the serial port. ………………..…....48
Figure 4.1 The block diagram of a FSL bus (Fast Simplex Link (FSL) Bus (v2.11a), 2008)…..53
Figure 4.2 The block diagram of the side_relay hardware…………………...………………….54
ix
Figure 4.3 The type of interface that could be added with Xilinx Platform Studio.…..………...55
Figure 4.4 The two design unit that are connected to the Microblaze core.….…….….………...56
Figure 4.5 The new connection after reconfiguration is being made……………..…...………...56
Figure 4.6 State diagram of the side_relay unit…………………………………..….…………..57
Figure 4.7 The port interface of the algorithm IP…………..…………………….……………...58
Figure 4.8 The flow chart of the Needleman-Wunsch test_nw and Smith-Waterman test_sw
IP. On the right is the further breakdown of the Delay state.……………..………...…………...60
Figure 4.9 Further breakdown of the compute table in the flow chart.………………………….61
Figure 4.10 The interconnection of side_relay, algorithm IP and the Microblaze core.………...62
Figure 4.11 The port interface of the array_proc hardware………..……….….….……….…….63
Figure 4.12 The port interface of the processing_element hardware.………….…...……….…...64
Figure 4.13 The mapping of the VHDL block of the algorithm IP.………..…………………....66
Figure 4.14 The flow of execution for a single processing element system…………..………...68
Figure 4.15 The flow of execution for multiple processing element system in the middle of
the matrix table computation.…………….………………………………………….…………..68
Figure 5.1 The diagrams of the connections between the Microblaze core, the side_relay unit
and the main engine or algorithm IP...………………………………………….……………..…71
Figure 5.2 Resource utilization and the timing summary of the side_relay unit……………...…73
Figure 5.3(a) Resource utilization of Needleman-Wunsch IP………………………...…………74
Figure 5.3(b) Resource utilization of Needleman-Wunsch IP………………………….………..75
Figure 5.4 Timing summary for Needleman-Wunsch IP………………………………...……...75
Figure 5.5(a) Resource utilization of Smith-Waterman IP…………...…………………...……..77
Figure 5.5(b) Resource utilization of Smith-Waterman IP……………………………...……….77
x
Figure 5.6 Timing summary for Smith-Waterman IP……………………………………………78
Figure 5.7 Overall resource utilization of the Microblaze system with the Needleman-Wunsch
IP………………………………………………………………………………………………....79
Figure 5.8 Overall resource utilization of the Microblaze system with the Smith-Waterman
IP..…………………………………………………………………………………….………….80
Figure 5.9 Some of the simulation results of test_nw from Needleman-Wunsch
algorithm………..…………………………………….………………………….………………81
Figure 5.10 The test_nw hardware output the
result.……………………………………………………………………………...……………...83
Figure 5.11 Post-route simulation of Needleman-Wunsch algorithm processing element.....…..84
Figure 5.12 Post-route simulation of Smith-Waterman algorithm processing element …......…..85
Figure 5.13 Post-Route Simulation of Microblaze FSL side_relay unit…………………………86
Figure 5.14 Label 1, 2, 3 and 4 that shows the interface covered by Chipscope Pro
Analyzer in Figure 5.15………………………….…………………………………...………….87
Figure 5.15 The Chipscope Pro Analyzer result collected from interface labeled 1, 2, 3
and 4 in Figure 5.14……………………………………...………………………………………88
Figure 5.16 Label 5 and 6 that shows the interface covered by Chipscope Pro
Analyzer in Figure 5.17………………………………………………………………………….89
Figure 5.17 The Chipscope Pro Analyzer result collected from interface labeled 5
and 6 in Figure 5.16……………………………………………………….……………………..89
Figure 5.18 The Chipscope Pro Analyzer result collected from interface labeled 7
and 8 in Figure 5.16……………………………………………………………….……………..90
Figure 5.19 Procedure used by Criteria 1 to calculate the marks in Microblaze software………93
Figure 5.20 Procedure used by Criteria 2 to calculate the marks in Microblaze software………94
Figure B.1 The RTL Schematic of the VHDL block of the algorithm IP……….…………..…114
xi
Figure B.2 RTL Schematics of array_proc unit for Needleman-Wunsch……………………...115
Figure B.3 RTL Schematics of array_proc unit for Smith-Waterman……………………….…124
Figure B.4 The RTL Schematic of the processing element for Needleman-Wunsch…………..133
Figure B.5 The RTL Schematic of the processing element for Smith-Waterman……………...134
Figure B.6 RTL Schematics of side_relay……………………………………………..……….135
Figure C.1 The first half post-route simulation for Needleman-Wunsch Algorithm…………..137
Figure C.2 The second half of post-route simulation for Needleman-Wunsch Algorithm…….137
Figure C.3 The first half of post-route simulation for Smith-Waterman Algorithm….………..138
Figure C.4 The second half of post-route simulation for Smith-Waterman Algorithm………...138
List of Tables
Table 2.1 Various types of anti-spam solution and its description……………………………....17
Table 4.1 The description of ports that are in the master side of the FSL bus.………………….53
Table 4.2 The description of ports that are in the slave side of the FSL bus.…..…..……...…….53
Table 4.3 Port description for the array_proc hardware…………………………………………63
Table 4.4 Port description for the processing_element hardware ………………..……………..64
Table 4.5 The coordinates of values received by diagonal, upper and left registers…………….70
Table 5.1(a)(b) The results for testing of spam email..…………………………………..……....96
Table A.1 Microblaze terms and definitions…………………………………………………....113
1
Chapter 1
Introduction
1.1 Background
Spam has been a major problem in the world of internet. In year 2007 alone, spam costs around
$100 billion of US dollar for productivity loss worldwide (Ferris Research 2007).The spam
problem is so overwhelming that it reduce the efficiency and dependability of the network
(Ming-Wei et al. 2005) and consumes bandwidth (Hoanca 2006). If spam issue is not resolved or
reduced, it may stop email from being a way of internet communications altogether.
To date, there are various ways of blocking spam that is being researched, proposed and
implemented in the real world. Though none of these approaches totally eliminate the spam
problem it does however, help to reduce spam problem and increase the efficiencies of email
usage. One of the most popular and effective technique in filtering spam is using Bayesian
approach for detection. Bayesian detection however, is not a perfect anti-spam solution as
spammers could bypass it using Bayesian poisoning techniques (Cumming 2006). In Bayesian
poisoning, spam keywords are slightly modified to evade detection of the filter. The process of
detecting spam keywords that are slightly modified calls for the need of suitable approximate
algorithms.
After careful considerations, two algorithms were being selected for implementations
which are Needleman-Wunsch and Smith-Waterman. Both of these algorithms were well-known
2
in the field of bioinformatics for the purpose of gene sequences detection. In genes, sequences
were being compared to find similarities in two sequences that are slightly different as a result of
mutations. The string sequences detection in bioinformatics does have some similarity with spam
keyword detection. Only in spam keywords, the two sequences of words being compared were
mostly less than fifteen characters long while in gene comparison, the two sequences being
compared could be from several thousands to billions of characters long. Based on reviews done,
the two algorithms are not widely used yet because of the current technology limitations in
supporting sequences comparison that are too long. For this dissertation however, the two
algorithms were designed to compare sequences of normal text words that are mostly less than
fifteen characters long. The FPGA used were able to accommodate the smaller algorithms
designs.
Even though smaller designs of the two algorithms are being implemented, it still
requires a lot of processing power and time if it were implemented in software. Instead of
implementing it in software, the systems were implemented in FPGA hardware for faster speed.
Hardware have the advantages of computing at wire speed, the ability of utilize parallelism and
low power consumption.
This dissertation will provide an insight of some of the anti-spam approaches being
researched and applied in the world. It will highlight some of the problems faced in current anti-
spam solutions and proposed to solve some of the problems using the two algorithms. The
process of implementing the two algorithms in FPGA is being reviewed and discussed in this
dissertation. It then concludes with simulation, testing and analysis with the proposal of future
works.
3
1.2 Motivation and Purposes
(i) The problem of spam has been increasing over time calling for a faster anti-spam scanning
method to address the problem.
(ii) The need for a way to address the Bayesian poisoning problem.
(iii) Needleman-Wunsch and Smith-Waterman are computationally intensive algorithms. With
hardware, both algorithms could perform computations faster than software. FPGA offers the
advantages of computing at wire speed, the ability of utilize parallelism and low power
consumption.
1.3 Research Objective
(i) To study existing approaches and challenges in filtering spam, current ways of hardware
implementations and find two suitable algorithms.
(ii) To design two systems of hardware in FPGAs. One of the systems incorporates the first
algorithm and the other incorporates the second algorithm.
4
(iii) To implement both systems and test the effectiveness of it in detecting poisoned signatures.
1.4 Thesis outline
Chapter 2 Literature Review and Technology Background – In this chapter, various literature
reviews will be made covering anti-spam solutions and the background of the algorithms. This
chapter will also cover about the technology background of the design used.
Chapter 3 Design Methodology – The method of developing the environment is explained in this
chapter. This chapter will cover the software and hardware used in the design as well as brief
description of programming languages used.
Chapter 4 Design and Implementation – This chapter will brief about how the hardware is
designed, with the functions of various blocks of hardware being explained.
Chapter 5 Results, Simulations, Analysis, and Testing – Results of implementations is shown in
this chapter. It also explains about the simulations performed on the design. It is then followed
with testing results at the end. Various analyses were provided along the chapter.
5
Chapter 6 Conclusion – This section concludes the contributions in this research with further
improvements and future works.
6
Chapter 2
Literature Review and Technology Background
2.1 What is spam?
Spam is also known as Unsolicited Commercial Email (UCE) (Hoanca 2006) (Haupt 2004) and
Unsolicited Bulk Email (UBE) (Hoanca 2006) (The Definition of Spam 2007). Spam messages
are sent to groups of recipient without their consent (Gunnarsson & Ekberg 2003) (The
Definition of Spam 2007). UCE or UBE are commonly used by companies and individuals to
send emails to large number of people in a short time. Examples of spam are advertisements for
medications, websites, illegal items, rewards and prizes (Oda 2005).
2.2 The history of spam
The first incident of spam could be traced back to Digital Equipment Corporation sending large
amount of email to publicized about their new machine to all ARPANET addresses on the
United States west coast (Gunnarsson & Ekberg 2003) (Haupt 2004). Back then, the term spam
was still not in use in the computer community (Gunnarsson & Ekberg 2003).
According to (Haupt 2004), an email is considered a spam if:
i: The identity of the recipient is not relevant.
ii: The recipient never grant consent or permission for the email to be sent.
iii: The sender receive a sum of benefit out of proportion by sending spam mails to recipients.
7
During the late 1990s, the spam has become more and more an issue in the technology world.
There are different opinion in media and academic on this problem. At one corner, people regard
the spam problem as a mild annoyance (Crews 2001). On the other corner, people regard it as
predictions of doom. There are fears that the spam will overwhelm users and stops them from
using email altogether. Based on the issue ten report of Messaging Anti-Abuse Working Group,
the number of abusive emails is reported to be steadily in the range of 89% and 92% (MAAWG
2009). By 2015, the volume of spam is predicted to exceed 95% of all email traffic.
According to an AOL report in 2004, the spam volume has increased to an almost
100,000% from 1997 to 2004. Spam email is also being used as a vehicle for delivering viruses,
worms and phishing attacks that could lead to financial losses, data loss and identity theft
(Hoanca 2006) (Ming-Wei et al. 2005) (Catalin & Maria 2009). Even though there are a lot of
efforts by large companies, organizations and government over the recent years to stop spam, the
spam traffic continue to rise (Oda 2005). The increase of spam traffic result in what is equivalent
to distributed denial of services (DDOS) attack as the mail transfer agents(MTAs) resources are
being used to transfer spam traffic beside real email messages(Ming-Wei et al. 2005).
2.3 Privacy
Generally, internet users prefer to have the best personalized internet services available while at
the same time the ability to control their own privacy (Jacobsson & Carlsson 2007). They want
the rights to determine how their information is being used in the internet and by whom
(Gunnarsson & Ekberg 2003).
8
Spam is considered as one result of bad privacy protection. Companies collecting
information they could acquire from individuals could possibly sell it to any third party without
the owner's permission. The information obtained by the third party could be used to send spam
to individuals (Gunnarsson & Ekberg 2003).
2.4 Spam Prevention
At present, spam prevention were divided into 3 broad categories which is legislation, protocol
change and filtering (Hunt & Carpinter 2006). In legislation, rules and regulations were made by
a country or a group of country to keep the spam problem in check. For protocol change, new
ways for email communication is being studied to find better ways to reduce spam problem. This
includes email taxing, approaches and techniques to eliminate spam problem. In spam filtering,
filtering is divided into various categories as the Figure below adapted from (Hunt & Carpinter
2006).
Figure 2.1 The category of anti-spam solutions (Hunt & Carpinter 2006).
9
2.5 Rules and Regulations
Punishments against spammers tend to be more difficult as spam laws are limited to different
countries and states. Besides, there are no clear definition of spam which are agreed universally
(Oda 2005). A spammer that violate the spam law of a country may not violate the spam laws in
another country (Oda 2005). Because of this, all the spammers have to do are to move to places
that they didn‟t violate the laws (Hoanca 2006). Fundamentally, the privacy protection in
European Union is better than in United States (Gunnarsson & Ekberg 2003). One of the key
difference is that individuals in the United States do not own their own data collected from them
while individuals living in European countries do. Citizens in United States have different level
of privacy protection depending on the state they are living in. Based on the lessons learnt from
the Second World War, post-war Europe realized the threat of gathering private information.
Private information in the wrong hands might be devastating. European countries adopted the
United Nations guidelines and the Council of Europe Convention for the Protection of Human
Rights in 1950 (Gunnarsson & Ekberg 2003). The current law and legislation in USA and EU
tend to have less effect in controlling the spam volume. There are arguments that the current law
contributes to the increase of spam volume as spammers are allowed to send spam legally by
following certain rules (Carpinter & Hunt 2006).
10
2.6 CAN-SPAM Act
In Can-Spam Act, UBE are required to have labeling, opt-out instructions and sender‟s physical
address. Under this law, messages are prohibited to have deceptive subject lines and false
headers (Gunnarsson & Ekberg 2003). The first case that is charged under the CAN-SPAM act
could be traced to Anthony Greco, 18 from Cheektowaga, New York. Anthony was alleged to
have sent more than 1.5 million spam over Internet Messaging or SPIM in MySpace.com. He
threatened to tell other spammers how to send spim unless given the exclusive right to keep
sending spim (Hoanca 2006).
Despite the existence of the CAN-SPAM Act, there are studies that show that there have
been very low rate of compliance by advertisers. The studies were conducted based on 1,133
email messages from 4,800 email messages in 5 email accounts (Galen 2007). Based on these
studies, it shows that more need to be done to reduce the spam problem besides legislation alone.
The bill of right in United States is based on five principles (Gunnarsson & Ekberg 2003).
i.No personal data record-keeping systems whose very existence should be secret.
ii.A person should be able to find out what personal information is stored and used.
iii.A person must be able to prevent his or her personal information from being used or available
for other purposes than the intended purpose.
iv.A person must be able to correct identifiable personal information.
v.Organisations creating, maintaining, using, or selling records of personal data must assure the
reliability of the data for their intended use and prevent misuses.
11
While for European Parliament they adopted Data Protection Directive(95/46/EC) in 1995. The
Directive states that (Gunnarsson & Ekberg 2003) :
i. Member states shall protect the fundamental rights and freedom of persons, and in
particular their right to privacy with respect to the processing of data.
ii. There are also principles relating to data quality, which for instance declare that
personal data must be collected for specified, explicit and legitimate purposes and
not further processed in a way incompatible with those purposes.
iii. Personal data must also be processed lawfully, and it must be adequate, relevant and
not excessive in relation to the purposes for which they are collected. The personal
data may only be processed if the data subject has given his or her consent.
iv. The controller also must inform the data object of his or her right to access and
rectify the data concerning him or her. In the cases where the data have not been
obtained from the data subject, the controller must inform the data subject of the
identity of the controller, the purposes of the processing and other information.
The crisis of terrorist attack on September 11, 2001 has lead the American government to
ignore the privacy of internet users (Gunnarsson & Ekberg 2003). In order to track responsible
terrorists, Federal Bureau of Investigation or FBI has been installing controversial cyber
snooping software DCS-1000 known as Carnivore in Internet Service Providers in United States
12
(Gunnarsson & Ekberg 2003).There are reports that Central Intelligence Agency, CIA had leaked
sensitive commercial information gathered by the signals intelligence collection and analysis
network ECHELON. The leak of sensitive information actually leads Boeing to win an aircraft
contract worth $6 billion from Airbus. The former director of CIA James Woosley however,
clarified that it was for the purpose to “level the playing field” only (Gunnarsson & Ekberg
2003).
Simple Model Of Email Delivery
Figure 2.2 Model Of Email Delivery. Source from (Hoanca 2006).
The sender client, sender server, receiving server and receiving client are software and hardware
subsystems. To send an email a sender client compose a message when connected with the
sender server. The message is then sent to the sender server. The sender server then connects to
the receiving server and validates the existence of the recipient account before transmitting the
message to be stored in the server. The message will be retrieved by the receiving client when
the receiving client is connected to the system.
(1)Sender
Client(Outlook
Express, Eudora)
(2)Sender
Server(Exchange,
Sendmail)
(3)Receiving
Server(Exchange,
Sendmail)
(4)Receiving
Client(Outlook
Express, Eudora)
13
If any technique to block or reduce spam is being applied at (4)Receiving Client, it will
help reduce the loss of productivity at the human recipient. But, the cost to deliver the message,
by the sender server to the receiving server and to the client will have to be the burden of the
server owner and later passed on to the end user. To effectively stop the spam, spam control
techniques should be applied before it even leaves the sending client.
2.7 Ways to prevent spam
Generally, there is no silver bullet in spam prevention. That's it, no one approach fits all.
Prevention Methods How it works Advantages Disadvantages
Rules and Regulation Using laws and
legislations in
banning the activities
of spam in the
country.
The activities of
spamming in the
country are
prohibited.
Rules and regulations
have been the least
effective methods
used against spam.
Spam laws created in
a region tend to push
spammers offshore
outside the
jurisdiction of the law
instead of eliminating
it (Hoanca 2006).
Spam Filtering (Black
Listing) (Hoanca
2006)(Lorrie & Brian
1998)
The method of
blacklisting functions
by user listing the
email addresses to be
blocked.
Blacklisted email
addresses could not
send emails to users.
Spam filtering at the
receiving end is not
that effective as the
cost of sending the
spam has been borne
by the receiving
server. This method
could be easily
overcome by
spammers by using
botnets or zombie
Spam Filtering (White
Listing)(Hoanca 2006)
Whitelisting allows
only certain email
addresses to deliver
email to the address.
Emails from other
addresses are not
accepted and spam
problems is reduced.
This solution however
require users to add
new recipients
manually
14
Spam Filtering
(Bayesian Decision
Making)(Sahami et al.
1998)
A machine learning
spam filtering method
that are based on
probabilistic
calculations
The anti spam system
will continuously
calculated the
probability of each
spam words and
update the spam
filter.
Possibility to bypass
this filter using
Bayesian Poisoning.
Rate Throttling
Approach(Teergrubing)
(Hoanca 2006)
Teergrubing
functions by delaying
the receipt of email
messages.
The use of
Teergrubing has less
impact when single
messages are sent by
the server. However,
for spammers sending
a large number of
emails, it could slow
down the spammer‟s
server/s significantly.
Softwares that use
this concept are
TarProxy and Jackpot
(Hoanca 2006).
Besides consuming
the resources of the
spammers,
Teergrubing also
consume the
resources of the
server.(TWINING et
al. 2004)
Rate Throttling
Approach(TCP
damping) (Hoanca
2006)
By using TCP
damping, server that
receive email
messages will
calculates spam
scores for delivered
messages. The server
will then artificially
delay the
confirmation of email
messages that have
high spam scores.
For one sender, the
delay of sending the
email is not
significant but for
spammers that sent a
large number of
messages, it greatly
slows down the
process. The use of
TCP damping could
indirectly help
authorities to detect
spammers. Legal
servers will keep on
delivering messages
even though the delay
increase but servers
that are sending spam
tend to give up on
increasing delay.
To use TCP Damping
however, it require
code on the receiving
side to be rewritten to
use the spam scores.
Users of the servers
that are being used for
sending spam would
have to bear the
trouble of delay to the
receiving server. The
spam score are
dependent to the
filters that determine
the score. The
spammers could also
modify their spam
message format to
evade detection.
Rate Throttling
Approach(Grey
Listing) (Hoanca 2006)
The way grey listing
works is by first
refusing the
connection to the
There are reports that
the combination of
grey listing, white
listing and black
The drawback of
using grey listing is
that it has high false
positive rate. False
15
server that are not in
the whitelist. Normal
servers will attempt
to retransmit again
but for spam servers,
it is unlikely that it
will retry.
listing helps to reduce
spam by 88%.
positive are incident
that normal mail are
being misinterpreted
as spam. Besides,
some poorly
configured servers
will drop the
connections when
being denied
connection the first
time.
Alliance-Based
Approach(Yu-Fen et al.
2007)
The alliance-based
server functions by
using multiple servers
located at different
locations connected
to each other.
Through reliable and
secure connection,
they synch data
consisting of spam
signatures with each
other. Each server
will have their own
group of user and the
server will learn from
the user.
The system is able to
block more spam and
have good
performance.
Spam detection still
requires long
processing time.
Counterattack solutions
(Lorrie & Brian 1998)
Counter attack
solutions work by
replying to spam with
false applications.
The false applications
will burden the
spammers as they are
unable to differentiate
between the real
applications and fake
one. The technique
also works by
sending mass
complaints to the
spammers ISP.
This tactics
sometimes help to
produce
inconvenience to the
spam sender. The
spam sender will also
have their accounts
revoked by the ISPs.
Sometimes the true
identity of the spam
senders are hard to
trace as the spam
sender could be using
other victims email
addresses. The
counterattacks may
also end nowhere
leaving a large
amount of bounced
notice messages.
Opt-out list(Lorrie &
Brian 1998)
Users click the link
provided by the mail
to stop receiving
By selecting opt-out
list provided with the
spam mail, user will
However, sometimes
selecting the opt-out
will let spammers to
16
mails from the same
source in the future.
be able to stop
receiving from the
source.
know the existence of
the address and spam
even harder.
Channels(Lorrie &
Brian 1998)
A channelized email
functions by using
multiple email
address for a single
mail. A user may use
the public email
address alias for
business cards, public
posting on blogs or
submitting emails.
For private purposes,
the user may assign
another email address
alias to the same
email account. The
email account will
store the emails
received in different
channels according to
different email alias.
Once the user started
to receive spam from
the public address
alias, the user could
delete that address
alone and not the
whole email account.
The drawback of
using channels is
sometimes important
or wanted mails were
also received on the
same channel that
receives spam.
Authentication-Based
Spam Control(Hoanca
2006)
The users need to
login before using the
system. Each account
will have a reputation
score. Users that
seldom send spam
will have more
control of their
account as they have
a higher reputation
score. Users that send
lot of spam will have
a low reputation score
and have less control
of their account
As the emails will be
blocked or delayed if
the sender‟s IP were
not from
authenticated users,
this technique are
effective in reducing
spam.
Sender‟s could hijack
or infect trusted
sender‟s computer
with worm and used it
to send spam.
Munging(Ming-Wei et
al. 2005)
Munging works by
changing the email
address to a form that
are not detectable by
email harvester and
spam bots. For
example,
Munging could fool
spambots and other
email harvester
temporarily.
Spammers could
design their spambots
to be able to adapt to
munging tricks and
make the technique
ineffective.
17
could be change to
“jack at yahoo dot
com”. Email
addresses that are
being munged could
temporarily fool
spammers
Table 2.1 Various types of anti-spam solution and its description.
2.8 Combined Solutions
Generally, using combined solutions is better than using a single solution alone. Each solution
has their own weaknesses and strength. By using multiple filtering techniques, one solution
weakness could be covered by another solution (Hoanca 2006).
2.9 The Challenge of Botnets
To prevent detection, spammers have started to use botnet for their attack. In (Al-Bataineh &
White 2009), the use Botnet which consist of a network of compromised machine for spamming
is highlighted. By spamming this way, the group of botnets which receive command sent by its
master could launch attack and start mass-mailing from a large number of different sources of
machines crossing many domains of network. This could make it harder for the source of spam
to be detected.
18
2.10 The Effectiveness of Spam Filtering
The effectiveness of spam filter are literally categorized into these four categories (Yu-Fen et al.
2007):
False positive – Spam mail are being classified as non-spam mail.
False negative - Non-spam mail is being classified as spam mail.
True positive - Spam mail is being classified as spam mail.
True negative - Non-spam mail is being classified as non-spam mail.
Spam filter that have a high false positive rate is being considered to be less effective as
spam mail could still enter the inbox of email. With lower the false positive rate, less spam mail
will be found in inbox. In contrast, spam filter that have high false negative are considered a
more serious problem as real email are being classified as spam. If a spam filter that has high
false negative rate is being used for filtering, it will be a serious problem as important mails are
being blocked from reaching users.
19
2.11 Bayesian Poisoning
Bayesian poisoning is a method used by spammers to evade detection by machine-learning anti-
spam system. Using various ways, keywords word like “Viagra” could be modified to become
“V1agra”, “v1@gra” and so on (Hayes 2007). The slight modification of certain specific
keywords was designed to reduce the sensitivity of spam filters by injecting it into spam mail. As
Bayesian filter use spam mails for training, the weight score of certain common keywords like
Viagra could be reduced and thus reducing the effectiveness of anti-spam system.
In MIT spam conference of 2004, John Graham-Cumming demonstrates of two ways that
could possibly be used to attack POPFile‟s Bayesian engine (Graham-Cumming 2004). The first
way is by inserting random word from various literatures into the spam. This method did not
work because some of the words inserted are either in spam signature database or in words
identified as ham or in none of these two categories. The other way of attack was successful,
however. The method works by inserting random words into small amount of spams and then
add a web bug to confirm the reception. Once the web bug confirmed the reception, the system
will be trained to use the same poison words. After sending large amount of spam to a user,
certain amount of words are confirmed could be used to get through the anti-spam engine. As the
threat of Bayesian poisoning is real, research on counter-measures of Bayesian poisoning need to
be done before the problem get out of hand (Graham-Cumming 2006).
20
2.12 Needleman-Wunsch Algorithm
The Needleman-Wunsch algorithm is published by Saul B. Needleman and Christian-Wunsch
for the first time in (Needleman & Wunsch 1970). Since then, it has become the first algorithm
that applies the concept of dynamic programming in biological sequence comparison. Being an
approximate algorithm, Needleman-Wunsch are widely used for research into Deoxyribonucleic
acid(DNA), amino acid and protein alignment (Thomas & Rance 2003)(Lesk, Levitt & Chothia
1986)(Rose & Eisenmenger 1991)(Canella & Miglioli 2003)(Needleman & Wunsch 1970)(Du &
Lin 2004)(Xia & Dou 2007)(Mark & Michael 1996). As stated in (Thomas & Rance 2003),
DNA were consist of three parts which is phosphates, sugars and nitrogenous bases. The
nitrogenous bases are the information-containing portion of the DNA and represented in four
bases which is adenine, cytosine, guanine and thymine. Therefore in bioinformatics, these
nitrogenous bases were represented as A, C, G and T. Needleman-Wunsch algorithm have been
widely used to find similarity in DNA, protein and amino acid that are otherwise impossible to
find using visual comparison as the length of nitrogenous bases in Brook Trout and the Arctic
Char fish mitochondrial genome alone amount to approximately 16 thousand characters (Thomas
& Rance 2003).
2.13 Smith-Waterman Algorithm
Smith-Waterman algorithm is being published in 1981 in (Smith & Waterman 1981). Since it is
being published, Smith-Waterman are widely used in various areas like DNA(Li, Shum &
Truong 2007) (Xiandong & Vipin 2004), RNA(May et al. 2007), amino acid (Brutlag et al.
21
1993) and protein sequence comparison(Fa, Xiang-Zhen & Zhi-Yong 2002). Like Needleman-
Wunsch algorithm, Smith-Waterman belongs to the family of dynamic programming algorithm
(Nash, Blair & Grefenstette 2001). Dynamic programming algorithms require a large amount of
processing power and memory for its calculation. Various researches have been done to enhance
the speed of this algorithm. The researches includes using clusters (Boukerche, De Melo &
Ayala-Rincon 2005) and distributed computers (Jacob et al. 2007), SIMD (Hasan, Al-Ars &
Vassiliadis 2007), FPGAs (Benkrid, Ying & Benkrid 2007) (Storaasli, Strenski & Inc 2007) and
techniques to reduce memory usage and calculations (Fa, Xiang-Zhen & Zhi-Yong 2002) (Harris
et al. 2007).
2.14 Global Alignment and Local Alignment
Global alignment algorithms are used to align two sequences that are almost the same length. It
assume that two sequences are almost the same with minor differences in it (Knees, Schedl &
Widmer). For local alignment algorithm, it attempts to find group of similarity region in
sequences. Local alignment algorithm are more suitable to align two sequences that are relatively
very different in length in which is one is very long compared to the other (Boukerche, De Melo
& Ayala-Rincon 2005) (Christian & Jon 2006). Needleman-Wunsch are global alignment while
Smith-Waterman are local alignment based algorithm (Hasan, Al-Ars & Vassiliadis 2007).
22
2.15 Processing Time Issue
Dynamic Programming algorithms require a large amount of calculations to build the matrix
table. For example, the comparison of two strings of 5 characters long required 25 calculations.
While the comparison of two strings of 7 characters long require 49 calculations. The
exponential increase in the number of calculations also increases the requirement for more
memory to store the calculated table.
2.16 Heuristics Based Algorithms
BLAST is heuristics based algorithm that is used as an alternative to dynamic programming
algorithm in bioinformatics. Both of the algorithms are faster but less sensitive than dynamic
programming algorithms. Even though heuristic algorithms are less sensitive, it were chosen to
be used in bioinformatics as it require less processing power compared with dynamic
programming algorithm and therefore faster(Nash, Blair & Grefenstette 2001) (Hasan, Al-Ars &
Vassiliadis 2007) (Hsien-Yu, Meng-Lai & Yi 2004) (Boukerche, De Melo & Ayala-Rincon
2005) (Li, Shum & Truong 2007) (Xiandong & Vipin 2004) (Brutlag et al. 1993).
2.17 The Selection of Algorithm
In the process of finding suitable algorithms, various papers of different search algorithms have
been researched and looked into. Among the algorithms being studied are Apostolico-Giancarlo
algorithm(APOSTOLICO & GIANCARLO 1986), Horspool algorithm(HORSPOOL 1980),
23
Knuth-Morris-Pratt algorithm(KNUTH, MORRIS & PRATT 1977) and Brute Force algorithm.
The algorithms however, were found to be not suitable as these few algorithms are exact
algorithms. Exact algorithms were unable to detect spam signatures that were slightly modified.
In further research, it was found that algorithms for sequence comparison in bioinformatics have
the same characteristics for the sequence detection. In bioinformatics, algorithms were designed
to detect strings of genome that were slightly different from one another as a cause of insertion,
deletion and substitution. These approximate characteristics make algorithms used in
bioinformatics research very suitable to be applied in the detection of slightly modified spam
signatures.
There are algorithms in bioinformatics that were less sensitive in detection with less
complexity in computations like BLAST (ALTSCHUL et al. 1990) and higher sensitivity
algorithms like Smith-Waterman and Needleman-Wunsch that require high computations and
more complex steps (Nash,Blair & GREFENSTETTE 2001). Less sensitivity algorithms were
less accurate even though they were faster because of lower computations. Higher sensitivity
algorithms require more computation and are more challenging to develop because of the
complex steps involved. Despite the complexity and high demand of high sensitivity algorithms,
attempt to take on bigger challenge were made in this research. Instead of choosing one
algorithm that is highly sensitive, two algorithms were being implemented. It should be noted
however, that this research doesn‟t implement this two algorithms together as a combined single
unit. The milestone of this research is to implement the algorithms one at a time on the same
environment of hardware system. The possibility to combine both algorithms together will be
looked into in the future.
24
2.18 Calculations of Needleman-Wunsch Algorithm
Needleman-Wunsch algorithm consist of 3 parts calculations which is (i) computing the score,
(ii) computing the matrix table and (iii) performing the traceback. In step (i), two strings
sequence of characters that are going to be tested S1 “via1gra” and S2 “viagra” will be compared
to get the match and mismatch score. If there is a match a score of 1 will be given and 0 for a
mismatch.
76 69 61 67 72 61
76
69
61
31
67
72
61
0 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 1
0 0 0 0 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 1 0 0 1
S2 v i a g r a
ASCII
S1
v
i
a
1
g
r
a
i
j
Figure 2.3 The computed score table of Needleman-Wunsch.
Based on the score table, the main matrix table is calculated using the formula,
E i , j = max
E i – 1 , j-1 + score i-1 , j-1
E i , j-1 + W
E i – 1 , j + W
{
25
In this research, a value of 0 is used for the W gap.
76 69 61 67 72 61
76
69
61
31
67
72
61
0 0 0 0 0 0 0
0 1 1
0
0
0
0
0
0
S2 v i a g r a
ASCII
S1
v
i
a
1
g
r
a
i
j
E i , j = max
E i – 1 , j-1 + score i-1 , j-1
E i , j-1 + W
E i – 1 , j + W
{
Figure 2.4 The calculation of a cell in the matrix table of Needleman-Wunsch algorithm.
76 69 61 67 72 61
76
69
61
31
67
72
61
0 0 0 0 0 0 0
0 1 1 1 1 1 1
0 1 2 2 2 2 2
0 1 2 3 3 3 3
0 1 2 3 3 3 3
0 1 2 3 4 4 4
0 1 2 3 4 5 5
0 1 2 3 4 5 6
S2 v i a g r a
ASCII
S1
v
i
a
1
g
r
a
i
j
Figure 2.5 The completed calculations of the matrix table of Needleman-Wunsch algorithm for
viagra and via1gra strings.
26
Upon completing the matrix table, traceback is being performed starting from the bottom
rightmost of the cell to the upper left most of the cell using the formula Max{ Ei-1, j-1+Si,j, Ei,j-1,
Ei-1,j }. Traceback is performed to get the results from the table. The white spaces in Figure 2.6
shows the traceback performed starting from value of 6 to value 0 of the matric table.
76 69 61 67 72 61
76
69
61
31
67
72
61
0 0 0 0 0 0 0
0 1 1 1 1 1 1
0 1 2 2 2 2 2
0 1 2 3 3 3 3
0 1 2 3 3 3 3
0 1 2 3 4 4 4
0 1 2 3 4 5 5
0 1 2 3 4 5 6
S2 v i a g r a
ASCII
S1
v
i
a
1
g
r
a
i
j
Figure 2.6 The traceback being performed on the matrix table of Needleman-Wunsch algorithm.
Based on the traceback, it produced the result of :
v i a 1 g r a
| | | | | |
v i a _ g r a
The white spaces in Figure 2.6 shows the path of the traceback being performed. Based on the
result produced, it produced six matches even though the string S1 has been inject with an extra
character to produce via1gra.
27
2.19 Calculations of Smith-Waterman Algorithm
Like Needleman-Wunsch algorithm, the calculations of Smith-Waterman algorithm also consists
of 3 parts involving computing score, building the matrix table and performing traceback. For the
computing score segment, a score of +2 is given if there is a match and -1 for a mismatch.
61 70 68 72 6F 64
61
70
68
72
6F
64
69
0 0 0 0 0 0 0
0 2 -1 -1 -1 -1 -1
0 -1 2 -1 -1 -1 -1
0 -1 -1 2 -1 -1 -1
0 -1 -1 -1 2 -1 -1
0 -1 -1 -1 -1 2 -1
0 -1 -1 -1 -1 -1 2
0 -1 -1 -1 -1 -1 -1S
2 a p h r o dASCII
S1
a
p
h
r
o
d
i
i
j
31
0
-1
-1
-1
-1
-1
-1
-1
73
0
-1
-1
-1
-1
-1
-1
-1
69
0
-1
-1
-1
-1
-1
-1
2
40
0
-1
-1
-1
-1
-1
-1
-1
63
0
-1
-1
-1
-1
-1
-1
-1
73 0 -1 -1 -1 -1 -1 -1 -1 2 -1 -1 -1
69 0 -1 -1 -1 -1 -1 -1 -1 -1 2 -1 -1
61 0 2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
63 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2
s
i
a
c
1 s i @ c
Figure 2.7 The score table of Smith-Waterman algorithm.
Using the score table as in Figure 2.7, the matrix table is built using the formula shown below.
The completed table is shown in Figure 2.9.
28
61 70 68 72 6F 64
61
70
68
72
6F
64
69
0 0 0 0 0 0 0
0 2 2 2 2 2 2
0 2 4 4 4 4 4
0 2 4 6 6 6 6
0 2 4 6 8 8 8
0 2 4 6 8 10 10
0 2 4 6 8 10 12
0 2 4 6 8 10 12S
2 a p h r o dASCII
S1
a
p
h
r
o
d
i
i
j
31
0
2
4
6
8
10
12
12
73
0
2
4
6
8
10
12
12
69
0
2
4
6
8
10
12
14
40
0
2
4
6
8
10
12
14
63
0
2
4
6
8
10
12
14
73 0 2 4 6 8 10 12 12 14 14 14 14
69 0 2 4 6 8 10 12 12 14 16 16 16
61 0 2 4 6 8 10 12 12 14 16 16 16
63 0 2 4 6 8 10 12 12 14 16 16 18
s
i
a
c
1 s i @ c
Figure 2.8 The computed matrix table of Smith-Waterman algorithm.
After the table is computed, the traceback is performed starting from the highest value in the
matrix table until it reaches the neighbouring cells that are zero in value.
E i , j = max { 0
E i – 1 , j-1 + score i-1 , j-1
E i , j-1 + W
E i – 1 , j + W
29
61 70 68 72 6F 64
61
70
68
72
6F
64
69
0 0 0 0 0 0 0
0 2 2 2 2 2 2
0 2 4 4 4 4 4
0 2 4 6 6 6 6
0 2 4 6 8 8 8
0 2 4 6 8 10 10
0 2 4 6 8 10 12
0 2 4 6 8 10 12
S2 a p h r o d
ASCII
S1
a
p
h
r
o
d
i
i
j
31
0
2
4
6
8
10
12
12
73
0
2
4
6
8
10
12
12
69
0
2
4
6
8
10
12
14
40
0
2
4
6
8
10
12
14
63
0
2
4
6
8
10
12
14
73 0 2 4 6 8 10 12 12 14 14 14 14
69 0 2 4 6 8 10 12 12 14 16 16 16
61 0 2 4 6 8 10 12 12 14 16 16 16
63 0 2 4 6 8 10 12 12 14 16 16 18
s
i
a
c
1 s i @ c
Figure 2.9 The traceback being performed on the computed matrix table of Smith-Waterman
algorithm.
2.20 FPGA
Hardware circuits are generally divided into two groups which are ASIC and FPGA. ASIC or
application-specific integrated circuit is circuits that are produced for general-purpose use and
targeted for mass production. When mass produced, ASIC tend to cost lower per unit of chip
because of low recurring cost. However, the drawback of ASIC is that it requires huge non-
recurring cost for design. Once the circuits are burned into chips, no modifications could be
made to it (JOAN 2009). Therefore, during the design stage, it require careful design and testing
as any unforeseen error detected after the production could made all the chips that have already
30
been produced to be defective. FPGA or also known as field-programmable gate array on the
other hand, offers the feature of re-programmability of the chip. After the circuits are designed
and programmed into FPGA, the designer could made modifications and reload the new design
into the hardware FPGAs. The re-programmability of FPGA made it to be more suitable for
prototypes design testing and development. FPGA also didn‟t have huge non-recurring cost
(XILINX 2010).
2.21 FPGA Platforms
To date, Xilinx is still the market leader for FPGA followed by Altera. As the founder of FPGA
technologies, Xilinx lead with more than 50 percent of the market share. The two FPGA
companies dominate the FPGA market while smaller companies like Silicon Blue, Achronix,
Tabula, Actel, Lattice, Abound (M2000), Tier Logic and others make their entries and exits from
time to time.
Xilinx have been actively rolling out new development boards throughout the years.
From every new products there are improvements in terms of higher capacity, lower power
consumption, higher speed, better throughput and much more. The variants of Xilinx
development boards are divided to Spartan and more powerful Virtex category. From older to
newer models, in the Spartan category there is Spartan-3E, Spartan-3A Spartan-6. In Virtex,
there are Virtex-4, Virtex-5 and Virtex-6 models. Each models are further broken down to
variants that are customized to suit different needs.
Like Xilinx, Altera too were actively rolling out their own products as well. For low cost
FPGAs categories, there are Cyclone II and Cyclone III. Altera also offered Arria GX for low
31
cost with transceivers FPGA. For high-end FPGAs, Altera has Stratix, Stratix II, Stratix III and
Stratix IV as their products line up.
2.22 Future Demand
The demand of FPGAs has been increasing over the years. FPGAs have been gradually
displacing ASICs and widely available off-the-shelf ASSPs (Application Specific Standard
Product). FPGAs is widely used in various areas such as defense, aerospace, broadcasting, wired
and wireless communications, automotive, industries, medical devices and scientific research.
2.23 Parallelism
Parallelism has been widely researched and implemented in various implementations to speed up
calculation speed of algorithms. There are two ways of applying parallelism of which one by
using multiple units of hardware to work together and the other by applying parallelism inside
the hardware itself. For example with Pentium 4 processor powered computers, parallelism could
be implemented by using 4 single core of Pentium processors on the same motherboard of the
server. Or else, it could be implemented by implementing a unit of quad-core Pentium processor
inside the server. Parallelism inside the hardware itself provide better advantage in terms of
reducing cost, smaller size, lower power consumption, less heat dissipation and increased
performances as a result of shorter communication distances. Parallelism is applied in the
implementations of hardware for Needleman-Wunsch and Smith-Waterman algorithm in this
research. Instead of using a single processing element, the two hardwares adopt multiple units of
32
processing elements working concurrently. This helps to reduce the table processing speed from
mn to m+n with m and n the length of the two sequences being compared.
33
Chapter 3
Design Methodology
This chapter is dedicated to explain the overall implementation of the Needleman-Wunsch and
Smith-Waterman spam or unsolicited commercial email inline filter. The two systems need to
undergo various steps of development before becoming a reality. The software and hardware
applications and tools used for the design will also be described in this chapter.
3.1 Flow Diagram
Implement and configure the FPGAs with
Microblaze and all the hardware units except
the side_relay and the algorithm IP
Build and code the web server application
in C to run on the Microblaze
Import the design into ISE.
Insert the Chipscope Definintion and Connection file and
system .ucf file to the hardware in ISE for monitoring purpose.
Attach 2 FSL interface to the Microblaze. One for
side_relay and the other for the algorithm IP.
Design, develop,simulate and test
the side_relay unit in VHDL.
Design, develop, simulate and test the
algorithm IP of Needleman-Wunsch
and Smith-Waterman in VHDL.
Update the C code of the application
running on Microblaze to include
operations to handle the two new hardware
Perform testing using the
complete FPGA system
environments.
Figure 3.1 The stages involved in designing the inline filter.
34
Send emails to the FPGA
system web server via a TCP
client application on the
desktop.
FPGA system environments
integrated with Needleman-
Wunsch hardware on
standby.
Done
Start
System on
standby
NoYes
FPGA system
perform scanning
Output
results
Power down
End
Send emails to the FPGA
system web server via a TCP
client application on the
desktop.
FPGA system environments
integrated with Smith-
Waterman hardware on
standby.
Done
Start
System on
standby
NoYes
FPGA system
perform scanning
Output
results
Power down
End
Figure 3.2 Illustrates the operations of the FPGA systems.
3.2 Overall Architecture
The overall connections of the inline filter are being shown in Figure 3.3 below. The Xilinx
Development platform that are being used is connected to three cables which is the crossover
35
Ethernet cable, null modem cable and JTAG Boundary Scan or IEEE/ANSI standard
1149.1_1190 cable.
Xilinx Development tools for hardware
and software development and download
JT
AG
ca
ble
(do
wn
loa
d
.bit a
nd
.e
lf to
th
e b
oa
rd)
Debugging Terminal
Nu
ll M
od
em
ca
ble
(co
nn
ecte
d to
RS
23
2 o
f th
e d
eve
lop
me
nt b
oa
rd
for
de
bu
gg
ing
an
d a
na
lysis
)
TCP Client
Crossover cable(Send email) Xilinx
Development
Board
Figure 3.3 The overall design of the architecture.
36
3.2.1 Crossover Ethernet Cable
The computer connects to the TCP server using the crossover cable. With simple TCP client
software installed in the computer, emails are sent to the Xilinx Development Board for scanning
purpose through this Ethernet cable. The crossover Ethernet cable is connected to the Ethernet
port on the computer and on the Xilinx Development Board.
3.2.2 Null Modem cable
The Xilinx Development Board model that is being used is ML505 LX50T FFG1136 and this
board contains one male DB-9 RS232 serial port. This port can be used to communicate and
transfer serial data to other devices. The port is designed to operate at a speed of up to 115200
Bd. A null modem cable is required to connect the serial port on the Xilinx board to the RS232
serial port on the computer. Using serial port (COM) terminal emulation software like
HyperTerminal or Terminal, user could view the output from the Xilinx board at the computer.
This could help user to get information of the current status of the design in FPGA and for
debugging purpose.
37
3.2.3 JTAG Cable
JTAG cable is used to program the FPGA on the Xilinx Development Board with hardware and
software created by user. User could create the bitstreams (.bit) and executable and linkable
format(.elf) file in the computer using relevant development tools and then download it to the
FPGA. The JTAG cable could also be used for debugging using software like Xilinx Chipscope
Pro or Xilinx Platform Studio SDK.
3.3 Microblaze
Microblaze is a soft-core RISC (Reduced Instruction Set Computer) processor. It is developed by
Xilinx to run embedded applications in FPGA. Microblaze is equipped with 32-bit address, data
buses, instruction word and registers. Based on Harvard architecture, Microblaze have PLB v46,
FSL bus and LMB (Local Memory Bus). The number of Microblaze processor that could be
implemented into the FPGA depends on the capacity of the FPGA itself. The Microblaze Debug
Module (MDM) however, could accommodate the debugging of up to 8 Microblaze processors
at a time. Microblaze is highly customizable in that we could choose what IP that are needed to
work with the processor (Embedded Systems Development, 2008).
38
3.4 Microblaze Hardware Design
The Needleman-Wunsch and Smith-Waterman IP are designed to work with the Xilinx
Microblaze processor using Fast Simplex Link (FSL) as the bus interface. As Microblaze is a
programmable processor created to work in FPGAs, it is highly customizable. The
microprocessor could be created and modified using Xilinx ISE Design Suite.
In Figure 3.4, the diagram shows how the Microblaze processor, the algorithm IP and the
IO hardware interconnect with one another in FPGA. The soft IP xps_ethernetlite, xps_uartlite
and xps_sysace connect to the real hardware of Ethernet_MAC, RS232_Uart and
SysACE_CompactFlash. Microblaze core connect to the BRAM using its own bus which is
Data-Side Local Memory Bus (DLMB) and Instruction-Side Local Memory Bus (ILMB). The
Microblaze core has one FSL interface connected to the side relay and two connected to the
algorithm IP. The design can be created using Xilinx Platform Studio (XPS). XPS allow users to
set the parameter for each block of hardware and helps to configure the algorithm IP and the
side_relay to connect to the Microblaze core using FSL interface.
39
MPMC MODULE INTERFACE
xps_ethernetlite
Ethernet_MAC
xps_uartlite
RS232_Uart
xps_sysace
SysACE_CompactFlash
xps_intc
xps_intc_0
xps_timer
xps_timer_1
mb
_p
lb
microblaze
Mdm
Debug_module
bram_block
lmb_bram
test_nw or
test_sw
side_relay
dlmb ilmb
microblaze_0_dbg
fsl
fsl
fsl fsl
clock_generator
clock_generator_0
proc_sys_reset
proc_sys_reset_0
dxcl ixcl
Figure 3.4 The design of the Microblaze and how it interconnects with the inline filter. The
design are located in the FPGA of the Xilinx Development board.
3.5 Microblaze Software Design
To run the Microblaze, software need to be created and downloaded to the processor. Microblaze
supports C programming language and could be run based on various Real Time Operating
40
System (RTOS). For this design, Xilkernel 4.0 is choosen for the development. Xilkernel is a
RTOS available for free in XPS created by Xilinx. Xilkernel is highly customizable, robust and
small in size. There are also other third party RTOS available in the internet that could work with
Microblaze. Beside the Xilkernel RTOS the design involves the usage of other library like Xilinx
Memory File System (xilmfs), LibXil FATFile System (Xilfatfs) and the open source lightweight
IP (lwIP).
Initialize main thread
Start Echo application thread
Create network thread
Set IP and MAC address
Initialize socket
Bind
Listen
Connection
requested?Accept connection
Initialize LWIP
Start packet receive thread
yes
TCP Client
Request connection
Accept connection
Figure 3.5 The flow of the software in Microblaze once the design is started.
41
Process echo request
Change the data in memory
to lower character
Call function read_file
Perform scanning of data
from ethernet and flash
database
Read data/email
from ethernet to
memory
Read the spam
signature inside
database flash
into memory
Generate and
print result
Accept connection
TCP Client
Read data/email
Hello Jack. How
are you? I’m….
hello jack. how
are you? i’m….
To lower case
Figure 3.6 The flow of software after a connection is accepted. A new thread is created and
certain function will be called to perform the relevant task.
Figure 3.5 and Figure 3.6 shows the flow of the software designed to run on the Microblaze
processor. The software acts as a TCP server and constantly listens for connection request. Once
connection is accepted, a new thread is created to handle the connection. The TCP server will
then went back to listening mode. The new thread that is created will then perform the functions
42
as in Figure 3.6. The scanning function of Figure 3.6 could be further broken down to what is
shown in Figure 3.7.
Perform scanning of data
from ethernet and flash
database
Tokenize the email
Send tokenized email to FSL
master of the algorithm IP
Read result from FSL slave
of the algorithm IP
Operation
complete?
Generate and
print result
Yes
No
Send signature to FSL master
of the side_relay
hello jack. how
are you? i’m….
Tokenization[hello] [jack.] [how]
[are] [you?] [i’m]….
Figure 3.7 Further elaboration of the scanning function is shown in this figure.
43
3.6 Software and Hardware Development Applications
Main software that are being used to develop the system will be further explained below. There
are several types of software involved in developing the systems.
3.6.1 Xilinx ISE Design Suite 10.1
The ISE design suite which comprise of Xilinx ISE, EDK and Chipscope Pro are being used in
the development of the system.
3.6.2 Xilinx ISE 10.1(Xilinx Integrated System Environment)
This is the software used to develop the IP (intellectual property) engine of Needleman-Wunsch,
Smith-Waterman and the side_relay unit. The tool could support both VHDL and Verilog
hardware description language. For this system development, VHDL was chosen as the
development language. The ISE tool provides the ISE simulator to perform behavioral and post-
route simulation. User could also choose to use other simulator such as Modelsim created by
other third party vendor when developing their IP engine. Besides being used to develop the
44
VHDL IP engine, Xilinx ISE could also be used to import the Microblaze block from XPS to
connect to the VHDL system and integrate the Chipscope functionality into the ISE project.
3.6.3 Xilinx XPS 10.1(Xilinx Platform Studio)
Xilinx Platform Studio are the tools used to develop and configure the Microblaze processor.
Using the Base System Builder wizard (BSB), user could choose and customize what they want
in their Microblaze block. Among the options that could be chosen by the user in BSB are:
-the target development board used by the design.
-the type of processor used, either PowerPC or Microblaze.
-the processor bus clock frequency, data and instruction BRAM.
-the IO devices that will be used.
-the sample application for the device.
-memory device to hold the simple Memory Test and Peripheral Selftest application of
Microblaze.
-standard input, output and boot memory for the devices.
By using the BSB, the wizard would helps to configure the UCF (User Constrained File) to
connect to the hardware. The Xilinx Platform Studio could support the creation of simple
software or application to run on the processor. It also provides Xilinx Microprocessor Debugger
45
(XMD) and GNU debugger for software debugging purposes. For the development of more
complicated applications, users could use XPS SDK software.
3.6.4 Xilinx XPS SDK 10.1(Xilinx Platform Studio Software Development Kit)
The XPS SDK are used to develop and debug more complex applications. Using XPS SDK,
users could connect to the hardware design that are generated and downloaded into the
development board. After connecting to the design in the board, user could then create their
application in C programming language and debug before generating the .elf file and download it
to the board. The process of debugging and downloading the application will be repeated in the
duration of the software development.
3.6.5 Xilinx ISE Simulator
The ISE simulator is being used during the development of the VHDL system. Using ISE
simulator, user could create testbench in VHDL and generate input signals into the system
created. User could then view the output signals and perform necessary modifications and
debugging.
46
3.6.6 Xilinx Chipscope Pro
Chipscope Pro software is used to monitor the real hardware signals of the engine once it is
downloaded into the FPGA of the board. Using Chipscope Pro core inserter, user could create
the necessary monitoring cores and connect it to the system design. The .bit file will then be
generated in ISE to be downloaded to the board. Once the design start running in the
development board, the Chipscope cores will start to gather signals at certain location of the
design determined by us and send the signals using JTAG cable to the Chipscope Pro Analyzer
software in the computer. Chipscope Pro is used for real time monitoring and debugging of the
design. If there are any errors encountered in Chipscope Pro Analyzer, the user will have to
return to the initial system design for corrections. Chipscope Pro could be integrated as part of
the ISE design. In Chipscope Pro, there are 4 types of core that could be integrated which is :
-IBA (Integrated Bus Analyzer)
Used to debug the IBM CoreConnect Processor Local Bus (PLB).
-ILA (Integrated Logic Analyzer)
A module that let users to view the trigger signals of the hardware design.
-VIO (Virtual Input/Output)
Helps to monitor and drive signals into the design in real-time. The VIO core could be used to
generate signals into the design and could be integrated as the permanent part of the system
design.
47
-ICON (Integrated Controller)
The ICON core helps to provide communication path to connect to other Chipscope cores. A
single ICON core could support the connections of up to 16 Chipscope cores.
3.7 Other Applications Used
This sub-chapter describes about other softwares and programs that complement the system. The
softwares were used to send inputs and read outputs from the target device for debugging and
analysis purpose.
3.7.1 Hyper Terminal
The Hyper Terminal software is being used for the purpose of debugging the software
applications that run in Microblaze. As the Xilinx development board have a debugging RS-232
serial port that supports null modem cable, it could be connected to the RS-232 serial port of the
CPU. Using the Hyper Terminal application, users could view the outputs of the software
application. This could aid in the debugging process of the software.
48
Figure 3.8 The sample coding used to display output from the serial port.
Sample coding in the XPS SDK. The coding are in C language. Note the syntax „print‟,
„xil_printf‟ are being used in the coding. The output of this statement will be displayed in the
Hyper Terminal.
3.7.2 Bray Terminal v1.9b
Bray terminal is another alternative to Hyper Terminal software.
49
3.7.3 Simple TCP Client
A simple application from bitArt that could create TCP connection from the computer installed
with it to TCP server. This application is being used to create connection to TCP server software
running on Microblaze inside the FPGA of the development board.
3.8 Programming Languages Used
During the development of the systems, two programming languages were used which are
VHDL and C programming. The sub-topic below will brief more about this.
3.8.1 VHDL
Short form for Very High Speed Integrated Circuit Hardware Description Language (VHSIC
HDL). VHDL are one of the hardware description language (HDL) widely used besides Verilog
HDL. VHDL development was initially supported by US Department of Defense in the 1980s
and were used as standard hardwares documentation by them. VHDL was standardized by
Institute of Electrical and Electronics Engineers (IEEE) in 1987 as VHDL-87. It was later
revised by IEEE as VHDL-93 and later VHDL-2001.
50
For this thesis, VHDL were being used to develop the side_relay and the algorithm IP of
Needleman-Wunsch and Smith-Waterman. The design units were developed and tested in Xilinx
ISE before being attached to Xilinx Microblaze.
3.8.2 C Programming
C programming language is a structured programming language widely used around the world to
develop applications and systems. It were used to develop an application to run on Microblaze
with Xilkernel as the operating system. This application will help to synchronize the hardware
input/output on the development board and the algorithm IP to work together. The application
that is being developed will also act as a TCP server receiving or serving the request for
connection.
3.9 Development Board
The development board that is being used as the target for system development are
XC5VLX50T. This model belongs to the Xilinx Virtex 5 FPGA family of FF1136 pin package.
It comprises of 7,200 slices of Slice Logic, 28,800 CLB FlipFlops and a maximum 480Kb of
distributed RAM. The board also has 60 blocks of 18Kb Block RAM and that amounts to
51
2,160Kb of Block RAM memories. The XC5VLX50T could support Microblaze microprocessor
running up to 150 MHz.
52
Chapter 4
Design and Implementation
This chapter explains the design of the Needleman-Wunsch and Smith-Waterman hardware. The
design involved is the IP (intellectual property) of Needleman-Wunsch, IP of Smith-Waterman
and the side supporting hardware. The chapter also explains about the implementation of
parallelism technology in the Needleman-Wunsch and Smith-Waterman IP engine.
4.1 Fast Simplex Link(FSL)
FSL or Fast Simplex Link is a fast communication bus protocol developed by Xilinx. FSL could
be used to interconnect two design units in FPGA. FSL is uni-directional and consist of a master
and a slave for each interface. The version 7 of the Microblaze processor could support a
maximum of 16 FSL channels (Embedded Systems Development, 2008). In Figure 4.1, the port
of the FSL bus is being shown. The end that is the master connects to the design unit that is
sending data. The other end that is the slave of FSL bus is connected to design unit that receives
data. FSL bus was chosen to be implemented as interface for the hardware because of simpler
design compared to Processor Local Bus (PLB) and easier to use.
53
Figure 4.1 The block diagram of a FSL bus (Fast Simplex Link (FSL) Bus (v2.11a), 2008).
Port Name Width Input to FIFO/Output from FIFO
Description
FSL_M_Clk 1 Input Used as input clock for FSL master when the FSL are set to asynchronous FIFO mode.
FSL_M_Data 32 Input Get input data from connected peripheral or Microblaze processor.
FSL_M_Control 1 Input Extra control bit.
FSL_M_Write 1 Input Controls the write of data to FSL bus. FSL bus will read the value from FSL_M_Data on rising clock edge when the FSL_M_Write are set to 1.
FSL_M_Full 1 Output Indicate that the FSL FIFO is full when set to ‘1’.
Table 4.1 The description of ports that are in the master side of the FSL bus.
Port Name Width Input to FIFO/Output from FIFO
Description
FSL_S_Clk 1 Input Act as input clock for FSL slave when the FSL are set to asynchronous mode.
FSL_S_Data 32 Output Output data from connected peripheral or Microblaze processor.
FSL_S_Control 1 Output Extra control bit.
FSL_S_Read 1 Input Used to acknowledge that data has been read. A value of ‘1’ on the rising clock edge will delete the first input value in the FSL FIFO queue.
FSL_S_Exists 1 Output When there are value in the FSL bus, it will set to ‘1’;
Table 4.2 The description of ports that are in the slave side of the FSL bus.
54
4.2 IP Design
For IP design, three blocks of hardware were created which is the side_relay hardware,
Needleman-Wunsch hardware and Smith-Waterman hardware. The three blocks were created to
be FSL bus interface compatible. Further details about the three blocks will be explained below.
4.2.1 FSL side_relay
FSL_S_Clk
FSL_S_Read
FSL_S_Data
FSL_S_Control
FSL_S_Exists
FSL_M_Clk
FSL_M_Write
FSL_M_Data
FSL_M_Control
FSL_M_Full
FS
L_
Clk
FS
L_
Rst
side_relay
Figure 4.2 The block diagram of the side_relay hardware.
The FSL side_relay unit are created to relay data signals from Microblaze processor to the
algorithm IP. As the engine of the algorithm IP require two slaves and one master FSL interface,
the adding of a peripheral with three FSL channel to one IP is not supported by the Xilinx
Platform Studio (XPS). As shown in Figure 4.3, the Microblaze only support two channel to one
IP design unit. IP design unit with three channels as on the rightmost of Figure 4.3 are not
supported even though the third channel is recognized by XPS. The necessary software library
55
driver for the third channel to operate could not be generated by the XPS. Therefore the
side_relay unit is created to address this problem.
Figure 4.3 The type of interface that could be added with Xilinx Platform Studio.
Two design units are being created in Xilinx Platform Studio (XPS) as shown in 4.4. The main
engine unit has two channel of FSL bus and side_relay unit has one FSL bus channel connected
to the Microblaze core. After creating the two units, an additional set of FSL master interface is
added to the side_relay and additional set of slave interface is added to the main engine. A new
FSL channel is created to connect the new side_relay master interface to the main engine slave
interface. Figure 4.5 shows the diagram of the reconfigured units.
56
Figure 4.4 The two design unit that are connected to the Microblaze core. One is side_relay and
the other could be Needleman-Wunsch or Smith-Waterman hardware.
Figure 4.5 The new connection after reconfiguration is being made.
As the protocol for the master and slave interface are different the side_relay unit are designed to
read data from the slave of FSL channel connected to the Microblaze core and send the data to
the master of FSL channel connected to the algorithm IP. The protocol and sequence of the
signals that are sent to the master interface connected to the algorithm IP must be exactly the
same as the protocol and sequence of master interface of the FSL channel that connect
57
Microblaze core to side_relay. The state of the side_relay hardware unit is shown in Figure 4.6
below.
Read data from
FSL1_S_Data
Write output to
FSL_M_Data
Figure 4.6 State diagram of the side_relay unit.
4.2.2 Needleman-Wunsch and Smith-Waterman Hardware
This section explains about the design of the Needleman-Wunsch and Smith-Waterman
hardware. Both of the hardware shares the same design approach and use the same hardware
design for side_relay hardware unit. For the main IP unit of Needleman-Wunsch and Smith-
Waterman, they were designed to consist of 3 main blocks of hardwares which are test_nw,
array_proc and processing_element for Needleman-Wunsch and test_sw, array_proc and
processing_element for Smith-Waterman.
58
4.2.2.1 Hardware test_nw and test_sw
The hardware IP, which is test_nw and test_sw are created to act as interfacing controller that
manage the communication between the FSL bus and the array_proc. It reads the data from the
slave interface of the FSL bus and transmits calculated results back to Microblaze processor via
the master FSL interface.
FSL_S_Clk
FSL_S_Read
FSL_S_Data
FSL_S_Control
FSL_S_Exists
FSL_M_Clk
FSL_M_Write
FSL_M_Data
FSL_M_Control
FSL_M_Full
FS
L_
Clk
FS
L_
Rst
test_nw/
test_sw
FSL1_S_Clk
FSL1_S_Read
FSL1_S_Data
FSL1_S_Control
FSL1_S_Exists
Figure 4.7 The port interface of the algorithm IP.
Five extra ports are added to the algorithm IP which is FSL1_S_Clk, FSL1_S_Read,
FSL1_S_Data, FSL1_S_Control and FSL1_S_Exists. The five ports are added to the design to
connect to the slave of FSL channel from the side_relay unit.
59
The test_nw Needleman-Wunsch and test_sw Smith-Waterman are designed to operate on 5
states. At the beginning, the algorithm will be in Idle state. Once there are signal trigger from the
FSL bus connecting to the side_relay, the IP will jump to the Read Data state. After receiving a
fixed amount of data from side_relay FSL channel, the algorithm IP will jump state to read a
fixed amount of data from the slave of another FSL channel. The algorithm IP will then jump to
Delay state to allow sufficient time for the IP to process the data. Next, the IP will jump to Write
Output state and write the result to the master interface of the FSL channel connected to it before
returning to Read Data state. Figure 4.8 shows how the flow of the algorithm IP.
60
Idle
Read data from
FSL1_S_Data or
side_relay FSL
channel
Read data from
FSL_S_Data or
algorithm IP FSL
channel
Delay state for
processing
Write output to
FSL_M_Data
Delay state for
processing
Compute table
Perform
traceback
Output result
Figure 4.8 The flow chart of the Needleman-Wunsch test_nw and Smith-Waterman test_sw IP.
On the right is the further breakdown of the Delay state.
61
Compute table
PE required = 1 Activate PE 1
Activate PE 1 and PE 2
Activate PE 1 to PE 3
Activate PE 1 to PE 4
PE required = 2
PE required = 3
PE required = 4
PE required = 5
PE required = 6
PE required = 7
PE required = 8
PE required = 9
PE required = 10
PE required = 11
PE required = 12
PE required = 13
Table
completed?
Activate PE 1 to PE 5
Activate PE 1 to PE 6
Activate PE 1 and PE 7
Activate PE 1 to PE 8
Activate PE 1 to PE 9
Activate PE 1 to PE 10
Activate PE 1 to PE 11
Activate PE 1 to PE 12
Activate PE 1 to PE 13
No
Yes
Perform traceback
Figure 4.9 Further breakdown of the compute table in the flow chart.
The Figure below shows how the two systems of the Needleman-Wunsch or Smith-Waterman
hardware are connected to the side_relay hardware and Microblaze processor. As two systems
were built, the algorithm IP is test_nw for Needleman-Wunsch algorithm in one system and
test_sw for Smith-Waterman algorithm in another system.
62
Microblaze
FS
L I
nte
rfac
ealgorithm IP
side_relay
FSL_S_Clk
FSL_S_Read
FSL_S_Data
FSL_S_Control
FSL_S_Exists
FSL_M_Clk
FSL_M_Write
FSL_M_Data
FSL_M_Control
FSL_M_Full
FSL_S_Clk
FSL_S_Read
FSL_S_Data
FSL_S_Control
FSL_S_Exists
FSL_M_Clk
FSL_M_Write
FSL_M_Data
FSL_M_Control
FSL_M_Full
FS
L1
_S
_C
lk
FS
L1
_S
_R
ead
FS
L1
_S
_D
ata
FS
L1
_S
_C
on
tro
l
FS
L1
_S
_E
xis
ts
FS
L_
Clk
FS
L_
Rst
FS
L I
nte
rfac
eF
SL
Inte
rfac
e
FSL Interface
FSL_Clk
FSL_Rst
test_nw or test_sw
Figure 4.10 The interconnection of side_relay, algorithm IP and the Microblaze core.
63
4.2.2.2 Hardware array_proc
FSL_Exist
FSL_Data
FSL1_Exist
FSL1_Data
permit_enter
FSL_Data_keluar
Clk
Rst
array_proc
Figure 4.11 The port interface of the array_proc hardware.
Port Name Width Input/Output Description
Clk 1 Input Input of global clock for the array_proc. Mapped from the FSL_Clk
Rst 1 Input Used to reset the peripherals connected to it.
FSL_Exist 1 Input Mapped from FSL_S_Exists. Indicate that there is/are data/s in the FSL bus.
FSL_Data 32 Input Mapped from FSL_S_Data. Contains the data output of the FSL bus.
FSL1_Exist 1 Input Mapped from FSL1_S_Exists. Indicate that there is/are data/s in the FSL bus.
FSL1_Data 32 Input Mapped from FSL1_S_Data. Contains the data output of the FSL bus.
FSL_Data_keluar 32 Output Mapped from FSL_M_Data. Send input data to the FSL bus.
permit_enter 1 Input Mapped from sig_permit_enter.
Table 4.3 Port description for the array_proc hardware.
Hardware array_proc are mapped inside test_nw and test_sw. The array_proc were created
differently for the two systems. The function of the array_proc is to receive value from test_nw
or test_sw, store the matrix table, build the matrix table by sending values and receiving from the
processing_element. Once the matrix table is completed, both systems will perform different
64
traceback according to its own algorithms and send the results back to test_nw or test_sw. The
array_proc top level block diagram and its IO port description were displayed above in Figure
4.11 and Table 4.3.
4.2.2.3 Hardware processing_element
comp1
comp2
diagonal_value
up_value
left_value
ctr_
in
ctr_
out
d_value
processing_element
Figure 4.12 The port interface of the processing_element hardware.
Port Name Width Input/Output Description
comp1 32 Input Receive a character the string received from database.
comp2 32 Input Receive a character the string received from email content.
diagonal_value 8 Input Receive a value from the diagonal position of the matrix table.
up_value 8 Input Receive a value from the upper position of the matrix table.
left_value 8 Input Receive a value from the left position of the matrix table.
ctr_in 1 Input When set to ‘1’, the processing_element will read the value from “comp1”, “comp2”, “diagonal_value”, “up_value” and “left_value”.
ctr_out 1 Input When set to ‘1’, the processing_element will output the calculation result via “d_value”.
d_value 8 Output Output calculated result.
Table 4.4 Port description for the processing_element hardware.
65
The processing_element blocks are the processing element for Needleman-Wunsch and Smith-
Waterman algorithm. It contains the formula to perform calculations for the matrix table. The
table and diagram above describe about the structure and port description for the
processing_element. The formula for Needleman-Wunsch algorithm is as below:
The formula for Smith-Waterman algorithm is as below:
E i , j = max
E i – 1 , j-1 + score i-1 , j-1
E i , j-1 + W
E i – 1 , j + W
{
E i , j = max { 0
E i – 1 , j-1 + score i-1 , j-1
E i , j-1 + W
E i – 1 , j + W
66
The algorithm IPs are designed in three blocks with the outermost layer act as the interfacing to
the FSL channel. The middle layer or the array_proc act as the controller with memories to be
used during the processing. The middle layer also functions as the storage of strings during
processing and perform traceback at the end of the process. The processing element contains the
formula for the algorithm and receives five inputs from array_proc to process. Figure 4.13 shows
the mapping of the design. The RTL schematic for the algorithm IP are shown in Figure 3.15. As
in Figure 3.15, the location of array_proc was circled. The RTL schematics for array_proc are
included in the appendix as it is too large to be displayed here.
test_nw or test_sw
array_proc
FSL Master
FSL Slave
FSL Slave
processing_element1
processing_element13
processing_element2
increment
Figure 4.13 The mapping of the VHDL block of the algorithm IP.
67
4.3 Parallelism
As with other dynamic programming algorithms, Needleman-Wusnch and Smith-Waterman
required a large amount of processing to compute the matrix table. Both the algorithms were
designed to compare two strings that have a maximum of 13 characters long. That means each
time the IP of Needleman-Wunsch or Smith-Waterman received two strings of characters, it
were required to calculate 13X13 times. As shown in the Figure 4.14 below, the IP have to
calculate 169 times to complete the table. That calls for the integration of parallel processing
technology. Instead of using one processing element to perform calculations, the advantage of
VHDL programming language is fully utilized to produce 13 units of processing element being
integrated to a single IP. The calculation time of both IP has been reduced significantly to 13+13
times. The calculation cycle is reduced from 169 to 26 which means that the calculation time of
the table for parallel system is reduced to less than one sixth compare to single processing
element system. Figure 4.15 below shows the execution of the parallel system. Parallel system
calculates in anti-diagonal way from top-left to the bottom-right. The maximum values that are
calculated concurrently is 13 and therefore, requiring 13 processing elements.
68
0 00 0 0 0 0 0 0 0 0 0 0 0
1 1
2
0 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 2 2 2 2 2 2 2 2 2 2 2
3 3
4
0 1 2 3 3 3 3 3 3 3 3 3 3
0 1 2 3 4 4 4 4 4 4 4 4 4 4
6
50 1 2 3 4 5 5 5 5 5 5 5 5
0 1 2 3 4 5 6 6 6 6 6 6 6
5
6
7 7
8
0 1 2 3 4 5 6 7 7 7 7 7 7
0 1 2 3 4 5 6 7 8 8 8 8 8 8
0 1 2 3 4 5 6
0
0
0
S2 1 2 3 4 5
S1
1
2
3
4
5
6
7
i
j
8
9
10
11
7 8 9 10
11
12
13
0
013
12
6
Figure 4.14 The flow of execution for a single processing element system.
Process 1
Process 2
Process 3
Process 4
Process 5
Process 6
Process 7
Process 8
Process 9
Process 10
Process 11
Process 12
Process 13
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
PE 10
PE 11
PE 12
PE 13
00 0 0 0 0 0 0 0 0 0 0 0
1
2
0 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 2 2 2 2 2 2 2 2 2 2
0 1 2 3 3 3 3 3 3 3 3 3 3
0 1 2 3 4 4 4 4 4 4 4 4
0 1 2 3 4 5 5 5 5 5 5
0 1 2 3 4 5 6 6 6 6
0 1 2 3 4 5 6 7 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
S2 1 2 3 4 5
S1
1
2
3
4
5
6
7
i
j
8
9
10
11
7 8 9 10
11
12
13
0 1 213
12
6
Figure 4.15 The flow of execution for multiple processing element system in the middle of the
matrix table computation.
69
To control 13 processing elements in VHDL, 13 processes were created in the architecture of
array_proc. The processes were mapped to the processing elements that were declared as
components in the coding. When instructed by the main process, the process will activate the
processing elements as needed. The advantages of parallel processing of VHDL programming
language were utilized in the system. The pseudo code below demonstrates how the process in
the parallel hardware works in VHDL. Below is the table that shows the coordinates of the
values receive for diagonal, upper and left of the hardware unit.
With n = 1 to 13
process_table_n : process( Clk ) begin if Clk'event and Clk = '1' then if ( condition ) then : : : sig_diagonal_value(0 to 3) <= diagonal value ; sig_up_value(0 to 3) <= upper value ; sig_left_value(0 to 3) <= left value ; end if; end if; end process ;
Diagonal Upper Left
Process 1 i-1, j-1 i-1, j i, j-1
Process 2 i, j-2 i, j-1 i+1, j-2
Process 3 i+1, j-3 i+1, j-2 i+2, j-3
Process 4 i+2, j-4 i+2, j-3 i+3, j-4
Process 5 i+3, j-5 i+3, j-4 i+4, j-5
Process 6 i+4, j-6 i+4, j-5 i+5, j-6
Process 7 i+5, j-7 i+5, j-6 i+6, j-7
Process 8 i+6, j-8 i+6, j-7 i+7, j-8
Process 9 i+7, j-9 i+7, j-8 i+8, j-9
70
Process 10 i+8, j-10 i+8, j-9 i+9, j-10
Process 11 i+9, j-11 i+9, j-10 i+10, j-11
Process 12 i+10, j-12 i+10, j-11 i+11, j-12
Process 13 i+11, j-13 i+11, j-12 i+12, j-13
Table 4.5 The coordinates of values received by diagonal, upper and left registers.
71
Chapter 5
Results, Simulations, Analysis, and Testing
This chapter presents various synthesis results and post-route simulations for co-processors and
its components created using VHDL programming language in Xilinx ISE. It also includes real-
time hardware signals sampling using Chipscope Pro software tool. The chapter will then
continue with explanations about testing done using the designs.
Figure 5.1 The diagrams of the connections between the Microblaze core, the side_relay unit and
the main engine or algorithm IP. Each end of the FSL bus channels are labeled with numbers
from 1 to 8.
72
5.1 Development of side_relay Unit
The side_relay unit was designed as a unit to relay signals from the Microblaze core to the main
engine of algorithm IP. Figure 5.1 shows the interconnection of these three units. As the
side_relay unit receives signals from the FSL bus channel labeled with number 2, it has to
reproduce signals to be sent via the master interface in number 3. One thing need to be noted
that, signals reproduced in number 3 are not a mirror to what is received at interface number 2.
But rather, it is a mirror of what is produced by interface number 1. The signals protocol for the
master and slave interface are different because write operations are performed at the master
interface and read operations are performed at slave interface. The side_relay unit relay the
signals in a way that the signals received at interface number 4 are the same as the signals
received at interface number 2. The side_relay unit are also designed to stop relaying once the
port FSL_M_Full of the FSL channel connecting 3 and 4 signals ‟1‟ and continue when
FSL_M_Full=‟0‟.
73
Figure 5.2 Resource utilization and the timing summary of the side_relay unit.
Figure 5.2 shows the resource utilization of the side_relay hardware. Resource utilization are a
good reference to make sure that the design fulfill the speed requirement and the FPGA has
enough resources to implement the design. The maximum frequency of the design is clocked at
around 848Mhz, making it stable to run with Microblaze processor that are set to 125Mhz.
5.2 Developing the Needleman-Wunsch Algorithm IP Unit
Figure 5.3(a) below shows the resource utilization of Needleman-Wunsch co-processor. The
design used around one third of the VLX50T resources. Based on Figure 5.3(b), of the 2,118
number of slice registers used, mostly are consumed by array_proc block totalling 2,713 slices.
74
The design also used up 8,262 slices of LUTs or 28% of the maximum available units for the
Virtex 5 board. From this amount 7,993 units were used by array_proc while the remaining for
others. The array_proc used a large amount of resources because it stores the 3 dimensional array
of the matrix table which is very large. Besides, it were required to control 13 processing element
concurrently. The array_proc also contain the part algorithms used to perform the required
traceback for Needleman-Wunsch.
Figure 5.3(a) Resource utilization of Needleman-Wunsch IP.
75
Figure 5.3(b) Resource utilization of Needleman-Wunsch IP.
Figure 5.4 Timing summary for Needleman-Wunsch IP.
76
Needleman-Wunsch co-processor is clocked in at a maximum frequency of 150.943MHz by
Xilinx ISE during synthesis. That makes the unit to be able to run comfortably with Microblaze
processor which is set at the speed of 125MHz. The design is able to run at such speed thanks to
the implementation of pipelining technology, which makes the design to run in stages.
5.3 Developing the Smith-Waterman Algorithm IP Unit
Like Needleman-Wunsch co-processor, most of the Smith-Waterman resource consumption
originates from the array_proc block. As in the Figure below, the array_proc unit consume 3,147
of the 3,389 slices consumed by the whole design. The slice register consumption are at 1611 of
the overall 1714 and LUTs at 8950 of the 9362 in total.
77
Figure 5.5(a) Resource utilization of Smith-Waterman IP.
Figure 5.5(b) Resource utilization of Smith-Waterman IP.
78
Figure 5.6 Timing summary for Smith-Waterman IP.
The timing summary shows that Smith-Waterman co-processor was able to achieve a maximum
frequency of 175.623MHz. The design is 15MHz faster than the Needleman-Wunsch unit. Both
co-processors are capable to run at 125MHz of which the Microblaze processor are set to.
5.4 Overall Microblaze with Needleman-Wunsch IP and Microblaze with Smith-Waterman
IP
The 2 systems of Microblaze processor with Needleman-Wunsch and Microblaze processor with
Smith-Waterman are imported into ISE from XPS so that Chipscope Pro monitoring core could
be inserted into the design. Though XPS support the insertion of Chipscope Pro cores, it is more
79
limited in terms of functionality compared with ISE. Figure 5.7 and Figure 5.8 display the
overall resource utilization of the 2 designs. Both the systems used up almost one third of slice
registers available in VLX50T and around half of slice LUTs. Of the amount, around 20% of
slice registers and slice LUTs were consumed by the Microblaze processor. About 70% of the
7,200 slice logics available are being used up to accommodate the full design. Of this amount,
about 30% were consumed by Microblaze and the remaining by the co-processor IP.
(a) (b)
Figure 5.7 Overall resource utilization of the Microblaze system with the Needleman-Wunsch IP.
80
(a) (b)
Figure 5.8 Overall resource utilization of the Microblaze system with the Smith-Waterman IP.
5.5 Post-Route Simulation
Also known as post-place and route timing simulation. When simulating in post-route
simulation, the system will create a Standard Delay Format (SDF) file. The simulator helps to
add blocks and routing delays for the design during the IP development with Xilinx ISE. Using
post-route simulation helps the developer to see how the IP design will behave in actual circuit
before importing the design to connect to Microblaze processor.
81
5.6 Hardware Timing Diagram
Figure 5.9 Some of the simulation results of test_nw from Needleman-Wunsch algorithm.
Both the Needleman-Wunsch test_nw and Smith-Waterman test_sw are almost the same in terms
of design. At initial stage, both hardware blocks will obtain 13 characters from spam signature
database and spam email before start processing as shown above. At the end of the simulation,
the results are yielded after calculations and tracebacks are performed in the IPs. Both IPs will
then loop back to the initial stage and standby to receive new strings of characters to perform
processing again. The full simulations of test_nw and test_sw are included in Appendix C. In
Figure 5.9, the test_nw are simulated to be in the state to read data from side_relay hardware.
Values from port fsl_clk, fsl1_data, fsl1_s_exists, fsl_s_data and fsl_s_exists are generated from
the testbench. Ports fsl1_data, fsl1_s_exists sends the values first for 13 times before ports
fsl_s_data and fsl_s_exists continues. The clock cycle of fsl_clk were generated at 10 ns.
10 ns
1.6 ns
10 ns
82
Testbench flood the fsl1_s_data with the first value followed by a trigger „1‟ by fsl1_s_exists.
The test_nw hardware detect the value by fsl1_s_exists on the next rising clock edge. Upon
detection of a „1‟ in fsl1_s_exists, test_nw reads the data in fsl1_s_data before sending a „1‟ in
fsl1_s_read for 1 clock cycle to indicate that it has read 1 value from the FSL FIFO. In real FSL
bus, 1 value will be deleted from the queue according to first in first out rule. The length of time
test_nw respond with a „1‟ in fsl1_s_read after the rising edge is measured at 1.6 ns. After the
test_nw receives the values from fsl1_s_data for 13 times, it will change state to read data from
the Microblaze. Upon completing the generation of data for fsl1_s_data and fsl1_s_exists, the
testbench will continue to generate data for FSL port connected to Microblaze which are
fsl_s_data and fsl_s_exist. The timing and patterns of data generation by fsl_s_data and
fsl_s_exists are the same as fsl1_s_data and fsl1_s_exists. As in Appendix C, during the timeline
31,750ns to around 35,250ns, there are no respond from the hardware as it is in the processing
mode. The array_proc require some time to process the matrix table and perform traceback. After
some amount of time, the test_nw change state and generated the results in ports fsl_m_data.
Port fsl_m_write are set to „1‟ for one clock cycle to show that result value is available and port
fsl_m_data contains the result. The processes from Figure 5.9 to 5.10 are a continuous process
for test_nw. In post-route simulation, different sets of values are simulated into fsl1_s_data and
fsl_s_data for testing to make sure that the IP perform correctly.
83
Figure 5.10 The test_nw hardware output the result.
10 ns
84
5.7 Processing Element of Needleman-Wunsch Post-Route Simulation
Figure 5.11 Post-route simulation of Needleman-Wunsch algorithm processing element.
The above Figure shows the post-route simulation of the processing element of Needleman-
Wunsch algorithm. Various values are being tested in the post-route simulation to ensure the
design function properly. Values for port comp1, comp2, diagonal_value, up_value, left_value,
ctr_in and ctr_out are generated by testbench. In T1, port ctr_in generates a positive signal, the
processing element read the values from port comp1, comp2, diagonal_value, up_value and
left_value. Port comp1 and port comp2 each receive a character from the array_proc to be
compared. If value in Port comp1 match the value in port comp2, an additional score value of 1
will be added to the value that is received from the port diagonal_value. If the value from port
comp1 did not match with the value in port comp2, a score value of 0 will be added to the value
received from port diagonal_value. The three values which is value received from port
diagonal_value plus additional score, value from port up_value and value from port left_value
will then be compared to determine which the highest value is. Based on the Figure 5.11 the
T1 T2 T3
85
highest value is displayed for port d_value at T3 by the processing element when port ctr_out
signals '1' in T2 as indicated by the arrow.
5.8 Processing Element of Smith-Waterman Post-Route Simulation
Figure 5.12 shows the post-route simulation for Smith-Waterman processing element. Like
processing element for Needleman-Wunsch, processing element of Smith-Waterman also
compare three values received from port diagonal_value plus score, port up_value and port
left_value when port ctr_in signals '1' in T1. Port comp1 and port comp2 also receive its value
when port ctr_in signals „1‟.The score for Smith-Waterman are different than the score for
Needleman-Wunsch. A match value between port comp1 and port comp2, will give a score of 2
point else -1 for a mismatch. If the value in port diagonal_value were found to be 0 and there is a
mismatch between comp1 and comp2, the -1 score will be neglected. When port ctr_out signals
'1' in T2, the PE respond by displaying the highest at T3 in port d_value.
Figure 5.12 Post-route simulation of Smith-Waterman algorithm processing element. T1 T2 T3
86
5.9 Microblaze FSL side_relay unit Post-Route Simulation
In Figure 5.13 below, it displays a post-route simulation of side_relay block. Values from port
fsl_clk, fsl_rst, fsl_s_data and fsl_s_exists are generated using the testbench. When block
side_relay detect that the status of fsl_s_exist is '1' on rising clock edge, it checks whether the
fsl_m_full is '1' or '0' as shown in a. A „1‟ by port fsl_m_full represents that the FIFO FSL bus
connected to main algorithm IP is full. If port fsl_m_full is '0' as in a the side_relay will read the
data from b port fsl_s_data and trigger the port fsl_s_read to '1' in c for 1 clock cycle. After
reading the data, the side_relay change the state to write and set fsl_m_write to „1‟ in d for 1
clock cycle. The side_relay then write the data to fsl_m_data. The side_relay will then went back
to read state and repeat the process.
Figure 5.13 Post-Route Simulation of Microblaze FSL side_relay unit.
10 ns
10 ns
a
b
c
d
87
5.10 Hardware Data Sampling
Using Chipscope Pro Analyzer, hardware signals were collected when it is running on the Xilinx
Development Board. By setting up the proper trigger match and capture setting, Chipscope Pro
Analyzer could capture the real signals while the system is in operation.
5.11 Xilinx Chipscope Pro Sampling of FSL Interface
It this sub-chapter, various screenshots are displayed to provide insights on how the FSL works
in the design. Figure 5.14 below shows a diagram of FSL bus connection between Microblaze,
side_relay and the main engine. Figure 5.15 shows the sampling results of interface 1, 2, 3 and 4
labeled in Figure 5.14.
Figure 5.14 Label 1, 2, 3 and 4 that shows the interface covered by Chipscope Pro Analyzer in
Figure 5.15.
88
Figure 5.15 The Chipscope Pro Analyzer result collected from interface labeled 1, 2, 3 and 4 in
Figure 5.14.
In Figure 5.15, it shows the sampling collected from interface labeled 1, 2, 3 and 4 of Figure
5.14. The trigger used is FSL_S_Exists port of the interface labeled 2. The storage qualifications
are set to all data. When the FSL_M_Write are set to '1' with FSL_M_Data containing the data
which is 'a', the FSL_S_Exists are set to '1' as there are data in the bus. FSL_S_Read are set to '1'
in the next sampling and data are read from the FSL_S_Data in label 2. After getting the data, the
side_relay block write the data to another FSL bus channel connected to the co-processor. The
other FSL channel is labeled 3 and 4. The side_relay set the FSL_M_Write to '1' with the data in
FSL_M_Data as in 3 of Figure 5.15. The co-processor respond when it detects that
FSL_S_Exists in 4 are set to '1' by setting FSL_S_Read to '1' in next cycle. The data 'a' are
retrieved into the co-processor.
{ { { {
89
Figure 5.16 below shows the diagram of FSL bus connection between Microblaze,
side_relay and the main engine but with different labels. The Figure is labeled with number 5, 6,
7 and 8 for further explanation. Figure 5.17 and Figure 5.18 shows the sampling results of
interface 5, 6, 7 and 8 labeled in Figure 5.16.
Figure 5.16 Label 5 and 6 that shows the interface covered by Chipscope Pro Analyzer in Figure
5.17.
Figure 5.17 The Chipscope Pro Analyzer result collected from interface labeled 5 and 6 in Figure
5.16.
{ {
90
In Figure 5.17 shows the Chipscope Pro sampling collected from interface labeled 5 and 6. The
FSL bus received a string of characters for word "viagra" from the master interface and passes it
on to the slave interface. The Xilinx Chipscope Pro Analyzer has been set in storage qualification
of storing the sampling when the FSL_S_Exists are set to '1'.
Figure 5.18 The Chipscope Pro Analyzer result collected from interface labeled 7 and 8 in
Figure 5.16.
For Figure 5.18, it shows some of the ongoing activities on the interface labeled 7 and 8. The
storage qualification is set to store the sampling when FSL_M_Write are set to '1'. The
co-processor are passing scanning results back to Microblaze processor for further analysis
after performing comparison of strings of characters it received before that.
{ {
91
Xilinx Chipscope Pro Analyzer software is vital in ensuring a design runs properly. The
sampling of Chipscope Pro Analyzer is gathered in real-time. If there are improper signals
detected in Chipscope Pro sampling, it means that there are flaws in the hardware that we design
and steps should be taken to rectify it.
5.12 Spam Mail Testing
For testing purpose, a selected group of 100 spam mails and 100 non-spam mails or ham is being
used. The testing corpus of spam and ham are obtained from TREC 2007. The selected group of
testing emails is text based email. Based on the selected emails, spam keywords are identified
and saved into the CompactFlash (CF) card. The emails are then sent one by one from the TCP
client to the Xilinx Development board for scanning purpose.
92
5.12.1 Testing Criteria
For experiment purpose, two set of criterias were used to test the effectiveness of both the built
Needleman-Wunsch and Smith-Waterman systems. In criteria 1, for keywords that are 4 or 5
characters long, a mismatch of 1 character will trigger the counter. If the keywords are 6 or more
than 6 characters long, then system will tolerate a maximum number of 2 characters mismatches
to set the trigger. Any keywords that are shorter than 4 characters will need to be matched
exactly. When the counter is triggered, it will add a score of one for the email. The counter starts
at the value of 0. At the end of a scan on the email, the system will post the accumulated score
for the email in the result. In criteria 2, stricter rule were implemented to trigger the counter. Any
words with the number of characters lower than 5 will be required to be exact match. For words
that are 5 and 6 characters long, a mismatch of 1 character is tolerated to trigger the counter.
Words that are 7 or more characters long are allowed to have maximum 2 characters of
mismatch. The Figure 5.19 and Figure 5.20 below explain the flow of both criteria.
93
Keyword = exact match
Mark = Mark+1
Keyword = 4 characterMatch = 3
character?
Keyword = 5 characterMatch = 4
character?
Keyword >= 6 character0<Mismatch <= 2
character?
Yes
Start
End
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
Figure 5.19 Procedure used by Criteria 1 to calculate the marks in Microblaze software.
94
Keyword = exact match
Mark = Mark+1
Keyword = 5 characterMatch = 4
character?
Keyword = 6 characterMatch = 5
character?
Keyword >= 7 character0<Mismatch <= 2
character?
Yes
Start
End
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
Figure 5.20 Procedure used by Criteria 2 to calculate the marks in Microblaze software.
95
5.12.2 Results
After using 100 spam and 100 ham email for testing. The results are accumulated as in Table
5.1(a) and Table 5.1(b) below. The threshold score of 10 are used as reference. For ham mails,
the number of email hams that accumulate a score of below 10 in criteria 1 is 83. This is against
criteria 2 which accumulate 98. The same hams that underwent testing using criteria 2 tend to
have lower score compared to criteria 1. As for spam mails, the number of spam mails that
accumulate a score of more than 10 are 94 for criteria 1 against 90 of criteria 2. Based on the
Table, if a score of 10 is being used as threshold to classify whether an email is spam or not, it
could be observed that criteria 2 perform better in terms of lower false negatives. In criteria 2,
the numbers of ham mail that are misclassified as spam are only 2, compared with 17 of criteria
1. With criteria 2, it tends to get a lower false negative at only 2%. As with spam, criteria 1 have
a higher percentage of detection at 94% and 90% for criteria 2. Criteria 1 have a higher rate of
true positive than criteria 2. It should be noted that the results are based on the spam signature
database that are being used. If the database signatures are more updated, the results will be even
more accurate. In this testing, the 2 systems successfully detect most of spam emails that modify
its keywords to evade detection of exact match algorithms.
96
Ham
Criteria 1 Criteria 2
Accumulated score of 10
and below(email)
83 98
Accumulated score of 10
and above(email)
17 2
(a)
Spam
Criteria 1 Criteria 2
Accumulated score of 10
and below(email)
6 10
Accumulated score of 10
and above(email)
94 90
(b)
Table 5.1(a)(b) The results for testing of spam email.
97
Based on the testing performed, both Needleman-Wunsch and Smith-Waterman yield the same
result for number of match detection for two inputs of 13 characters each. Spam mail has higher
accumulated marks for both criteria 1 and criteria 2. Most of the spam keywords that are being
slightly modified are detected. For example:
g a r b a g e
| | | | | |
g a r b i g e
v 1 a g r a
| | | | |
v i a g r a
p r e s c r 1 p t i o n
| | | | | | | | | | |
p r e s c r i p t i o n
p h 4 r m a c y
| | | | | | |
p h a r m a c y
98
However there are also unavoidable slight detection of other keywords, like:
t a b l e t
| | | | |
s t a b l e
l a t e r
| | | |
l a s e r
The accuracy of the spam detection depends on the spam signature database being used. The
more keywords in the database that are the same as in the spam email content, the higher the
detection rate is.
99
Chapter 6
Conclusion
Chapter 6 concludes the research with contributions achieved in the development of this thesis. It
also provides information of the future improvement and future work after this research.
6.1 Contributions
(i) Creating FPGA based co-processor on Xilinx FSL interface for Needleman-Wunsch and
Smith-Waterman algorithm in VHDL. By using FSL interface, it helps to ensure the two
hardwares easier to be commercialized and applied to other research in the future. Previous
FPGAs developments in other research did not standardized the interface of their hardware and
therefore, causing the hardwares developed to be only lab friendly. When attempts to use this
hardwares are made, additional controller that slows the design has to be made because of port
incompatibility. As of current date of research, Xilinx FPGA are one of the widely used FPGA in
the market and FSL bus are the latest bus created to connect custom peripheral to the Xilinx
Microblaze processor. This made the two IP which is Needleman-Wunsch and Smith-Waterman
to be very flexible as it is connected to highly customizable Xilinx Microblaze processor.
100
(ii) Demonstrate the ability of hardware circuits and FPGA for spam scanning. In this
research, two FPGA embedded systems is created. One with Xilinx Microblaze processor
connected to Needleman-Wunsch and the other with Xilinx Microblaze processor connected to
Smith-Waterman. The Microblaze processor act as TCP server and also as the controller of the
algorithm IP.
(iii) Implementing parallelism in FSL based FPGA of Needleman-Wunsch and Smith-
Waterman algorithm. Creating parallel Needleman-Wunsch and Smith-Waterman IP is a
tedious process considering the number of processing elements involved. By integrating
parallelism in the IP, the computation speed of the matrix table is reduced to less than one sixth
compared to single processing element system.
6.2 Further Improvements and Future Works
(i) Creating larger FSL based FPGA of Needleman-Wunsch and Smith-Waterman algorithm
with future devices. By using larger capacity device to create Needleman-Wunsch and Smith-
Waterman that could receive longer strings of characters, attempts could be made to apply the
design in other type of applications.
101
(ii) Testing both algorithms with other machine learning anti-spam algorithms. As there is no
silver bullet to anti-spam solution, it would be interesting to see how Needleman-Wunsch and
Smith-Waterman algorithms perform when coupled with other anti-spam solutions.
(iii) XPS SDK only provide the ability to measure performance of software running alone on the
Microblaze processor. If the XilKernel operating system is being used, XPS SDK will not
support profiling or benchmarking of the applications running on it. As creating a TCP Server
application running on Microblaze require Xilkernel, the software performance are not
measurable. If future XPS provide support to measure the performance of the software that are
using Xilkernel, some research could be done to measure the performance of the TCP server
application.
(iv) When using Library LWIP (Lightweight IP) in socket mode, the maximum TCP throughput
that could be achieved are only about 1Mbps (Velusamy 2008). Xilinx adapters are not
optimized in socket mode and will only be fixed by Xilinx in subsequent releases. Testing could
be done to measure the performance of the software network throughput once this problem is
solved in the future.
102
Reference
GUNNARSSON, A. and EKBERG, S.(2003) Invasion of Privacy, Master Thesis, Blekinge
Institute of Technology.
HOANCA, B. (2006) How good are our weapons in the spam wars? Technology and Society
Magazine, IEEE, 25, 22-30.
YU-FEN, C., CHIA-MEI, C., BINGCHIANG, J. & HSIAO-CHUNG, L. (2007) An Alliance-
Based Anti-spam Approach. Natural Computation, 2007. ICNC 2007. Third International
Conference on.
HAUPT, R. L. (2004) Unsolicited commercial e-mail (UCE). Antennas and Propagation
Magazine, IEEE, 46, 153-154.
David E. Sorkin. (2003) Spam Laws. Available from : <http://www.spamlaws.com/> [16
December 2003].
LORRIE FAITH, C. & BRIAN, A. L. (1998) Spam! Commun. ACM, 41, 74-83.
ODA, T. (2005) A Spam-Detecting Articial Immune System. Faculty of Graduate Studies and
Research. Ottawa, Carleton University.
SANPAKDEE, U., WALAIRACHT, A. & WALAIRACHT, S. (2006) Adaptive Spai Mail
Filtering Using Genetic Algorithm. Advanced Communication Technology, 2006. ICACT 2006.
The 8th International Conference.
CARPINTER, J. & HUNT, R. (2006) Tightening the net: A review of current and next
generation spam filtering tools. Computers & Security, 25, 566-578.
103
MING-WEI, W., YENNUN, H., SHYUE-KUNG, L., ING-YI, C. & SY-YEN, K. (2005) A
multi-faceted approach towards spam-resistible mail. Dependable Computing, 2005.
Proceedings. 11th Pacific Rim International Symposium on.
The Definition of Spam (2007) Available from : <http://www.spamhaus.org/definition.html>
[29 November, 2007].
CREWS, C. W. (2001) Policy Analysis : Why Canning" spam" is a Bad Idea, Cato Institute.
JACOBSSON, A. & CARLSSON, B. (2007) Privacy and Spam: Empirical Studies of
Unsolicited Commercial e-Mail. Proceedings of IFIP Summer School on Risks & Challenges of
the Network Society.
The Global Economic Impact of Spam, 2005, Available from :
<http://www.ferris.com/?file_id=2004/05/611_409SpamCosts.pdf> [3 Dicember 2007].
BANIT, A., NITIN, K. & MOLLE, M. (2005) Controlling spam Emails at the routers.
Communications, 2005. ICC 2005. 2005 IEEE International Conference on.
GALEN, A. G. (2007) Compliance with the CAN-SPAM Act of 2003. Commun. ACM, 50, 56-
62.
HUNT, R. & CARPINTER, J. (2006) Current and New Developments in Spam Filtering.
Networks, 2006. ICON '06. 14th IEEE International Conference on.
DREYFUS, S. (2002) Richard Bellman on the Birth of Dynamic Programming. Operations
Research, 50, 48-51.
LUIS VON, A., MANUEL, B. & JOHN, L. (2004) Telling humans and computers apart
automatically. Commun. ACM, 47, 56-60.
104
CATALIN, A. & MARIA, C. (2009) Phishing 101. MIT Spam Conference 2009. Massachusetts
Avenue Cambridge.
AL-BATAINEH, A. & WHITE, G. (2009) Detection and Prevention Methods of Botnet-
generated Spam. MIT Spam Conference 2009. Massachusetts Avenue Cambridge.
FRIESS, N. & AYCOCK, J. (2009) A Kosher Source of Ham. MIT Spam Conference 2009.
Massachusetts Avenue Cambridge.
(2009) Email Metrics Program: The Network Operators‟ Perspective. Report #10 – Third and
Fourth Quarter 2008. San Francisco, Messaging Anti-Abuse Working Group.
(2008) Email Metrics Program: The Network Operators‟ Perspective. Report #9 – Second
Quarter 2008. San Francisco, Messaging Anti-Abuse Working Group.
CANELLA, M. & MIGLIOLI, F. (2003) Performing DNA comparison on a bio-inspired tissue
of FPGAs. Parallel and Distributed Processing Symposium, 2003. Proceedings. International.
DU, Z. & LIN, F. (2004) Using blocks+ database in Needleman-Wunsch algorithm. Fuzzy
Information, 2004. Processing NAFIPS '04. IEEE Annual Meeting of the.
FUNG, W. W. L., SHAM, I., YUAN, G. & AAMODT, T. M. (2007) Dynamic Warp Formation
and Scheduling for Efficient GPU Control Flow. Microarchitecture, 2007. MICRO 2007. 40th
Annual IEEE/ACM International Symposium on.
KNEES, P., SCHEDL, M. & WIDMER, G. Multiple Lyrics Alignment: Automatic Retrieval of
Song Lyrics. Proceedings of 6th International Conference on Music Information Retrieval
(ISMIR’05), 564–569.
LESK, A. M., LEVITT, M. & CHOTHIA, C. (1986) Alignment of the amino acid sequences of
distantly related proteins using variable gap penalties. Protein Engineering Design and Selection,
1, 77-78.
105
MARK, G. & MICHAEL, L. (1996) Using Iterative Dynamic Programming to Obtain Accurate
Pairwise and Multiple Alignments of Protein Structures. Proceedings of the Fourth International
Conference on Intelligent Systems for Molecular Biology. AAAI Press.
NAVEED, T., SIDDIQUI, I. S. & AHMED, S. Parallel Needleman-Wunsch Algorithm for Grid.
Available http://www. gridbus. org/alchemi/files/Parallel% 20Needlema.
NEEDLEMAN, S. B. & WUNSCH, C. D. (1970) A general method applicable to the search for
similarities in the ammo acid sequence of two proteins. J. Mol. Biol, 48, 443-453.
ROSE, J. & EISENMENGER, F. (1991) A fast unbiased comparison of protein structures by
means of the Needleman-Wunsch algorithm. Journal of Molecular Evolution, 32, 340-354.
THOMAS, R. & RANCE, N. (2003) A parallel algorithm for DNA alignment. Crossroads, 9, 10-
15.
XIA, F. & DOU, Y. (2007) Reducing Storage Requirements in Accelerating Algorithm of Global
BioSequence Alignment on FPGA. Advanced Parallel Processing Technologies.
LI, I., SHUM, W. & TRUONG, K. (2007) 160-fold acceleration of the Smith-Waterman
algorithm using a field programmable gate array (FPGA). BMC Bioinformatics, 8, 185.
MAY, P., KLAU, G., BAUER, M. & STEINKE, T. (2007) Accelerated microRNA-Precursor
Detection Using the Smith-Waterman Algorithm on FPGAs. Distributed, High-Performance and
Grid Computing in Computational Biology.
HARRIS, B., JACOB, A. C., LANCASTER, J. M., BUHLER, J. & CHAMBERLAIN, R. D.
(2007) A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP. Field
Programmable Logic and Applications, 2007. FPL 2007. International Conference on.
106
XIANDONG, M. & VIPIN, C. (2004) Bio-sequence analysis with cradle's 3SoCTM
software
scalable system on chip. Proceedings of the 2004 ACM symposium on Applied computing.
Nicosia, Cyprus, ACM.
WEIGUO, L., SCHMIDT, B., VOSS, G., SCHRODER, A. & MULLER-WITTIG, W. (2006)
Bio-sequence database scanning on a GPU. Parallel and Distributed Processing Symposium,
2006. IPDPS 2006. 20th International.
BRUTLAG, D. L., DAUTRICOURT, J. P., DIAZ, R., FIER, J., MOXON, B. & STAMM, R.
(1993) BLAZETM
: An implementation of the Smith-Waterman sequence comparison algorithm
on a massively parallel computer. Computers & chemistry, 17, 203-207.
NASH, H., BLAIR, D. & GREFENSTETTE, J. (2001) Comparing algorithms for large-scale
sequence analysis. Bioinformatics and Bioengineering Conference, 2001. Proceedings of the
IEEE 2nd International Symposium on.
BENKRID, K., YING, L. & BENKRID, A. (2007) Design and Implementation of a Highly
Parameterised FPGA-Based Skeleton for Pairwise Biological Sequence Alignment. Field-
Programmable Custom Computing Machines, 2007. FCCM 2007. 15th Annual IEEE Symposium
on.
GOK, M. & YILMAZ, C. (2006) Efficient Cell Designs for Systolic Smith-Waterman
Implementations. Field Programmable Logic and Applications, 2006. FPL '06. International
Conference on.
CHRISTIAN, K. & JON, C. (2006) Efficient sequence alignment of network traffic. Proceedings
of the 6th ACM SIGCOMM conference on Internet measurement. Rio de Janeriro, Brazil, ACM.
STORAASLI, O., STRENSKI, D. & INC, C. (2007) Exploring Accelerating Science
Applications with FPGAs. Proc. of the Reconfigurable Systems Summer Institute, July.
NUR'AINI ABDUL, R., ROSNI, A., ABDULLAH ZAWAWI HAJI, T. & ZALILA, A. (2006)
Fast Dynamic Programming Based Sequence Alignment Algorithm. Distributed Frameworks for
Multimedia Applications, 2006. The 2nd International Conference on.
107
LIU, Y., HUANG, W., JOHNSON, J. & VAIDYA, S. (2006) GPU Accelerated Smith-
Waterman. Computational Science – ICCS 2006.
HASAN, L., AL-ARS, Z. & VASSILIADIS, S. (2007) Hardware acceleration of sequence
alignment algorithms-an overview. Design & Technology of Integrated Systems in Nanoscale
Era, 2007. DTIS. International Conference on.
AMAR, S. (2006) Heterogeneous processing: a strategy for augmenting moore's law. Linux J.,
2006, 7.
BENKRID, K., LIU, Y. & BENKRID, A. (2007) High Performance Biosequence Database
Scanning using FPGAs. Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE
International Conference on.
SMITH, T. F. & WATERMAN, M. S. (1981) Identification of common molecular subsequences.
J. Mol. Bwl, 147, 195-197.
PEIHENG, Z., GUANGMING, T. & GUANG, R. G. (2007) Implementation of the Smith-
Waterman algorithm on a reconfigurable supercomputing platform. Proceedings of the 1st
international workshop on High-performance reconfigurable computing technology and
applications: held in conjunction with SC07. Reno, Nevada, ACM.
GOTOH, O. (1982) An improved algorithm for matching biological sequences. Journal of
Molecular Biology, 162, 705.
HSIEN-YU, L., MENG-LAI, Y. & YI, C. (2004) A parallel implementation of the Smith-
Waterman algorithm for massive sequences searching. Engineering in Medicine and Biology
Society, 2004. IEMBS '04. 26th Annual International Conference of the IEEE.
FA, Z., XIANG-ZHEN, Q. & ZHI-YONG, L. (2002) A parallel Smith-Waterman algorithm
based on divide and conquer. Algorithms and Architectures for Parallel Processing, 2002.
Proceedings. Fifth International Conference on.
108
BOUKERCHE, A., DE MELO, A. C. M. A. & AYALA-RINCON, M. (2005) Parallel strategies
for local biological sequence alignment in a cluster of workstations. Parallel and Distributed
Processing Symposium, 2005. Proceedings. 19th IEEE International.
ROGNES, T. & SEEBERG, E. (2000) Six-fold speed-up of Smith-Waterman sequence database
searches using parallel processing on common microprocessors. Bioinformatics, 16, 699-706.
JACOB, A., SANYAL, S., PAPRZYCKI, M., ARORA, R. & GANZHA, M. (2007) Whole
Genome Comparison on a Network of Workstations. Parallel and Distributed Computing, 2007.
ISPDC '07. Sixth International Symposium on.
TAMIL, E. M., IDRIS, M. Y. I., THONG, C. M., SAUDI, M. M. & JALI, M. Z. (2008)
Needleman Wunsch Implementation for SPAM/UCE Inline Filter. Seventh International
Network Conference (INC 2008). Plymouth, United Kingdom.
RAZAK, Z., ZULKIFLEE, K., SALLEH, R., YAACOB, M. & TAMIL, E. M. (2007) A Real-
Time Line Segmentation Algorithm For An Offline Overlapped Handwritten Jawi Character
Recognition Chip. Malaysian Journal of Computer Science, 20, 12.
ABIDIN, S. A. Z., OTHMAN, A. H., TAMIL, E. M. & JALI, Z. M. (2006) e-Mail Spam Source
of Origin and Content In Open Relay Exploits at Home DSL Connection Using Jackpot
Mailswerver 1.2.2 Honeypot. Proceedings of National ICT Conference. Perlis, Malaysia.
TAMIL, E. M. & IDRIS, M. Y. I. (2006) FPGA Based Approximate String Search Algorithm
Implementation To Detect Polymorphic Worm. Proceedings of 3rd International Conference on
Artificial Intelligence in Engineering and Technology (ICAIET 2006). Sabah, Malaysia.
IDRIS, M. Y. I., TENG, Y. G. & TAMIL, E. M. (2007) Hardware-Based Worm Detection
Design Using Knuth-Morris-Pratt Algorithm. Proceedings of the Conference on IT Research and
Application (CITRA 2007). Selangor, Malaysia.
109
TAMIL, E. M., IDRIS, M. Y. I. & HENG, T. H. (2007) FPGA Design of Spyware Inline Filter
Using Levenshtein Distance Approximate String Search Algorithm. Proceedings of the SCORED
2007. Universiti Tenaga Nasional, Malaysia.
TAMIL, E. M., IDRIS, M. Y. I., HENG, T. H. & SAUDI, M. (2008) Hardware based
SPAM/UCE Filter Design with Levenshtein Distance Algorithm : A Framework. Proceedings of
Internet Convergence Conference (ICC 2007). Kuala Lumpur, Malaysia.
DU, Y. (2005) A SOC Implementation of Ogg Audio Player using MicroBlaze. Department of
Electrical Engineering, Faculty of Electrical Engineering, Mathematics and Computer Science.
Delft, Delft University of Technology.
MAGNUSSON, P. (2004) Evaluating Xilinx Microblaze for Network SoC Applications.
Department of Computer Science and Electrical Engineering. Lulea,Sweden, Luleå University
of Technology.
BERNSPANG, J. (2004) Interfacing an external Ethernet MAC/PHY to a MicroBlaze system on
a Virtex-II FPGA. Computer Engineering, Dept. of Electrical Engineering at LinkÄopings
universitet. Brisbane, University of Queensland.
(2008) Embedded Systems Development. Xilinx.
Synthesis and Simulation Design Guide. Xilinx.
PEDRONI, V. A. (2004) Circuit Design with VHDL. Cambridge, Massachusetts, MIT Press.
CHU, P. P. (2006) RTL HARDWARE DESIGN USING VHDL, New Jersey, John Wiley & Sons,
Inc.
(2004) Chipscope PLB IBA. Xilinx.
(2004) Chipscope ICON. Xilinx.
(2004) ML401 Evaluation Platform. Xilinx.
(2008) Fast Simplex Link(FSL) Bus (v2.11a). Xilinx.
110
ROSINGER, H.-P. (2004) Connecting Customized IP to the MicroBlaze Soft Processor Using
the Fast Simplex Link (FSL) Channel. XAPP529. Xilinx.
VELUSAMY, S. (2008) LightWeight IP (lwIP) Application Examples. v1.0 ed., Xilinx.
(2008) XAPP1026. 1.1 ed., Xilinx.
CASAGRANDE, N. (2003) Basic-Algorithms-of-Bioinformatics Applet.
BRAY (2008) Terminal. 1.9b ed.
JOAN, B. (2009) Difference Between ASIC and FPGA. ASIC vs FPGA. Available from :
<http://www.differencebetween.net/technology/difference-between-asic-and-fpga/> [08 January,
2010].
XILINX (2010) FPGA vs. ASIC. Available from :
<http://www.xilinx.com/company/gettingstarted/fpgavsasic.htm> [08 January 2010].
HAYES, B. (2007) How Many Ways Can You Spell V1@gra? , American Scientist. Available
from : <http://amsciadmin.eresources.com/libraries/documents/2008521812126487-2007-
07Hayes.pdf> [25 January, 2010].
GRAHAM-CUMMING, J. (2006) Does Bayesian poisoning exist? Available from :
<http://www.virusbtn.com/spambulletin/archive/2006/02/sb200602-poison> [27 December,
2007].
GRAHAM-CUMMING, J. (2004) How to beat an adaptive spam filter. The Spam Conference
2004.
111
SAHAMI, M., DUMAIS, S., HECKERMAN, D. & HORVITZ, E. (1998) A Bayesian approach
to filtering junk e-mail. AAAI-98 Workshop on Learning for Text Categorization, 460.
FERRIS RESEARCH(2007) Available from : <http://www.ferris.com/research-library/industry-
statistics/> [02 January, 2007].
SPAMHAUS.org (2007) “The SPAMHAUS Project”, Available from :
<http://www.spamhaus.org/effective_filtering.html> [04 January, 2007].
TWINING, D., WILLIAMSON, M. M., MOWBRAY, M. J. F. & RAHMOUNI, M. (2004)
Email prioritization: Reducing delays on legitimate mail caused by junk mail. USENIX
Association.
COHEN, J. (2005) COMPUTER SCIENCE AND BIOINFORMATICS. COMMUNICATIONS
OF THE ACM. ACM.
SANTARINI, M. (2010) Xcell Journal. 2010 Customer Innovation Issue ed. San Jose, Mike
Santarini.
RODRIGUEZ-RAMOS, L. F., ALONSO, A., GAGO, F., GIGANTE, J. V., HERRERA, G. &
VIERA, T. (2006) Adaptive Optics Real-Time Control Using FPGA. Field Programmable Logic
and Applications, 2006. FPL '06. International Conference on.
APOSTOLICO, A. & GIANCARLO, R. (1986) The Boyer-Moore-Galil string searching
strategies revisited. SIAM J. Comput., 15, 98-105.
HORSPOOL, R. N. (1980) Practical fast searching in strings. Software: Practice and
Experience, 10, 501-506.
112
KNUTH, D. E., MORRIS JR, J. H. & PRATT, V. R. (1977) Fast pattern matching in strings.
SIAM Journal on Computing, 6, 323.
ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W. & LIPMAN, D. J. (1990) Basic
local alignment search tool. J. Mol. Biol, 215, 403-410.
113
Appendix A
Microblaze Terms and Definition
Block RAM Random access memory built inside the FPGA. Used as the primary storage of coding
that run on Microblaze. User could choose the size allocated for their coding. BRAM
are also used as the buffer of other IP in Microblaze block. The BRAM are scattered in
FPGA and limited in size therefore, it should be assigned carefully.
I-cache BRAM Instruction cache for Microblaze. Use the space of BRAM and the size is determined by
user.
D-cache BRAM Data cache for Microblaze. Also use the space of BRAM and the size is determined by
user.
PLB v46 PLB or Processor Local Bus are the bus that interconnect the Microblaze core to other
IP. PLBv46 have been replacing the OPB bus since the EDK 9.2i.
OPB Also known as On-Chip Peripheral Bus(OPB). Most of its applications have been
replaced by PLB since EDK 9.2i.
LMB Local Memory Bus used by Microblaze processor to gain fast access to the on-chip
BRAM.
FSL A fast communication bus protocol that could be used to connect the IP developed by
user to the Microblaze core or to other design unit. Fast Simplex Link bus are a lot
simpler and easier to use compared with PLB as it has less ports. Up to 16 units of
parallel FSL channels could be supported in version 7 of the Microblaze processor.
Table A.1 Microblaze terms and definitions.
114
Appendix B
RTL Schematics
Figure B.1 The RTL Schematic of the VHDL block of the algorithm IP.
array_proc
115
Figure B.2 RTL Schematics of array_proc unit for Needleman-Wunsch
(i)
(ii)
116
(iii)
117
(iv)
118
(v)
119
(vi)
120
(vii)
121
(viii)
122
(ix)
123
(x)
124
Figure B.3 RTL Schematics of array_proc unit for Smith-Waterman
(i)
(ii)
125
(iii)
126
(iv)
127
(v)
128
(vi)
129
(vii)
130
(viii)
131
(ix)
132
(x)
133
Figure B.4 The RTL Schematic of the processing element for Needleman-Wunsch.
134
Figure B.5 The RTL Schematic of the processing element for Smith-Waterman.
135
Figure B.6 RTL Schematics of side_relay
136
Appendix C
Full simulations of hardware
137
Figure C.1 The first half post-route simulation for Needleman-Wunsch Algorithm.
Figure C.2 The second half of post-route simulation for Needleman-Wunsch Algorithm.
138
5.4 Smith-Waterman Algorithm IP Post-Route Simulation
Figure C.3 The first half of post-route simulation for Smith-Waterman Algorithm.
Figure C.4 The second half of post-route simulation for Smith-Waterman Algorithm.