i
National Ribat University
Faculty of Graduate Studies & Scientific Research
Database Hiding in Tag Web Using Steganography by
Genetic Algorithm
Thesis Submitted for Fulfilling of the Requirements of
Ph.D. in Computer Science
By : Fatma Abdalla Mabrouk
Supervisor: Prof. Mudawi Mukhtar Elmusharaf
1438-2017
ii
تهاللـــسا
حم بسم للاه حيم الره ن الره
ان ك ل عل لهمت نق الوا سبح ا ع كيم(ا إنهك أ نت الع ليم م ل ن ا إله م الح )
صدق للا العظيم
(32) سورة البقرة اآلية
iii
DEDICATION
To my big family, and my small family for their continuous support and
encouragement
iv
ACKNOWLEDGEMENTS
First of all, thanks almighty god for blessing me more than I deserve, and
granting me the strength and perseverance to complete this search and present it in such
a satisfactory manner.
I would like to express my sincere gratitude to my supervisor Prof. Dr. Mudawi
Mukhtar Elmusharaf first for accepting the supervision of this thesis and second for his
patience, continuous support, motivation and immense knowledge. His guidance helped
me a lot during research time and writing of this thesis.
Also, the completion of this project could not have been accomplished without
the support of my friends Muna Ahmed Alsadig and Imtesal Ali Yasin.
I would like to thank my family for their love and encouragement: my mother
her continuous prayer has always supported me and gave me the strength to keep on
struggling, and to my brothers especially my brother Osama, for his spiritual support
throughout this thesis and in my life as a whole, and to my sisters especially Amal, the
first person who supported me to complete my proposal, thank you for giving me all this
time. And my uncle Salah Mabrouk without him this thesis couldn’t have seen the light.
A special thanks to my dear husband Amir Osman his patience and support
helped me to complete my work.
Last but not least, to the soul of my dear father who raised inside me the love of
science and always supported me during his life and death, my heartfelt thanks.
v
ABSTRACT
The main goal of this research is to study steganography technique by GA and to
design new system known as (SteganoTag), it is one of the new methods of
steganography information through hiding database within the saved web pages by
using genetic algorithm, without changing the page size, to increase the reliability and
confidentiality of the data base system. A sample of database that is designed in XML
language has been selected.
The main sector of this research is the application of genetic algorithm as a
method of data security, it is applied in the field of evolutionary programming in
artificial intelligence, as the experimental method is used in the analysis of different
types of home pages, these proposed technical HTML tags and its attributes are applied
to hide illogical database using the genetic algorithm. This proposed technique
considers labels as genes and characteristics as chromosomes. Then the architecture has
been detailed and the implementation of the proposed system of hiding information
using the software program C # and the system has been simulated using different
scenarios and a variety of data. The good side of using steganography with a genetic
algorithm has been clarified.
Finally, the most important findings in this research is that the combination
between the science of genetic algorithm and steganography raising the efficiency of the
process of masking data in a web page without changing its parameters, and the
encryption algorithm enhances the complexity of illegal attempts of steganography
removal. Genetic algorithm has the ability to achieve a significant improvement in data
security following the same methodology, this collection can be extended to involve the
development of other security systems to get safer and reliable systems of database
hiding. The high flexibility of HTML can be applied in many other techniques, other
non-public languages can be used in the process of database hiding and exploitation of
the Internet protocols, and the development of this method by introducing
developmental algorithms to increase the efficiency of data hiding. The development of
e-mail data process can also be hidden.
vi
المستخلص
دراسة تقنية إخفاء المعلومات بواسطة الخوارزمية الجينية، ومن ثم الهدف من هذا البحث
إخفاء المعلومات من خالل إخفاء قاعدة كأحد الطرق الجديدة في (SteganoTag)نظام تصميم
بيانات داخل صفحة االنترنت المحفوظة باستخدام الخوارزمية الجينية دون تغيير حجم الصفحة
. XMLلزيادة موثوقية وسرية نظام قاعدة البيانات. سيتم أخذ عينة من قاعدة البيانات المصمم بلغة
الجينية كأسلوب أمن للبيانات، ويتم النطاق الرئيسي في هذا البحث تطبيق الخوارزمية
يستخدم المنهج التجريبي استخدامها في مجال البرمجة التطورية في الذكاء االصطناعي، كما انه
HTMLفي تحليل أنواع مختلفة من الصفحات الرئيسية، تستخدم هذه التقنية المقترحة عالمات
وارزمية الجينية. وتعتبر هذه التقنية باستخدام الخ ةوصفاتها إلخفاء قاعدة بيانات غير منطقي
المقترحة أن أي عالمة تمثل الجينات وتمثل أي صفة كروموسوم.
#Cالهندسة المعمارية وتنفيذ نظام إخفاء المعلومات المقترح باستخدام برنامج تم تصميم
يد من ومحاكاة النظام باستخدام سيناريوهات مختلفة ومجموعة بيانات متنوعة. وتوضح الجانب الج
استخدام إخفاء المعلومات مع الخوارزمية الجينية.
هذا البحث أن الجمع بين علم االخفاء وخوارزمية الجينية يحسن من ئجنتامن اهم أخيرا،
من تعقيد ةعملية اخفاء البيانات في صفحة ويب دون تغيير معالمه، ان خوارزميه التشفير عزز
ا. الخوارزمية الجينية لها القدرة على تحقيق تحسن كبير محاوالت إزالة اإلخفاء الغير مسموح به
متد إلى تطوير أنظمة األمان يس المنهجية، هذا المزيج يمكن أن على أمن البيانات وباتباع نف
استغالل المرونة بها. يمكن يةوموثوق ا إلخفاء قواعد البيانات أكثر آمن األخرى للحصول على أنظمة
ر من التقنيات األخرى، كما يمكن استخدام لغات أخرى غير شائعة في في كثي HTMLالعالية للغة
عملية إخفاء قواعد البيانات. استغالل بروتوكوالت شبكة االنترنت في عملية إخفاء قواعد البيانات،
وتطوير هذه الطريقة بإدخال الخوارزميات التطويرية لزيادة كفاءة إخفاء البيانات. كما يمكن تطوير
بيانات البريد اإللكتروني.االخفاء لل
vii
CONTENT
TITLE PAGE…………….…………………………………………………………. i
ii .....……….……………………………………………………………………استهالل
DEDICATION …………………...………………………………………………… iii
ACKNOWLEDGEMENTS ……...…………………………………….................... iv
ABSTRACT…………………………………………………………....................... v
vi ..................................………………………………………………………المستخلص
CONTENT ………………………………………….….…………………….......... vii
LIST OF TABLES ……………………………….………………………….….…. x
LIST OF FIGURES ………………………………….……………………............. xi
LIST OF ABBREVIATION ………………………….……………………............ xiii
LIST OF SYMBOLS …………………………….……….……………………...... xv
CHAPTER
CHAPTER1: INTRODUCTION……………………………………………….… 1
1.1 Problem Background ………………………………………………………. 1
1.2 Motivation ….…………………………...……………………………….…. 4
1.3 Research Problem…………………………………………………………... 4
1.4 Research Objectives …………………………………….……………....…. 5
1.5 Research Questions ………………………...………………………….…… 6
1.6 Research Scope…...………………………………………………………… 6
1.7 Research Methodology and Activities……………………………………… 6
1.8 Glossary of Research Terms………………………………...…….….….…. 8
1.9 Structure of Thesis…………………………………...……………….….…. 10
1.10 Chapter Summary………………………………………………………. …. 10
CHAPTER 2: LITERATURE REVIEW AND PREVIOUS STUDIES………. 11
2.1 Introduction.………………………………………………………………… 11
2.2 Literature Review.…………………………………………………….….…. 11
2.2.1 The Security On the Internet Environment …….….……………… 11
2.2.2 Security and Steganography………………………………….….... 12
2.2.3 Data Hiding Techniques…………………………………………... 14
2.2.4 Comparative Between Cryptography and Steganography………… 16
viii
2.2.5 The Fundamental Requirements When Hiding Data in Data ….…. 17
2.2.6 Method of Hiding Data……………………………………....….… 17
2.2.7 Steganography in Digital Age………………………………….…. 18
2.2.8 The Steganography Approaches ………………………………...... 19
2.2.9 Types of Steganography…………………………………………... 20
2.2.10 Different Types of Media That Hide Data in Steganographic Techniques 22
2.2.11 Techniques of Steganography On Network…………….….……… 23
2.2.12 Network Stego Techniques………………………………….….…. 24
2.2.13 HTML Characteristics……………………………………….….… 26
2.2.14 Data Steganography Techniques On HTML Document……….…. 26
2.2.15 Genetic Algorithm Are Part of Evolutionary Algorithms………… 27
2.2.16 History of Genetic Algorithm………………………………….…. 29
2.2.17 Genetic Algorithms ’s Techniques.….…………………………… 29
2.2.18 Generating Random Population …………………………….…...... 30
2.3 Previous Studies.……………………………………………………………. 34
2.3.1 Previous Analysis of Methods in Steganography…………....……. 34
2.3.2 Previous Survey of Genetic Algorithm Applications……………... 40
2.3.3 Previous Review to HTML Web Page’s Steganography…….……. 45
2.2.4 Comparative Studies with Other Related Methods…………......…. 46
2.4 Chapter Summary…………………………………………………………… 49
CHAPTER3: RESEARCH PROCEDURES………………………………….…. 50
3.1 Introduction…………………………………………………………………. 50
3.2 Hiding Data in The Stored Web Page………………………………….…… 50
3.3 Security Definition On the Project……………………………………...…... 51
3.4 The Proposed Technique………………………………………………….... 52
3.5 The Philosophy of Genetic Algorithm Application………………….…....... 53
3.6 Creation Database and Generation of an Encryption Key …….….….…...... 56
3.6.1 Creating Database……………………………………………….… 56
3.6.2 Generating an Encryption Key……………………………….…… 56
3.6.3 Hiding The Data……………………………………………….…. 57
3.6.4 Extracting The Data………………………………………………. 60
3.7 The Architecture Design of the Proposed System……………….….….…... 62
3.8 The Implementation of the Proposed System……….……………….……... 65
ix
3.9 The Good Side of Using Genetic Algorithm in The Proposed System…… 67
3.10 The Limitations of Using Genetic Algorithm in The Proposed System…… 69
Chapter Summary………………………………………………….….….… 69
CHAPTER 4: ANALYSIS AND DISCUSSION………………………………… 70
4.1 Introduction………………………………………………………….……… 70
4.2 System Examination…………………………………………………...…… 70
4.3 Comparative Studies……………………………………………………...… 74
4.4 Performance Evaluation of Genetic Algorithm…………………….….…… 75
4.4.1 Fitness Value When Select Mutation………………………….…. 76
4.4.2 Fitness Value When Cancel Mutation……………………….….… 80
4.5 CPU Time Usage…………………………………………………….…...... 84
4.6 Chapter Summary………………………………………………….….……. 87
CHAPTER 5: CONCLUSION AND FUTURE WORK………………………... 88
5.1 Conclusions…………………………………………………………………. 88
5.2 Results ……………………………………………………………………… 89
5.3 Future work…………………………………………………………………. 89
5.4 Recommendation…………………………………………………………… 89
REFERENCES……………………………………………………………………. 91
APPENDICES……………………………………………………………………... 99
A. The Screens of the System ………………….……………….….………… 99
B.Source Code ...…...……………………………………………………… 101
x
LIST OF TABLES
Table no Table name Page no
Table 2.1: Comparison between cryptography and steganography……… 17
Table2.2: Summary of studies on steganography……………………...... 39
Table2.3(A): Summary of studies on Genetic Algorithm…………………... 44
Table2.3(B): Summary of studies on Genetic Algorithm…………………... 45
Table 2.4: Compare between some methods used web page………….…. 48
Table 4.1: Results of efficiency on the companies’ pages………………. 71
Table 4.2: Results of efficiency on the news’ pages……………………... 71
Table 4.3: Results of efficiency on universities’ pages……………….…. 72
Table 4.4: Results of efficiency on Social Media’ pages………………… 72
Table 4.5: Simulation Tests Results for pages’ size after hiding data…… 73
Table 4.6: Comparison between pages Capacity on pages share with
other method……………………………………………….….
74
Table 4.7: Training dataset to test a select mutation…………….………. 78
Table 4.8: Training dataset to test a cancel mutation……………………. 81
Table 4.9: Training dataset to test fitness function when cancel mutation. 82
Table 4.10: Results of CPU time when steganography on dell page……… 86
xi
LIST OF FIGURES
Figure no Figure name Page no
Figure 1.1: The Flow of the Research Activities ……………………......... 8
Figure 2.1: The core standards for Computer and network security……… 13
Figure 2.2: Fundamental objectives of Information security……………… 14
Figure 2.3: Types of security system……………………………………… 15
Figure 2.4: Types of Steganography………………………………………. 20
Figure2.5: Steganographic media Techniques……………………………. 23
Figure 2.6: Techniques of Steganography on Network…………………… 24
Figure 2.7: The block situation of natural search techniques……………... 28
Figure 2.8: A Genetic Algorithm’s Techniques…………………………… 30
Figure 2.9: A diagram for a The steps of general Genetic Algorithm….…. 33
Figure 3.1: The criteria to protection database on internet environment...... 51
Figure 3.2: The four security techniques………………………………...... 52
Figure 3.3: Tag characteristic on proposed technique…………………...... 52
Figure 3.4: Flow chart to hide database on HTML document……………. 59
Figure 3.5: Flow chart to extract database from HTML document............. 60
Figure 3.6: The architecture of (SteganoTag) system……………………. 63
Figure 3.7: The main steps in steganography proposal…………………… 67
Figure 4.1: Simulation Tests Results Increase rate after hide data…….…. 74
Figure 4.2: Comparison results between the two methods…………….…. 75
figure 4.3: Experimental dataset when select mutation……………….…. 77
Figure 4.4: The evaluation simulation result when select mutation 78
Figure 4.5: Simulation result of GP modelling of best fitness simulation
when select mutation………………………………………….
78
Figure 4.6: The results of select mutation to training dataset………….…. 79
figure 4.7: The result of GP model fitness evolution of the program……. 79
Figure 4.8: The results of training dataset when select mutation…………. 80
Figure 4.9: Simulation results of test data when select mutation…………. 80
xii
Figure 4.10: The evaluation result when cancel mutation…………………. 81
Figure 4.11: The results of modelling of best fitness simulation when
cancel mutation………………….…………………………….
81
Figure 4.12: The results of training dataset when cancel mutation………… 82
Figure 4.13: Model fitness evolution of the program when cancel mutation. 83
Figure 4.14: The result of training dataset when cancel mutation…….……. 83
Figure 4.15: Simulation results for time line when steganography on BBC
news page…………………………………………….……….
85
Figure 4.16: Simulation results for method grid when steganography on
BBC news page…………………………………………….….
85
Figure 4.17: The results of CPU time when steganography on dell page.…. 85
xiii
LIST OF ABBREVIATION
Abbreviation Description
ADO ActiveX Data Objects
AI Artificial Intelligence
BMP BitMap Picture
BOSS Break Our Steganographic System
BPCS Bit-Plane Complexity Segmentation
CPU Central Process Unite
DWT Separate Wavelet Transform
EAs Evolutionary Algorithms
EOF End-Of-File
EP Evolutionary Programming
FTP File Transfer Protocol such as
GA Genetic Algorithm
GIF Graphics Interchange Format
GP Genetic Programming
HTML Hyper Text Markup Language
HTTP HyperText Transfer Protocol
HUGO Highly Undetectable steGO
HVS Human Visual System
ICMP Internet Control Message Protocol
IP Internet Protocol
IS Information Security
IT Information Technology
xiv
JPEG Joint Photographic Experts Group
LEC Largest Embedding Capacity
LSB Least Significant Bit
LTE Long Term Evolution
MSE Mean Square Error
OPAP Optimal Pixel Adjustment Process
PSNR Peak signal-to-noise ratio
PVD Pixel Value Differencing
RS Retention of Secrecy
TCP Transmission Control Protocol
UDP User Datagram Protocol
URL Uniform Resource Locator
WWW World Wide Web
XML eXtensible Markup Language
xv
LIST OF SYMBOLS
Symbol Definition
(a1, a2) A couple of attributes
(c1, c2) A couple of chromosomes
(G1 …, Gn) A set of gene
Gn Tags
H HTML
On The more crossover order of the couples of attributes in a particular
tag
Pn Pair of chromosomes
Rn,i The sum of for all pairs of chromosomes
nr The number of the attributes
pa The capacity of each attribute on page
1
CHAPTER 1
INTRODUCTION
1.1. Problem Background
The free flow of information has opened the door to everyone to intervene
without distinguishing between beneficial and harmful intervention, making information
susceptible to damage, sabotage, espionage, theft and other forms of aggression. Also
the easy use of advanced software and the evolution of networks which provided greater
flow of information have created more opportunities for easy access and aggression by
different unauthorized users to most sites, different files and information sources.
Sometimes the same service provider may be the intruder and penetrate the secrecy and
privacy of users even though the content of the data is encrypted.
This is why database privacy became one of the main challenges that face this
era of Information Technology (IT) and constitute a source of large anxiety for all users.
Information technology infrastructure started taking these customer concerns as one of
their top priorities in their present and future product and developmental activities.
Some sort of protection for information is achieved by immunization of its environment
through what has been termed as the security of information, namely steganography, it
is the science of hiding information.
The main goal of steganography is to hide the data from a third party, it differs
from what is called cryptography which makes data unreadable by a third party.
“There are a large number of steganographic methods that most of us are familiar with
(especially if you watch a lot of spy movies!), ranging from invisible ink and microdots to secreting a
hidden message in the second letter of each word of a large body of text and spread spectrum radio
communication. With computers and networks, there are many other ways of hiding information”.
(Gary C. Kessler,2001)
2
In steganography techniques, many different cover file formats can be used but
because of wide spread application of digital images on the internet, they have become
the most popular format.
Steganography today, is more sophisticated than the cryptography, allowing a
user to hide large amounts of information within image and audio files. These forms of
steganography often are used in conjunction with cryptography so that the information
is doubly protected; first it is encrypted and then hidden so that an adversary has to first
find the information and then decrypt it.
Also the main goal of steganography is to communicate securely in a completely
undetectable manner and to avoid drawing suspicion to the transmission of a hidden
data. It is not to keep others from knowing the hidden information, but it is to keep
others from thinking that the information even exists. If a steganography method causes
someone to suspect the carrier medium. And researches are still underway to develop
this technology and use it to protect the information in all fields.
In this study it has been proposed to use genetic algorithm in hiding data because
Genetic Algorithm GA has the ability to increase the hiding capacity compared to other
systems, according to the experimental results on this proposal it has been found that
genetic algorithm is capable of providing a larger embedding capacity without causing
noticeable distortions on media cover in comparison with similar existing methods.
GA is a population-based metaheuristic algorithm that uses genetics-inspired
operators to sample the solution space. This means that this algorithm implies some
kind of genetic operators on a population of individuals in order to evolve them
throughout the generations.
GA is a combinatorial optimization technique and its general purpose
optimization method based on Darwin theory of evolution, that searches for an optimal
near value of a complex objective function by simulation of the natural evolutionary
process.
GA has been successfully used in a wide variety of problem domains, it consists
of three basic operators: selection, crossover, and mutation. The algorithm starts with a
3
set of solutions to the problem, the solution set are represented by chromosomes in GA
called the population.
Data hiding which have got many methods, is described in the literature review
on chapter two, is a widely used method in information security. In data hiding
applications, optimization techniques are utilized in order to improve the success of
algorithms. The genetic algorithm is one of the largely used heuristic optimization
technique in these applications.
The current information system security is not able to handle the increasing
development and increasingly complex nature of the computer systems and their
security needs, based on this deficiency, genetic algorithm has been successfully applied
to information security problems like steganography system.
Also the proposed number of genes, which are primary numbers, that have been
processed through a genetic algorithm, will reduce the time to hide data through the
primary tag that is found in relation table. The suitable chromosomes have been selected
through fitness function which assessed the chromosomes of the current generation, in
order to select the offspring.
The proposed technique uses the HTML tags and their attributes to hide database
illogically using genetic algorithm. It is based on the fact that the ordering of the
attributes in the HTML tags has no impact on the appearance of the document. This
ordering can be used to hide the data efficiently. The proposed technique considered
that any tag represents gene and an attribute represents chromosome.
There were two techniques have been integrated in this study. The first
technique is genetic algorithm technique which is inspired from natural evolution and its
main function is to identify the problem and generate its useful solutions through
optimization and search problems.
The second technique is steganography technique which inspired from the
science of hiding and its main function is embedding the data in a transmission medium.
Its ultimate goal of this integration is to increase the efficiency of hiding data within a
large capacity.
4
1.2 Motivation
The main motivation for this work is to investigate:
• Protection of database through hiding the database in mediator without
changing its features (size and specifications), and scattering the database
within certain parts of the mediator. The ability to separate the program of
concealment and decoder concealment will be protected by a password.
• The increased need to protect intellectual property rights by digital content
owners.
• Using a biologically inspired technique like genetic algorithm coupled with
steganography system can be used efficiently to design future generations of
intelligent information security systems.
• Using the steganography is to avoid drawing attention to the transmission of
hidden information.
The goal of the process is optimization of steganography function and increase the
information security.
1.3 Research Problem
The internet has a huge number of such web pages and their number is growing
rapidly. It is clearly that the internet huge size, combined with lack of an effective
control, gives the opportunity to smuggling some content of the ordinary web pages .
“Obviously the internet huge size, combined with lack of an effective control, gives one an
opportunity to “smuggle” some undesirable content into the ordinary web pages. Furthermore, a number
of methods exist that allow to hide such a content or hidden message without changing the web page
look. They are taking advantage of steganography”. (L. Polak 1, Z. Kotulski 2,2010)
As the technology of transmitting information on network in secure, the
importance of information security came to be recognized widely. This research is
simulation of information security system and its application to hide database on web
page which save in local memory.
5
Steganography technique have lately attracted considerable attention as a good
solution for information security and copyright problems and the protection method for
communication privacy.
The goal of the project is to hide the database within a web page using
steganography with genetic algorithm and to compare those algorithms in the context of
speed and quality of concealing, and describe their functionality in data security.
One of problems that face hiding of data is the limited size of the file in which
information needs to be embedded.
1.4 Research Objectives
1. The objective of this research is to study the hiding of database inside a specific
file (Multimedia File) without changing the size of this file and by using genetic
algorithm to increase the reliability and confidentiality of the database system.
2. Hiding information by covering it with another information, and integrate the
new information with the existing information so it does not show the hidden
information and the other information remains visible as it was before, by this the
system can hide a database consisting of nearly 100 bits and more within a web page.
3. The database is hiding within a mediator (web page) without changing its
features (size and specifications), and is scattered within certain parts of the mediator,
the ability to separate the program of concealment and decoder concealment are
protecting by a password that is agreed upon.
4. Simulate steganography system and testing the efficiency and accuracy of hiding
the database through genetic algorithm using c#.net software, and compare and it with
the existing the steganography systems.
5. To Provide more confidentiality and integrity of confidential data authentication
while accessing, storing the database easily.
6. The method used steganography and encryption to ensure the confidentiality of
all data, and duplicate a protection of database.
7. Using steganography to hide a secret data in the best, no one can see that both
parties to a secret connection.
6
1.5 Research Questions
This thesis is an attempt to find the answers for the following questions:
1. How do genetic algorithms produce a safe information hiding tool?
2. Is steganography on HTML document suitable tool for building integrity
information security system?
3. Do choosing illogical data base has an impact on the proposed steganography
system?
1.6 Research Scope
-Thematic scope: The main scope of this research to apply genetic algorithm as a
security technique, it is used in the field of Evolutionary Programming (EP) in Artificial
Intelligence (AI), specifically in the research branch and problem-solving.
Genetic algorithm has the ability to find the best solutions to the problems and
improvement of problem optimization depending on the random statistic search to hide
the database.
-Time scope of the research: The period from 2012 to 2017
1.7 Research Methodology and Activities
On this thesis experimental methodology have been used, analyzing different types
of home pages like company pages, news portals, social media pages and university
pages. These pages have got a lot of items that must be described. HTML tags with
features, increasing the ability of steganography and evaluate the performance of the
genetic algorithm in steganography.
The proposed technique uses the HTML tags and their attributes to hide database
illogically using genetic algorithm. It is based on the fact that the ordering of the
attributes in the HTML tags has no impact on the appearance of the document. This
ordering can be used to hide the data efficiently. The proposed technique has considered
that any tag represents a gene and any attribute represents a chromosome.
7
Home web pages of different sizes are checked based on the number of attributes in
tags. The performance of the system is tested over many runs such as examining
different web pages and categories, testing the effect of variations in multiple
parameters simulating the algorithm to hide data on web page.
The following activities are the main objectives of this research:
1- Analyze a web page and then find out or create the number of the attributes that
must be taken:
• Follow the steps of the genetic algorithm according to their activities.
2- Create and verify the theoretical design:
• Build a theoretical flow of the agent design and verify it.
• Verify the logical and computational models.
3- Create novel mathematical and computational models:
• Build the mathematical and computational model.
4- Simulate the proposed system using c# software:
• Simulate the system using different scenarios and diverse dataset.
• Evaluate the model reliability using traditional and novel metrics and compare it
with the contemporary models.
This research will perform the following:
• Focus on hidden database in web page.
• Hide information using genetic algorithm.
• Study the structure features of HTML on attributes of tag after putting the
hidden data and identify its relation to the attributes of web page.
8
Figure 1.1: The
Flow of the
Research Activities
1.8 Glossary of Research Terms
Allele:
One of two or several alternative forms of a gene that generated by mutation and
are found at the same place on a chromosome 0 or 1.
Cryptography:
The science of saving information security, it is the base of modern security
technologies used to protect information and resources needed on networks.
Evolutionary Algorithms:
9
(EAs) a term used to describe computer based problem solving systems which use
computational models of evolutionary processes as essential elements in their design
and implementation.
Genetic Algorithm:
(GA) is a search inference that simulate the process of natural selection. This
inference is used to generate useful solutions for optimization and searching problems,
using methods inspired by natural evolution.
Plain text:
Refers to any message that is not encrypted - also called clear text.
Steganalysis:
Is the study to discover hidden messages using a technique called steganography.
Steganographic:
Adjective related to a secret data within an ordinary visible information in a
technique called steganography.
Steganography:
The art and science of hiding information as a written text, picture or sound, this
technique can be used together with cryptography technique as a way to increase data
protection.
(Stego) Object:
The object that is actually going to be seen out in the open the text, picture or sound
that will be used to carry the message right under everyone’s nose. It is the result of
combining the cover text and the embedded message.
1.9 Structure of Thesis
10
This thesis is composed of the following chapters:
• The first chapter introduces the whole research in addition it gives a brief idea
about the main concepts involved in this work, motivation of the novel approach,
problem statement, research questions, objectives, scope of research, research
methodology, relevant research activities, and some glossary of research terms.
• The second chapter provides literature review in steganography and genetic
algorithm. This chapter gives an overview of steganography approaches and
security, and defines genetic algorithm with its different types and different
techniques. Also, it gives a brief idea about previous studies in steganography and
genetic algorithm applications and compare these studies with other related
methods.
• The third chapter analyses the problem statement and examines in details the
theoretical aspects of the proposed system and discusses the architecture and
implementation of the proposed system, it demonstrates the good side of using
steganography with genetic algorithm.
• The fourth chapter discuss the evaluation and measurement of system, and discuss
the experiments and dataset used to verify the model and the results.
• Finally, the fifth chapter gives the conclusion and recommendations for future
work.
1.10 Chapter Summary
This chapter presents the research problems, objectives, motivation, scope,
methodologies and activities. The main problem in information security system,
information protection the form hackers and how to hide database with integrated
security.
CHAPTER 2
11
LITERATURE REVIEW AND PREVIOUS STUDIES
2.1 Introduction
As the internet became the fundamental tool in communication services,
information delivery and financial transactions, and as e-governments all-over the world
become heavily dependent on the internet, data security in internet has become the most
important factor to be considered. So the existence of security and safety requirements
for most of online applications to protect against unauthorized access became
mandatory. Governments, large companies, publishing and broadcasting industries
became in urgent need for a technique that can effectively secure and protect their
confidential data, and this has motivated the innovators to discover the different security
methods like steganography, genetic algorithm and cryptography.
Genetic algorithm has been used as an effective technique for information hiding
and improving the performance of information hiding systems. Using genetic algorithm
is mainly based on the mechanism of natural genetics and the theory of evolution.
2.2 Literature Review
2.2.1 The Security on The Internet Environment
There are different secret terms used in internet to prevent the disclosure of
information to unauthorized people. For example, the protection system that is applied
in electronic commerce, sending credit card details from the buyer to the merchant to
compete transaction process will expose the secrecy of the buyer.
The system applies secrecy by encrypting the card number during transmission,
by limiting access to storage areas or hiding the serial number of the card, and printing
receipts records, but unfortunately all these measures are not enough for data protection.
Exposing the secrecy may take different forms. For example, spying on a
personal computer screen to steel a password or exposing personal database without the
owner knowledge, or when hacking governmental computers or computer that keeps
highly sensitive information, leading to violation of high confidentiality.
12
2.2.2 Security and Steganography
Computer and network security have certain core standards that any secret
communication method should address. Though no one method addresses all security
requirements, steganography does satisfy several of these requirements, sometimes in
conjunction with other technologies such as crypto (Donovan Artz,2001).
1. Confidentiality
Confidentiality is a basic aspect of network security, and making sure that any
unauthorized person cannot gain access to or read information on network.
Confidentiality is at the heart of what steganography does. Steganography, though,
accomplishes confidentiality in a slightly different manner than cryptography.
With cryptography, an unauthorized person can see the information but cannot
access it. Because they can tell that there is information being protected, the
unauthorized person may try to break the encryption. With steganography, because the
data is hidden, any unauthorized party does not even know there is sensitive data there.
From a confidentiality standpoint, steganography keeps the information protected at a
higher level.
2. Survivability
The main activity of communication is that one party transmits information and
the other party receives it. The completion of this cycle represents the feature of
survivability. Even when data is being hidden in a message you have to be sure that
whatever processing of the data takes place between sender and receiver does not
destroy the information. Must be sure that the information is not only received by the
recipient, but also extracted so that the message can be read. When using
steganography, it is critical to understand the processing a message will go through and
determine whether the hidden message has a high chance of survivability across a
network.
3. No Detection
13
It makes no sense to perform data hiding if someone can figure out how or
where the information is hidden. If someone can easily detect where you hide your
information and find your message, it defeats the purpose of using steganography. The
way that steganography is usually performed to make it hard to find the hidden data is to
do it in such a way that there is little change to the properties of host file.
Therefore, the algorithm that is used must be robust enough that, even if
someone knows how the technique works, they cannot easily find out that you have
hidden data in a given file. A robust algorithm is one where the insertion method is hard
to detect and hard to destroy.
4. Visibility
When hiding data, it must be undetectable, so it must make sure that people
can’t see any visible changes to the host file in which the data is hidden. If hide a secret
message in an image and it distorts the image in such a way that someone can tell it has
been modified, steganography has been unsuccessful. (Eric Cole, 2003)
Figure 2.1 The core standards for Computer and network security
2.2.3 Data Hiding Techniques
data
14
Confidentiality
Integrity
Authentication
(Do not believe everything you see or hear) is confirmed by concepts of
computer science in the field of data hiding. Bringing most of the contents of the
computer may contain some hidden information without the user's knowledge. Hide
data with all the contents of the advantages and disadvantages of the techniques that
became must interest in them and diving depth.
Some facts concerning hiding of data can be found in. (Eric Cole, 2003). Also in
literature (Eiji Kawaguchi, Eason, 2007), (B.B. Zaidan, A.A. Zaidan, A.K. Al-Frajat,
H.A. Jalab 2010), (Matthew Walker, 2001) and other books and papers.
Information Security (IS) is one of the most misunderstood things within the
Information Technology (IT) world right now. (Robert H. Williams III,2007) so it is
necessary to discuss briefly these techniques before a thorough review is provided.
There are three fundamental objectives of computer security: confidentiality,
integrity and authentication as shown in Fig 2.2
Figure 2.2: fundamental objectives of Information security
A. Confidentiality: Preserving authorized restrictions on information access and
disclosure, including means for protecting personal privacy and secure
information.
B. Integrity: Guarding against improper information modification or destruction
and includes ensuring information non-repudiation and authenticity.
C. Authentication: Assure that the source of the message is an authorized party, or
to detect any unauthorized access to or use of information.
15
An important aspect of information security is recognizing the value of
information and the expected attacks for this information from unauthorized parties then
defining appropriate procedures and protection requirements for the information. Not all
information is equal and so not all information requires the same degree of protection.
This requires information to be assigned a security classification where the top-
secret data need highly secure software and procedures to deal with this data and assign
different level of authorized parties such as some parties authorized to disclose the data
only while another have the ability to change it. (B.B. Zaidan, A.A. Zaidan, A.K. Al-
Frajat , H.A. Jalab 2010).Protection system can be classified in to more specific as
encryption information (cryptography)hiding information (steganography).
Figure 2.3: Types of security system [36]
2.2.4 Comparative Between Cryptography and Steganography
16
The advent of computers there has been a vast dissemination of information,
some of which needs to be kept private, some of which doesn't.
The information may be hidden in two basic ways (cryptography and
steganography). The methods of cryptography do not conceal the presence of secret
information but render it unintelligible to outsider by various transformations of the
information that is to be put into secret form, while methods of steganography conceal
the very existence of the secret information.
The main goal of cryptography is keeping data secure form unauthorized
attackers. The reverse of data encryption is data decryption.
The main goal of steganography is to communicate securely in a completely
undetectable manner and to avoid drawing suspicion to the transmission of a hidden
data. It is not to keep others from knowing the hidden information, but it is to keep
others from thinking that the information even exists. If a steganography method causes
someone to suspect the carrier medium.
In hide information can drive two techniques, one is digital watermarking is the
process of embedding information into a digital signal in a way that is difficult to
remove, the signal may be audio, pictures, video or text files; its mostly used for
demonstrate the intellectual property rights purpose such as adding copy right logo or
text (author signature) for multimedia files.
Steganography is the art and science of writing hidden messages in such a way
that no one, apart from the sender and intended recipient, suspects the existence of the
message. Since, the main use for steganography is to send secure messages between
parties, then it’s aim to prevent the message being detected by any other party (Eiji
Kawaguchi and Eason, 2007).
Table2.1: Comparison between cryptography and steganography
17
2.2.5 The Fundamental Requirements When Hiding Data in Data
The requirements of any data hiding system can be categorized into security,
capacity and robustness (Ingemar J Cox et al.1996). All these factors are inversely
proportional to each other creating the so called data hiding dilemma. (Arup Kumar
Bhaumik1, June 2009)
2.2.6 Method of Hiding data
There are essentially three ways to hide data: injection, substitution, and
generation.
1. Injection
Finds areas in a file that will be ignored and puts your covert message in those
areas. For example, most files contain an EOF or end-of-file marker. When playing an
audio file, the application that is playing the file will stop playing when it reaches the
EOF because it thinks it is the end of the file.
2. Substitution
Cryptography Steganography
1. The encrypted letter could be seen by
anyone but cryptography make the
message not understandable the end result
in cryptography is the cipher text.
1. Steganography is hiding the message in another
media so that nobody will notice the message.
2. The end result in Cryptography is the
cipher.
2. The end result of information hiding is stego-
media
3. The goal of a secure Cryptographic is to
prevent and interceptor from gaining any
information about the plaintext from the
interceptor cipher.
3. The goal of secure Steganographic methods is to
prevent an observant intermediary from even
obtaining knowledge of the mere presence of the
secret data.
4.Any person has the ability of detecting
and modifying the encrypted message.
4. The hidden message is imperceptible to anyone.
5. Steganography cannot be used to adapt
the robustness of Cryptographic system.
5. Steganography can be used in conjunction with
cryptography by hiding an encrypted message.
18
Finds insignificant information in the host file and replaces it with your covert
data. For example, with sound files each unit of sound hear is composed of several
bytes. If modify the Least Significant Bit (LSB) it will slightly modify the sound, but so
slightly that the human ear cannot tell the difference.
3. Generation
Creates a new overt file based on the information that is contained in the covert
message. For example, one generation technique will take covert file and produce a
picture that resembles a modern painting. This is done by substituting a patch of green
for every 0 and substituting a patch of yellow for every 1. The picture is created solely
based on the bit sequence of the covert file. (Eric Cole, 2003)
2.2.7 Steganography in Digital Age
Steganography is the art and science of invisible communication. This is
accomplished through hiding information in other information, thus hiding the existence
of the communicated information. The word steganography is derived from the Greek
words “stegos” meaning “cover” and “grafia” meaning “writing”, defining it as
“covered writing”. In image steganography the information is hidden exclusively in
images.
The idea and practice of hiding information has a long history. In Histories the
Greek historian Herodotus writes of a nobleman, Histaeus, who needed to communicate
with his son-in-law in Greece. He shaved the head of one of his most trusted slaves and
tattooed the message onto the slave’s scalp.
When the slave’s hair grew back the slave was dispatched with the hidden
message. In the Second World War the Microdot technique was developed by the
Germans. Information, especially photographs, was reduced in size until it was the size
of a typed period. Extremely difficult to detect, a normal cover message was sent over
an insecure channel with one of the periods on the paper containing hidden information.
“Today steganography has come into its own on the Internet. Used for transmitting data as well
as for hiding trademarks in images and music (called digital watermarking), electronic steganography
19
may ironically be one of the last bastions of information privacy in our world today”. (Eric Cole,
2003)
Steganography has traditionally been used by the military and criminal classes.
One trend that is intriguing today is the increase in use of steganography by all sectors.
And researches are still underway to develop this technology and use it to protect the
information in all fields.
2.2.8 The Steganography Approaches
The encrypted message using steganography, the resulting stego-image can be
transmitted without revealing that secret information is being exchanged. Furthermore,
even if an attacker were to defeat the steganographic technique and detect the message
from the stego-object, he would still require the cryptographic decoding key to decipher
the encrypted message (Zaidan, Zaidan, 2009). Since then, the steganography
approaches can be divided into three types:
1. Pure steganography
2. Secret key steganography
3. Public key steganography
1. Pure Steganography
This technique simply uses the steganography approach only without
combination with other methods. It is working on hiding information within cover
carrier.
2. Secret Key Steganography
The secret key steganography uses the combination of the secret key
cryptography technique and the steganography approach. The idea of this type is to
encrypt the secret message or data by secret key approach and then hide the encrypted
data within cover carrier.
3. Public Key Steganography
20
The last type of steganography is to combine the public key cryptography
approach and the steganography approach. The idea of this type is to encrypt the secret
data using the public key approach and then hide the encrypted data within cover
carrier. Further direction can be done by using small size of encrypted data to hide it
within multimedia cover.
2.2.9 Types Of Steganography
Over the years, people have categorized steganography techniques in different
ways. The importance classification scheme breaks steganography down into the
following three groups:
1. Insertion-based
2. Algorithmic-based
3. Grammar-based
This scheme focuses on how data is hidden. Note that as new techniques have been
developed, they do not clearly map into this scheme.
Figure 2.4: Types of Steganography
1. Insertion-Based
21
Insertion-based steganography techniques work by inserting blocks of data into a
host file. Using an insertion-based technique, data is inserted at the same point in every
file. This type of technique works by finding places in a file that can be changed,
without having any significant effect on the host file.
Once these redundant areas are identified, the data to be hidden can be broken
up and inserted in them and will be fairly hard to detect. Depending on the file format,
this data can be hidden between headers, in color tables, in image data, or in several
other fields.
A very common way to hide data is to insert it into the Least Significant Bits
(LSB) of an 8-bit or 16-bit file—for example, a 16-bit sound file. With sound files, one
can change the first and second LSB of each 16-bit group without having a large impact
on the quality of the sound. Because data is always being inserted at the same point for
each file, this can be categorized as an insertion steganography technique.
2. Algorithmic-Based
Algorithmic-based steganography techniques use some sort of computer
algorithm to designate where in a file data should be hidden. Because this category of
technique doesn’t always insert data in the same spot in each file, it is possible that the
process will degrade the quality of the file. If someone compared the original file to the
one where data is hidden, that person might be able to see or hear a change in the file.
This category of techniques has to be examined carefully to ascertain whether a
technique is detectable. Remember that one of the goals of stego is to make sure nobody
can detect that data is hidden in a file. If you do not create an algorithm and seed
number that place the data in nonessential locations, the hidden data could completely
obliterate the original image file or result in an image that looks very unusual for
example, if you hide data in an image file you must provide a number to seed the
stenographic technique. This number could be either a random number or the first five
bytes of the file. The algorithmic technique would take the seed value and use it to
determine where to place the secret data throughout the file.
22
The algorithm could be very complex or as simple as this: If the first digit is 1,
insert the first bit at location x; if the first digit is 2, insert the first bit at location y; and
so on. If careful thought is not given to the algorithm that is used, it could result in a
disastrous output file.
3. Grammar-Based
Both the insertion and algorithmic techniques would take the secret message and
somehow embed it in a host file. Grammar-based steganography techniques require no
host file in which to hide a message because it generates its own host file.
This class of technique uses hidden data to generate an output file based on a
predefined grammar. In fact, the output file produced looks just like the predefined
grammar.
This approach could be used to hide data from automatic scanning programs that
use statistical patterns to identify data. These programs scan data looking for anything
unusual. Such a program can scan for English type text, and anything that fits the profile
would not be flagged by the scanning program. (Eric Cole, 2003)
2.2.10 Different Types of Media That Hide Data in Steganographic Techniques
Steganography use different kinds of media to hide the data.
1) Text Steganography:
This technique hides the data within a text file. It is difficult technique, because
sometimes a redundant amount of data is needed to be hidden within a message that is
scarce in text files. (Neha Rani, july 2013).
2) Image Steganography:
It is one of the most commonly used techniques because of the limitation of the
Human Visual System (HVS). Human eye cannot detect the difference in a vast range
of colors, and so it will not be able to notice an insignificant change in the quality of an
image that results from steganography.
23
3) Audio Steganography:
This technique transmits hidden data within an audio signal. it is a difficult form
of steganography, because it is very hard to imbed a secret data within digital sound.
4) Video Steganography:
This technique of hiding some secret data inside a video file. The addition secret
data to a video file is not recognizable by the human eye as the change of a pixel color
is not easily to be detectable.
Figure2.5: Steganographic media Techniques
2.2.11 Techniques of Steganography on Network
A network security system depends on layers of protection and consists of
multiple components like networking monitor, security software and hardware
computer. the components work together to increase the security and the integrity of the
computer network. On a computer network can be used stego techniques to hide files in
traffic. ( Rupali Gawade, 2014)
Plain text
Steganography
Image Video Text Audio
Webpage text
Java script CSS HTML
XML
24
When make a simulating connection usually uses port 80 traffic which is
Hypertext Transfer HTTP protocol, the message might pass without raising anyone’s
suspicion.
2.2.12 Network Stego Techniques
There are four techniques used on the network, each one of has a different standard
of sophistication and a different approach to hide data. (Eric Cole, 2003)
1. Hiding in an attachment
2. Hiding in network headers
3. Hiding in an overt protocol
4. Hiding in a transmission
Figure 2.6: Techniques of Steganography on Network
1.Hiding in an Attachment
It is the simplest form of using a network to transport stego file from party to
another. The stego file is a technique to hide a secret message in the file and take this
file which contains hidden data and attach it to some other form of network traffic.
25
There are three common ways to do these secret massages: by email, by File Transfer
Protocol such as FTP, or by posting a file on a web site.
2.Hiding Data in Network Headers
It is important to know the networking and Transmission Control
Protocol/Internet Protocol (TCP / IP) protocol to understand this technique. Protocol
(TCP / IP) actually contains four major communication protocols: IP, TCP, User
Datagram Protocol UDP, Internet Control Message Protocol (ICMP), these protocols
are running on each of the sending computer and the receiving computer to standardize
communications. TCP protocol on the sending computer communicates with the TCP
protocol on the receiver's computer, the IP protocol communicate with the sender's IP
protocol on the receiving computer. This makes protocol header based stego is possible.
Every packet goes across the Internet must contain these headers, and can easily embed
data in the unneeded portion and transmit the Hidden Data with FTP.
3.Hiding in a Transmission
It is the ability of hiding stego in the attachment, by using one program to hide
data and another program to transfer information file. For example, S-tools can be
applied to hide a secret message in a file and then use a separate e-mail program to
attach a photo and send it.
4.Hiding in an Overt Protocol
This technique is called data camouflaging, because it makes data look like
something else. This technique can take data and put it in normal network traffic, and
modify the data in such a way that it looks like the overt protocol. Most networks carry
large amounts of HTTP or web traffic, so that they can send data over port 80, and it
would be looking like web traffic. The problem in this category is that, if someone
examined the payload, it would not look like normal web traffic, which usually contains
HTML, on the other hand if symbols such as < > </> are added to the data, the traffic
would look like web traffic and probably would slip from the casual observer.
26
2.2.13 HTML Characteristics
The HTML is the language which is widely used in the Internet domain without
having any effect on the network contents, its wide existence on the Internet gave it the
ability to cover all kinds of data, where case of alphabets in opening and closing tags are
swapped at all upper case or all lower case, keeping in view the secret message bit ‘0’ or
‘1’. (K. F. Rafat , M. Sher, December 2012)
The factors promoted to use text steganography in an HTML document, is web
pages, they are present in a huge amount, and detecting which one is containing the
hidden information is next to impossible, also the order of tags used for formatting the
appearance of a web page does not matter.
The HTML tags are enclosed in “angle brackets” that generally appear in pair
referred to as start <” and ‘end >” tag. HTML tags are case insensitive and permits
reordering of tags in variety of ways. This skill of HTML tags is being exploited for
hiding bits of secret data. for example:
The HTML tag “< caption > some text < / caption >” may represent the secret
bit ‘0’ whereas the tag “< caption > some text < / caption >” may represent secret bit
‘1’.
2.2.14 Data Steganography Techniques on HTML Document
To hide a data on HTML document, one of the following techniques could be
chosen. (Neha Rani,2013)
1) Selectively hiding:
This technique requires a large amount of plain text, in which the characters are
hidden in any specific location within the characters of the words. The text can be
extracted by concatenation of the characters.
<caption> <center> <cite>
27
2) HTML web pages:
Because the attributes of HTML tags are case sensitive, this technique uses this
fact to hide the text. Then original text can be retrieved by using the same characters.
(Sandipan Dey,2010)
3) Hiding a character using Whitespace application:
This technique uses the fact that 0 is determined by fewer numbers of
whitespaces and the 1 is determined by the number of whitespaces between words.
2.2.15 Genetic Algorithm are Part of Evolutionary Algorithms
In the 1950s and the 1960s, several computer scientists independently studied
evolutionary systems with the idea that evolution could be used as an optimization tool
for engineering problems. The idea in all these systems was to evolve a population of
candidate solutions to a given problem, using operators inspired by natural genetic
variation and natural selection.
The field of evolutionary strategies has remained an active area of research,
mostly developing independently from the field of genetic algorithms. Developed
"evolutionary programming," a technique in which candidate solutions to given tasks
were represented as finite−state machines, which were evolved by randomly mutating
their state−transition diagrams and selecting the fittest. Evolutionary computing is a
rapidly growing area of artificial intelligence.
Evolutionary Algorithms (EAs) are population based meta heuristic optimization
algorithms that use biology-inspired mechanisms and survival of the fittest theory in
order to refine a set of solution iteratively. (GAs) are subclasses of (EAs) where the
elements of the search space are binary strings or arrays of other element types.
(GAs) are computer based search techniques patterned after the genetic
mechanisms of biological organisms that have adapted and flourished in changing
highly competitive environment.
28
The last decade has witnessed many exciting advances in the use of genetic
algorithms to solve optimization problems in process control systems (GAs) are the
solution for optimization of hard problems quickly, reliably and accurately.
As the complexity of the real-time controller increases, the (GAs) applications
have grown in more than equal measure. The figure 2.7 show outlines the situation of
natural techniques among other well-known search procedures. (S.N.Sivanandam
S.N.Deepa,2007)
Figure 2.7: The block situation of natural search techniques [4]
29
2.2.16 History of Genetic Algorithms
The history of Genetic algorithms (GAs) was invented by John Holland in the
1960s and was developed by Holland and his students and colleagues at the University
of Michigan in the 1960s and the 1970s. In contrast with evolution strategies and
evolutionary programming, Holland's original goal was not to design algorithms to
solve specific problems, but rather to formally study the phenomenon of adaptation as it
occurs in nature and to develop ways in which the mechanisms of natural adaptation
might be imported into computer systems.
Holland's GA is a method for moving from one population of "chromosomes"
(e.g. strings of ones and zeros, or "bits") to a new population by using a kind of "natural
selection" together with the genetics−inspired operators of crossover, mutation, and
inversion.
Each chromosome consists of "genes" (e.g. bits), each gene being an instance of
a particular "allele" (e.g., 0 or 1). The selection operator chooses those chromosomes in
the population that will be allowed to reproduce, and on average the fitter chromosomes
produce more offspring than the less fit ones.
Crossover exchanges subparts of two chromosomes, roughly mimicking
biological recombination between two single−chromosome ("haploid") organisms;
mutation randomly changes the allele values of some locations in the chromosome; and
the inversion reverses the order of a contiguous section of the chromosome, thus
rearranging the order in which genes are arrayed. Holland’s introduction of a
population−based algorithm with crossover, inversion, and the mutation was a major
innovation. (Mitchell, Melanie,1998)
2.2.17 Genetic Algorithms ’s Techniques
Coding techniques in genetic algorithms are specific problem that transforms to
solve the problem in the chromosomes. There are various coding techniques used in
genetic algorithms. Binary encoding, Permutation encoding, Value encoding and Tree
encoding is shown in Figure 2.8 (Anit Kumar,2013). The most common form of
encoding is Binary encoding; it gives many possible chromosomes.
30
Binary encoding is often not natural for many problems and sometimes
corrections must be made after crossover and mutation. The best one that suits with the
request queue or coding problems are permutations. Where coding is used flip. In
flipping encoding, each chromosome is a series of numbers in sequence.
Value coding is a technique in which each chromosome is a series of some of
the values, it is used where require some more complex values. It is more necessary to
develop some new crossover and mutation specific for the problem.
Tree encoding is used mainly for evolving programs or advanced expressions for
genetic programming, where the crossover and mutation can be done relatively easily.
Chromosomes with binary encoding
Chromosome A 101100101100101011100101
Chromosome B 111111100000110000011111
Chromosomes with permutation encoding
Chromosome A 1 5 3 2 6 4 7 9 8
Chromosome B 8 5 6 7 2 3 1 4 9
Chromosomes with value encoding
Chromosome A 1.2324 5.3243 0.4556 2.3293
Chromosome B ABDJEIFJDHDIERJFDLDFL
Chromosome C (north), (south), (east), (west)
Chromosome B Chromosome A
( do_until step wall )
y=(x/(9+y))
Chromosomes with tree encoding
Figure 2.8: A Genetic Algorithm’s Techniques [34]
2.2.18 Generating Random Population
Genetic algorithms (GAs) are mainly inspired by the famous Darwinian’s theory
of survival of the fittest. It is fundamentally based on a group of solutions represented
by chromosomes, called a population.
Do until
step wait
/
+
y 9
+ x
31
A group of solutions is extracted from one population and used to create a new
population, with the motivation that the new population of chromosomes can be better
than the old ones, and the solutions are selected according to their suitability to form
new solutions. This process is repeated until a satisfying condition is reached. The main
outlines of genetic algorithms are as below: (Mitchell Melanie, 1999)
Step 1: Generation of Random Population
It is a random group of solutions formed among the initial population. Its genetic
makeup involves unique representation, which maintained a generation of individuals at
each point in the search process.
It is important that the initial population has a perfect variety of individuals,
because they learn from each other. The first order of diversity is by configuration of
network and random uniformity, this diversity is not related to local optimization
methods or assembly and lack of this diversity will lead to suboptimal solutions.
Step 2: Evaluation of Fitness
The main function of fitness is to measure the obtained object, it is optimized by
applying a genetic process and evaluation of each solution to identify whether it will
assist in the solutions of the second generation. In the process of evaluation for each
chromosome the value is set to return to its full fitness, depending to its proximity to
solve the problem. It is perfectly designed because it selects the individual which
produces and creates the next generation of population.
Step3: New Population
It is composed of the following processes: selection, crossover, mutation and
acceptance called elitism.
A) Selection
Two parents of chromosomes are selected from a population according to their
fitness, thus better fitness of the parent will lead the bigger chance of selection. The
function of evaluation mainly controls the selection of individuals for the coming
32
generation to reproduce or to live. Chromosomes with a greater fitness value are more
likely to reproduce offspring.
B) Crossover
As the offspring is the product of parent chromosomes, the crossover is the
formation of a new chromosome through the combination of two portions of a two good
parent chromosome. It consists of a combination of genes, including configuration.
A crossover point in the parent chromosomes is randomly chosen. Then, the
two different portions of each chromosome are swapped with other portion of
chromosomes to form two new chromosomes. There are many types of crossover, the
typical type of crossover in binary representation is the one-point and the two-point
crossover. In almost all types of crossover operators are picked from the mating pool at
random and some portions of the strings are exchanged between the strings to create
two new strings.
C) Mutation
Mutation is a mandatory operation in GA, it is the most fundamental way to
modify a solution for the next generation. Mutation takes place by changing the value of
allele randomly to a slight change. It improves the general performance of
chromosomes and protects the searching process of premature convergence. It also
keeps the diversity in the population.
The mutation point is randomly chosen and the allele associated with the
mutation point is changed. Not all alleles are mutated but depend on the mutation
probability. The mutation operation alters the strings to hopefully create a better string.
Since this operation is stochastically performed, the claim is not guaranteed.
D) Elitism
Elitism is the mechanism that maintains a number of the best solutions in GA. It
can be done in many different ways, it can be introduced in a simple mechanism in the
steady state, genetic operators are used to create two offsprings, they are then compared
to their parents, and the best two are selected among the four solutions as the next
33
generation. Elitism can be applied universally in the generation sense, in this case when
the offspring population is formed it is combined with the current population and the
next generation is selected from the best solutions.
Step4: Replacement
Applying new solutions (population) to run more of the process of Genetic
Algorithm.
Step 5: Testing and Termination
Solutions are examined, if they satisfied the end of the case, as the fitness value
of the best solution is met or the maximum number of generations is achieved, the
process is terminated and the best solutions has been returned in to the current
population.
Step 6: Looping of GA
Genetic algorithm performance is affected by operators’ crossover and mutation.
If the solutions of the new generation produced one output that is equal or close to the
required answer, so the issue is resolved. If the output was not equal to the required
answer, then the same process will be repeated for the next generation as their parents
did until a solution is found.
Figure 2.9: A diagram for a The steps of general Genetic Algorithm
34
2.3 Previous Studies
2.3.1 Previous Analysis of Methods in Steganography
Many methods are proposed, one of these methods is Generalizations of the
Pixel Value Differencing (PVD) it is (Jen-Chang Liu, Ming-Hong Shih,2008) method
for data hiding in gray level. This method compares between Steganography and
cryptographic.
Jen-Chang Liu [18] increase the capacity by proposing two extensions
of the PVD method, the first approach is the block based and the second
approach is the Haar-based. For the first one he divided cover image into
square blocks of n-pixels and a noverlapping horizontal blocks. For the second
one, the cover image is decomposed by applying the 2-D integer Haar wavelet.
It was found that by using this proposed method, the data hiding
capacity was significantly increased but on expense of the quality of the stego-
image. The Retention of Secrecy (RS) diagram and difference histogram has
approved the security of these proposed methods.
Souvik Bhattacharyya et al. [53] discussed that lately more concerns
about confidentiality of information on the internet have increased due illegal
information access and because the generalized environment of the internet
became more hostile, so a steganography has become a wide field of research
trying to fulfill more immunity for hidden data.
A new method of information hiding in a text by inserting extra blank
space between the words of odd or even size according to the embedding
sequence and also in some cases the blank spaces in between the words of the
original cover text may be used for mapping each two bit of the embedding
sequences.
In the proposed system the secret message is first encoded using the
proposed encrypting algorithm. The encrypted message embedded in the cover
text using the proposed embedding algorithm to form the stego text.
35
At the receiver side, the extraction process starts by extracting the
encoded form of the message. After extraction the encrypted form of message
goes through the decryption process and finally authenticity of the message has
been checked through integer.
These results show the capabilities of secure transfer of the message
compared to earlier techniques with the addition of authenticity checking of the
secret information.
Christian Grothoff et al. [8] studied the systems that steganographically embed
information in the “noise” created by automatic translation of natural language
documents. They focus on two problems– generation of plausible steganographic texts
and avoiding transmission of the original source for stego objects.
The key idea behind translation-based steganography is to hide information in
the noise that invariably occurs in natural language translation.
The new proposes is a protocol for covert message transfer in natural language
text, for which have been a proof-of-concept implementation.
The new steganographic protocol is assumed that the sender and receiver have
previously agreed on a shared secret key. In order to send a message, the sender first
needs to obtain an original text in some source language.
The experimental results revealed that effects of different configurations of the
system produce translations of varying quality, but even quality degradation is not
predictable. This idea is made more difficult by the fact that the translation is
transmitted with no reference to the source text.
It was demonstrated that the variations produced by the stenographic encoding
are similar to those of various unmodified machine translation systems, showing that it
would be impractical for an adversary to establish the existence of a hidden message.
Till now, it has been found that in this modern stegnographic method, the
highest bitrate that our prototype achieved is about 0.33%.
36
Jessica Fridrich et al. [19] made a theoretical study for analyzing a newly
proposed algorithm called Highly Undetectable steGO (HUGO) as part of the Break
Our Steganographic System (BOSS) challenge, to signify the characteristics that are
able to detect the hidden payload by applying these schemes and to get a better picture
concerning the benefit of adaptive steganography with general selection routs.
It is mainly meant to improve the ability of detecting adaptive stegnography as
HUGO stegnography which makes hidden changes in hard-to-model areas of the cover.
HUGO is characterized by preservation of high-dimensional feature vector and so it put
into consideration a great amount of complex dependencies among surrounding pixels.
HUGO can be applied on different domains and medias inspired it was designed
for pictures in raster format.
As a summary, it was not possible to apply the fact that for HUGO the ability of
hiding changes at each pixel can be estimated, and giving the Warden probabilistic
information concerning the chosen channel doesn’t seem to be a weakness. Also the
steganalysis needs to apply high-dimensional features and scalable machine learning as
the level of sophistication of steganographic schemes increases.
Most of the features in this high dimensional feature vector are uninformative
and preserving them will weaken the algorithm. Instead, adding more diverse features
will lead to increase the dimensionality.
Eiji Kawaguchi and Richard O. Eason [12] proposed a new steganography using
the image as the vessel data, and they embedded secret information in the bit-planes of
the vessel. Also, they compared between watermarking and Bit-Plane Complexity
Segmentation (BPCS) Steganography in two fundamental ways.
Their experiments with BMP images have shown capacities exceeding 50% of
the original image size.
T. Morkel et al. [31] published a paper offering the overview of the different
algorithms used for image steganography to illustrate the security potential of
steganography for business and personal use. The reflection is based on a set of criteria
37
that have been identified for image steganography. He found that there is a large
selection of approaches to hiding information in images.
All the major image file formats have different methods of hiding messages,
with different strong and weak points respectively. Where one technique lacks in
payload capacity, the other lacks in robustness.
Least Significant Bit (LSB) in both BMP and GIF makes up for this, but both
approaches result in suspicious files that increase the probability of detection when in
the presence of a warden.
Thus for an agent to decide on which steganographic algorithm to use, he would
have to decide on the type of application he want to use the algorithm for and if he is
willing to compromise on some features to ensure the security of others.
Hedieh Sajedi and Mansour Jamzad [41] proposed an adaptive steganography
technique that can effectively defend against the most famous steganalysis algorithms.
His idea is built on embedding secret data in contoured coefficients via an iterative
embedding procedure to reduce the stereo image distortion.
Embedding is done by changing the coefficient values proportional to the
regions in which the coefficients reside and hidden data can be retrieved with zero-bit
error rates.
The results showed that the state-of-the-art stephanotis methods cannot be
reliably discriminated between clean and stereo images produced by our method.
Through experimentation illustrated that the cover selection measures improve the
undependability of stereo images by the straphanger.
Image complexity criteria are very prompt measures to select a proper cover
image from the database, but they are not very precise. In contrast, exact measures are
relaxed but introduce the best cover image with respect to the confidential data. It also
indicated that the amount of changes and visual quality measures are reliable criteria for
cover selection.
38
Finally, the results demonstrated that by using cover selection one can embed
much more bits in a suitable cover image.
Babloo Saha and Shuchi Sharma [37] made a theoretical study giving a thorough
understanding and evolution of different existing digital image steganography
techniques of data hiding in spatial, transformation and compression domains.
The study covered and integrated recent research work without going into much
detail of steganalysis, which is the art and science of defeating steganography.
They showed the recent research work in the field of oceanography deployed in
spatial, transform, and compression domains of digital images. Transform domain
techniques make changes in the frequency coefficients instead of manipulating the
image pixels directly, thus distortion is maintained at a minimum level and that’s why
they are preferred over spatial domain techniques.
They found that hiding more data results directly into more distortion of the
image. So the steganography technique deployed is dependent on the type of application
it is designed for. They also found that steganography can be misused like other
technologies.
For instance, terrorists may use this technique for their confidence, secure
communication or anti-virus systems can be fooled if viruses are transmitted in this
way. It is evident that steganography has numerous useful applications and will remain
the point of attraction for researchers.
Cheng-Hsing Yang et al. [38] proposed a novel, an efficient steganographic
method, which embeds a large amount of data and takes human vision into
consideration. Hiding data in gray-level images.
The experimental results showed that not only the method has a larger capacity
and can pass the detection of programs, but also the embedded data are totally
imperceptible from the human’s eyes upon the higher. Peak Signal to Noise Ratio PSNR
measures in experiments.
39
Table2.2: Summary of studies on steganography
Author Applications studies The result
Jen-Chang
Liu_, Ming-
Hong Shih,
(2008)
Method for data hiding in gray
level, proposing two extensions of
the PVD method.
The data hiding capacity was significantly
increased but on expense of the quality of
the stego-image.
Souvik
Bhattacharyya
et al. (2010)
Hiding in a text by inserting extra
blank space between the words of
odd or even size according to the
embedding sequence.
The capabilities of secure transfer of the
message compared to earlier techniques
with the addition of authenticity checking
of the secret information.
Christian
Grothoff
et al. (2009)
Studied the systems that steganographically embed
information in the “noise” created
by automatic translation of natural
language documents.
Revealing that effects of different configurations of the system produce
translations of varying quality, but even
quality degradation is not predictable.
Jessica Fridrich
et al. (2011)
Analyzing a new proposed
algorithm called (HUGO)
It was not possible to apply the fact that for
HUGO the ability of hiding changes at
each pixel can be estimated, and giving the Warden probabilistic information
concerning.
Eiji Kawaguchi
and Richard O.
Eason (2007)
Proposing a new steganography
using the image as the vessel data,
and they embedded secret
information in the bit-planes of the
vessel.
The experiments with BMP images have
shown capacities exceeding 50% of the
original image size.
T. Morkel et
al. (2005)
Offering the overview of the
different algorithms used for image
steganography to illustrate the
security potential of steganography
for business and personal use.
Used the algorithm for and if he is willing
to compromise on some features to ensure
the security of others.
Hedieh Sajedi
and Mansour
Jamzad (2010)
Proposing an adaptive
steganography technique that can
effectively defend against the most
famous steganalysis algorithms.
Demonstrated that by using cover selection
one can embed much more bits in a
suitable cover image.
Babloo Saha
and Shuchi
Sharma (2012)
Giving a thorough understanding
and evolution of different existing
digital image steganography
techniques of data hiding in spatial,
transformation and compression
domains.
They found that hiding more data results
directly into more distortion of the image.
They also found that steganography can be
misused like other technologies.
Cheng-Hsing
Yang
et al. (2011)
Proposing a novel, an efficient
steganographic method.
The method has a larger capacity and can
pass the detection of programs, also the
embedded data are totally imperceptible
from the human’s eyes upon the higher
PSNR measures in experiments.
40
2.3 Previous Survey of Genetic Algorithm Applications
Genetic Algorithm (GA) idea was born since 1975, when discovered by
Holland, (GA) is basically a mathematical expressions and logic algorithm, based on the
concept of natural genetics.
Here are some of the theoretical and practical applications that are using the idea
of a genetic algorithm in the researches.
Komal R. Hole1 and Prof. Vijay S et al. [43] proposed Theoretical study gives a
brief overview of the canonical genetic algorithm and reviews the tasks of image pre-
processing. The main task is to enhance image quality with respect to get a required
image perception. They introduced various approaches based on genetic algorithm to
get image with good and natural contrast.
It includes the definition of image enhancement and image segmentation and
also the need of Image Enhancement and the image can be enhanced using the Genetic
Algorithm and the Image Segmentation using Genetic Algorithm.
Rajesh Kumar et al. [50] compared the normal techniques of image fusion with
genetic algorithms based techniques. The results were that the image techniques GA
basis of much better results compared with traditional techniques. The experimental
results that the plans for image fusion based GA better performance than existing
schemes.
Raj Kumar Mohanta1[49] reviewed the applications of genetic algorithms for
image segmentation. It is a difficult task in the photos and the subsequent tasks
including object detection, feature extraction and processing, and to identify the faces
and classification depends on the quality of the segmentation process.
The results indicate that: Genetic algorithms have many advantages in obtaining
the optimal solution. It has been shown that the best way stronger in a large area. A
Genetic algorithm that allows for strong performance. Optimal result depends on the
encoding and the involvement of chromosome genetic system operators, as well as to
the fitness function.
41
P. Surekha1, S. Sumathi2 [47] proposed new method to improve digital images
in a Separate Wavelet Transform (DWT) domain. The tradeoff between transparency
and durability as an optimization problem and solved through the application of genetic
algorithms.
A series of experiments were performed by varying several parameters in GA,
like number of generations, population size, crossover probability, and mutation
probability.
The experimental results of this approach are proving to be safe and strong
attacks filtering, additive noise, rotation, scaling, cropping and compression Joint
Photographic Experts Group JPEG. Peak Signal to Noise Ratio (PSNR), Mean Square
Error (MSE), and is evaluating computational time for group photos.
Mantas Paulinas and Andrius Ušinskas [23] survey for GA use, it is constantly
gaining popularity in image processing. Various tasks from basic image contrast and
level of detail enhancement, with complex filter model parameters are solved using this
paradigm. The algorithm provides an opportunity to perform a robust search without
trapping in local extremes.
Different authors adopt GAs to solve a very big variety of simple and difficult
tasks. Every approach is unique, with different information encoding types,
reproduction and selection schemes. The success of optimization strongly depends on
the chosen chromosome encoding scheme, crossover and mutation strategies as well as
a fitness function. For each problem, careful analysis must be done and the correct
approach chosen.
K.F.Man, et al [20] discussed proposed theoretical studies that it predicted for
GA in the field of computer. |proved that genetic algorithms are the most powerful
unbiased optimization techniques for sampling a large solution space. Because of
unbiased stochastic sampling,
There are large classes of problems that appear to be more susceptible to resolve
the GA by any additional available optimization techniques. Like the applications in
real time, also the systems on the Internet. Perhaps the most promising areas of
42
application are systems in AI- imminent hybrid. The use of GA with Neural Networks
(NN) and fuzzy logic.
T. R. Gopala Krishnan Nair and Suma V, Manas S [32] proposed a
steganography method using genetic algorithm to protect against The Retention of
Secrecy (RS) attacks in color images. With the Implementation of the natural evolution
of the stego image using genetic algorithm enables to achieve Optimized security and
image quality.
The RS analysis is one of the strongest steganalysis, which detects the secret
message by the statistical analysis of pixel values. The objective of this proposal is to
establish a highly RS-resistant secure model with steganography. The method uses the
genetic algorithm. It enables to achieve security and enhance image quality. The
method, the pixel values of the stego image are modified by the genetic algorithm to
retain their statistical characteristics.
It is difficult to detect the existence of the secret message by the RS analysis,
implementation of this approach enhances the visual quality of the stego image.
Nevertheless, as the length of the secret message increases, the probability of detection
of secret message by RS analysis also increases.
The study makes a survey on methods in steganography. Like Least Significant
Bit (LSB) replacement steganography study, which Replacement method, the least
significant bit of the pixel values is replaced with the bit values of the Message.
Shen Wang, Bian Yang and Xiamu Niu [52] presented a new steganography
based on genetic algorithm. It is embedding the secret message in Least Significant Bit
(LSB) of the cover image, the pixel values of the steg image are modeled by the genetic
algorithm to keep their statistic characters.
Thus, the existence of the secret message is hard to be detected by the Retention
of Secrecy (RS) analysis. Meanwhile, better visual quality can be achieved.
The experimental results demonstrate the proposed algorithm's activeness in
resistance to steganalysis with better visual quality. The embedding capacity is 90%.
43
Amrita Khamrui Enrolled Scholar [5] authenticated the image to ensure the
security against the Retention of Secrecy (RS) analysis which is the most notable
steganalysis algorithm. It detects the steg message by the statistical analysis of pixel
values.
The cover image can be either grayscale or color. The cover and authenticating
images are both benchmark images. It is clear that the proposed techniques obtained
high PSNR ratio along with better image fidelity for various images. The payload may
be increased based on the requirement.
A comparative study of Peak Signal to Noise Ratio (PSNR) has been made
between various techniques. It has been that PSNR is in between 35 to 50 (approx)
which is satisfied as the better value of PSNR improves image quality. It has also been
noticed that PSNR is gradually increasing which indicates an improvement of different
techniques.
Elham Ghasemi et al. [13] proposed method embeds the message in Discrete
wavelet transform (DWT) coefficients based on GA and The Optimal Pixel Adjustment
Process (OPAP) algorithm and then applies to the obtained embedded image. It
introduces a novel steganography technique to increase the capacity and the
imperceptibility of the image after embedding.
GA employed to obtain an optimal mapping function to lessen the error
difference between the cover and the stego image and use the block mapping method to
preserve the local image properties.
Also, applied the OPAP to increase the holding capacity of the algorithm in
comparison to other systems. However, the computational complexity of the new
algorithm is high. The simulation results showed that capacity and imperceptibility of
the image had increased simultaneously.
Santi P. Maity, and Malay K. Kundu [29] investigated the scope of usage of GA
for optimality of data hiding in digital images and proposed two algorithms. The first
algorithm proposes data hiding method with improved payload capacity intended for
covert communication. Decoding reliability is improved with the increase of the number
44
of iterations in GA when the set of parameter values is fixed. The algorithm is proven to
be secured against stego test based on higher order statistics. The second method
proposes an invisible image-in-image communication through a noisy channel where
linear, power-law and parabolic functions are used to modulate the auxiliary messages.
The experimental results show that parabolic function offers higher visual and
statistical invisibility and reasonably good robustness, whereas, linear function offers a
higher robustness with reasonably good Invisibility.
Christine K. Mulunda [9] proposed a secure text Steganography algorithm based
on the genetic method. A Genetic algorithm technique is not prone to visual attacks
because of its use of numbers.
This is not the case for Format-Based technique that deals with modifications of
existing text in order to hide the steganographic text by resizing of fonts, insertion of
spaces or non-displayed characters, deliberate misspellings distributed throughout the
text and resizing the fonts, among others. The Experimental results showed that this
approach works, achieving effective optimization, security, and robustness.
Table2.3(A): Summary of studies on Genetic Algorithm
Author Applications studies The result
Miss. Komal R.
Hole1 and Prof.
Vijay S et al.
(2013)
Giving a brief overview of the
canonical genetic algorithm and
reviewing the tasks of image pre-
processing.
The need of Image Enhancement and the
genetic algorithm mage can be enhanced
using the Genetic Algorithm.
Rajesh Kumar et
al. (2011)
Comparing the normal techniques of
image fusion with genetic algorithms
based techniques.
The plans for image fusion based GA
better performance than existing schemes.
Raj Kumar
Mohanta1(2012)
Reviewing the applications of genetic
algorithms for image segmentation.
Genetic algorithms have many advantages
in obtaining the optimal solution. Optimal result depends on the encoding and the
involvement of chromosome genetic
system operators.
[47] P.Surekha1
and S. Sumathi2
(2011)
Proposing new method to improve
digital images in a Separate Wavelet
Transform (DWT) domain.
Proving to be safe and strong attacks
filtering, additive noise, rotation, scaling,
cropping and compression JPEG. Peak
Signal to Noise Ratio (PSNR), Mean Square Error (MSE), and is evaluating
computational time for group photos.
45
Table2.3(B): Summary of studies on Genetic Algorithm
2.3.3 Previous Review to HTML Web Page’s Steganography
There has been tremendous research in the field of HTML web pages’
steganography. Some of the data steganography works are listed below.
L.Polak [22] proposed a method of detecting and removing a hidden content
which could be transmitted through the HTML code of World Wide Web WWW pages.
In this method, the procedure which allows monitoring the changes of web pages’
structure and introduced the measure which describes how the attributes of tags are
Author Applications studies The result
T. R. Gopalakrishnan
Nair and Suma
V, Manas S
(2012)
Proposing a steganography method using genetic algorithm to protect
against The Retention of Secrecy (RS)
attacks in color images.
It is difficult to detect the existence of the secret message by the RS analysis,
implementation of this approach enhances
the visual quality of the stego image. The
length of the secret message increases, the
probability of detection of secret message
by RS analysis also increases.
Shen Wang,
Bian Yang and
Xiamu
Niu(2010)
Presenting a new steganography based
on genetic algorithm.
Demonstrate the proposed algorithm's
activeness in resistance to steganalysis
with better visual quality. The embedding
capacity is 90%.
Amrita
Khamrui
Enrolled
Scholar (2014)
Authenticated the image to ensure the
security against the Retention of
Secrecy (RS) analysis.
Comparative study of Peak Signal to
Noise Ratio (PSNR) has been made
between various techniques. It has been
that PSNR is in between 35 to 50 (approx)
which is satisfied as the better value of
PSNR improves image quality. It has been
noticed that PSNR is gradually increasing
which indicates an improvement of
different techniques.
Elham Ghasemi
et al .(2012)
Proposing method embeds the message
in(DWT) coefficients based on GA and the Optimal Pixel Adjustment Process
(OPAP) algorithm and then applies to
the obtained embedded image.
The simulation results showed that
capacity and imperceptibility of the image had increased simultaneously.
Santi P. Maity,
and Malay K.
Kundu (2008)
Proposed two algorithms to
investigated the scope of usage of GA
for optimality of data hiding in digital
images.
The parabolic function offers higher visual
and statistical invisibility and reasonably
good robustness; the linear function offers
a higher robustness with reasonably good
Invisibility.
Christine K.
Mulunda(2013)
Proposing a secure text Steganography
algorithm based on the genetic method.
Achieving effective optimization, security,
and robustness.
46
ordered in Hyper Text Markup Language / EXtensible Markup Language
(HTML/XML) documents. The method can be applied to control web pages and to
assure that nobody could exploit them as stego channels. (L.Polak, 2010)
Chintan Dhanani [6] her proposer Try to hide hexadecimal data in the HTML
document to overcome the problems of limited amount of hiding places & increase in
size of the document so one hiding place can hide equal to 4 bits. The paper made a
survey includes the classification of steganography techniques and techniques that
already was implemented to hide information in web documents.
He saw that data hidden in the web document is less suspicious in comparison of
other carriers because HTML web pages are now a routine part of everyone’s life and
HTML document contains the considerable number of tags, attributes & other elements
in which data can be hidden.
Mohit et al. [45] proposed technique used the HTML tags and their attributes to
hide the secret message. In this method, the messages by changing the order of
attributes as the ordering of attributes have no impact on the appearance of the HTML
documents. The key file is essentially a collection of key combinations stored in the
form of rows and columns.
These combinations are generated by thorough scanning of the html documents.
The attributes combinations used in the HTML tags are utilized to generate a key file.
2.2.4 Comparative Studies with Other Related Methods
In last years, many good methods in steganography have been proposed. HTML
steganography is one part of data hiding which uses HTML web document as a cover.
To use HTML has some benefits like large amount of pages available to hide
data, decoding of that data by any unauthorized user is very hard. HTML steganography
methods such as using of null spaces, attribute on tag, attribute value enclosures, case of
characters, evaluating the performance of the algorithm by a Largest Embedding
Capacity (LEC)& law security. ( Dhammjyoti V. Dhawase, 2014)
Here is comparison between some methods that use web page as a cover.
47
Comparison is made according to the experimental results such as a strong anti-testing
capability, strong security capability, good imperceptible and larger embedded capacity.
Mohit Garg et al, [45] proposed a novel approach of text steganography that uses
the HTML tags and attributes to hide the secret messages. HTML tag contains
numerous amount of attributes & attribute order in the tag does not affects the output of
the web pages. So data can be hidden using attribute order.
Advantage: This method has high security technique to hide messages.
Disadvantage: This method can be used when only a small amount of data needs to be
concealed.
Shingo Inoue, et al [30] using null space or white space or invisible character,
proposed some methods for hiding data into XML document. These methods can be
applied to existing XML documents easily.
Advantage: These methods can be applied to existing XML documents easily.
disadvantage: This method has weak security technique.
There are some conditions to apply this method.
1. No dependence of the order of the elements in the application.
2. No reorder of the elements before extracting the secret data.
Xin-Guang [33] gives basic idea with modify the written state (case) of letters This
method achieves the aim of hiding secret information in hypertext by modifying the
written state of the mark-up letters.
Advantage: This proposed system provides an efficient method for hiding the data from
hackers and sending hidden data to the destination in a safe manner without changing
the size of the file even after encoding.
Disadvantage: The Detection and Security of this system is very weak. As hidden data
can easily be detected and attacked.
48
Mohammad Shirali Shahreza, [44] The main idea in this method is to hide coded
data in the ID attribute of the HTML document tags. Colour code or tag id replacement
with Hexadecimal data.
Advantage: this method achieving effective, security, and robustness this approach
works to increase a high Embedding Capacity & strong security of data.
Disadvantage: Embedding Capacity is medium and the file Size is Change.
Chintan Dhanani, [6] using a relative links & multi web page embedment
technology to transfer from one web page to another web page. It divides message in
more than one parts & embed it in different pages.
Advantages: this approach works to increasing a high Embedding Capacity & strong
security of data.
disadvantages: Applicable for text files only.
The comparison between the methods shows very low in embedding capacity,
although the imperceptible is good in all method. As mentioned in table 2.4
Table 2.4: Compare between some methods used web page
Method Imperceptible Embedding
Capacity
Change Size Detection Security
using Null Space or
White space or
Invisible character
GGoooodd LLooww YYeess WWeeaakk WWeeaakk
Modify the written
state (case) of letters
GGoooodd LLooww NNoo WWeeaakk WWeeaakk
changing order of
Attributes
GGoooodd LLooww NNoo YYeess SSttrroonngg
Method/Parameter Tag
displacement
GGoooodd LLooww YYeess YYeess SSttrroonngg
Colour code or tag id
replacement with
Hexadecimal data
GGoooodd MMeeddiiuumm YYeess YYeess AAvveerraaggee
using a relative links &
multi web page
embedment technology
GGoooodd HHiigghh NNoo YYeess SSttrroonngg
49
2.4 Chapter Summary
The Steganography approaches and the fundamental mechanisms to hide data
have been studied. In this chapter fundamental objectives of computer security and
steganography and have been discussed because they are the most important aspects
of information security.
The main goal of steganography has been studied, and comparison have been held
between cryptography and steganography. Types of steganography approaches have
also been discussed, then the main steps of genetic algorithms have been studied as part
of evolutionary algorithms.
Performance analysis of methods in steganography literature have been
discussed. Literature survey of genetic algorithm applications has been reviewed and
some of the theoretical and practical applications that are using the idea of a genetic
algorithm in the research and the experimental results for this researches has been
discussed. Finally, they have been compared to other studies related to other methods of
security.
50
CHAPTER 3
RESEARCH PROCEDURES
3.1 Introduction
This chapter presents the computational model of HTML stegno system
architecture (StegnoTag). This computational model accomplishes the abstract
specification and design criteria that have been set.
For each agent, an algorithm has been proposed, this algorithm should be
implemented in order to achieve the desired design features and integration of GA in
steganography system as proposed in this work.
The expressive details of the most important stage of the solving problem have
been presented practically. The details are given in different sections; they are
structured to gradually follow the abstract model. Each of the subsequent sections
describes different algorithms that have been built upon the basic mechanisms of the
proposed.
3.2 Hiding Data in The Stored Web Page
Internet have rapidly developed and expanded in recent the years, with increased
rate of information exchange and turnover, handling of such a huge amount of
information raised the need for security and confidentiality, especially those dealing
with database systems such as email, e-commerce, medical records, billing information
and other applications database is considered as the main store for internet components
and information, it may also represent the infrastructure for websites.
Because of the value and the sensitivity of the information included within
databases, it is vulnerable to hacking attacks. Protection of data became a must to secure
valuable information included within databases.
51
Security consideration is required to be given to the database itself, beside the
surrounding environment, including the underlying applications and internet system.
.
Figure 3.1: The criteria to protection database on internet environment
3.3 Security Definition on The Project
For the hiding data system in the stored web page to function properly, it needs
some elements far accurate information processing like security control applied for data
protection.
Hwee Hwa Pang [16] published a paper about Steganographic Schemes, his
paper introduces StegFD, a steganographic file driver that securely hides user-selected
files in a file system so that, without the corresponding access keys, an attacker would
not be able to deduce their existence. (HweeHwa Pang, JUNE 2004).
The proposed system has got four security techniques, the first technique creates
the database by XML, this language does not need a local server to save a database, the
second technique is encrypting data with a key to increase level of protection, the third
technique is hiding database in HTML documents by using a genetic algorithm and the
last technique is extracting database from a cover HTML document.
Hiding database is one of the most difficult techniques, because the database
must be dynamically updated, it also has got different attributes.
Database
Security
Survivability Visibility
Attacker
52
Figure 3.2: The four security techniques
3.4 The Proposed Technique
The proposed technique uses the HTML tags and their attributes to hide database
illogically using genetic algorithm. It is based on the fact that the ordering of the
attributes in the HTML tags has no impact on the appearance of the document.
This ordering can be used to hide the data efficiently. The proposed technique
considered that any tag represents gene and an attribute represents chromosome.
Figure 3.3: Tag characteristic on proposed technique
Whenever the page is larger and with more attributes, it is better for the
attributes, and can hide more bits within each attribute. In proposal a relation table have
been created, it consists of a primary data of two columns, each row consists of two
chromosomes represent gene.
The gene contains a set of properties. Hexadecimal encode have been chosen in
the analysis of attributes conversion, to overcome the problems of capacity and limited
amount of data for hiding.
Create database Encryption database Hide database Extract database
XML
Chromosome1 Chromosome2
Attribute1 Attribute2
Gene
Tag
53
In this encoding chromosome are represented using Hexadecimal numbers (0-9,
A-F) so the gene can hide 8 bits on the project, each chromosome in the row is supposed
to hide 4 bits.
The relation table stores the primary tag, from which it is supposed to start and
then make a random search, the algorithm method examines each chromosome of each
HTML (attribute), to examine the existence of the chromosome in the primary field of
the key file.
If the chromosome exists in the primary field, the algorithm will search its
corresponding secondary chromosome in the corresponding HTML tag, if it found
secondary chromosome, then this combination of chromosomes will be used to hide the
bit. If not, then the algorithm will skip this chromosome.
Hiding of a bit is determined by the order of the attributes in the attribute
combination. If the primary attribute is followed by a secondary attribute, it can hide bit
‘1’ in hexadecimal number, if not it can hide bit ‘0’ in hexadecimal number. A genetic
algorithm can be applied in this step.
The extraction of data from the cover page by identifying the first chromosome
combinations that hides a bits and then finding the bits that correspond to the order of
those attributes. If the primary chromosome is located before the secondary
chromosome, then the algorithm will hide bit ‘1’ in hexadecimal number. If not, they
will hide bit ‘0’ in hexadecimal number.
3.5 The Philosophy of Genetic Algorithm Application
Genetic algorithm is mainly based on setting of a group of solutions by random
natural search, then put the possibilities of the solutions (good hiding) and determines
the best solution by its fitness value. So there should be a way to specify how good that
solution is. Chromosomes (attributes) represent the solutions within the genetic
algorithm. The two basic components of chromosomes are the coded solution and its
fitness value.
54
Population represents set of chromosomes (attributes), during the process of
hiding, the genetic algorithm selects a chromosome from (tags) which represents the
population, then the genetic algorithm specifies the fitness value of the chromosome and
to produce new chromosomes called offspring. The fitness is indicated by the good
solution and the proximity of the chromosome to the optimal solution.
The aim of this proposal is to analyze a web page and then find out or create the
number of the attributes that must be taken, so that data can be hidden without
exceeding the appropriate number of created fractures which is identified by the fitness
function. Also the proposer for reduce the number of genes that will be dealt through a
genetic algorithm in order to reduce the time to hide data through the primary tag
(relation table) and account by a fitness function.
The offspring are the new population, which replace some of the chromosomes
in the existing population. The population selects the worst and the best chromosomes
and stores additional statistical information to determine the stopping criteria (hide all
data). The following are the proposed steps of the genetic algorithm:
1. Selection:
It is mainly based on the selection of pair of individuals through the roulette
wheel collection of fitness value of individuals, it starts with a primary tag (relation
table) which will be hidden within the first field of the table, and then it will be more
accurate in the selection of the second attribute, after that it will change from the sequence
properties to random properties, and this does not affect the page.
2. Fitness Function:
It is a function which assess the chromosomes of the current generation, to select
the best chromosomes in order to select the offspring, there are some factors affects a
fitness function for example The number of attributes.
3. Generation Evaluation:
The overall assessment of the generation on the basis fitness as is judged on the
development of future generations the value of function. By the evaluation of the
55
fitness function for each chromosome, put the percentage fitness for each
chromosome, and chose a parents of the next generation by the roulette wheel and
random selection. To calculate the fitness function, it must be the attributes number
and size does not exceed the capacity of Page.
nr=the number of the attributes.
pa=the capacity of the attributes on page.
Given space P of candidate solutions to a problem, fitness function f (p) for P
measures the quality of a solution P. The quality of a solution P may not different
smoothly as the genes comprising P vary since the genetic operators such as crossover
and mutation.
f(p) = (nr*pa)
So f(p) = ∑i(nr*pa ) < capacity of the page from tags
4. Crossover:
The two attributes which are basically chromosomes will be selected randomly
and the crossing point between the two attributes will be selected for the crossover,
and finally switch codes following the transit between the spouses site can select the
point or several standardized points.
5. Mutation:
Which is to prevent the minor end of the program will be through either when
the chromosome 1 exchange and vice versa 0 or choose random locations to be
changed 0 to 1 and vice versa.
6. Replacement:
These is replaced when forming the two attributes of two chromosomes parents
of four is no longer new to the population, but according to the two approaches are
switching to maintain generation.
56
7. Stop:
when complete the process of hiding database.
3.6 Creation Database and Generation of An Encryption Key
The proposal technique has implemented in C#.net language. Its content consists:
1. Creating database
2. Generating an encryption key
3. Hiding the data
4. Extracting the data
3.6.1 Creating Database
XML is very simple data store. It will occupy very less memory almost like a
text file.it can utilize XML features by easily managing the data. XML will reduce the
programming burden by its simplicity. (David Hunter et.al,2007)
XML doesn't need to install and maintenance any database engine. XML data
provide simple access using the power of ActiveX Data Objects(ADO.NET) Dataset.
It can be shared across the Web. XML documents can be stored without schemas
because they contain Meta data. Any XML tag can process an unlimited number of
attributes such as name or password.
3.6.2 Generating Encryption Key:
Encryption is a method of security that turns all kind of information into
unreadable cipher by doing a set of algorithms. These algorithms carry the data into
streams or blocks of seemingly random alphanumeric characters. (Nigel Smart.1997)
An encryption key might decrypt, or perform both functions, depending on the
type of encryption software being used.
There are a lot of types of encryption schemes, but not all are secure. Simple
algorithms can be easily broken using modern computer power, and yet another point of
57
weakness lies in the decryption method. Even the most secure algorithms will decrypt
for anyone who holds the password or key.
To creating and managing keys is an important part of the cryptographic
process. The key must be kept secret from anyone who should not decrypt data.
A simple way on C# has been used which has a special library (using
System.Security.Cryptography). Encrypt secret database using cipher encryption
mechanism and convert the data in binary format.
textBoxEncrypted.Text = creatdatxml.Encrypt.EncryptString (textBox4.Text, textBoxPassword.Text);
When a given password is encrypted, that password will always generate the
array of 24 bytes.
3.6.3 Hiding the Data:
To hide a data on HTML document, first step convert data in the binary number
in terms of bit. Then search the HTML document by scan the attributes combinations
that can be used to hide a bit as shown in Figure 3.4. Convert each attribute to a
chromosome by use library Geneticalgorithm.dll.
If first attribute exists in the primary attribute field of the key file.
Corresponding secondary attribute is searched in the corresponding HTML tag. If the
attribute found, then this couple of attributes is used to hide a bit by make crossover
with the name and value. Then make search randomly by roulette wheel selection.
On genetic algorithm, each couple of chromosomes invent one time, if the
chromosome appears more than one time each time equals one bit for example <hr />
appear 4 times then =16 bit.
if (!ICrossover.Handled)
{
//the attribute is not a primary key attribute. Is it a secondary key attribute?
58
bool copyAttribute = false;
rows = keyTable.Select(String.Format("secondAttribute = '{0:x2}'",
ICrossover.QueryFormattedName));
if (rows.Length > 0)
{
//if the corresponding first attribute does not exist in this tag
//this attribute will not be used and must be copied.
HtmlAttribute firstAttribute = FindAttribute(rows[0]["firstAttribute"].ToString(),
tag.Attributes);
Suppose H≡HTML code contains a set of gene (G1, G2, G3,…,Gn) Each gene has
a couple of chromosomes (c1, c2) the tags≡ Gn contains attributes (a1 , a2) let us denote
that a1 precede a2 in a particular tag occurrence by Gn(c1, c2) = (c1, c2) which means they
consist of an ordered pair of
attributes ≡ chromosomes.
Pn ≡ pair of chromosomes. In that case Pn,i(c1, c2) =1 if Gn,i(c1, c2) = (c1, c2)
otherwise Pn,i(c2, c1) =0, i defines the specific tags occurrence.
On≡ the more crossover order of the couples of attributes in a particular tag.
So On(cx,cy) = (cx,cy) if ∑I Pn,i(cx,cy) = (cx,cy) ≥ Pn,i(cx,cy) = (cx,cy) describes, if for a
specific tag occurrence, the order of chromosomes is compliant with the previously
determined predominant order.
So Rn,i(cx,cy) =1 if Gn,i(cx,cy) On(cx,cy) otherwise Rn,i(cy,cx) =0.
The sum of Rn,i for all pairs of chromosomes determines the number of their
occurrences in the predominant order.
59
Figure 3.4: Flow chart to hide database on HTML document
60
3.6.4 Extracting the Database
The extractor method extracts data from cover page. First enter the key of
encryption if key is true show the data. The stego page (H) document code contains a
set of gene. H≡HTML {G1, G2, G3, …, Gn} Each gene has a couple of chromosomes
(c1, c2) the tags≡ Gn as shown in Figure 3.5.
Figure 3.5: Flow chart to extract database from HTML document
61
Analyze each chromosome of each gene of the stego page by crossover
selection, each consecutive two elements of the chromosome G are considered as a
couple, if this chromosome (attribute) is found in the primary chromosome (attribute)
field of the key file then check if its corresponding secondary chromosome (attribute) is
present in the currently being processed gene.
If yes, then this pair of chromosome hides a bit.
if (!ICrossover.Handled)
{
//attribute has not been used, yet
//find key row for this attribute
rows = keyTable.Select(String.Format("firstAttribute =
{0:x2}'",ICrossover.QueryFormattedName));
if (rows.Length > 0)
{
//find corresponding attribute
secondAttribute =
FindAttribute(rows[0]["secondAttribute"].ToString(), tag.Attributes);
if (secondAttribute != null)
{
attributePosition =
htmlDocument.IndexOf(ICrossover.Name, tag.BeginPosition);
secondAttributePosition =
62
htmlDocument.IndexOf(secondAttribute.Name, tag.BeginPosition);
//compare the attributes' positions
messageByte =
ExtractBit(attributePosition, secondAttributePosition,
messageByte, bitIndex, message);
If the primary chromosome (attribute) is followed by secondary chromosome
(attribute), record a bit 1, else record a bit 0. Mark the chromosomes as processed after
retrieving the bit.
Pn,i (c1, c2)=1 Pn,i (c2, c1)=0
If the chromosome is not found in the primary attribute field of the key file, then
skip this chromosome and move to another chromosome (attribute). On the end convert
the bit stream obtained after the completion of into stream of characters.
3.7 The Architecture Design of the Proposed System
Many researchers and developers of steganography systems have used different
architectures and methodologies to establish safety hiding system for many types of
applications.
The construction of security systems based on steganography system varies
among developers of intrusion detection and prevention systems. Different developer
uses different requirements, architecture and methods.
The architecture of this proposed mainly depends on genetic algorithm as shown
in figure (3.6) this figure also shows the methods by which genetic algorithm as a class
is linked to other classes. The algorithm code contains five main classes, after creating
the GeneticAlgorithm class and providing it with an implementation of IGenomeFactory
class, this method can be applied to create the population of custom Genomes that are
needed to be used in the GA search.
63
Figure 3.6: The architecture of (StegnoTag) system
During the creation of RealGenomeFactory class, the minimum and maximum
values of each genome will be identified and then the genetic algorithm will provide
GenomeFactory class, which is responsible for every aspect of Genome construction, so
that it will be ready for evaluation and crossover with over genomes.
IGenomeFactory class collects methods used to create Genomes. HtmlAttribute
class which is considered as GeneticAlgorithm’s parent, stores HTML tags and their
attributes. Attributes don't have many properties, they only have a name and a value.
64
Each attribute in a tag can be used for only one bit. The program has to mark it
as already handled. An HtmlTag class has a name and a number of attributes. The
constructor searches the tag's text for attributes and their values.
private HtmlAttributeCollection ICrossover;
public HtmlAttributeCollection Attributes {
get { return ICrossover; }
public HtmlTag(String text, int beginPosition, int endPosition) {
this.beginPosition = beginPosition;
this.endPosition = endPosition;
this.ICrossover = new HtmlAttributeCollection();
Also has a fitness method to calculate a fitness value by comparing the value.
HtmlUtility class calculates the key attribute couples in an HTML document with
crossover method according to the rule that the selection operator chooses individuals
with a probability that corresponds to the relative fitness.
Chromosomes with a high fitness value have a great chance of being selected to
generate children for the next generation, two chosen individuals, are called the parents,
also the path and the name of the HTML document, data table with the key attributes
and number of bytes that can be hidden in the specified document.
public int CompareTo(object obj)
{ HtmlAttribute compared=(HtmlAttribute)obj;
if(this.Fitness<compared.Fitness)
return -1;
else if(this.Fitness>compared.Fitness)
return 1;
else
return 0;
}
#endregion
65
internal static void Sort()
{ throw new NotImplementedException(); }
3.8 The Implementation of The Proposed System
Genetic transactions are the basic steps on genetic algorithm, they are steady
steps that vary in style formula, those steps related to each other. The algorithm cannot
be applied on any problem unless specific conditions are available, for example the
components of the problem must be presented in the form of genomes, if these
conditions are not available the genetic algorithm will lose its value and usefulness in
finding the best solution.
The genetic algorithm is implemented in this proposal system on four main
steps, defined as follows:
1.Alteration
At the first step, database is coded by substitute with the target coding of
samples. The hexadecimal coding is the target bits that increase the capacity of the
cover page, the bits are going to be represented in the form of genomes.
2. Modulation
This is the most important step; it is an essential part of algorithm. All results
and achievements that it expects are depending on this step. Active and smart
algorithms are useful here.
In this stage genetic algorithm tries to decrease the amount of faults and improve
the transparency. To perform this step, two different methods will be used.
One method is easier and is similar to the ordinary techniques, the other method
will be a validation method for better modification of the bits of samples. It is simply
the difference between original page and modified page, here more bits and samples are
modified and adjusted than some previous algorithms.
If it can decrease the difference of the bits, the transparency will be improved.
The following are two examples of modulation for expected smart genetic algorithm.
66
In the first sample, the bits are: 00101111 = 47 on Hexadecimal value are 2F the
goal layer 5 and 1-bit data without modify: 00111111 = 63 Hexadecimal value are 3F
(difference is 16) after modifying the: 00110000 = 48 Hexadecimal value are 30 (the
difference will be 1 for 1-bit including).
In the second sample, the bits are: 00100111 = 39 Hexadecimal value are 27 the
goal layers are 4&5, and data bits are 11 without modifying: 00111111 = 63
Hexadecimal value are 3F (difference is 24) after modifying: the: 00011111 = 31
Hexadecimal value are 1F (difference will be 8 for 2 bits including).
The sample in the proposed system (StegnoTag) is a chromosome (attribute) and
each bit of the sample represents a (tag) gene. First generation or first parents consist of
original page and altered sampled.
Fitness may be determined by a function which calculates the mistake. The most
transparent sample pattern should be measured fittest.
It must be considered that in crossover and mutation the place of target bit
should not be changed.
3. Verification
This step is to control the quality, and the function of the algorithm is identified,
here the result must be verified. If the result is different from the original page and the
new page is acceptable and reasonable, the new page will be accepted, otherwise it will
be rejected and original page will be used in reconstructing the new page instead of that.
4. Reconstruction
In the last step a new page will be created. This is done by test the samples.
There are two states at the input of this step. Either modified sample or the original
sample that is the same with host page file.
This is why it can be claimed that the algorithm does not alter all samples or
predictable samples. Figure 3.7 show the main steps in steganography proposal.
67
Figure 3.7: The main steps in steganography proposal
3.9 The Good Side of Using Genetic Algorithm in The Proposal
One of the benefits of genetic algorithm is that, it can solve most problems with
optimization that can be described with chromosome coding and solving problems with
the development of multiple solutions. Moreover, it is very easy to understand,
practical, does not demand complicated mathematics and be easily transferred to genetic
algorithms and simulation models list. With respect to this proposal(StegnoTag), it have
been proved that genetic algorithm have got efficient hiding ability so that hidden
information cannot be detected on the data through visual, semantic or statistical attack,
and this part will be described in chapter four.
In contrast to the genetic algorithm, steganography techniques are susceptible to
visual, semantic and statistical attacks, because it uses random numbers. Based on the
fact that the order of attributes in HTML tags has no effect on the appearance of the
68
document. This feature can be used to hide the random data effectively. In the proposed
technique, tags are represented in the form of genes and attributes are represented in the
form of chromosomes.
In this proposal a relational table have been created, it consists of primary data
that is distributed in two columns, each column consists of two chromosomes
representing a gene that contains a set of properties. Hexadecimal encode will be
selected in the process of attributes conversion to overcome the problems of capacity
and limited the amount of hidden data. In this encoding process, the chromosome is
represented by using Hexadecimal numbers, this encoding process will increase the
capacity of the cover page (0-9, A-F) so gene can hide 8 bits on the project. The first
chromosome in a row be represented by 4 bits and the second is also represented by 4
bits. The genetic algorithm randomly combines the left hand side of one chromosome
with the right hand side of other chromosome, to form a new chromosome. The new
chromosome must be modified by replacing the repetitive genes with other genes, so
that all genes are different, within each chromosome, in process named crossover.
In mutation, the processor randomly chooses a chromosome and exchanges any
two genes to form a new chromosome. In this operation the place of the embedded
target bit is not changed. The main job of chromosomal fitness is to maximize or
minimize the value of the chromosome through many alterations to reach the suitable
cut off value, which have already been defined, and here it should select the minimum
optimal value of the chromosome.
In regards to the capacity of the hidden data, existing techniques can embed
large database, and in some cases the meaning of the cover page changes completely
until no sense can be made out of it.To analyse each chromosome of each gene of the
stego page, any consecutive two elements of the chromosome G are considered as a
couple, if this chromosome (attribute) is found in the primary chromosomal field of the
key file, then the presence of its corresponding secondary chromosome in the currently
processed gene, is examined. If it is present, then this pair of chromosome hides a bit. If
the primary chromosome (attribute) is followed by secondary chromosome (attribute),
bit 1 will be recorded, otherwise bit 0 will be recorded, and after retrieving the bit the
chromosomes will be marked as processed.
69
The method that is applied to hide huge database is vulnerable to visual attack,
because it uses illogical type of database. The main question to be answered in this
research is how to implement genetic algorithm to produce an effective tool of hiding
data. An encryption method is used to add an extra layer of security for better
encryption of confidential data.
3.10 Limitations of Using Genetic Algorithm in The Proposal
The limitations of this work are primarily due to the conditions of the simulation test:
• The test has been implemented and simulated steadily with the standard group of data
that have already been specified.
• Due to the limitation of resources, the tests have also used samples from random data.
A percentage of web pages and data have been determined to understand the behaviour
of hiding data by the researcher.
• Since the testing and simulation were done in a static condition and methods of
steganography that need the techniques of application in the real time cannot be
implemented. Also, it cannot be measured by standards of latency and simultaneous
connections due to the same reason.
3.11 Chapter Summary
This chapter gives an idea about the theoretical aspects of the proposed system
which uses genetic algorithm to hide the database on HTML document (the protection
of database) and determines the type of media that hide data in proposal techniques.
This chapter also discuss in details the processes and stages of the algorithm and it
showed the steps in Flow charts.
This chapter also discussed the architecture and implementation of the proposed
steganography system (StegnoTag) and demonstrated the good side of using
steganography with genetic algorithm where real-world problems can be solved
practically. Also discussed the limitations of using genetic algorithm.
70
CHAPTER 4
ANALYSIS AND DISCUSSION
4.1 Introduction
The proposed model for steganography by genetic algorithm is validated by
experimental research methods where series of tests are performed to prove its
capabilities. For this purpose, an application was developed that consists of database by
XML with library genetic algorithms and Steganography with encryption.
The application is developed by the C#.NET programming language logical
concepts programming. This application contains the classes, which are designed in a
way that represent logical concepts in the same way.
The genetic algorithm needs to be more reliable than those from previous related
studies and researches. The results achieved from the simulation tests are discussed
using new metrics for detection capabilities.
4.2 System Examination
To determine the fitness of the simulation the behavior of application have been
examined to know the efficiency system by calculating the capacity of the cover file to
determine the size of database which can be secretly sent by using the application of
steganography, also the change in the size on the cover page after steganography,
detection of security in the web page by determine how data can be safety and
intangible. The study has been compared with the search results to the searchers L.Polak
and Z.Kotulski [22].
The homepage of web pages has been divided into different categories to
regulate the training and testing, validation of the algorithm is examined across a range
of different parameter settings. While the classification algorithm works efficiently for
real time applications, when evaluating the system performance over many runs such as
71
examining different web pages’ categories and testing the effects of variations in
multiple parameters, simulating the algorithm can be in short time.
The testing from different home pages like company pages, news portals, social
media pages and university pages. This pages have a lot of items that must be described
HTML tags with features, increasing its ability of steganography. The results of the
analysis will illustrate by details in the tables below.
Table 4.1: Results of efficiency on the companies’ pages
Web Page File Size
(KB)
Stegano Capacity
(B)
Characters
number
Efficiency
(B/kB) %
dell.com 143 134 67 93.71%
microsoft.com 62 55 27 88.71%
toshiba.com 88 44 22 50%
asus.com 85 37 18 43.51%
sony.com 145 52 26 35.86%
samsung.com 81 14 14 17.28%
Table 4.2: Results of efficiency on the news’ pages
Web Page File Size
(KB)
Stegano Capacity
(B)
Characters
number
Efficiency
(B/kB) %
alarabiya.net 276 350 175 126.81%
news.google.com 750 721 360 96.13%
aljazeera.net 190 121 60 63.68%
bbc.com 430 249 124 57.91%
cnn.com 893 480 208 53.75%
news.yahoo.com 1,013 224 112 22.11%
For the simulation purpose, C# software has been used and the simulation was
run in normal PC. Each web page has been trained and tested individually for different
parameter settings.
72
Table 4.3: Results of efficiency on universities’ pages
Web Page File Size
(KB)
Stegano Capacity
(B)
Characters
number
Efficiency
(B/kB) %
fbsu.edu.sa 107 235 117 219.61%
ksu.edu.sa/en 71 74 37 104.23%
manchester.ac.uk 62 72 57 116.13%
cam.ac.uk 61 53 26 86.89%
ox.ac.uk 100 83 42 83%
hanover.edu 91 11 5 12.09%
Table 4.4: Results of efficiency on Social Media’ pages
Web Page File Size
(KB)
Stegano Capacity
(B)
Characters
number
Efficiency
(B/kB) %
youtube.com 483 304 152 62.94%
plus.google.com 1,062 420 210 39.55%
facebook.com 856 217 108 25.35%
twitter.com 302 51 25 16.89%
linkedin.com 329 37 18 11.25%
The results show that universities pages (Table 4.3) and companies’ pages
(Table 4.1) have smaller capacity of the cover file to determine the database which can
be secretly sent by using the application of steganography, because their main pages
(home page) are usually small ones in order to simplify navigation and finding specific
information sought by the customers.
Social Media (Table 4.4) commonly do not allow to hide a lot of information
comparing to their size, because the attributes do not appear so frequently in their
source code, it must login to the main page to navigate the inside pages have a lot of
attributes.
The best results are shown in news pages’ table (4.2), those results represent the
best efficiency of capacity. The News pages are big in size because they have lot of
attributes, so that they can allow to hide a lot of data compared to their size so they can
constitute a decent steganography environment.
73
It have been observed that in some pages the efficiency is over 100% in
fbsu.edu.sa page (Fahad Bin Sultan University) have 219.61%, the hide was very good
and the size not change these results show that this type of processing allows to highly
increase the maximum size of hidden data that can be sent over a HTML code with the
tags’ steganography algorithm, but some case like manchester.ac.uk (Manchester
University) have an efficiency 116.13% the hide was bad, because there is a defect and
integrating the data hidden.
The result of table (4.5) shows the efficiency of genetic algorithm, most of the
pages after hiding data like the original page do not change when the secret data are
hidden on HTML document (Figure 4.1), the increase between 1-3 kb except social
media pages the increases 11-14 kb, 497 kb of capacity for the youtube.com page,
compared to 483 kb before, gave us 2.9% increase.
(Table 4.5): Simulation Tests Results for pages’ size after hiding data
Web Page File Size before
hiding (KB)
File Size after
hiding (KB)
Increase
size
Increase rate
dell.com 143 143 0 0
microsoft.com 62 62 0 0
alarabiya.net 276 278 2 0.7%
news.google.com 750 753 3 0.4%
aljazeera.net 190 191 1 0.5%
bbc.com 430 430 0 0
cnn.com 893 893 0 0
fbsu.edu.sa 107 107 0 0
ksu.edu.sa/en 71 72 1 1.4%
manchester.ac.uk 62 62 0 0
cam.ac.uk 61 61 0 0
ox.ac.uk 100 101 1 1.0%
hanover.edu 91 91 0 0
youtube.com 483 497 14 2.9%
plus.google.com 1,062 1.076 14 13.2%
facebook.com 856 867 11 1.3%
74
Figure 4.1: Simulation Tests Results Increase rate after hide data
Figure (4.1) shows the best feature to gain the maximum accuracy. The value
between 0% and 0.02% increase rate obtained at most of the news pages. These values
are indication of the strength of the genetic algorithm in hiding data without increasing
the size of page.
4.3 Comparative Studies
To test the proposed algorithm, a table have been made containing three
columns, the first column have got different web pages, the second column is the
capacity of proposal method, and the last column is the capacity of L.Polak and
Z.kotulski method ,so efficiency of the steganography algorithm have been examined.
(Table 4.6): Comparison between pages’ capacity on pages share with other method
Web Page Capacity
of proposal method
Capacity
of L.Polak method
microsoft.com 55 126
sony.com 52 100
youtube.com 304 255
cnn.com 480 153
news.yahoo.com 224 133
On table (4.6) some websites which shared between the two methods, the
websites have been tested for the maximum capacity hidden amount of web page. Table
75
shows the largest embedded capacity. The two methods were compared, the
experimental results figure (4.2) are shown so that these methods have capacity bigger
than the other methods. So a comparison between the pages after hiding data (table 4.6)
explained the proof on the efficiency of genetic algorithm.
Figure (4.2): Comparison results between the two methods
4.4 Performance Evaluation of Genetic Algorithm
There are three main components to be designed in GAs. The primary
component is the coding which is drawing a scheme from a problem to the GA
paradigm and represents possible solutions like binary coding and hexadecimal coding
[48].
The second one is a fitness function which determines the quality of solutions
and allows us to differentiate best solutions from bad solutions. The last one is a set of
parameters including population size, population structure, a sequence of genetic
operators, the operators' parameters, the end of situation [21]. The two models of
performance analysis have been measured in this application:
1. Fitness value when select mutation model.
2. Fitness value when cancel mutation model.
76
A GPdotNET v3 software have been used for testing the experimental analysis
[54]. It is a free software tool of artificial intelligence for the application of genetic
programming and genetic algorithms in modeling and optimization of different
problems of engineering nature. The application was developed in .NET (Mono)
framework and C# programming language and can be run on Windows and Linux
operating system.
In this project the fitness function was
Where nr =the number of the attributes.
And pa =the capacity of attributes on page.
Suppose x is a candidate solution to problem, fitness function for x is to
measuring quality of a solution x. The goodness of a solution x may not different as the
genes which is included x vary since the genetic operators such as crossover and
mutation. The equation took from the number of genes in the attributes string, the
population size a capacity of each attribute on page, the mutation probability, the
crossover probability, a random number generator, and a number of generations to run
the simulation.
The roulette wheel selection procedure is employed for the reproduction process.
Because this procedure requires positive fitness values, when there is a negative fitness
value in the population, compensation technique is used for fitness sizing.
In crossover operation the one-point crossover was chosen with set of a
probability. The mutation is applied by random changes in the binary bit string. Each bit
of the string mutation occurs with specific probability, if there are solutions useless
reached after the operations, this string is ignored, and then new one are created from
the start instead.
4.4.1 Fitness Value When Select Mutation Model
In the first run when mutation was selected, the two chromosomes were
randomly chosen to create a new offspring by using a probability of crossing their
genetic string patterns at a random point, 500 values were generated with 500 genes, a
77
population size of 70, in probability of GP operations 0.05 chance of mutation, selection
1 of elitism ,0.3 of reproduction and a 0.7 chance of crossover.
No law to determine the values, may give any values without affecting the
results of the algorithm. To calculate the fitness function must be a number of attributes
does not exceed the attributes capacity of the page from tags.
The implementation was responsible for an initial generation of genes,
calculating their fitness, mating then and creating offspring, then analyzing the next
generation.
Experimental data is generated by GPdotNET program figure (4.3). The entire
dataset is randomly divided into a training set of 24 compounds and test set consisting
of 10 compounds, represented by 10 selected molecular descriptors.
figure (4.3): Experimental dataset when select mutation
The results show that the best solution is found in 500 generations, with best
fitness value is 137.76 as shown in figure (4.4). The most fit, and the maximum fitness
correct bits in the most fit for each generation.
78
Figure (4.4): The evaluation simulation result when select mutation
Figure (4.5): Simulation result of GP modelling of best fitness simulation when select
mutation
When analyzing average fitness, it’s amazing to see the biggest jump from the
first generation to the second, this is the first trained generation to work with the fitness
function, each time the algorithm selects random chromosomes it selects the more fit
out of the two possibilities surrounding a randomly generated number. When this is
performed for an entire generation, there would be substantial growth.
Table (4.7): Training dataset to test a select mutation
Nr X1 X2 X3 X4 R1 R2 R3 R4 Y
1 0 0 0 0.04345 3.83745 2.44014 7.15454 0.60323 0.448
2 0 0 0.5 0.33859 3.83745 2.44014 7.15454 0.60323 0.368
3 0 0 1 0.26253 3.83745 2.44014 7.15454 0.60323 0.336
4 0 0.5 0 0.06103 3.83745 2.44014 7.15454 0.60323 0.576
5 0 0.5 0.5 0.57484 3.83745 2.44014 7.15454 0.60323 0.552
6 0 0.5 1 0.58944 3.83745 2.44014 7.15454 0.60323 0.408
7 0 1 0 0.22962 3.83745 2.44014 7.15454 0.60323 1
8 0 1 0.5 0.67536 3.83745 2.44014 7.15454 0.60323 0.88
9 0 1 1 0.70682 3.83745 2.44014 7.15454 0.60323 0.776
79
When choosing mutation, the fitness value was increased in every time, and
change at generation value was lower.
This has led to high the fitness value for the most fit, also has the ability to
randomly mutate was highly valued to genes figure (4.6) show the results.
Figure (4.6): the results of select mutation to training dataset
figure (4.7): The result of GP model fitness evolution of the program when select
mutation
It is not important to have a linear upward slope in reaching higher fitness,
because the crossover function may destroy a most fit individuals offspring in the early
generations as shown in figure (4.7).
80
Figure (4.8): The results of training dataset when select mutation
Figure (4.9): Simulation results of test data when select mutation
Figure (4.9) show the average fitness constantly increase, so that despite some
variance in the most fit gene, the population tends to grow consistently over time.
4.4.2 Fitness Value When Cancel Mutation
In the second run when mutation was canceled, the two chromosomes were
randomly chosen to create a new offspring by using a probability of crossing their
genetic string patterns at a random point, 500 values were generated with 500 genes, a
population size of 70, in probability of GP operations 0.00 chance of mutation, selection
1 of elitism ,0.3 of reproduction and a 0.7 chance of crossover. Also like the selected
mutation model, the entire dataset is randomly divided into a training set of 24
compounds and test set consisting of 10 compounds, represented by 10 selected
molecular descriptors.
The best fitness values increased slower not reach as high of a value total, the
most fit individual to be more consistent likely reaching a higher overall value. The
results show that the best solution is found in 500 generations, with best fitness value is
81
53.23, the changed at generation is 84 as shown in figure (4.10). The result is less than
select mutation.
Figure (4.10): The evaluation result when cancel mutation
Figure (4.11): The results of modelling of best fitness simulation when cancel mutation
Figure (4.11) displays the best fitness is 53.23, the most fit, and the maximum
fitness correct bits in the most fit for each generation. When analyzing average fitness,
each time the algorithm selects random chromosomes it selects the more fit out of the
two possibilities surrounding a randomly generated number. When this is performed for
an entire generation, there would be substantial growth.
Table (4.8): Training dataset to test a cancel mutation
Nr X1 X2 X3 X4 X5 X6 X7 Y
1 -0.45239 0.509759 1.606766 2.828297 4.163454 5.600927 7.129132 13.91386
2 0.509759 1.606766 2.828297 4.163454 5.600927 7.129132 8.73632 15.71945
3 1.606766 2.828297 4.163454 5.600927 7.129132 8.73632 10.41068 17.54589
4 2.828297 4.163454 5.600927 7.129132 8.73632 10.41068 12.14042 19.38214
5 4.163454 5.600927 7.129132 8.73632 10.41068 12.14042 13.91386 21.21744
6 5.600927 7.129132 8.73632 10.41068 12.14042 13.91386 15.71945 23.0414
7 7.129132 8.73632 10.41068 12.14042 13.91386 15.71945 17.54589 24.84399
8 8.73632 10.41068 12.14042 13.91386 15.71945 17.54589 19.38214 26.61558
9 10.41068 12.14042 13.91386 15.71945 17.54589 19.38214 21.21744 28.34696
10 12.14042 13.91386 15.71945 17.54589 19.38214 21.21744 23.0414 30.02934
82
The simulation was produced the results detailed in table (4.8). The maximum
best fitness was 53.23 and the result shows the maximum fitness correct bits in the most
fit for each generation. When mutation was canceled, the best fitness values was
increased slowly to reach as high of a value total, the most fit individual to be more
consistent likely to reaching a higher overall value the results shown in figure (4.12).
Figure (4.12): The results of training dataset when cancel mutation
The second simulation test of the algorithm has been applied for higher fitness in
dataset which includes a difference of bits on Table (4.9).
Table (4.9): Training dataset to test fitness function when cancel mutation
Nr X1 X2 X3 X4 X5 X6 X7 Y
1 6.433781 3.815089 2.54016 2.798055 3.632377 3.527516 2.96315 -0.5019
2 3.815089 2.54016 2.798055 3.632377 3.527516 2.96315 2.165226 -1.27008
3 2.54016 2.798055 3.632377 3.527516 2.96315 2.165226 1.26764 -1.91571
4 2.798055 3.632377 3.527516 2.96315 2.165226 1.26764 0.358334 -2.41751
5 3.632377 3.527516 2.96315 2.165226 1.26764 0.358334 -0.5019 -2.76119
6 3.527516 2.96315 2.165226 1.26764 0.358334 -0.5019 -1.27008 -2.93796
7 2.96315 2.165226 1.26764 0.358334 -0.5019 -1.27008 -1.91571 -2.94334
8 2.165226 1.26764 0.358334 -0.5019 -1.27008 -1.91571 -2.41751 -2.77633
9 1.26764 0.358334 -0.5019 -1.27008 -1.91571 -2.41751 -2.76119 -2.43874
10 0.358334 -0.5019 -1.27008 -1.91571 -2.41751 -2.76119 -2.93796 -1.93468
83
The results show that it is not important to have a linear upward slope in
reaching higher fitness, because the crossover function may destroy a most fit
individuals offspring in the early generations as shown in figure (4.13) and figure
(4.14). The values of true positive and true negative are also demonstrated. But it is
different between select mutation and cancel mutation. There are more meanders in
model fitness evolution of the program when cancel mutation Figure (4.13).
Figure (4.13): Model fitness evolution of the program when cancel mutation
Figure (4.14): the result of training dataset when cancel mutation
The performance analysis in fitness value show many facts in genetic algorithm
which related fitness value with respect to mutation selection, crossover, population
size, length of random genetic string, and the number of generations when used genetic
algorithm in steganography.
84
When cancel mutation it has been observed that quite few development jumps
after the first dozens of generations and the performance was very poor .These
simulation results show that a GA without mutation does not work at all. Therefore,
they mean that mutation plays a key role in this optimization problem.
An analysis showed that the mutation may affects in decreased average fitness,
but increased probability value of fitness fit was individual, also shows that crossover
helped average fitness while lowering the fitness value for the most fit individual lead to
a reduced negative fitness effect.
4.5 CPU Time Usage
Change in the simulation time has effect on CPU time. The project is simulation
steganography; it is depending on the save the different web pages on the system, so it
has an effect on the CPU time.
Buffer levels and different computers and operating systems vary widely in how
they keep track of CPU time, the simulation time increases, when more production
processes happening in the program. The performance of big database applications
depends critically on the packet buffer size of the data center.
In this application the database has been used with illogical design, it builds with
XML language, that does not require large memory to save it.
Computer's kind like the fitness function determine set of decision variables
according to the CPU time. Buffer levels do not have significant effect on simulation
time, except some extreme conditions, so did not mention buffer levels in performance
analysis [25].
The Results on CPU time converge when execute the hide data in different home
pages. The testing from different home page like company pages, news portals, Social
Media pages and university pages. When execute steganography with BBC news home
page CPU time, figure (4.15) and Figure (4.16) presents the results for CPU time by
using NET Reflector 9.0 program [55].
85
Figure (4.15): Simulation results for time line when steganography on BBC news page
Figure (4.16): Simulation results for method grid when steganography on BBC news
page
Further analysis result carried by execute a lot of different web pages, the
Statistical inference about the results is obtained by comparing the performances using
percentage tests. Examine the behavior of the performance measures steganography in
BBC news home page and dell company, Fahad Bin Sultan University, and YouTube
home page. Different kinds of web pages when make steganography having the same
CPU time results.
Figure 4.17: The results of CPU time when steganography on dell page
86
The results obtained in figure (4.17) by the beginning of the program have hit
count point in encrypting method after this point there is an increase in CPU time,
which is in the fitness method (. ctor) which have big hit count in method.
Population size and maximum generation number have no effects on CPU time
increases. Increasing the population size, and generation number enlarges the search
space; so CPU time increases.
The crossover probability is insignificant according to CPU time, as in the case
of fitness response. To crossover or not does not make any sense, because in each case
the produced children have similar patterns. Figure (4.17) curve shows that increments
in fitness function make significant reductions in CPU time covered.
Table (4.10): Results of CPU time when steganography on dell page
Method name Hit count CPU % Average CPU %
Main 1 99.964 0
btnEncryptDataSet_Click 1 2.65 0.002
EncryptString 1 1.434 0.009
btnSrcFileName_Click 2 41.938 0.005
GetSourceFileName 2 41.874 0.004
btnAnalyse_Click 1 5.442 0.038
GetCapacity 2 6.303 0.108
FindTags 4 3.218 0.05
.ctor 6 2.783 0
btnDstFileName_Click 1 12.317 0.01
GetDestinationFileName 1 12.275 0.003
btnHide_Click 1 24.349 0.037
Hide 1 21.382 0.349
btnExtract_Click 1 3.189 0.026
Extract 1 2.966 0.108
button6_Click 1 1.929 0.01
The results showed that, the simulation time of the solutions generated by GA
constitute to 99.964% of CPU time 0% in average CPU % Table 4.10. The factors
affecting the CPU time change the simulation time. For example, the simulation time
depends on the load of the system. The type of operating system increased the
87
simulation time, more production processes come in the project. Hide method has
41.938% of CPU time average CPU % 0.005%.
Encryption method after this point there is an increase in CPU time, which is in
the fitness method. The mutation has negative impact on CPU, so it is better to set it at
its high level. As the crossover probability and initial population type do not have
significant effect on CPU time.
4.6 Chapter Summary
This chapter in general looks for an important and traditional metrics needed to
check the simulation test of the system algorithms and clarify the limitation and
assumptions. The data used in the simulation test have been identified and discussed.
Also the mathematical equation and definition of the required metrics have been
determined and the results that support the algorithm have been analyzed.
The behavior of application has been tested to know the fitness of simulation
and the efficiency system by calculating the capacity of the cover file to determine the
database which can be secretly sent by using the application of steganography. The
dataset that is used in the simulation has been selected and discussed.
Also the mathematical equation and definition of the required metrics have been
determined. The results that supported the algorithm have been analyzed and reported.
Two models of performance analysis have been measured, the first one is fitness
value by selecting the mutation model and the second one is fitness value by the
cancelation of mutation model. The Change in the simulation time have been observed
through the effect of CPU time on the project of simulation steganography.
88
CHAPTER 5
CONCLUSIONS AND FUTURE WORK
5.1 Conclusions
The main objective of this research is to study the hiding of database within a
specific file (web page) without changing the size of this file using genetic algorithm to
increase the reliability and confidentiality of this database. It is inspired by the way of
natural evolution which is based on Darwin's theory. The mechanism is mainly based on
the way that chromosomes generate new generations through specific mechanisms, such
as cloning and mutation, through which data may be hidden within the page. In each
repetition of the process of generation, concealment process is performed including the
random selection and re-mixing and re-encryption.
The results showed that the combination between steganography and genetic
algorithm will improve the perfection of database hiding within a specific file up to 90%
without changing its general features of the file that are distributed in specific locations
within the file, the ability to isolate the hiding program will be protected with a
password.
The proposed program includes the concentration on the hidden database with a
specific web page and the application of genetic algorithm in database concealment.
That is way the general features of the HTML construction have been studied specially
the (tag) and putting the hidden database inside it and the relation between the features
and the hidden data. The steganography process has been completed successfully with
the required criteria.
Encryption algorithm is designed to strengthen and further complicate illegal
attempts to remove concealment. The main features of the system were designed and
implemented successfully.
89
5.2 Results
The conceptual framework is derived from a hybrid conceptual frameworks for
hiding and framework applied for genetic algorithm engineering, the new framework is
one of the contributions of this research, it can be applied in any system inspired by the
biological diversity.
The integration of the system of genetic algorithm and its adaptation to the
steganography system are essential for the designed specifications of the system to fulfil
the main objectives through the verification of system selection and analysis of results
by applying traditional methods of calculation. This research has introduced new
reference measurements through specific programs.
5.3 Future Work
To work in the future, researchers need deep knowledge about the mechanism of
natural and genetic algorithms for more inspiration and to develop new proposals. This
work has shown that the integration of the science of steganography with genetic
algorithm have got the ability to achieve a significant improvement in data security and
following the same methodology, this mix can be extended to the development of other
security systems to get safer and more reliable systems for database hiding.
5.4 Recommendation
1. The system needs a source of inspiration to be validated as a real-time, to
overcome any shortages that may arise during the implementation of the system
in the real time. The test needs to be more reliable and a sufficient volume of
data. Implementation testing and simulation must be done using other data to
solve problems in other sectors. Other algorithms must be improved to increase
the amount of hidden databases and the application of logical databases to deal
with the memory of the computer.
2. The high flexibility of HTML must be exploited so that it can be manipulated in
the places of features without any change in the status of the page in a web,
which is imperceptible for the browser. Other languages can also be used, they
other than common in the process of hiding databases. Although the proposed
90
method is simple but it is not common and require knowledge and experience to
be discovered.
3. The Internet protocols can also be exploited through steganography. The
development of this method by introducing developmental algorithms to
increase the efficiency hiding data the appearance of fifth-generation in the
communication process and this will develop a way of hiding data, it provides
greater speeds for data download compared to other generations through what is
known as Long Term Evolution (LTE). Steganography of e-mail data and its
white spaces and characters can also be developed.
91
REFERENCES
First: books
[1] David Hunter, et al -Beginning XML-(Indiana: Indianapolis-Wiley
Publishing- 4th ed- N,N,NO,Date).
[2] Mitchell Melanie- An Introduction to Genetic Algorithms- (London:
England- Massachusetts Institute of Technology- Fifth printing, 1999). Available at
:http://www.boente.eti.br/fuzzy/ebook-fuzzy-mitchell.pdf (Accessed on:24/08/2014).
[3] Nigel Smart-Cryptography: An Introduction-(3rd Edition)- McGraw-Hill-
(N.N.NO Date).
[4] S.N.Sivanandam S.N.Deepa- Introduction to Genetic Algorithms –
(Berlin Heidelberg:Springer-Verlag 2008 ).
Second: Scientific papers and researches
[5] Amrita Khamrui Enrolled Scholar-A Report on Genetic Algorithm based
Steganography for Image Authentication-(2014). Available
at:http://jkmandal.com/pdf/amrita_report2.pdf -(Accessed on:09/10/2014).
[6] Chintan Dhanani, et al -HTML Steganography using Relative links &
Multi web-page Embedment-(2014). Available at
:https://www.ijedr.org/papers/IJEDR1402108.pdf(Accessed on:16/01/2016).
[7] Chintan Dhanani, et al-Steganography using web documents as a
carrier:A Survey-(2013).
Available at:https://www.ijedr.org/papers/IJEDR1303036.pdf-(Accessed
on:05/08/2015).
92
[8] Christian Grothoff et al. -Transection Based Steganography -(2009).
Available at:http://grothoff.org/christian/stego.pdf-(Accessed on:12/08/2012).
[9] Christine K. Mulunda , et al -Genetic Algorithm Based Model in Text
Steganography-(10-1-2013).
Available at:http://dl.acm.org/citation.cfm?id=2636522-(Accessed on:15/10/2014).
[10] David E. Goldberg-Genetic Algorithms and the Variance of Fitness-(
Complex Systems 5 - 1991)- pp. 265-278. Available at:http://www.complex-
systems.com/pdf/05-3-1.pdf-(Accessed on:21/05/2011).
[11] Donovan Artz - Digital Steganography: Hiding Data within Data-( IEEE
Internet Computing May / June 2001)-pp.75-80. Available at
:http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=935180&url=http%3A%2F
%2Fieeexplore.ieee.org%2Fiel5%2F4236%2F20242%2F00935180-(Accessed
on:22/09/2012).
[12] Eiji Kawaguchi and Richard O. Eason -Principle and applications of
BPCS-Steganography-(2007). Available at:http://datahide.org/BPCSe/QtechHV-
program-e.html-(Accessed on:14/07/2011).
[13] Elham Ghasemi, et al -High Capacity Image Steganography Based on
Genetic Algorithm and Wavelet Transform- (2012). Available at
:http://www.iaeng.org/publication/IMECS2011/IMECS2011_pp495-498.pdf-
(Accessed on:27/10/2014).
[14] Eric Cole Ronald D. Krutz-Hiding in Plain Sight: Steganography and the
Art of Covert Communication- (Indiana. Canada: Wiley Publishing- Indianapolis -
2003). Available at: http://www.amazon.com/Hiding-Plain-Sight-Steganography-
Communication/dp/0471444499-(Accessed on:11/04/2014).
93
[15] Gary C. Kessler-Steganography: Hiding Data Within Data-(2001).
Available at:http://www.garykessler.net/library/steganography.html-(Accessed
on:16/09/2012(.
[16] HweeHwa Pang, et al- Steganographic Schemes for File System and B-
Tree- (IEEE Transactions on Knowledge & Data Engineering- vol.16- no. 6- June
2004) pp. 701-713-(Accessed on:16/09/2012).
[17] Ingemar J Cox et al- Information Transmission and Steganography –
(1996). Available at
:http://www.cs.ucl.ac.uk/staff/I.Cox/Content/papers/2005/iwdw2005.pdf-(Accessed
on:12/12/2013).
[18] Jen-Chang Liu, Ming-Hong Shih -Generalizations of Pixel-Value
Differencing Steganography for Data Hiding in Images-(2008).
[19] Jessica Fridrich, et al -Steganalysis of Content-Adaptive Steganography in
Spatial Domain-(2011). Available at
:http://dde.binghamton.edu/kodovsky/pdf/Fri11BOSS.pdf-(Accessed on:22/10/2012).
[20] K.F.Man,et al- K.S. Tang, K.F.-Genetic Algorithms: Concept And
Applications-(IEEE transactions in industrial electronics -Vol.43-No.5-October
1996)-pp.519-534. Available at
:http://www.dca.fee.unicamp.br/~gomide/courses/EA072/artigos/Genetic_Algorithm
s_Concepts_Applications_Kwong_1996.pdf-(Accessed on:22/02/2016).
[21] Kazuo Sugihara-Measures for Performance Evaluation of Genetic
Algorithms-(2007). Available at
:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.8611&rep=rep1&type
=pdf- ) Accessed on:20/04/2016).
[22] L.Polak1,Z.Kotulski2 -Sending Hidden Data Through Www
Pages:Detection And Prevention-( Trans- Polish Academy of Sciences - Institute of
Fundamental Technological Research- 2010)-pp.75–89.)Accessed on:10/08/2015(.
94
[23] Mantas Paulinas, Andrius Ušinskas -A Survey of Genetic Algorithms
Applications for Image Enhancement and Segmentation- (Information Technology
and Control- Vol.36- No.3-2007)-Pp. 278- 284. Available at
:http://itc.ktu.lt/itc363/Paulinas363.pdf-(Accessed on:25/11/2011).
[24] Matthew Walker -Introduction to Genetic Programming-( October 7,
2001). Available at
:https://www.cs.montana.edu/~bwall/cs580/introduction_to_gp.pdf-(Accessed
on:16/09/2012).
[25] Onur Boyabatli -Parameter Selection in Genetic Algorithms-(Systemics,
Cybernetics and Informatics- Vol. 2 – No. 4-2004)-pp.78-83. Available at:
http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1840&context=lkcsb_rese
arch-(Accessed on:17/02/2016).
[26] Robert H. Williams III- Introduction to Information Security Concepts-(
2007). Available at
:http://www.worldcolleges.info/sites/default/files/enggnotes/introduction_to_informa
tion_security_concepts.pdf-(Accessed on:6/10/2012).
[27] Robert H. Williams III- Introduction to Information Security Concepts-
(2007). Available at
:http://www.worldcolleges.info/sites/default/files/enggnotes/introduction_to_informa
tion_security_concepts.pdf-(Accessed on:21/10/2012).
[28] Sandipan Dey -Embedding Secret Data in Html Web Page-(2010).
Available at:http://arxiv.org/pdf/1004.0459v1.pdf-(Accessed on:16/01/2016).
[29] Santi P. Maity1, Malay K. Kundu2-Genetic Algorithms for Optimality of
Data Hiding in Digital Images- (2008). Available at
:http://www.isical.ac.in/~malay/Papers/IJSoft%20comp_09.pdf-(Accessed
on:16/10/2014).
95
[30] Shingo Inoue ,et al -A Proposal on Information Hiding Methods using
XML-(2002). Available at:http://takizawa.ne.jp/nlp_xml.pdf-(Accessed
on:11/02/2016).
[31] T. Morkel , et al -An Overview Of Image Steganography-(2005). Available
at:http://repository.root-me.org/St%C3%A9ganographie/EN%20-
%20Image%20Steganography%20Overview.pdf-(Accessed on:27/08/2012).
[32] T. R. Gopala krishnan Nair, et al -Genetic Algorithm to Make Persistent
Security and Quality of Image in Steganography from RS Analysis-
(2012).Available at:https://arxiv.org/ftp/arxiv/papers/1204/1204.2616.pdf-(Accessed
on:19/10/2014).
[33] Xin-Guang Sui, Hui Luo-A New Steganography method based on
Hypertext- IEEE-2004.
Third: Journals and periodicals
[34] Anit Kumar -Encoding Schemes In Genetic Algorithm – (International
Journal of Advanced Research in IT and Engineering, Vol. 2 - No. 3 - March 2013)-
PP.1-7. Available at: http://www.garph.co.uk/ijarie/mar2013/1.pdf-(Accessed
on:25/04/2014).
[35] Arup Kumar Bhaumik1, et al -Data Hiding in Video- (International Journal
of Database Theory and Application -Vol. 2- No. 2- June 2009)-pp.9-16. Available
at:http://www.sersc.org/journals/IJDTA/vol2_no2/2.pd-f(Accessed on:27/10/2012).
[36] B.B. Zaidan, al.el- StegoMos: A secure novel approach of high rate data
hidden using mosaic image and ANN-BMP cryptosystem -(International Journal of
the Physical Sciences- Vol. 5(11)- 18 September, 2010 )- pp. 1796-1806. Available
at:http://www.academicjournals.org/journal/IJPS/article-full-text-
pdf/328BB1031824-(Accessed on:18/06/2011).
[37] Babloo Saha , Shuchi Sharma-Steganographic Techniques of Data Hiding
using Digital Images –(Defence Science Journal-Vol. 62- No. 1- January 2012)- pp.
96
11-18. Available at
:http://publications.drdo.gov.in/ojs/index.php/dsj/article/viewFile/1436/601-
(Accessed on :03/09/2012).
[38] Cheng-Hsing Yanga, et al -A data hiding scheme using the varieties of
pixel-value differencing inmultimedia images-( The Journal of Systems and
Software-( 2011))-pp. 669–678. Available at:
https://lms.ctl.cyut.edu.tw/sysdata/8/21108/doc/d7b84166985e286b/attach/905867.p
df-(Accessed on:02/03/2015).
[39] Dhammjyoti V. Dhawase , Sachin Chavan- Webpage Information Hiding
Using Page Contents- (International Journal of Advanced Research in Computer
Engineering & Technology (IJARCET)Volume 3- Issue 1-January 2014)-pp. 182-
186. Available at:http://ijarcet.org/wp-content/uploads/IJARCET-VOL-3-ISSUE-1-
182-186.pdf-(Accessed on:15/01/2016).
[40] Hamid.A.Jalab, et al -New Design for Information Hiding with in
Steganography Using Distortion Techniques-( IACSIT International Journal of
Engineering and Technology- Vol. 2-No.1- February 2010)-pp.72-77. Available at
:http://www.ijetch.org/papers/103-T463.pdf-(Accessed on:08/08/2015).
[41] Hedieh Sajedi · Mansour Jamzad -Using contourlet transform and cover
selection for secure steganography-( International Journal of Electrical and
Computer Engineering (IJECE)Vol.2- No.5- October 2012)- pp. 699-708. -(Accessed
on:27/10/2015). . Available at
http://www.ijcta.com/documents/volumes/vol3issue2/ijcta2012030233.pdf-(Accessed
on:04/10/2014).
[42] K. F. Rafat and M. Sher -StegRithm:Steganographic Algorithm for Digital
ASCII Text Documents-( IACSIT International Journal of Engineering and
Technology- Vol. 4- No. 6- December 2012)-pp.765-769. Available at
:http://www.ijetch.org/papers/480-B210.pdf-(Accessed on:07/12/2015).
97
[43] Komal R. Hole1, et al -Application of Genetic Algorithm for Image
Enhancement and Segmentation-(International Journal of Advanced Research in
Computer Engineering & Technology (IJARCET)-Volume 2-Issue 4- April 2013)- pp.
1342-1346. Available at:http://ijarcet.org/wp-content/uploads/IJARCET-VOL-2-
ISSUE-4-1342-1346.pdf-(Accessed on:08/10/2014).
[44] Mohammad Shirali Shahreza-Arabic/Persian Text Steganography Utilizing
Similar Letters with Different Codes-(The Arabian Journal for Science and
Engineering- Volume 35-Number 1B-( 2006)-pp.213-222. Available at
:https://ajse.kfupm.edu.sa/articles/351b-p.14.pdf-(Accessed on:25/02/2015).
[45] Mohit Garg -A Novel Text Steganography Technique Based on Html
Documents-(International Journal of Advanced Science and Technology -Vol. 35-
October 2011)- pp. 129-138. Available at
:http://www.sersc.org/journals/IJAST/vol35/11.pdf-(Accessed on:23/08/2015).
[46] Neha Rani, et al - Text Steganography Techniques-(International Journal
of Engineering Trends and Technology (IJETT) – Vol.4 Issue 7- July 2013)-pp.
3013- 3015. Available at:http://www.ijettjournal.org/volume-4/issue-7/IJETT-
V4I7P186.pdf-(Accessed on:05/10/2015).
[47] P. Surekha1,S. Sumathi2-Implementation Of Genetic Algorithm For A Dwt
Based Image Watermarking Scheme-(
Ictact Journal On Soft Computing: Special Issue On Fuzzy In Industrial And Process
Automation,July 2011, Volume: 02, Issue: 01)-pp.244-252 . Available at:
http://ictactjournals.in/paper/IJSC_Vol2_Iss1_244_252.pdf- Accessed
on:20/02/2013).
[48] Priyanka Sharma1, Rajesh Gargi2-Performance Analysis of Different
SelectionTechniques in Genetic Algorithm-(International Journal of Science and
Research (IJSR)-2012)-pp. 2042-2046. Available at
:http://www.ijsr.net/archive/v3i8/MDIwMTU1NDc%3D.pdf-(Accessed
on:25/01/2015).
[49] Raj Kumar Mohanta1, Binapani Sethi2-A Review of Genetic Algorithm
application for Image Segmentation – (Raj Kumar Mohanta et al,Int.J.Computer
98
Technology & Applications-Vol 3 )-pp. 720-723. Available at
:http://www.ijcta.com/documents/volumes/vol3issue2/ijcta2012030233.pdf
[50] Rajesh Kumar et al. - Genetic Divergence Studies in Pigeonpea –
(American Journal of Plant Sciences-2013)-pp. 2126-2130 . Available at
:http://dx.doi.org/10.4236/ajps.2013.411264-(Accessed on:02/06/2016).
[51] Rupali Gawade, et al -Data Hiding Using Steganography for Network
Security-( International Journal of Advanced Research in Computer and
Communication Engineering-Vol. 3- Issue 2- February 2014)-pp. 5740- 5743.
Available at
:http://www.ijarcce.com/upload/2014/february/IJARCEE9H_S_priyanka_shetya_Da
ta.pdf-(Accessed on:22/06/2016).
[52] Shen Wang, et al -A Secure Steganography Method based on Genetic
Algorithm- (Journal of Information Hiding and Multimedia Signal Processing-Vol.1-
No. 1- January 2010)-pp.28-35. Available at
:http://vanilla47.com/PDFs/Cryptography/Steganography/A%20Secure%20Stegano
graphy%20Method%20based%20on%20Genetic%20Algorithm.pdf-(Accessed
on:09/10/2014).
[53] Souvik Bhattacharyya, et al- Data Hiding Through Multi Level
Steganography and SSCE-(Journal of Global Research in Computer Science,
Volume 2, No. 2, February 2011)-pp.38-47.
Available at:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.8944&rep=rep1&typ
e=pdf-(Accessed on:05/01/2013).
Fourth: Web sites
[54] GPdotNET v3,(https://gpdotnet.codeplex.com/) -(Accessed on:12/03/2016).
[55] Reflector 9.0 program,
(https://documentation.redgate.com/display/REF9/.NET+Reflector+9+documentatio
n). -(Accessed on:22/04/2016).
99
APPENDICES
APPENDIX A
The Screens of the System
Creating and encrypting data base
Hiding a data base in web page
100
Extracting and decrypting data base
101
APPENDIX B
Source Code
%%%%%%%%%%%%%%%%%%%%%%%%%%%% class Encrypt %%%%%%%%%%%%%%%%%% using System; using System.Linq; using System.Text; using System.Security.Cryptography; using System.IO; namespace creatdatxml { public static class Encrypt { // This size of the IV (in bytes) must = (keysize / 8). Default keysize is 256, so the IV must be 32 bytes long. Using a 16 character string here gives us 32 bytes when converted to a byte array. private const string initVector = "pemgail9uzpgzl88"; // This constant is used to determine the keysize of the encryption algorithm. private const int keysize = 256; //Encrypt public static string EncryptString(string plainText, string passPhrase) { byte[] initVectorBytes = Encoding.Default.GetBytes(initVector); byte[] plainTextBytes = Encoding.Default.GetBytes(plainText); PasswordDeriveBytes password = new PasswordDeriveBytes(passPhrase, null); byte[] keyBytes = password.GetBytes(keysize / 8); RijndaelManaged symmetricKey = new RijndaelManaged(); symmetricKey.Mode = CipherMode.CBC; ICryptoTransform encryptor = symmetricKey.CreateEncryptor(keyBytes, initVectorBytes); MemoryStream memoryStream = new MemoryStream(); CryptoStream cryptoStream = new CryptoStream(memoryStream, encryptor, CryptoStreamMode.Write); cryptoStream.Write(plainTextBytes, 0, plainTextBytes.Length); cryptoStream.FlushFinalBlock(); byte[] cipherTextBytes = memoryStream.ToArray(); memoryStream.Close(); cryptoStream.Close(); return Convert.ToBase64String(cipherTextBytes); } //Decrypt public static string DecryptString(string cipherText, string passPhrase) { byte[] initVectorBytes = Encoding.ASCII.GetBytes(initVector); byte[] cipherTextBytes = Convert.FromBase64String(cipherText); PasswordDeriveBytes password = new PasswordDeriveBytes(passPhrase, null); byte[] keyBytes = password.GetBytes(keysize / 8); RijndaelManaged symmetricKey = new RijndaelManaged(); symmetricKey.Mode = CipherMode.CBC; ICryptoTransform decryptor = symmetricKey.CreateDecryptor(keyBytes, initVectorBytes); MemoryStream memoryStream = new MemoryStream(cipherTextBytes); CryptoStream cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read);
102
byte[] plainTextBytes = new byte[cipherTextBytes.Length]; int decryptedByteCount = cryptoStream.Read(plainTextBytes, 0, plainTextBytes.Length); memoryStream.Close(); cryptoStream.Close(); return Encoding.UTF8.GetString(plainTextBytes, 0, decryptedByteCount); } } }
%%%%%%%%%%%%%%%%%%%%%%%%end class%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%% class HtmlUtility %%%%%%%%%%%%%%%%%%
#region Using directives using System; using System.IO; using System.Data; using System.Text; using System.Collections; using System.Collections.Specialized; using System.Runtime.InteropServices; using GeneticAlgorithms; #endregion namespace stegeneticweb { public class HtmlUtility : GeneticAlgorithms.GenomeCollection { public HtmlUtility() { } /// <summary>Counts the key attribute couples in an HTML document</summary> /// <param name="sourceFileName">Path and name of the HTML document</param> /// <param name="keyTable">DataTable with the key attributes</param> /// <returns>Count of bytes that can be hidden in the specified document</returns> public int GetCapacity(String sourceFileName, DataTable keyTable) { int countCarrierCouples = 0; StreamReader reader = new StreamReader(sourceFileName, Encoding.Default); String htmlDocument = reader.ReadToEnd(); reader.Close(); HtmlTagCollection tags = FindTags(htmlDocument); StringBuilder insertTextBuilder = new StringBuilder(); DataRow[] rows; HtmlAttribute secondAttribute; foreach (HtmlTag tag in tags) { foreach (HtmlAttribute attribute in tag.Attributes) { if (!attribute.Handled)
103
{ rows = keyTable.Select("firstAttribute = '" + attribute.Name.Replace("'", "''") + "'"); if (rows.Length > 0) { secondAttribute = FindAttribute(rows[0]["secondAttribute"].ToString(), tag.Attributes); if (secondAttribute != null) { countCarrierCouples++; } } } } } return countCarrierCouples; } /// <summary>Encode one bit as a combination of attributes, add the resulting text to a StringBuilder</summary> /// <param name="messageByte">Current byte</param> /// <param name="bitIndex">Current position in [messageByte]</param> /// <param name="firstAttribute">Key attribute</param> /// <param name="secondAttribute">Corresponding attribute</param> /// <param name="insertTextBuilder">Receives the new HTML text</param> private void HideBit(int messageByte, int bitIndex, HtmlAttribute firstAttribute, HtmlAttribute secondAttribute, StringBuilder insertTextBuilder) { String firstAttributeText, secondAttributeText; if (firstAttribute.Value.Length > 0) { firstAttributeText = String.Format("{0:x2}={1:x2}", firstAttribute.Name, firstAttribute.Value); } else { firstAttributeText = firstAttribute.Name; } if (secondAttribute.Value.Length > 0) { secondAttributeText = String.Format("{0:x2}={1:x2}", secondAttribute.Name, secondAttribute.Value); } else { secondAttributeText = secondAttribute.Name; } if (GetBit(messageByte, bitIndex)) { //bit is true insertTextBuilder.AppendFormat( @" {0:x2} {1:x2}", firstAttributeText, secondAttributeText); } else { //bit is false
104
insertTextBuilder.AppendFormat( @" {0:x2} {1:x2}", secondAttributeText, firstAttributeText); { } } } /// <summary>Hide a message in an HTML document</summary> /// <param name="sourceFileName">Path and name of the HTML document</param> /// <param name="destinationFileName">Path and name to save the resulting HTML document</param> /// <param name="message">The message to hide</param> /// <param name="keyTable">DataTable with the key attributes</param> public void Hide(String sourceFileName, String destinationFileName, Stream message, DataTable keyTable) { //read the carrier document StreamReader reader = new StreamReader(sourceFileName, Encoding.Default); String htmlDocument = reader.ReadToEnd(); reader.Close(); message.Position = 0; ; //list the HTML tags HtmlTagCollection tags = FindTags(htmlDocument); StringBuilder insertTextBuilder = new StringBuilder(); DataRow[] rows; HtmlAttribute secondAttribute; int offset = 0; int bitIndex = 7; int messageByte = 0; foreach (HtmlTag tag in tags) { insertTextBuilder.Remove(0, insertTextBuilder.Length); insertTextBuilder.AppendFormat("<{0:x2}", tag.Name); foreach (HtmlAttribute ICrossover in tag.Attributes) { if (!ICrossover.Handled) { //attribute has not been used, yet //find key row for this attribute rows = keyTable.Select(String.Format("firstAttribute = '{0:x2}'", ICrossover.QueryFormattedName)); if (rows.Length > 0) { //find corresponding attribute secondAttribute = FindAttribute(rows[0]["secondAttribute"].ToString(), tag.Attributes); if (secondAttribute != null)
105
{ if (bitIndex ==7) { //get next message byte bitIndex = 0; messageByte = message.ReadByte(); } else { //next bit bitIndex++; } //change the attributes' order HideBit(messageByte, bitIndex, ICrossover, secondAttribute, insertTextBuilder); //mark both attributes as handled ICrossover.Handled = true; secondAttribute.Handled = true; } } if (!ICrossover.Handled) { //the attribute is not a primary key attribute. Is it a secondary key attribute? bool copyAttribute = false; rows = keyTable.Select(String.Format("secondAttribute = '{0:x2}'", ICrossover.QueryFormattedName)); if (rows.Length > 0) { //if the corresponding first attribute does not exist in this tag or has already been used, //this attribute will not be used and must be copied. HtmlAttribute firstAttribute = FindAttribute(rows[0]["firstAttribute"].ToString(), tag.Attributes); if (firstAttribute == null) { copyAttribute = true; } else { copyAttribute = firstAttribute.Handled; } } else if (rows.Length == 0) { //this attribute is not part of the key and must be copied. copyAttribute = true; } if (copyAttribute) { //copy unused attribute insertTextBuilder.AppendFormat( @" {0:x2}={1:x2}",
106
ICrossover.Name, ICrossover.Value); ICrossover.Handled = true; } } } } //replace old tag with new tag tag.BeginPosition += offset; tag.EndPosition += offset; String insertText = insertTextBuilder.ToString(); int newLength = insertText.Length; if (newLength > 0) { int oldLength = tag.EndPosition - tag.BeginPosition; htmlDocument = htmlDocument.Remove(tag.BeginPosition, oldLength); htmlDocument = htmlDocument.Insert(tag.BeginPosition, insertText); offset += (newLength - oldLength); } if (messageByte < 0) { break; //finished } } //save the new document StreamWriter writer = new StreamWriter(destinationFileName); writer.Write(htmlDocument); writer.Close(); } /// <summary>Extract one bit, add it to a Stream</summary> /// <param name="firstAttributePosition">Position of the key attribute in the source document</param> /// <param name="secondAttributePosition">Position of the corresponding attribute in the source document</param> /// <param name="messageByte">Current message byte</param> /// <param name="bitIndex">Current bit index</param> /// <param name="message">Message stream</param> /// <returns>New message byte</returns> private byte ExtractBit(int firstAttributePosition, int secondAttributePosition, byte messageByte, int bitIndex, Stream message) { if (firstAttributePosition < secondAttributePosition) { messageByte = SetBit(messageByte, bitIndex, true); } else { messageByte = SetBit(messageByte, bitIndex, false); } if (bitIndex == 7) { //save to message byte
107
message.WriteByte(messageByte); messageByte = 0; } return messageByte; } /// <summary>Extract a hidden message from an HTML document</summary> /// <param name="sourceFileName">Path and name of the HTML document</param> /// <param name="message">Empty stream for the message</param> /// <param name="keyTable">DataTable with the key attributes</param> public void Extract(String sourceFileName, Stream message, DataTable keyTable) { //read the carrier document StreamReader reader = new StreamReader(sourceFileName, Encoding.Default); String htmlDocument = reader.ReadToEnd(); reader.Close(); //list the HTML tags HtmlTagCollection tags = FindTags(htmlDocument); StringBuilder insertTextBuilder = new StringBuilder(); DataRow[] rows; HtmlAttribute secondAttribute; int attributePosition, secondAttributePosition; int messageLength = 0; int bitIndex = 0; byte messageByte = 0; foreach (HtmlTag tag in tags) { foreach (HtmlAttribute ICrossover in tag.Attributes) { if (!ICrossover.Handled) { //attribute has not been used, yet //find key row for this attribute rows = keyTable.Select(String.Format("firstAttribute = '{0:x2}'", ICrossover.QueryFormattedName)); if (rows.Length > 0) { //find corresponding attribute secondAttribute = FindAttribute(rows[0]["secondAttribute"].ToString(), tag.Attributes); if (secondAttribute != null) { attributePosition = htmlDocument.IndexOf(ICrossover.Name, tag.BeginPosition); secondAttributePosition = htmlDocument.IndexOf(secondAttribute.Name, tag.BeginPosition); //compare the attributes' positions messageByte = ExtractBit(attributePosition, secondAttributePosition, messageByte, bitIndex, message);
108
//next bit if (bitIndex == 7) { bitIndex = 0; if ((message.Length == 1) && (messageLength == 0)) { //read length message.Position = 0; BinaryReader binaryReader = new BinaryReader(message); messageLength = binaryReader.ReadByte(); reader = null; message.SetLength(0); message.Position = 0; } else if ((messageLength > 0) && (message.Length == messageLength)) { break; //finished } } else { bitIndex++; } //mark both attributes as handled ICrossover.Handled = true; secondAttribute.Handled = true; } } if (!ICrossover.Handled) { rows = keyTable.Select(String.Format("secondAttribute = '{0:x2}'", ICrossover.QueryFormattedName)); if (rows.Length == 0) { //tag not used ICrossover.Handled = true; } } } } if ((messageLength > 0) && (message.Length == messageLength)) { break; //finished } } } /// <summary>Find the attribute with a specific name</summary> /// <param name="name">Name of the attribute</param> /// <param name="attributes">Attributes of a tag</param> /// <returns>The attribute found in [attributes], or null</returns> private HtmlAttribute FindAttribute(String name, HtmlAttributeCollection attributes) {
109
HtmlAttribute foundAttribute = null; foreach (HtmlAttribute ICrossover in attributes) { if ((!ICrossover.Handled) && (ICrossover.Name == name)) { foundAttribute = ICrossover; break; } } return foundAttribute; } /// <summary>List all HTML tags of a document</summary> /// <param name="htmlDocument"></param> /// <returns>List with</returns> private HtmlTagCollection FindTags(String htmlDocument) { HtmlTagCollection ICrossover = new HtmlTagCollection(); int indexStart = 0, indexEnd = 0; String text; do { indexStart = htmlDocument.IndexOf('<', indexEnd + 1); if (indexStart > 0) { indexEnd = htmlDocument.IndexOf('>', indexStart + 1); if (indexEnd > 0) { if (htmlDocument[indexStart + 1] != '/') { //Ende vom Start-Tag gefunden text = htmlDocument.Substring(indexStart, indexEnd - indexStart); ICrossover.Add(new HtmlTag(text, indexStart, indexEnd)); } } } } while (indexStart > 0); return ICrossover; } /// <summary>Get the value of a bit</summary> /// <param name="b">The byte value</param> /// <param name="position">The position of the bit</param> /// <returns>The value of the bit</returns> private bool GetBit(int b, int position) { return ((b & (byte)(1 << position)) != 0); } /// <summary>Set a bit to [newBitValue]</summary> /// <param name="b">The byte value</param> /// <param name="position">The position (1-8) of the bit</param> /// <param name="newBitValue">The new value of the bit in position [position]</param> /// <returns>The new byte value</returns> /// #region Random Methods
110
/// public static byte[] GetRandomBytes(int size,HtmlAttribute parent) { byte[] buffer = new byte[size]; RandomProvider.NextBytes(buffer); return buffer; } private static Random _randomProvider = new Random(); public static Random RandomProvider { get { return _randomProvider; } set { _randomProvider = value; } } private byte SetBit(byte b, int position, bool newBitValue ) { byte mask = (byte)(1 << position); if (newBitValue) { return (byte)(b | mask); } else { return (byte)(b & ~mask); } } } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%% end class %%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%class HtmlAttribute %%%%%%%%%%%%%%%%%% using System; using System.Collections.Generic; using System.Linq; using System.Text; using GeneticAlgorithms; namespace stegeneticweb { public class HtmlAttribute:IComparable { protected GeneticAlgorithm Parent; protected HtmlAttribute(GeneticAlgorithm parent) { Parent=parent; } private String name; private String value; private bool handled;
111
public String Name { get { return name; } } public String QueryFormattedName { get { return name.Replace("'", "''"); } } public String Value { get { return this.value; } set { this.value = value; } } public bool Handled { get { return handled; } set { this.handled = value; } } public HtmlAttribute(String name) { this.name = name.ToLower(); this.value = String.Empty; handled = false; } private double _fitness=double.MinValue; public double Fitness { get { // double Fitness=Math.Pow(15 * x * y * (1 - x) * (1 - y) * Math.Sin(n * Math.PI * x) * Math.Sin(n * Math.PI * y), 2); return _fitness; } set { _fitness=value; } } #region IComparable Members public int CompareTo(object obj) { HtmlAttribute compared=(HtmlAttribute)obj; if(this.Fitness<compared.Fitness) return -1; else if(this.Fitness>compared.Fitness)
112
return 1; else return 0; } #endregion internal static void Sort() { throw new NotImplementedException(); } } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%end class %%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% IGenomeFactory%%%%%%%%%%%%%%%%%%
using System; using System.Collections.Generic; using System.Linq; using System.Text; using GeneticAlgorithms; namespace stegeneticweb { /// <summary> /// Collects methods used to create Genomes /// </summary> public interface IGenomeFactory { HtmlAttribute CreateGenome(GeneticAlgorithm parent); } } %%%%%%%%%%%%%%%%%%%%%%%%%%%% end class%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%% GeneticAlgorithm %%%%%%%%%%%%%%%%%% using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace stegeneticweb { /// <summary> /// The GeneticAlgorithm class is the cornerstone in a small framework of class that build a customizable /// genetic algorithm searches. At a mimimum you must set up either a BinaryGenome or RealGenome /// that describes a solution to your problem and then implement the IEvaluateGenome interface to /// provide a way of 'scoring' each possible candidate in the population of possible solutions. /// /// See the provided examples for more information. /// </summary> public class GeneticAlgorithm : stegeneticweb.IGeneticAlgorithm { protected internal readonly System.Type genomeType;
113
/// <summary> /// Creates an instance of the GeneticAlgorithm class. /// </summary> public GeneticAlgorithm() { } #region Methods /// <summary> /// After you have created the GeneticAlgorithm instance and provided it with an implementation of /// IGenomeFactory, this method can be called to actually create the population of custom Genomes for /// use in the GA search. /// /// In the case of the provided RealGenomeFactory you create a RealGenomeFactory instance, then /// set the minimum and maximum values you want each genome to have, then provide the Genetic Algorithm /// the factory through the GenomeFactory property. /// /// The GenomeFactory that you provide is responsible for every aspect of constructing the Genome so that it's ready /// to evaluate and crossover with over genomes. /// </summary> /// <param name="populationSize"></param> public void CreateGenomes(int populationSize) { if(GenomeFactory==null) throw new ApplicationException("Cannot create Genomes if the GenomeFactory is null"); HtmlAttribute[] genomes = new HtmlAttribute[populationSize]; for(int i=0;i<populationSize;i++) _genomes.Add(GenomeFactory.CreateGenome(this)); } /// <summary> /// Attempts to find an optimal solution to the problem. /// </summary> /// <returns></returns> public virtual HtmlAttribute FindOptima() { return FindOptima(ExitConditions); } /// <summary> /// Attempts to find an optimal solution to the problem by beginning the process of evaluation and crossover/mutation. /// /// This method behavior is as follows: While the ExitConditions are not met, loop through each Genome in the population, and select /// for it a mate with (if in greedy mode, do not mate and replace the best solution with a child; keep the best solution only as /// a mate for other Genomes). Use the provided Selector to control the selection process (weighted random, or sequential etc) /// . Use the Crossover provided to recombine the selected genome and it's mate, as well as any mutation that must happen to the /// </summary> /// <param name="conditions"></param>
114
/// <returns></returns> public virtual HtmlAttribute FindOptima(ExitConditions conditions) { /// If you run the debug version, the app keeps track of the time spent evaluating the Genomes /// recombining and mutating the Genomes (crossover) and selecting mates for the Genomes (selection) #if DEBUG Counter evalTime=new Counter(); Counter crossoverTime=new Counter(); Counter selectorTime=new Counter(); #endif if(Selector==null) throw new ApplicationException("Cannot run FindOptima if the Selector is null"); if(Crossover==null) throw new ApplicationException("Cannot run FindOptima if the Crossover is null"); startTime=DateTime.Now; // this is how greedyness is implemented: // if we are 'greedy' we keep the top genome every time, so we short the // count or genomes by one, because we want the last genome to remain // untouched int genomeCount = (IsGreedy ? _genomes.Count-1 : _genomes.Count); while(conditions.DoesContinue(this)) { if(NewGeneration!=null) NewGeneration(this, null); #if DEBUG evalTime.Start(); #endif // first, evalute the fitness: for(int g=0;g<genomeCount;g++) { Genomes[g].Fitness = Evaluator.Eval(Genomes[g]); } #if DEBUG evalTime.Stop(); #endif // sort places genomes in order from least fit at position 0 //to most fit at the end of the collection: Genomes.Sort(); if(Genomes[Genomes.Count-1].Fitness>_gbestFitness) { _gbestFitness=Genomes[Genomes.Count-1].Fitness; if(NewGlobalBest!=null) NewGlobalBest(this, null); } for(int g=0;g<genomeCount;g++)
115
{ #if DEBUG selectorTime.Start(); #endif HtmlAttribute mate = Selector.Select(); //while(mate.Equals(Genomes[g])) //mate = Selector.Select(); #if DEBUG selectorTime.Stop(); #endif #if DEBUG crossoverTime.Start(); #endif /// Note: although the Crossover method will *often* simply modify the referenced first Genome 'in-place' /// it makes sense to return it explicitly so that future implementations *can* create new Genome instances /// if it better/necessary for it's particular algorithm _genomes[g] = Crossover.Crossover(_genomes[g], mate); #if DEBUG crossoverTime.Stop(); #endif } generations++; } #if DEBUG Console.WriteLine("Evaluation time: {0}", evalTime.Seconds); Console.WriteLine("Crossover time: {0}", crossoverTime.Seconds); Console.WriteLine("Selector time: {0}", selectorTime.Seconds); #endif if(!IsGreedy) HtmlAttribute.Sort(); return Genomes[Genomes.Count-1]; } #endregion #region Properties protected DateTime startTime; /// <summary> /// Records the time FindOptima was last called. /// </summary> public DateTime StartTime { get { return startTime; } } protected int generations=0; /// <summary> /// The Generation count since the last time FindOptima was called. /// </summary> public int GenerationCount
116
{ get { return generations; } } private HtmlAttributeCollection _genomes = new HtmlAttributeCollection(); public HtmlAttributeCollection Genomes { get { return _genomes; } } private IGenomeSelector _selector; /// <summary> /// The Selector is an instance of IGenomeSelector that provides the selection strategy for the GA. /// </summary> public IGenomeSelector Selector { get { return _selector; } set { _selector=value; } } private IEvaluateGenome _evaluator; /// <summary> /// The IEvaluateGenome implementation that the GA will use to determine Genomes' fitness. /// </summary> public IEvaluateGenome Evaluator { get { return _evaluator; } set { _evaluator=value; } } private ICrossover _crossover; /// <summary> /// The ICrossover used to recombine and mutate Genomes. /// </summary> public ICrossover Crossover { get { return _crossover; } set
117
{ _crossover=value; } } private IGenomeFactory _genomeFactory; /// <summary> /// The IGenomeFactory used to create new genomes as necessary. /// </summary> public IGenomeFactory GenomeFactory { get { return _genomeFactory; } set { _genomeFactory=value; } } private ExitConditions _exitConditions=new ExitConditions(); /// <summary> /// Sets/Gets the instance of ExitConditions this GA is using to determine when to quit iterating through /// generations. /// </summary> public ExitConditions ExitConditions { get { return _exitConditions; } set { _exitConditions=value; } } private bool _isGreedy=true; /// <summary> /// 'Greedyness' in this idiom of a GeneticAlgorithm means that the algorithm always holds on to the most /// optimal solution yet found between generations. /// </summary> public bool IsGreedy { get { return _isGreedy; } set { _isGreedy=value; } } #endregion public event GeneticAlgorithmEventHandler NewGeneration; public event GeneticAlgorithmEventHandler NewGlobalBest;
118
private double _gbestFitness=double.MinValue; } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%end class %%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%HtmlAttributeCollectio%%%%%%%%%%%%%%%%%%
#region Using directives using System; using System.Collections; using System.Text; using AForge.Genetic; using GeneticAlgorithms; #endregion namespace stegeneticweb { /// <summary> /// <para> /// A collection that stores <see cref='.HtmlAttribute'/> objects. /// </para> /// </summary> /// <seealso cref='.HtmlAttributeCollection'/> [Serializable()] public class HtmlAttributeCollection : System.Collections.CollectionBase { /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlAttributeCollection'/>. /// </para> /// </summary> public HtmlAttributeCollection() { } /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlAttributeCollection'/> based on another <see cref='.HtmlAttributeCollection'/>. /// </para> /// </summary> /// <param name='value'> /// A <see cref='.HtmlAttributeCollection'/> from which the contents are copied /// </param> public HtmlAttributeCollection(HtmlAttributeCollection val) { this.AddRange(val); } /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlAttributeCollection'/> containing any array of <see cref='.HtmlAttribute'/> objects. /// </para> /// </summary> /// <param name='value'>
119
/// A array of <see cref='.HtmlAttribute'/> objects with which to intialize the collection /// </param> public HtmlAttributeCollection(HtmlAttribute[] val) { this.AddRange(val); } /// <summary> /// <para>Represents the entry at the specified index of the <see cref='.HtmlAttribute'/>.</para> /// </summary> /// <param name='index'><para>The zero-based index of the entry to locate in the collection.</para></param> /// <value> /// <para> The entry at the specified index of the collection.</para> /// </value> /// <exception cref='System.ArgumentOutOfRangeException'><paramref name='index'/> is outside the valid range of indexes for the collection.</exception> public HtmlAttribute this[int index] { get { return ((HtmlAttribute)(List[index])); } set { List[index] = value; } } /// <summary> /// <para>Adds a <see cref='.HtmlAttribute'/> with the specified value to the /// <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to add.</param> /// <returns> /// <para>The index at which the new element was inserted.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.AddRange'/> public int Add(HtmlAttribute val) { return List.Add(val); } /// <summary> /// <para>Copies the elements of an array to the end of the <see cref='.HtmlAttributeCollection'/>.</para> /// </summary> /// <param name='value'> /// An array of type <see cref='.HtmlAttribute'/> containing the objects to add to the collection. /// </param> /// <returns> /// <para>None.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.Add'/> public void AddRange(HtmlAttribute[] val) { for (int i = 0; i < val.Length; i++) { this.Add(val[i]); } } /// <summary> /// <para>
120
/// Adds the contents of another <see cref='.HtmlAttributeCollection'/> to the end of the collection. /// </para> /// </summary> /// <param name='value'> /// A <see cref='.HtmlAttributeCollection'/> containing the objects to add to the collection. /// </param> /// <returns> /// <para>None.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.Add'/> public void AddRange(HtmlAttributeCollection val) { for (int i = 0; i < val.Count; i++) { this.Add(val[i]); } } /// <summary> /// <para>Gets a value indicating whether the /// <see cref='.HtmlAttributeCollection'/> contains the specified <see cref='.HtmlAttribute'/>.</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to locate.</param> /// <returns> /// <para><see langword='true'/> if the <see cref='.HtmlAttribute'/> is contained in the collection; /// otherwise, <see langword='false'/>.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.IndexOf'/> public bool Contains(HtmlAttribute val) { return List.Contains(val); } /// <summary> /// <para>Copies the <see cref='.HtmlAttributeCollection'/> values to a one-dimensional <see cref='System.Array'/> instance at the /// specified index.</para> /// </summary> /// <param name='array'><para>The one-dimensional <see cref='System.Array'/> that is the destination of the values copied from <see cref='.HtmlAttributeCollection'/> .</para></param> /// <param name='index'>The index in <paramref name='array'/> where copying begins.</param> /// <returns> /// <para>None.</para> /// </returns> /// <exception cref='System.ArgumentException'><para><paramref name='array'/> is multidimensional.</para> <para>-or-</para> <para>The number of elements in the <see cref='.HtmlAttributeCollection'/> is greater than the available space between <paramref name='arrayIndex'/> and the end of <paramref name='array'/>.</para></exception> /// <exception cref='System.ArgumentNullException'><paramref name='array'/> is <see langword='null'/>. </exception> /// <exception cref='System.ArgumentOutOfRangeException'><paramref name='arrayIndex'/> is less than <paramref name='array'/>'s lowbound. </exception> /// <seealso cref='System.Array'/> public void CopyTo(HtmlAttribute[] array, int index) { List.CopyTo(array, index); }
121
/// <summary> /// <para>Returns the index of a <see cref='.HtmlAttribute'/> in /// the <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to locate.</param> /// <returns> /// <para>The index of the <see cref='.HtmlAttribute'/> of <paramref name='value'/> in the /// <see cref='.HtmlAttributeCollection'/>, if found; otherwise, -1.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.Contains'/> /// public int IndexOf(HtmlAttribute val) { return List.IndexOf(val); } /// <summary> /// <para>Inserts a <see cref='.HtmlAttribute'/> into the <see cref='.HtmlAttributeCollection'/> at the specified index.</para> /// </summary> /// <param name='index'>The zero-based index where <paramref name='value'/> should be inserted.</param> /// <param name=' value'>The <see cref='.HtmlAttribute'/> to insert.</param> /// <returns><para>None.</para></returns> /// <seealso cref='.HtmlAttributeCollection.Add'/> public void Insert(int index, HtmlAttribute val) { List.Insert(index, val); } /// <summary> /// <para>Returns an enumerator that can iterate through /// the <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <returns><para>None.</para></returns> /// <seealso cref='System.Collections.IEnumerator'/> public new HtmlAttributeEnumerator GetEnumerator() { return new HtmlAttributeEnumerator(this); } /// <summary> /// <para> Removes a specific <see cref='.HtmlAttribute'/> from the /// <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to remove from the <see cref='.HtmlAttributeCollection'/> .</param> /// <returns><para>None.</para></returns> /// <exception cref='System.ArgumentException'><paramref name='value'/> is not found in the Collection. </exception> public void Remove(HtmlAttribute val) { List.Remove(val); } public class HtmlAttributeEnumerator :IEnumerator { IEnumerator baseEnumerator; IEnumerable temp; public HtmlAttributeEnumerator(HtmlAttributeCollection mappings) { this.temp = ((IEnumerable)(mappings));
122
this.baseEnumerator = temp.GetEnumerator(); } public HtmlAttribute Current { get { return ((HtmlAttribute)(baseEnumerator.Current)); } } object IEnumerator.Current { get { return baseEnumerator.Current; } } public bool MoveNext() { return baseEnumerator.MoveNext(); } bool IEnumerator.MoveNext() { return baseEnumerator.MoveNext(); } public void Reset() { baseEnumerator.Reset(); } void IEnumerator.Reset() { baseEnumerator.Reset(); } } internal void Sort() { throw new NotImplementedException(); } } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%end class %%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%% Counter %%%%%%%%%%%%%%%% using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace stegeneticweb { /// <summary> /// Counter is just a high-resolution stopwatch for timing operations. /// Slightly modified code created by the legendary Eric Gunnerson of Microsoft. /// </summary> public class Counter { long elapsedCount = 0; long startCount = 0; long lastLapCount = 0;
123
public void Start() { startCount = 0; QueryPerformanceCounter(ref startCount); } public void Stop() { long stopCount = 0; QueryPerformanceCounter(ref stopCount); elapsedCount += (stopCount - startCount); } public void Clear() { elapsedCount = 0; } public double Seconds { get { long freq = 0; QueryPerformanceFrequency(ref freq); return ((double)elapsedCount / (double)freq); } } public override string ToString() { return String.Format("{0} seconds", Seconds); } public double Lap { get { long freq = 0; long elapsed = lastLapCount; QueryPerformanceFrequency(ref freq); QueryPerformanceCounter(ref lastLapCount); return ((double)(lastLapCount - elapsed) / (double)freq); } } public static long Frequency { get { long freq = 0; QueryPerformanceFrequency(ref freq); return freq; } } public static long Value { get { long count = 0; QueryPerformanceCounter(ref count); return count;
124
} } [System.Runtime.InteropServices.DllImport("KERNEL32")] private static extern bool QueryPerformanceCounter(ref long lpPerformanceCount); [System.Runtime.InteropServices.DllImport("KERNEL32")] private static extern bool QueryPerformanceFrequency(ref long lpFrequency); } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%end class %%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%ICrossover %%%%%%%%%%%%%%%%%% using System; namespace stegeneticweb { /// <summary> /// Classes that implement this interface provide the logic to crossover (recombine) two target Genomes /// or Genome derived classes. /// </summary> public interface ICrossover { // HtmlAttribute Crossover(HtmlAttribute name, HtmlAttribute value); // double CrossoverProbability { get; set; } // double MutationProbability { get; set; } HtmlTag Crossover(HtmlTag AttributeName, HtmlTag AttributeValue,HtmlTag Space) ; double CrossoverProbability { get; set; } double MutationProbability { get; set; } HtmlAttribute Crossover(HtmlAttribute name, HtmlAttribute mate); } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%% end class %%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% HtmlAttributeCollection%%%%%%%%%%%%%%%%%%
#region Using directives using System; using System.Collections; using System.Text; using AForge.Genetic; using GeneticAlgorithms; #endregion namespace stegeneticweb { /// <summary> /// <para> /// A collection that stores <see cref='.HtmlAttribute'/> objects. /// </para> /// </summary> /// <seealso cref='.HtmlAttributeCollection'/>
125
[Serializable()] public class HtmlAttributeCollection : System.Collections.CollectionBase { /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlAttributeCollection'/>. /// </para> /// </summary> public HtmlAttributeCollection() { } /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlAttributeCollection'/> based on another <see cref='.HtmlAttributeCollection'/>. /// </para> /// </summary> /// <param name='value'> /// A <see cref='.HtmlAttributeCollection'/> from which the contents are copied /// </param> public HtmlAttributeCollection(HtmlAttributeCollection val) { this.AddRange(val); } /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlAttributeCollection'/> containing any array of <see cref='.HtmlAttribute'/> objects. /// </para> /// </summary> /// <param name='value'> /// A array of <see cref='.HtmlAttribute'/> objects with which to intialize the collection /// </param> public HtmlAttributeCollection(HtmlAttribute[] val) { this.AddRange(val); } /// <summary> /// <para>Represents the entry at the specified index of the <see cref='.HtmlAttribute'/>.</para> /// </summary> /// <param name='index'><para>The zero-based index of the entry to locate in the collection.</para></param> /// <value> /// <para> The entry at the specified index of the collection.</para> /// </value> /// <exception cref='System.ArgumentOutOfRangeException'><paramref name='index'/> is outside the valid range of indexes for the collection.</exception> public HtmlAttribute this[int index] { get { return ((HtmlAttribute)(List[index])); } set { List[index] = value; }
126
} /// <summary> /// <para>Adds a <see cref='.HtmlAttribute'/> with the specified value to the /// <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to add.</param> /// <returns> /// <para>The index at which the new element was inserted.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.AddRange'/> public int Add(HtmlAttribute val) { return List.Add(val); } /// <summary> /// <para>Copies the elements of an array to the end of the <see cref='.HtmlAttributeCollection'/>.</para> /// </summary> /// <param name='value'> /// An array of type <see cref='.HtmlAttribute'/> containing the objects to add to the collection. /// </param> /// <returns> /// <para>None.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.Add'/> public void AddRange(HtmlAttribute[] val) { for (int i = 0; i < val.Length; i++) { this.Add(val[i]); } } /// <summary> /// <para> /// Adds the contents of another <see cref='.HtmlAttributeCollection'/> to the end of the collection. /// </para> /// </summary> /// <param name='value'> /// A <see cref='.HtmlAttributeCollection'/> containing the objects to add to the collection. /// </param> /// <returns> /// <para>None.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.Add'/> public void AddRange(HtmlAttributeCollection val) { for (int i = 0; i < val.Count; i++) { this.Add(val[i]); } } /// <summary> /// <para>Gets a value indicating whether the /// <see cref='.HtmlAttributeCollection'/> contains the specified <see cref='.HtmlAttribute'/>.</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to locate.</param> /// <returns>
127
/// <para><see langword='true'/> if the <see cref='.HtmlAttribute'/> is contained in the collection; /// otherwise, <see langword='false'/>.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.IndexOf'/> public bool Contains(HtmlAttribute val) { return List.Contains(val); } /// <summary> /// <para>Copies the <see cref='.HtmlAttributeCollection'/> values to a one-dimensional <see cref='System.Array'/> instance at the /// specified index.</para> /// </summary> /// <param name='array'><para>The one-dimensional <see cref='System.Array'/> that is the destination of the values copied from <see cref='.HtmlAttributeCollection'/> .</para></param> /// <param name='index'>The index in <paramref name='array'/> where copying begins.</param> /// <returns> /// <para>None.</para> /// </returns> /// <exception cref='System.ArgumentException'><para><paramref name='array'/> is multidimensional.</para> <para>-or-</para> <para>The number of elements in the <see cref='.HtmlAttributeCollection'/> is greater than the available space between <paramref name='arrayIndex'/> and the end of <paramref name='array'/>.</para></exception> /// <exception cref='System.ArgumentNullException'><paramref name='array'/> is <see langword='null'/>. </exception> /// <exception cref='System.ArgumentOutOfRangeException'><paramref name='arrayIndex'/> is less than <paramref name='array'/>'s lowbound. </exception> /// <seealso cref='System.Array'/> public void CopyTo(HtmlAttribute[] array, int index) { List.CopyTo(array, index); } /// <summary> /// <para>Returns the index of a <see cref='.HtmlAttribute'/> in /// the <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to locate.</param> /// <returns> /// <para>The index of the <see cref='.HtmlAttribute'/> of <paramref name='value'/> in the /// <see cref='.HtmlAttributeCollection'/>, if found; otherwise, -1.</para> /// </returns> /// <seealso cref='.HtmlAttributeCollection.Contains'/> /// public int IndexOf(HtmlAttribute val) { return List.IndexOf(val); } /// <summary> /// <para>Inserts a <see cref='.HtmlAttribute'/> into the <see cref='.HtmlAttributeCollection'/> at the specified index.</para> /// </summary> /// <param name='index'>The zero-based index where <paramref name='value'/> should be inserted.</param>
128
/// <param name=' value'>The <see cref='.HtmlAttribute'/> to insert.</param> /// <returns><para>None.</para></returns> /// <seealso cref='.HtmlAttributeCollection.Add'/> public void Insert(int index, HtmlAttribute val) { List.Insert(index, val); } /// <summary> /// <para>Returns an enumerator that can iterate through /// the <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <returns><para>None.</para></returns> /// <seealso cref='System.Collections.IEnumerator'/> public new HtmlAttributeEnumerator GetEnumerator() { return new HtmlAttributeEnumerator(this); } /// <summary> /// <para> Removes a specific <see cref='.HtmlAttribute'/> from the /// <see cref='.HtmlAttributeCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlAttribute'/> to remove from the <see cref='.HtmlAttributeCollection'/> .</param> /// <returns><para>None.</para></returns> /// <exception cref='System.ArgumentException'><paramref name='value'/> is not found in the Collection. </exception> public void Remove(HtmlAttribute val) { List.Remove(val); } public class HtmlAttributeEnumerator :IEnumerator { IEnumerator baseEnumerator; IEnumerable temp; public HtmlAttributeEnumerator(HtmlAttributeCollection mappings) { this.temp = ((IEnumerable)(mappings)); this.baseEnumerator = temp.GetEnumerator(); } public HtmlAttribute Current { get { return ((HtmlAttribute)(baseEnumerator.Current)); } } object IEnumerator.Current { get { return baseEnumerator.Current; } } public bool MoveNext() { return baseEnumerator.MoveNext(); } bool IEnumerator.MoveNext() { return baseEnumerator.MoveNext(); } public void Reset() { baseEnumerator.Reset();
129
} void IEnumerator.Reset() { baseEnumerator.Reset(); } } internal void Sort() { throw new NotImplementedException(); } } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%end class %%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%HtmlTag %%%%%%%%%%%%%%%%%% #region Using directives using System; using System.Collections.Specialized; using System.Text; using AForge.Genetic; using GeneticAlgorithms; #endregion namespace stegeneticweb { public class HtmlTag:IComparable { protected GeneticAlgorithm Parent; protected HtmlTag(GeneticAlgorithm parent) { Parent = parent; } private enum PostionInTag { AttributeName, AttributeValue, Space } public int beginPosition; public int endPosition; private String name; public int BeginPosition { get { return beginPosition; } set { beginPosition = value; } } public int EndPosition { get { return endPosition; } set { endPosition = value; } } public String Name { get { return name; }
130
} private HtmlAttributeCollection ICrossover; public HtmlAttributeCollection Attributes { get { return ICrossover; } } public HtmlTag(String text, int beginPosition, int endPosition) { this.beginPosition = beginPosition; this.endPosition = endPosition; this.ICrossover = new HtmlAttributeCollection(); //separate tag name and attributes int index = text.IndexOf(' '); if (index < 0) { //this is a tag without any attributes name = text.Substring(1, text.Length - 1); } else { name = text.Substring(1, index - 1); } if (index > 0) { text = text.Substring(index); //find and list all attributes in this tag PostionInTag status = PostionInTag.Space; int startIndex = 0; String attributeName; String attributeValue; char attributeValueQuotation = '\''; HtmlAttribute attribute = null; for (int n = 1; n < text.Length; n++) { if ((status == PostionInTag.Space) && ((text[n] == '\'') || (text[n] == '\"'))) { //begin value startIndex = n; attributeValueQuotation = text[n]; status = PostionInTag.AttributeValue; } else if ((status == PostionInTag.AttributeValue) && (text[n] == attributeValueQuotation)) { //end value if (attribute != null) { attributeValue = text.Substring(startIndex, n + 1 - startIndex); attribute.Value = attributeValue; attribute = null; } status = PostionInTag.Space; } else if ((status == PostionInTag.Space) && (text[n] != ' ')) { //begin attribute status = PostionInTag.AttributeName; startIndex = n; }
131
else if ((status == PostionInTag.AttributeName) && ((text[n] == '=') || Char.IsWhiteSpace(text[n]) || (n == text.Length-1))) { //end name if (n == text.Length - 1) { //Correct string cursor position. //This is the last character of the tag. //The last attribute does not have a value. n++; } attributeName = text.Substring(startIndex, n - startIndex); attribute = new HtmlAttribute(attributeName); ICrossover.Add(attribute); status = PostionInTag.Space; } else if ((status != PostionInTag.AttributeValue) && (text[n] == ' ')) { status = PostionInTag.Space; } } } } private double _fitness = double.MinValue; double x=0; double y=0 ; // double n =0; private double MaxValue = double.MaxValue; public double Fitness { get { //double Fitnes = Math.Pow(15 * x * y * (1 - x) * (1 - y) * Math.Sin(n * Math.PI * x) * Math.Sin(n * Math.PI * y), 2); double Fitness = (((MaxValue) / (y) - (x + 8))); return _fitness; } set { _fitness = value; } } public int CompareTo(object obj) { { HtmlAttribute compared = (HtmlAttribute)obj; if (this.Fitness < compared.Fitness) return -1; else if (this.Fitness > compared.Fitness) return 1; else return 0; } }
132
} } %%%%%%%%%%%%%%end class %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%% HtmlTagCollection%%%%%%%%%%%%%%%%%% using System; using System.Collections; using AForge.Genetic; using GeneticAlgorithms; namespace stegeneticweb { /// <summary> /// <para> /// A collection that stores <see cref='.HtmlTag'/> objects. /// </para> /// </summary> /// <seealso cref='.HtmlTagCollection'/> [Serializable()] public class HtmlTagCollection : System.Collections.CollectionBase { /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlTagCollection'/>. /// </para> /// </summary> public HtmlTagCollection() { } /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlTagCollection'/> based on another <see cref='.HtmlTagCollection'/>. /// </para> /// </summary> /// <param name='value'> /// A <see cref='.HtmlTagCollection'/> from which the contents are copied /// </param> public HtmlTagCollection(HtmlTagCollection val) { this.AddRange(val); } /// <summary> /// <para> /// Initializes a new instance of <see cref='.HtmlTagCollection'/> containing any array of <see cref='.HtmlTag'/> objects. /// </para> /// </summary> /// <param name='value'> /// A array of <see cref='.HtmlTag'/> objects with which to intialize the collection /// </param> public HtmlTagCollection(HtmlTag[] val) { this.AddRange(val); } /// <summary> /// <para>Represents the entry at the specified index of the <see cref='.HtmlTag'/>.</para>
133
/// </summary> /// <param name='index'><para>The zero-based index of the entry to locate in the collection.</para></param> /// <value> /// <para> The entry at the specified index of the collection.</para> /// </value> /// <exception cref='System.ArgumentOutOfRangeException'><paramref name='index'/> is outside the valid range of indexes for the collection.</exception> public HtmlTag this[int index] { get { return ((HtmlTag)(List[index])); } set { List[index] = value; } } /// <summary> /// <para>Adds a <see cref='.HtmlTag'/> with the specified value to the /// <see cref='.HtmlTagCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlTag'/> to add.</param> /// <returns> /// <para>The index at which the new element was inserted.</para> /// </returns> /// <seealso cref='.HtmlTagCollection.AddRange'/> public int Add(HtmlTag val) { return List.Add(val); } /// <summary> /// <para>Copies the elements of an array to the end of the <see cref='.HtmlTagCollection'/>.</para> /// </summary> /// <param name='value'> /// An array of type <see cref='.HtmlTag'/> containing the objects to add to the collection. /// </param> /// <returns> /// <para>None.</para> /// </returns> /// <seealso cref='.HtmlTagCollection.Add'/> public void AddRange(HtmlTag[] val) { for (int i = 0; i < val.Length; i++) { this.Add(val[i]); } } /// <summary> /// <para> /// Adds the contents of another <see cref='.HtmlTagCollection'/> to the end of the collection. /// </para> /// </summary> /// <param name='value'> /// A <see cref='.HtmlTagCollection'/> containing the objects to add to the collection. /// </param> /// <returns> /// <para>None.</para>
134
/// </returns> /// <seealso cref='.HtmlTagCollection.Add'/> public void AddRange(HtmlTagCollection val) { for (int i = 0; i < val.Count; i++) { this.Add(val[i]); } } /// <summary> /// <para>Gets a value indicating whether the /// <see cref='.HtmlTagCollection'/> contains the specified <see cref='.HtmlTag'/>.</para> /// </summary> /// <param name='value'>The <see cref='.HtmlTag'/> to locate.</param> /// <returns> /// <para><see langword='true'/> if the <see cref='.HtmlTag'/> is contained in the collection; /// otherwise, <see langword='false'/>.</para> /// </returns> /// <seealso cref='.HtmlTagCollection.IndexOf'/> public bool Contains(HtmlTag val) { return List.Contains(val); } /// <summary> /// <para>Copies the <see cref='.HtmlTagCollection'/> values to a one-dimensional <see cref='System.Array'/> instance at the /// specified index.</para> /// </summary> /// <param name='array'><para>The one-dimensional <see cref='System.Array'/> that is the destination of the values copied from <see cref='.HtmlTagCollection'/> .</para></param> /// <param name='index'>The index in <paramref name='array'/> where copying begins.</param> /// <returns> /// <para>None.</para> /// </returns> /// <exception cref='System.ArgumentException'><para><paramref name='array'/> is multidimensional.</para> <para>-or-</para> <para>The number of elements in the <see cref='.HtmlTagCollection'/> is greater than the available space between <paramref name='arrayIndex'/> and the end of <paramref name='array'/>.</para></exception> /// <exception cref='System.ArgumentNullException'><paramref name='array'/> is <see langword='null'/>. </exception> /// <exception cref='System.ArgumentOutOfRangeException'><paramref name='arrayIndex'/> is less than <paramref name='array'/>'s lowbound. </exception> /// <seealso cref='System.Array'/> public void CopyTo(HtmlTag[] array, int index) { List.CopyTo(array, index); } /// <summary> /// <para>Returns the index of a <see cref='.HtmlTag'/> in /// the <see cref='.HtmlTagCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlTag'/> to locate.</param> /// <returns> /// <para>The index of the <see cref='.HtmlTag'/> of <paramref name='value'/> in the /// <see cref='.HtmlTagCollection'/>, if found; otherwise, -1.</para> /// </returns>
135
/// <seealso cref='.HtmlTagCollection.Contains'/> public int IndexOf(HtmlTag val) { return List.IndexOf(val); } /// <summary> /// <para>Inserts a <see cref='.HtmlTag'/> into the <see cref='.HtmlTagCollection'/> at the specified index.</para> /// </summary> /// <param name='index'>The zero-based index where <paramref name='value'/> should be inserted.</param> /// <param name=' value'>The <see cref='.HtmlTag'/> to insert.</param> /// <returns><para>None.</para></returns> /// <seealso cref='.HtmlTagCollection.Add'/> public void Insert(int index, HtmlTag val) { List.Insert(index, val); } /// <summary> /// <para>Returns an enumerator that can iterate through /// the <see cref='.HtmlTagCollection'/> .</para> /// </summary> /// <returns><para>None.</para></returns> /// <seealso cref='System.Collections.IEnumerator'/> public new HtmlTagEnumerator GetEnumerator() { return new HtmlTagEnumerator(this); } /// <summary> /// <para> Removes a specific <see cref='.HtmlTag'/> from the /// <see cref='.HtmlTagCollection'/> .</para> /// </summary> /// <param name='value'>The <see cref='.HtmlTag'/> to remove from the <see cref='.HtmlTagCollection'/> .</param> /// <returns><para>None.</para></returns> /// <exception cref='System.ArgumentException'><paramref name='value'/> is not found in the Collection. </exception> public void Remove(HtmlTag val) { List.Remove(val); } public class HtmlTagEnumerator : IEnumerator { IEnumerator baseEnumerator; IEnumerable temp; public HtmlTagEnumerator(HtmlTagCollection mappings) { this.temp = ((IEnumerable)(mappings)); this.baseEnumerator = temp.GetEnumerator(); } public HtmlTag Current { get { return ((HtmlTag)(baseEnumerator.Current)); } } object IEnumerator.Current { get { return baseEnumerator.Current; } }
136
public bool MoveNext() { return baseEnumerator.MoveNext(); } bool IEnumerator.MoveNext() { return baseEnumerator.MoveNext(); } public void Reset() { baseEnumerator.Reset(); } void IEnumerator.Reset() { baseEnumerator.Reset(); } } } }
%%%%%%%%%%%%%%%%%%%%%%%%%%%% end class %%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% create attrbute%%%%%%%%%%%%%%%%%%
<?xml version="1.0" standalone="yes"?> <NewDataSet> <keyCombinations> <firstAttribute>width</firstAttribute> <secondAttribute>height</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>src</firstAttribute> <secondAttribute>alt</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>style</firstAttribute> <secondAttribute>class</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>cellspacing</firstAttribute> <secondAttribute>cellpadding</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>background</firstAttribute> <secondAttribute>valign</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>face</firstAttribute> <secondAttribute>size</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>name</firstAttribute> <secondAttribute>content</secondAttribute> </keyCombinations> <keyCombinations> <firstAttribute>colspan</firstAttribute> <secondAttribute>bgcolor</secondAttribute> </keyCombinations> </NewDataSet> %%%%%%%%%%%%%%%%%%%%%%%%%%%% end class %%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%% create database%%%%%%%%%%%%%%%%%%
<?xml version="1.0" standalone="yes"?> <file>
137
<file> <id>1</id> <name> fatima abdalla</name> <pasword>ffi1</pasword> </file> <file> <id> 2</id> <name> arig amir</name> <pasword>arig2arig</pasword> </file> <file> <id>4</id> <name>mahammad ali</name> <pasword>mahammed</pasword> </file> <file> <id>5</id> <name>abdalla omar</name> <pasword>abdabd2013</pasword> </file> <file> <id>6</id> <name>nora bshra</name> <pasword>nononono</pasword> </file> <file> <id>41</id> <name>ali</name> <pasword>asdf</pasword> </file> <file> <id>12</id> <name>amira ahmad</name> <pasword>amamam</pasword> </file> </file>
%%%%%%%%%%%%%%%%%%%%%%%%%%%% end create attrbute%%%%%%%%%%%%%%%%%%