+ All Categories
Home > Documents > An Introduction Steganography with A Case Study of Steganalysis Arunabha (Arun) Sen, Huan Liu...

An Introduction Steganography with A Case Study of Steganalysis Arunabha (Arun) Sen, Huan Liu...

Date post: 18-Dec-2015
Category:
Upload: lindsey-bishop
View: 213 times
Download: 0 times
Share this document with a friend
43
Steganography with A Case Study of Steganalysis Arunabha (Arun) Sen, Huan Liu Department of Computer Science and Engineering Arizona State University Tempe,Az 85287 E-mail:[email protected], [email protected] Joint Work with Yanming Di and Avinash Ramineni
Transcript

An Introduction Steganography with A Case Study of Steganalysis

Arunabha (Arun) Sen, Huan LiuDepartment of Computer Science and Engineering

Arizona State UniversityTempe,Az 85287

E-mail:[email protected], [email protected]

Joint Work with Yanming Di and Avinash Ramineni

Secure Communication

Two parties, Alice and Bob, can exchange information over an insecure medium in such a way that even if an intruder (Willie) is able to intercept, read and perform computation on the intercepted information, Willie will not be able to decipher the content of the exchanged information.

Sometimes encryption may not be enough

• Prisoners Problem

• Alice and Bob are in jail and wish to hatch an escape plan . All their communications pass through the warden,Willie, and if Willie detects any encrypted messages, he can simply stop the communication.

• So they must find some way of hiding their secret message in an innocuous looking text.

Steganography

•Steganography is the art of hiding information in ways that prevent the detection of hidden messages.

•Steganography in Greek means “covered writing”

•Steganography and cryptography are cousins in the spy craft family

•While the goal of the cryptography system is to conceal the content of the messages, the goal of information hiding or steganography is to conceal their existence

Steganography•What to hide

• Texts

• Images

• Sound

•How to hide

•embed text in images/sound files

•embed image in image/sound files

•embed sound in image/sound files

Sometimes distinction between Steganography and Cryptography is blurry

Intruder Active Intruder

Passive Intruder Alters messages

just Listens

Plain Text Plain Text

Encryption Key K cipher text ,c=Ek(p) Decryption Key

Encryption method Decryption method

Steganographic System

ComparisonCryptography

C = Ek (P) Plain text

P = Dk (C)

Key

Steganography

secret message

cover image

f Stego message

f Cipher text

Encryption Example

Plain Text

Pleasetransferonemilliondollarstomyswissbankaccountsixtwotwo

Cipher Text

ASSASASAAASASASAAAAFDSGGSFSSQWEDSCVBNMDKL

Why does cipher text have to look gibberish?

Why cant it look like Mydaughtersbirthdayisseptemberthirdnineteensixtytwo

If cipher text looks like above, is it cryptography or steganography?

Real Example

During WW2 the following cipher message was actually sent by a German spy

Apparently neutral’s protest is thoroughly discounted and ignored. Isman hard hit. Blockade issue affects pretext for embargo on by-products, ejecting suets and vegetable oils

Hidden Message

Pershing sails from NY June 1

(Can be obtained by extracting the second letter in each word of the message sent)

Information Hiding in Images•Digital images are stored in 24-bit or 8-bit files

•All color variations are derived from three primary colors red, green and blue

•Each primary color is represented by 1 byte; 24 bit images are 3 bytes per pixel to represent a color value FFFFFF ( 100% Red + 100% Green + 100% Blue)

•A 1024 x 768 pixel image with 24-bit/pixel will have a file size exceeding 2 Mbytes

•In 8-bit color images such as GIF files, each pixel is represented by a single byte and each pixel merely points to color index table (palette ) with 256 possible colors

Digital Watermarking Watermarking is used primarily for identification and

entails embedding a unique piece of information within a medium without noticeably altering the medium

The difference between Steganography and Watermarking is primarily one of intent. Steganography conceals information; Watermarks extend information and become an attribute of the cover image

Publishing and broadcasting industries are interested in techniques for hiding encrypted copyright marks and serial numbers in digital films, audio recordings, books and multimedia products.

Steganographic Techniques

•Genome Steganography: Encoding a hidden message in a strand of human DNA

•Hiding in Text: Information can be hidden in the documents by manipulating the positions of lines and words, hiding the data in html files

•Hiding in the disk space:Hiding the data in unused or reserved space.

•Hiding in network packets:Packets that are transmitted through the internet.

Steganographic Techniques

•Hiding the data in software and circuitry:Data can be hidden in the layout of the code distributed in a program or the layout of electronic circuits on a board.

•Information Hiding in Images:Ranges from least significant bit insertion to masking and filtering to applying more sophisticated image processing algorithms

•LSB insertion: A simple approach for embedding information in a cover image. Encodes the message in each and every LSB of every pixel of an image

Some software tools for steganography•S-Tools: It includes programs that process GIF and BMP images, process audio files and will even hide information in the unused areas of the floppy diskettes

•StegoDos: StegoDos also known as the Black Wolfs Picture Encoder version 0.90a.It works only for 320* 200 images with 256 colors

•Camouflage: is a steganographic tool that allows hiding files by scrambling them and then attaching them to the file of your choice

•Mp3 Stego: MP3Stego is a steganographic tool that will hide information in MP3 files during the compression process.

Information Hiding in Images•Least Significant bit Insertion

•Masking and Filtering

•Algorithms and Transformations

Least Significant bit insertion

• A 1024x768 pixel image with 24 bits per pixel can hide 1024x768x3= 2359296 bits=294,912 bytes of information

On average LSB requires that only half the bits in an image be changed

Algorithm and TransformationJpeg-Jsteg => Steganography tool that creates a stego-image from the input of a message to be hidden and a loss less cover image.

The software combines the message and cover image , using the Jpeg algorithm it creates a lossy JPEG stego-image

JPEG images use the Discrete Cosine Transforms to do the compression

Discrete Cosine Transformation

•Two dimensional DCT is applied on blocks of 8x8 pixels

•Transforms 8x8 pixel blocks into 64 DCT coefficients•Modifying one coefficient affects all 64 image pixels

•DCT based image compression relies on two techniques to represent the images

•Quantization•Entropy Coding

•Least-significant bits of quantized DCT coefficients are used as redundant data

Discrete Cosine Transformation

Two Dimensional DCT

One Dimensional DCT

TCP Header

Hiding Data in TCP/IP Header

Place to hide secret message

•Reserved bits

•Sequence number field

Initial Sequence Number (ISN) is a randomly generated number ISN = M + F (localhost, localport, remotehost, remoteport)

Information Hiding Experiments in TCP Header

•Take each character of the message to be hidden (8 bit ASCII)

•Scale it to a 32-bit number by multiplying with an appropriate constant.

•Use the scaled number as the Initial Sequence Number

•How good is this Information Hiding technique ?

•Perform Entropy Test

Information Theory•What is information and how do you measure it?

•The crux of Information Theory is measure of information

•Consider the following messages The Sun will rise There will be scattered rainstorms There will be a tornado

•The less likely the message the more information it conveys

Information Theory

• If xi denotes an arbitrary message and P(xi) = Pi is the probability of the event that xi is selected for transmission, then the amount information associated with xi should be some function of Pi

• Shannon defined information measure by the logarithmic function Ii = logb (1/Pi) .

• The quantity Ii is called the self information of message xi

EntropyConsider an information source that emits a sequence of symbols selected from an alphabet of M different symbols let X denote the entire set of symbols x1, x2,…,xM . We can treat each symbol xi as a message that occurs with probability Pi and conveys self information Ii.The set of symbol probabilities must satisfy =1

The amount of information produced by the source during an arbitrary symbol interval is a discrete random variable having possible values I1,I2,….,IM.The expected information per symbol is then given by the statistical average

H(X)= which is called the source entropy.

M

i

iP1

M

i

iiIP1

Experimental Results The input to the program is a text file

  Message size 11 26 340 1353 9477 39675Entropy of text 3.45 3.6 4.91 4.84 4.93 4.55

Entropy of Random sequence number

3.45 4.7 8.40 10.4 13.21 15.27

Attacks on Steganographic systems Statistical Attacks

– Statistical tests can reveal if an image holds steganographic content

• Chi-Square Attack

• Entropy Test Visual Attacks

– The idea of visual attacks is to remove all parts of the image covering the message

– The human eye can now distinguish whether there is a potential message or still image content.

Steganography and Communication Theory

Steganography can be formalized by communication theory

Parameters of information hiding, such as number of bits that can be hidden, invisibility of the message and its resistance to removal can be related to the characteristics of communication system , such as capacity, signal to noise ratio and jamming margin

Steganography and Communication Theory

The notion of capacity and data hiding indicates the maximum number of bits hidden and successfully recovered by the stegosystem

The S/N ratio serves as a measure of invisibility or detectability – – Information bearing signal (message to be concealed)

– Noise (Cover image)

Steganography and Communication Theory

High S/N ratio is desired in a typical communication system

In a steganographic system, a very low S/N ratio corresponds to lower perceptibility and possibility of greater success in concealing the embedded message

The measure of jamming resistance can be used to describe a level of resistance to removal or destruction of the embedded message

A Case Study: LSB based steganography in JPEG images•JPEG images use the discrete cosine transforms to do the compression•Two dimensional DCT is applied on blocks of 8x8 pixels•Transforms 8x8 pixel blocks into 64 DCT coefficients•Modifying one coefficient affects all 64 image pixels•DCT based image compression relies on two techniques to represent the images

•Quantization•Entropy Coding

•Least-significant bits of quantized DCT coefficients are used for hiding the message bits

Steganalysis of JPEG based LSB steganographyModifying the LSB bits changes the statistical properties of the image

Pairs of Values: when embedding message bits into the LSB’s of quantized DCT coefficients, the frequency counts of the DCT coefficients change in pair E.g., (4, 5) (6, 7) …

(4,00000100)(5,00000101) the frequency counts only change 45 or 5 4

In the stego and cover image the sum of frequency counts between 4 and 5 remain the same

Turn into a Classification Problem• Objective of steganalysis is to distinguish the normal images from stego images

• Classification problem: Classify images into two separate classes: stego and normal

• Classification is a supervised learning technique

•Use a set of images with hidden data in them as training data

•Use classification algorithms to construct classifiers.

•When a classification algorithm is run on data set (Stego and cover images), it needs to find some decision boundary between the two classes and create a model.

•The model so generated can be used to predict the class to which each of the images belong to, given the test data.

• We evaluate three classification algorithms

• C4.5: A decision tree induction method

• Logistic Regression

• Neural Networks

Experiments•Used 180 JPEG images of size 768 x 512

•Created stego images with different amounts of data hidden in them

•Created image sets containing 5000,3000,2000,1000, 500 bytes of hidden messages

•Tools used for hiding are : JSTEG , JPHIDE , F5

•The secret message hidden was taken from Gutenberg’s E-text of Shakespeare’s First Folio

•Measure the amount of data hidden in the images by a unit bits/pixel

•Use 10 fold cross validation to test the three methods

Some common steganographic methods JSTEG

•Steganographic program by Derek Upham

•It can be viewed as the prototype of all LSB based methods

•Hides the data in the JPEG images by replacing the LSB’s of the quantized DCT

•Does not use encryption or random bit selection

•It sequentially modifies all quantized dct coefficients having values other than 0,1

Results for JSTEGPlots and tables

JPHide

• Steganographic program by Allan Latham

•It uses random bit selection—message bits are hidden in a randomly selected LSB’s

•The selection of random bits is controlled by a key

•Also encrypts the message before embedding it

•It modifies the DCT coefficients –1,0,1 in a special manner

Results for JPHIDE

F5•Program by Andreas Westfeld

•Observed that replacing the LSB’s of the DCT coefficients is vulnerable to statistical attack

•Proposed a new method of hiding by decrementing the absolute value of the quantized DCT coefficients

•Tries to minimize the number of bits that are modified by allowing high capacity

•Uses a matrix encoding for minimizing the number bits that are modified

Results for F5

Summary

Conclusion and Future work

•The present methods for steganalysis are method specific

•Our method is general and can be easily extended to other LSB based steganographic methods

•Existing LSB steganographic methods are easy to detect if the amount of information hidden is not too small

•Identifying the maximum capacity of information that can be hidden in an image using a particular steganographic tool has to be modeled


Recommended