+ All Categories
Home > Documents > CS230 Poster Final correctedcs230.stanford.edu/projects_fall_2019/posters/26249539.pdf ·...

CS230 Poster Final correctedcs230.stanford.edu/projects_fall_2019/posters/26249539.pdf ·...

Date post: 24-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Image Level Forgery Identification and Pixel Level Forgery Localization via a Convolutional Neural Network Haitao D. Deng 1 & Yitao Qiu 2 1 Department of Materials Science and Engineering 2 Department of Civil and Environmental Engineering Introduction The ease of manipulation of digital data through editing/cropping tools such as photoshop and photoeditor etc has often negatively impacted the infor- mation credibility. While several successful cases for forgery detection [3, 6] have been demonstrated [5, 7], progress for a generic detection technique development has been stagnant due to two main reasons. One, there are vari- ous fundamentally different forgery types. Two, it’s difficult to pin point the location of forged regions. [3, 5, 6, 7, 9]. Category Features/Models #Parameters Forgery Types ML [4] CFA <10 S, CM, R ML [11] ELA <10 S, CM, R ML [8] NOI <10 S, CM, R DL [1] Bayar 20M S DL [2] SRM 50k S,CM,R DL [10] Artificial 7M S,CM, R, E S:splicing, CM: copy-move, R: removal, E: enhancement Here, we report on a light-weight ( 5k and 700k) parameters) VGG-derived convolutional neural network architecture that allows for image level forgery detection with an accuracy of 91% and AUC of 85% on test data and for 93% and AUC of 79% pixel level forgery detection. Dataset and Features Dataset forged images forged pixels Train 1 31.09% 9.10% Dev 1 29.20%% 8.42% Test 1 30.20% 8.02% Train 2 98.39% 16.13% Dev 2 97.13% 16.85% All of the data was obtained from The Image Manipulation Dataset 1 and the COCO Dataset 2 . A total of 10000 images (224 pixels 224 pixels) were obtained, and split into train set, development set, and test set as 8:1:1. In this work, forgery types include a) copy-move, b) locally enhance, c) splicing. Image Level Forgery Identification Following VGG network architecture, using a far to near approach, we built our network as below. For training, images were into 56x56 smaller patches to augment training sample number. We augmented training data by manually enhanc- ing local pixel colors via multiplying a random coefficient to the local pixel values (a.k.a. en- hancement) in non-private regions (Train 1 to Train 2) to increase forged image sample and pixel ratio and achieved better performance on the same test data. This suggests that our clas- sification network was able to capture the com- mon features shared between different manipulation types. By adjusting the hyperparame- ters, including the learning rate and the optimizer, we found that adam with the initial learn- ing rate of 0.001 gave the high- est training accuracy and lowest loss. Pixel Level Image Forgery Localization In the image level forgery identification network, the pooling layers condense the spatial information down to fewer pixels for final classification; to achieve pixel wise prediction, the pooling layers were discarded, producing a pixel- wise feature extractor. The feature extractor is then fed into a local anomaly detection network proposed by Wu’s [11] paper. Our full model achieves better performance than the untrained model from [10] and similar performance to the partially trained one. To avoid the po- tential problem of vanishing gradients and to expedite the training process, we’ve also modified the network in C by adding shortcut paths every two convolution layers in the intermediate blocks (Model D) and have found sim- ilar performance within less training time. Model Trained Parameters Accuracy AUC ManTraNet 0 (7 M in total) 0.93 0.67 LADN trained ManTraNet 0.2 M (7M in total) 0.91 0.798 Our Model 0.27 M (0.27 M in total) 0.92 0.794 Our model with ResNet 0.27 M (0.27 M in total) 0.93 0.78 Our full model is demonstrated as below, source image from [10]: Conclusion and Future Work Here we deliver a light-weight network architecture that achieves high perfor- mance in both image level identification and pixel level localization. Future effort can be focused on condensing the LADN network, as well as incor- porating more features by using filtering kernels generating features such as CFA, ElA to improve network performance. Reference [1] Bayar & Stamm, ACM. 2016, pp. 5–10; [2] Bayar & Stamm, IEEE Trans. Inf. Forensics Secur, 13.11 (2018), pp. 2691–2706; [3] Birajdar & Mankar, Digit. Invest. 10.3 (2013), pp. 226–245; [4] Ferrara et al., IEEE Trans. Inf. Forensics Secur, 7.5 (2012), pp. 1566–1577; [5] Hill & Rager, “Image Forgery Detection”; [6] Hsu & Chang, ICME, Toronto, Canada, 2006; [7] Huh et al., ECCV, 2018, pp. 101–117; [8] Mahdian & Saic, Image. Vision. Comput., 27.10 (2009), pp. 1497–1503; [9] Rao & Ni, WIFS, IEEE. 2016, pp. 1–6; [10] Simonyan & Zisserman, arXiv:1409.1556 (2014); [11] Wu et al., CVPR 2019,pp. 9543–9552; [12] Zhou et al., CVPR, 2018, pp. 1053–1061. 1 https://www5.cs.fau.de/research/data/image-manipulation/ 2 http://cocodataset.org/#home
Transcript
Page 1: CS230 Poster Final correctedcs230.stanford.edu/projects_fall_2019/posters/26249539.pdf · Localization ork Deng 1 Qiu 2 1 Engineering 2 Engineering oduction such-[3, technique ari-the

Im

age

Level

Forgery

Id

en

tificatio

nan

dP

ixel

Level

Forgery

Localiz

atio

n

via

aC

on

volu

tion

al

Neu

ral

Netw

ork

Haita

oD

.D

en

g1

&Y

itao

Qiu

2

1D

epartm

ent

of

Mate

rials

Scie

nce

and

Engin

eerin

g2D

epartm

ent

of

Civ

iland

Enviro

nm

enta

lE

ngin

eerin

g

In

tro

du

ctio

n

The

ease

of

manip

ula

tion

of

dig

ital

data

thro

ugh

editin

g/c

roppin

gto

ols

such

as

photo

shop

and

photo

edito

retc

has

ofte

nnegativ

ely

impacte

dth

ein

for-

matio

ncre

dib

ility.

While

severa

lsuccessfu

lcases

for

forg

ery

dete

ctio

n[3

,

6]

have

been

dem

onstra

ted

[5,

7],

pro

gre

ss

for

ageneric

dete

ctio

nte

chniq

ue

develo

pm

ent

has

been

sta

gnant

due

totw

om

ain

reasons.

One,th

ere

are

vari-

ous

fundam

enta

llydiffe

rent

forg

ery

types.

Tw

o,

it’sdiffi

cult

topin

poin

tth

e

locatio

nof

forg

ed

regio

ns.

[3,5,6,7,9].

Cate

gory

Featu

res/M

odels

#P

ara

mete

rsF

org

ery

Types

ML

[4]

CFA

<10

S,C

M,R

ML

[11]

EL

A<

10

S,C

M,R

ML

[8]

NO

I<

10

S,C

M,R

DL

[1]

Bayar

⇠20M

S

DL

[2]

SR

M⇠

50k

S,C

M,R

DL

[10]

Artifi

cia

l7M

S,C

M,R

,E

S:s

plic

ing

,C

M:

co

py

-move,

R:

rem

oval,

E:

en

han

cem

en

t

Here

,w

ere

port

on

alig

ht-w

eig

ht

(5k

and

700k)

para

mete

rs)

VG

G-d

eriv

ed

convolu

tional

neura

lnetw

ork

arc

hite

ctu

reth

at

allo

ws

for

image

level

forg

ery

dete

ctio

nw

ithan

accura

cy

of

91%

and

AU

Cof

85%

on

test

data

and

for

93%

and

AU

Cof

79%

pix

el

level

forg

ery

dete

ctio

n.

Da

taset

an

dF

ea

tures

Data

set

forg

ed

images

forg

ed

pix

els

Tra

in1

31.0

9%

9.1

0%

Dev

129.2

0%

%8.4

2%

Test

130.2

0%

8.0

2%

Tra

in2

98.3

9%

16.1

3%

Dev

297.1

3%

16.8

5%

All

of

the

data

was

obta

ined

from

The

Image

Manip

ula

tion

Data

set

1

and

the

CO

CO

Data

set

2.

Ato

tal

of

10000

images

(224

pix

els⇥

224

pix

els

)w

ere

obta

ined,and

split

into

train

set,

develo

pm

ent

set,

and

test

set

as

8:1

:1.

Inth

isw

ork

,fo

rgery

types

inclu

de

a)

copy-m

ove,b)

locally

enhance,c)

splic

ing.

Im

ag

eL

evel

Fo

rg

ery

Id

en

tifica

tion

Follo

win

gV

GG

netw

ork

arc

hite

ctu

re,

usin

ga

far

tonear

appro

ach,

we

built

our

netw

ork

as

belo

w.

For

train

ing,

images

were

into

56x56

sm

alle

rpatc

hes

toaugm

ent

train

ing

sam

ple

num

ber.

We

augm

ente

dtra

inin

gdata

by

manually

enhanc-

ing

local

pix

el

colo

rsvia

multip

lyin

ga

random

coeffi

cie

nt

toth

elo

cal

pix

el

valu

es

(a.k

.a.

en-

hancem

ent)

innon-p

rivate

regio

ns

(Tra

in1

to

Tra

in2)

toin

cre

ase

forg

ed

image

sam

ple

and

pix

el

ratio

and

achie

ved

bette

rperfo

rmance

on

the

sam

ete

st

data

.T

his

suggests

that

our

cla

s-

sifi

catio

nnetw

ork

was

able

tocaptu

reth

ecom

-

mon

featu

res

share

dbetw

een

diffe

rent

manip

ula

tion

types.

By

adju

stin

gth

ehyperp

ara

me-

ters

,in

clu

din

gth

ele

arn

ing

rate

and

the

optim

izer,

we

found

that

adam

with

the

initia

lle

arn

-

ing

rate

of

0.0

01

gave

the

hig

h-

est

train

ing

accura

cy

and

low

est

loss.

Pix

el

Level

Im

ag

eF

org

ery

Lo

ca

liza

tion

Inth

eim

age

levelfo

rgery

identifi

catio

nnetw

ork

,th

epoolin

gla

yers

condense

the

spatia

lin

form

atio

ndow

nto

few

er

pix

els

for

finalcla

ssifi

catio

n;to

achie

ve

pix

el

wis

epre

dic

tion,

the

poolin

gla

yers

were

dis

card

ed,

pro

ducin

ga

pix

el-

wis

efe

atu

reextra

cto

r.T

he

featu

reextra

cto

ris

then

fed

into

alo

cal

anom

aly

dete

ctio

nnetw

ork

pro

posed

by

Wu’s

[11]

paper.

Our

full

model

achie

ves

bette

rperfo

rmance

than

the

untra

ined

model

from

[10]

and

sim

ilar

perfo

rmance

toth

epartia

llytra

ined

one.

To

avoid

the

po-

tentia

lpro

ble

mof

vanis

hin

ggra

die

nts

and

toexpedite

the

train

ing

pro

cess,

we’v

eals

om

odifi

ed

the

netw

ork

inC

by

addin

gshortc

ut

path

severy

two

convolu

tion

layers

inth

ein

term

edia

teblo

cks

(Model

D)

and

have

found

sim

-

ilar

perfo

rmance

with

inle

ss

train

ing

time.

Model

Tra

ined

Para

mete

rsA

ccura

cy

AU

C

ManT

raN

et

0(7

Min

tota

l)0.9

30.6

7

LA

DN

train

ed

ManT

raN

et

0.2

M(7

Min

tota

l)0.9

10.7

98

Our

Model

0.2

7M

(0.2

7M

into

tal)

0.9

20.7

94

Our

model

with

ResN

et

0.2

7M

(0.2

7M

into

tal)

0.9

30.7

8

Our

full

model

isdem

onstra

ted

as

belo

w,sourc

eim

age

from

[10]:

Co

nclu

sio

na

nd

Fu

ture

Wo

rk

Here

we

deliv

er

alig

ht-w

eig

htnetw

ork

arc

hite

ctu

reth

atachie

ves

hig

hperfo

r-

mance

inboth

image

level

identifi

catio

nand

pix

el

level

localiz

atio

n.

Futu

re

effo

rtcan

be

focused

on

condensin

gth

eL

AD

Nnetw

ork

,as

well

as

incor-

pora

ting

more

featu

res

by

usin

gfi

lterin

gkern

els

genera

ting

featu

res

such

as

CFA

,E

lAto

impro

ve

netw

ork

perfo

rmance.

Refe

ren

ce

[1]

Bay

ar

&S

tam

m,

AC

M.

20

16

,p

p.

5–

10

;[2

]B

ay

ar

&S

tam

m,

IEE

ET

ran

s.

Inf.

Fo

ren

sic

sS

ecu

r,1

3.1

1

(20

18

),p

p.

26

91

–2

70

6;

[3]

Bira

jdar

&M

an

kar,

Dig

it.In

vest.

10

.3(2

01

3),

pp

.2

26

–2

45

;[4

]F

erra

raet

al.,

IEE

ET

ran

s.

Inf.

Fo

ren

sic

sS

ecu

r,7

.5(2

01

2),

pp

.1

56

6–

15

77

;[5

]H

ill&

Rag

er,

“Im

ag

eF

org

ery

Dete

ctio

n”;

[6]

Hsu

&C

han

g,

ICM

E,

To

ron

to,

Can

ad

a,

20

06

;[7

]H

uh

et

al.,

EC

CV

,2

01

8,

pp

.1

01

–1

17

;[8

]M

ah

dia

n&

Saic

,Im

ag

e.

Vis

ion

.C

om

pu

t.,2

7.1

0(2

00

9),

pp

.1

49

7–

15

03

;[9

]R

ao

&N

i,W

IFS

,IE

EE

.2

01

6,

pp

.1

–6

;[1

0]

Sim

ony

an

&Z

isserm

an

,arX

iv:1

40

9.1

55

6(2

01

4);

[11

]W

uet

al.,

CV

PR

20

19

,pp

.9

54

3–

95

52

;[1

2]

Zh

ou

et

al.,

CV

PR

,2

01

8,

pp

.1

05

3–

10

61

.

1h

ttps://w

ww

5.c

s.fa

u.d

e/re

searc

h/d

ata

/imag

e-m

an

ipu

latio

n/

2h

ttp://c

oco

data

set.o

rg/#

ho

me

Recommended