+ All Categories
Home > Documents > Internet Censorship

Internet Censorship

Date post: 12-Sep-2021
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
88
Internet Ce Great Firewall o Jong C joumon@g ensorship: of China (GFC) C. Park gmail.com
Transcript
Page 1: Internet Censorship

Internet Censorship:

Great Firewall of China (GFC)

Jong C. [email protected]

Internet Censorship:

Great Firewall of China (GFC)

Jong C. [email protected]

Page 2: Internet Censorship

Contents

• Internet Censorship• Censoring Mechanisms• How to circumvent• How to circumvent• Censorship Researches• Our Work @ UNM

Censoring Mechanisms

Censorship Researches

Page 3: Internet Censorship

Information throttled

• Censorship observed• Slashdot• Open Net Initiative• Reporters Without Borders•• Human Right Watch• UN, etc.......• Censorship in burst• 3 out of 41 in ‘02 -> more than 25 in ‘09• can view at ONI for more details• political, social, security/confilict, Internet tools

Information throttled

Reporters Without Borders

> more than 25 in ‘09

can view at ONI for more details

political, social, security/confilict, Internet tools

Page 4: Internet Censorship

Generic Filtering• Maximillian Dornseif, Germany, • from “Government mandated blocking of foreign Web content”

• Packet-level filtering• mostly done by firewall (rules), but need up-to-dateup-to-date

• OSI layer 3 filtering• inspects header and forwards/drops• coarse-grained due to high false positive (overblocking)

• OSI layer 4 filteringlayer 3

Generic FilteringMaximillian Dornseif, Germany,

from “Government mandated blocking of foreign Web content”

level filtering

mostly done by firewall (rules), but need

OSI layer 3 filtering

inspects header and forwards/drops

grained due to high false

positive (overblocking)

OSI layer 4 filtering: finer-grained than

Page 5: Internet Censorship

Generic Filtering

(cont’d)• Application-level filtering• inspects payload and performs most detailed• often provides ways of informing users about the filteringthe filtering

• hard to be done in real• encryption/compression makes infeasible

Generic Filtering

level filtering

inspects payload and performs most detailed

often provides ways of informing users about

hard to be done in real-time due to overhead

encryption/compression makes infeasible

Page 6: Internet Censorship

Generic Filtering

(cont’d)• Filtering mechanisms• proxy firewall (HW requirement, latency)• IDS (signature-based and/or heuristic)•• DNS poisoning• help from search engines• web-site delisting• keyword filtering• keyword forgery

Generic Filtering

Filtering mechanisms

proxy firewall (HW requirement, latency)

based and/or heuristic)

help from search engines

Page 7: Internet Censorship

Circumvent Censorship• Discard TCP RST• Encrypt/Compress• Use proxy server (web cache)• Use publicly accessible DNS server (caveat: • Use publicly accessible DNS server (caveat: security)

• Tunneling such as ICMP, SSH, SSL, or VPN• Add comment in HTML• Use captchas• etc

Circumvent Censorship

Use proxy server (web cache)

Use publicly accessible DNS server (caveat: Use publicly accessible DNS server (caveat:

Tunneling such as ICMP, SSH, SSL, or VPN

Add comment in HTML

Page 8: Internet Censorship

Great Firewall of China

• Why China?• Most complicated censorship mechanism• China filters to maintain power•• Cyber-sensors and cyber• Hierarchical supervisory bodies• World’s largest IPv6 backbone

Great Firewall of China

Most complicated censorship mechanism

China filters to maintain power

sensors and cyber-police

Hierarchical supervisory bodies

World’s largest IPv6 backbone

Page 9: Internet Censorship

GFC (cont’d)• A large number of sites block• Not all about pornography sites• PA state, USA, blocks child pornography• Most advanced keyword filtering• Most advanced keyword filtering• falun gong, human right, Taiwan, Tibet, even foreign cities due to similar pronunciations

• TCP RSTs, specious SYN/ACK

GFC (cont’d)A large number of sites block

Not all about pornography sites

PA state, USA, blocks child pornography

Most advanced keyword filteringMost advanced keyword filtering

falun gong, human right, Taiwan, Tibet, even foreign cities due to similar pronunciations

TCP RSTs, specious SYN/ACK

Page 10: Internet Censorship

Zittrain & Edelman

• First studied on blocked web sites in China• from “Internet filtering in China”, IEEE Internet Computing

• Studied filtering methods•• IP address blacklisting• DNS IP address blocking• DNS redirection• URL keyword filtering• HTML response keyword filtering

Zittrain & Edelman

First studied on blocked web sites in China

from “Internet filtering in China”, IEEE Internet Computing

Studied filtering methods

IP address blacklisting

DNS IP address blocking

URL keyword filtering

HTML response keyword filtering

Page 11: Internet Censorship

Ignore the GFC

• Seminal work by Richard Clayton et al.• from “Ignoring the Great Firewall of China”, 6th workshop on Privacy Enhancing

Technologies

• Filtered at borders•• Triggers 3 consecutive TCP RSTs (seq, seq+1460, seq+4380)

• Stateless inspection

Ignore the GFC

Seminal work by Richard Clayton et al.

from “Ignoring the Great Firewall of China”, 6th workshop on Privacy Enhancing

Triggers 3 consecutive TCP RSTs (seq,

seq+1460, seq+4380)

Page 12: Internet Censorship

ConceptDoppler

• More work done by Jed Crandall et al.• from “ConceptDoppler: a weather tracker for Internet censorship”, ACM CCS, 2007

• Showed GFC as panopticon• used tcptraceroute with increasing TTL• used tcptraceroute with increasing TTL• Also showed 28.3% requests survived to reach server

• Most filtering happens at 1st hop, but deep thru 13th hops beyond borders

• Provides blacklist words using LSA

ConceptDoppler

More work done by Jed Crandall et al.

from “ConceptDoppler: a weather tracker for Internet censorship”, ACM CCS, 2007

Showed GFC as panopticon

used tcptraceroute with increasing TTLused tcptraceroute with increasing TTL

Also showed 28.3% requests survived to reach

Most filtering happens at 1st hop, but deep thru

13th hops beyond borders

Provides blacklist words using LSA

Page 13: Internet Censorship

TCP RST patterns

• Work done by N. Weaver et al.• from “Detecting forged TCP RST Packets”, TRUST, 2008

• Studied on forged TCP RST patterns and fingerprintsfingerprints

• Patterns:• D+R, R+D, R+R, S+R, SA+R• Fingerprints:• China has IPID 64, IPID

TCP RST patterns

Work done by N. Weaver et al.

from “Detecting forged TCP RST Packets”, TRUST, 2008

Studied on forged TCP RST patterns and

D+R, R+D, R+R, S+R, SA+R

China has IPID 64, IPID -26, SEQ 1460, etc

Page 14: Internet Censorship

Electronic Big Brother

• Work done by J. Karlin et al.• from “/ation-state routing: censorship, wiretapping, and BGP”

• Countries with most information flow: USA, England, and GermanyEngland, and Germany

• China has little effect on interdomain routing• Power Law can be applied

Electronic Big Brother

Work done by J. Karlin et al.

state routing: censorship, wiretapping, and BGP”

Countries with most information flow: USA,

England, and GermanyEngland, and Germany

China has little effect on interdomain routing

Power Law can be applied

Page 15: Internet Censorship

Our Work being going...• J. Crandall et al. showed, with a blacklist,GFCs blocks HTML GET requests substantially

• Are GFCs symmetric or asymmetric in HTML filtering?

•• If asymmetric, filtering would be less effective

• How much blocking on HTML response?• What blacklist keywords blocked & • How do GFCs block & how to escape?

Our Work being going...J. Crandall et al. showed, with a blacklist, that GFCs blocks HTML GET requests

Are GFCs symmetric or asymmetric in HTML

If asymmetric, filtering would be less

How much blocking on HTML response?

What blacklist keywords blocked & why?

How do GFCs block & how to escape?

Page 16: Internet Censorship

Constraints

• Internet Measurement• No fine-grained protocol support• Privacy and legal issue•• and moreM• China claims world’s largest IPv6 backbone• Huge latency due to 4over6• Little collaboration from IPv6

Internet Measurement

grained protocol support

Privacy and legal issue

China claims world’s largest IPv6 backbone

Huge latency due to 4over6

Little collaboration from IPv6

Page 17: Internet Censorship

Keyword Probe

GET server/keyword=

Doesn’t know which direction is blockedM

Keyword Probe

GET server/keyword=falun HTTP/1.1

Doesn’t know which direction is blockedM

Page 18: Internet Censorship

Number Probe

• Use a proxy server, a.k.a. a web cache• Use number query instead of keyword• Chinese cannot be guaranteed• Neither can English since some • Neither can English since some abbreviations are blocked

• Bogus response page of ~4KB• Hash will be much better, but file used currently

• Latency generated due to table iteration

Number Probe

Use a proxy server, a.k.a. a web cache

Use number query instead of keyword

Chinese cannot be guaranteed

Neither can English since some Neither can English since some

abbreviations are blocked

Bogus response page of ~4KB

Hash will be much better, but file used

Latency generated due to table iteration

Page 19: Internet Censorship

Number Probe

GET server/keyword=

1 means

Doesn’t matter which direction is blockedM

Number Probe

GET server/keyword=1 HTTP/1.1

means falun

Doesn’t matter which direction is blockedM

Page 20: Internet Censorship

Setup

• 7 pairs of single server and single client• Scripting probes with wget command + Python code on server

• WGETRC, .wgetrc, or /etc/wgetrc• WGETRC, .wgetrc, or /etc/wgetrc• Scapy used

7 pairs of single server and single client

Scripting probes with wget command + Python

WGETRC, .wgetrc, or /etc/wgetrcWGETRC, .wgetrc, or /etc/wgetrc

Page 21: Internet Censorship

Setup

• Python code with libpcap record traffics on both server-side & client

• tcptraceroute recording latency every 10 mins

•• 3 contiguous probes sent• Preliminary tests from late Feb to early Mar• Real experiments from spring recess thru now

Python code with libpcap record traffics on

side & client-side

tcptraceroute recording latency every 10

3 contiguous probes sent

Preliminary tests from late Feb to early Mar

Real experiments from spring recess thru now

Page 22: Internet Censorship

QUERY SENT

FSM for Probe

200 OK RCV’D

START200 OK RCV’D

TCP RST RCV’D+ QUERY SENT(+port/2)+ QUERY SENT(-port/2)+ QUERY SENT(+256)

WAIT

FSM for Probe

+ QUERY SENT(+256)+ QUERY SENT(-256)

Exponential backup+ HELLO SENT(+1)

200 OK RCV’D

CLEAR

Page 23: Internet Censorship

Proxy Hunt• Chinese broadcast a list of proxy servers• Financial issue• Falun movement• Geographically diverse web caches• Geographically diverse web caches• Use Visualroute server 2008 trial• Returns guess locations

• Proxy life time• Last up to two weeks or so• Some lasts a couple of days

Chinese broadcast a list of proxy servers

Geographically diverse web cachesGeographically diverse web caches

Use Visualroute server 2008 trial

guess locations from ISP info

Last up to two weeks or so

Some lasts a couple of days

Page 24: Internet Censorship

What to test?

• 12/20 diverse locations randomly chosen• Most along with east coast (1 partial)• 1 inland location included, but partial• 1/2 hrs to 2 days each run• 1/2 hrs to 2 days each run• Different locations of a single keyword• Beginning/Middle/End• Different response sizes• 600B/4KB/40KB/175KB/350KB• Static vs dynamic pages

What to test?

12/20 diverse locations randomly chosen

Most along with east coast (1 partial)

1 inland location included, but partial

1/2 hrs to 2 days each run1/2 hrs to 2 days each run

Different locations of a single keyword

Beginning/Middle/End

Different response sizes

600B/4KB/40KB/175KB/350KB

Static vs dynamic pages

Page 25: Internet Censorship

What to test?

• Keyword threshold (to be tested• How many number of the same/different kind triggers & how?

• Different HTTP protocols (• Different HTTP protocols (• Probably no effect since proxy uses HTTP 1.1

• No STDDEV data available yet (• TCP RSTs trend: how many & how (• Any other ideas?

What to test?

to be tested)

How many number of the same/different

kind triggers & how?

Different HTTP protocols (to be tested)Different HTTP protocols (to be tested)

Probably no effect since proxy uses HTTP

No STDDEV data available yet (being tested)

TCP RSTs trend: how many & how (to be tested)

Page 26: Internet Censorship

Map

9

10

11

1

2

34

5

6

7

89

12

Page 27: Internet Censorship

IPv 6 Backbone

12

IPv 6 Backbone

21

4

3

7 65

4

8

10

9

11

12

Page 28: Internet Censorship

IPv 6 BackboneIPv 6 Backbone

Page 29: Internet Censorship

TCP RST

• Depending on location & probe, the number of TCP RSTs received vary

• Sometimes just 1 RST• Sometimes more than 10 RSTs• Sometimes more than 10 RSTs• No automatism at this time (

Depending on location & probe, the number of

TCP RSTs received vary

Sometimes just 1 RST

Sometimes more than 10 RSTsSometimes more than 10 RSTs

No automatism at this time (future work)

Page 30: Internet Censorship

Odd Responses

• Unlike HTML GET request, HTML response receives odd packets

• 404 Page Not Found (once)•• 502 Error (58 times)• Proxy error/Bad gateway• 503 Service Unavailable (236 times)• Connection Timed Out (discarded

Odd Responses

Unlike HTML GET request, HTML response

receives odd packets

404 Page Not Found (once)

502 Error (58 times)

Proxy error/Bad gateway

503 Service Unavailable (236 times)

Connection Timed Out (not counted yet):

Page 31: Internet Censorship

Odd Responses

• Two kinds• Just 502/503 error received without any preceding/following RSTs

•• Some HELLOs get TCP RSTs (delay)• RST received first and then 502/503 error received: categorized same as TCP RST

• Who sends these? Let’s look at client

Odd Responses

Just 502/503 error received without any

preceding/following RSTs

Some HELLOs get TCP RSTs (delay)

RST received first and then 502/503 error

received: categorized same as TCP RST

Who sends these? Let’s look at client-side!

Page 32: Internet Censorship

200 OK

UNM -> China: [SYN] seq=0, TTL=64

China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=48

UNM -> China: [ACK] seq=1, ack=1, TTL=64

UNM -> China: GET server/search.php?keynum=0

HTTP/1.0, TTL=64 * 0 means hello

China -> UNM: [ACK] seq=1, ack=138, TTL=48

When a non-blocked keyword is requested

China -> UNM: [ACK] seq=1, ack=138, TTL=48

China -> UNM: [TCP Dup ACK] [ACK

China -> UNM: [TCP segment of a reassembled PDU

China -> UNM: [TCP segment of a reassembled PDU

China -> UNM: [TCP previous segment lost

seq=4262, ack=138, TTL=48

UNM -> China: [ACK] seq=138, ack=2897, TTL=64

UNM -> China: [TCP Dup ACK] [

China -> UNM: [TCP out-of-order

(text/html)

] seq=0, TTL=64

] seq=0, ack=1, TTL=48

] seq=1, ack=1, TTL=64

server/search.php?keynum=0

* 0 means hello

] seq=1, ack=138, TTL=48

blocked keyword is requested

] seq=1, ack=138, TTL=48

ACK] seq=1, ack=138, TTL=48

TCP segment of a reassembled PDU] TTL=48

TCP segment of a reassembled PDU] TTL=48

TCP previous segment lost] [FIN+ACK]

seq=4262, ack=138, TTL=48

] seq=138, ack=2897, TTL=64

] [ACK] seq=138, ack=2897

order] HTTP/1.0 200 OK

Page 33: Internet Censorship

TCP RST

UNM -> China: [SYN] seq=0, TTL=64

China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=48

UNM -> China: [ACK] seq=1, ack=1, TTL=64

UNM -> China: GET server/search.php?keynum=0

HTTP/1.0, TTL=64 * 0 means falun

When a TCP RST packet is received

HTTP/1.0, TTL=64 * 0 means falun

China -> UNM: [ACK] seq=1, ack=152, TTL=48

China -> UNM: [RST] seq=1, TTL=50

UNM -> China: [FIN+ACK] seq=1, ack=1, TTL=64

UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64

UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64

UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64

] seq=0, TTL=64

] seq=0, ack=1, TTL=48

] seq=1, ack=1, TTL=64

server/search.php?keynum=0

* 0 means falun

When a TCP RST packet is received

* 0 means falun

] seq=1, ack=152, TTL=48

TTL=50

] seq=1, ack=1, TTL=64

] seq=134, ack=2954, TTL=64

] seq=134, ack=2954, TTL=64

] seq=134, ack=2954, TTL=64

Page 34: Internet Censorship

502,503 Error

UNM -> China: [SYN] seq=0, TTL=64

China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=48

UNM -> China: [ACK] seq=1, ack=1, TTL=64

UNM -> China: GET server/search.php?keynum=0

HTTP/1.0, TTL=64 * 0 means falun

When a 503 error message is received

HTTP/1.0, TTL=64 * 0 means falun

China -> UNM: [ACK] seq=1, ack=134, TTL=48

China -> UNM: [TCP previous segment lost

seq=1578, ack=134, TTL=48

UNM -> China: [TCP Dup ACK] [

China -> UNM: [TCP retransmission

a reassembled PDU] , TTL=48

China -> UNM: [TCP retransmission

Unavailable, TTL=48

502,503 Error

] seq=0, TTL=64

] seq=0, ack=1, TTL=48

] seq=1, ack=1, TTL=64

server/search.php?keynum=0

* 0 means falun

When a 503 error message is received

* 0 means falun

] seq=1, ack=134, TTL=48

TCP previous segment lost] [FIN+ACK]

seq=1578, ack=134, TTL=48

] [ACK] seq=134, ack=1, TTL=64

TCP retransmission] [TCP segment of

] , TTL=48

TCP retransmission] HTTP/1.0 503 Service

Page 35: Internet Censorship

502,503 Error

• Based on client-side, GFCs seem to intervene?• Probably yes or no depending on preceding/following RSTs

• Let’s look at server-side!•

502,503 Error

side, GFCs seem to intervene?

Probably yes or no depending on preceding/following RSTs

side!

Page 36: Internet Censorship

502,503 Error

client proxy

GET

502/503

GETGET

GETOK

OK

502,503 Error

proxy server

GET

GET

OK

OK

Page 37: Internet Censorship

502,503 Error

• Probes right before the current probe and right after the current were successful

• i-th keyword (3rd probe): OK• (i+1)-th keyword (1• (i+1)-th keyword (1• (i+1)-th keyword (2

502,503 Error

Probes right before the current probe and right

after the current were successful

probe): OK

th keyword (1st probe): 502 Errorth keyword (1st probe): 502 Error

th keyword (2nd probe): OK

Page 38: Internet Censorship

502,503 Error

• What caused this kind of 502/503 error?• Probably proxy misconfigurations• More analysis needed• More analysis needed• Some HELLOs get delayed RSTs• 503 same as 502?• No concrete evidence• Still in analysis

502,503 Error

What caused this kind of 502/503 error?

Probably proxy misconfigurations

More analysis neededMore analysis needed

Some HELLOs get delayed RSTs

No concrete evidence

Page 39: Internet Censorship

Odd Packets

• Odd packets are generated by a web cache• Mistakenly thought as TCP RST packets received from GFC on the way from server

to a web cache

•• Then, self-censorship? Definitely no!• Probably due to proxy misconfigurations• Odd packets can also be considered TCP RST? Some are definitely false positive

• Why differs? Can’t explain at this time

Odd Packets

Odd packets are generated by a web cache

Mistakenly thought as TCP RST packets

received from GFC on the way from server

censorship? Definitely no!

Probably due to proxy misconfigurations

Odd packets can also be considered TCP RST?

false positive, though.

Can’t explain at this time

Page 40: Internet Censorship

Self-censorship?

UNM -> China: [SYN] seq=0, TTL=64

China -> UNM: [SYN+ACK] seq=0, ack=1, TTL=46

UNM -> China: [ACK] seq=1, ack=1, TTL=64

UNM -> China: GET server/search.php?keynum=0

HTTP/1.0, TTL=64 * 0 means falun

When a self-censorship is observed at Beijing

HTTP/1.0, TTL=64 * 0 means falun

China -> UNM: [ACK] seq=1, ack=152, TTL=46

China -> UNM: [RST] seq=1, TTL=46

UNM -> China: [FIN+ACK] seq=1, ack=1, TTL=64

UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64

UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64

UNM -> China: [FIN+ACK] seq=134, ack=2954, TTL=64

Proxy ACK’ed server in which TCP RST is

not observed at all.

censorship?

] seq=0, TTL=64

] seq=0, ack=1, TTL=46

] seq=1, ack=1, TTL=64

server/search.php?keynum=0

* 0 means falun

censorship is observed at Beijing

* 0 means falun

] seq=1, ack=152, TTL=46

TTL=46

] seq=1, ack=1, TTL=64

] seq=134, ack=2954, TTL=64

] seq=134, ack=2954, TTL=64

] seq=134, ack=2954, TTL=64

Proxy ACK’ed server in which TCP RST is

Page 41: Internet Censorship

Probing Results

• Odd probing results, but the following should be taken into an account:

• Route flutter• HTML request blocking trend varies • HTML request blocking trend varies depending on probing time (diurnal pattern)

• Might be the most efficient way because of distributed filtering on response?

• Can’t explain many phenomenon, but can make good guesses at least

Probing Results

Odd probing results, but the following should

be taken into an account:

HTML request blocking trend varies HTML request blocking trend varies

depending on probing time (diurnal pattern)

Might be the most efficient way because of

distributed filtering on response?

Can’t explain many phenomenon, but can

make good guesses at least

Page 42: Internet Censorship

#Block on 4KB

#keywords of ~4KB

250

300

350

0

50

100

150

200

250

#1 #2 #3 #4 #5

#Block on 4KB

#keywords of ~4KB

#6 #7 #8 #9 #10 #11 #12

Page 43: Internet Censorship

#Block on 4KB

Zoom-up of #keywords of ~4KB

25

30

35

0

5

10

15

20

25

#1 #2 #3 #4 #5 #6

#Block on 4KB

Zoom-up of #keywords of ~4KB

#6 #7 #8 #9 #10 #11 #12

Page 44: Internet Censorship

#Block on 600B

#keywords of ~600B

25

30

35

0

5

10

15

20

25

#1 #2 #4 #5 #6

#Block on 600B

#keywords of ~600B

#7 #8 #9 #10 #11 #12

Page 45: Internet Censorship

#Block on 40KB

#keywords of ~40KB

25

30

35

0

5

10

15

20

25

#1 #2 #4 #5 #6

#Block on 40KB

#keywords of ~40KB

#7 #8 #9 #10 #11 #12

Page 46: Internet Censorship

#Block on 175KB

#keywords of ~175KB

25

30

35

0

5

10

15

20

25

#1 #2 #4 #5 #6

#Block on 175KB

#keywords of ~175KB

#7 #8 #9 #10 #11

Page 47: Internet Censorship

#Block on 350KB

#keywords of ~350KB

50

60

0

10

20

30

40

#1 #2 #4 #5 #6

#Block on 350KB

#keywords of ~350KB

#6 #7 #8 #9 #10 #11

Page 48: Internet Censorship

#keywords by size

40

50

60

#Block on different size

0

10

20

30

40

#1 #2 #3 #4 #5 #6

#keywords by size

s0s1s10

#Block on different size

#6 #7 #8 #9 #10 #11 #12

s10s50s100

Page 49: Internet Censorship

#Block on different loc

Different Location

250

300

350

0

50

100

150

200

250

s0 s1 s10

#1 #2

#5 #6

#8 #9#11

#Block on different loc

Different Location

s10 s50 s100

#2 #4

#6 #7

#9 #10

Page 50: Internet Censorship

#Block on different loc

Zoom-up of Different Location

50

60

#1 #2

#5 #6

#8 #9

0

10

20

30

40

s0 s1 s10

#8 #9#11

#Block on different loc

Zoom-up of Different Location

#4

#7

#10

s50 s100

#10

Page 51: Internet Censorship

Nation-wide

0

10

20

30

40

50

60

s0

RST

30

40

50

60

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

RST

0

100

200

300

400

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

RST

0

10

20

30

40

50

60

s0 s1 s10 s50 s100

0

10

20

s0 s1 s10 s50 s100

0

s0 s1 s10 s50 s100

RST

s1 s10 s50 s100

Page 52: Internet Censorship

STD DEV

STDDEV-AVG of

#keywords

300

350

0

50

100

150

200

250

#1 #2 #4 #5 #6

STDDEV-AVG of

#7 #8 #9 #10 #11

Page 53: Internet Censorship

STD DEV

Zoom-up of STDDEV-AVG

of #keywords by location

40

50

0

10

20

30

#1 #2 #4 #5 #6

Zoom-up of STDDEV-AVG

of #keywords by location

#7 #8 #9 #10 #11

Page 54: Internet Censorship

Latency

Latency

800

1000

0

200

400

600

h1 h4 h7 h10

h13

h16

h19

Latency

#1

#2

#3

#4

h19

h22

h25#4

#5

#6

#7

#8

#9

#10

#11

#12

Page 55: Internet Censorship

Latency

Latency

500

600

200

300

400

h1 h4 h7 h10

h13

h16

h19

Latency

#1

#2

#3

#4

h19

h22

h25#4

#5

#6

#7

#8

#9

#10

#11

#12

Page 56: Internet Censorship

Beijing on 4KB

30

#keywords by date

at Beijing (#9) with keyword at begin

4/6/2008

6

0

10

20

Beijing on 4KB

#keywords by date

at Beijing (#9) with keyword at begin

4/8/2008

17

Page 57: Internet Censorship

False Positive

• Hard to predict, huh?• Need more time/tests• Assume all odd packets false positive•• Let’s visit all graphs again!• False positive rate of ~8.5%• 2236 times not rec’d• 2045 RSTs observed

False Positive

Hard to predict, huh?

Need more time/tests

Assume all odd packets false positive

Let’s visit all graphs again!

False positive rate of ~8.5%

2236 times not rec’d in total since Mar

2045 RSTs observed

Page 58: Internet Censorship

STD DEV

#keywords of STDDEV at Beijing (#9)

with keyword at beginning on Apr 25-26

5

6

7

8

0

1

2

3

4

22-48 23-10 23-31 23-51 00-13 00-33 00-54 01-15 01-35 01-56

RST+Others

#keywords of STDDEV at Beijing (#9)

with keyword at beginning on Apr 25-26

4

5

6

7

0

1

2

3

4

22-48 23-10 23-31 23-51 00-13 00-33 00-54 01-15 01-35 01-56

RST only

Page 59: Internet Censorship

STD DEV#keywords of STDDEV at Beijing (#9)

with keyword at beginning on Apr 26-28

8

10

0

2

4

6

s0 s1 s10

#keywords of STDDEV at Beijing (#9)

with keyword at beginning on Apr 26-28

s10 s50 s100

Page 60: Internet Censorship

Beijing on 4KB

30

#keywords by date

at Beijing (#9) with keyword at begin

30

#keywords by date

at Beijing (#9) with keyword at begin

4/6/20084/8/2008

5

0

10

20

30

4/6/2008

6

0

10

20

30

Beijing on 4KB

#keywords by date

at Beijing (#9) with keyword at begin

#keywords by date

at Beijing (#9) with keyword at begin

4/8/2008

9

4/8/2008

17

Page 61: Internet Censorship

STD DEV

Zoom-up of STDDEV-AVG

of #keywords by location

40

50

0

10

20

30

#1 #2 #4 #5 #6

Zoom-up of STDDEV-AVG

of #keywords by location

#7 #8 #9 #10 #11 #12

Page 62: Internet Censorship

By Location#keywords

at Guangzhou (#1)

10

20

30

RST+Others

RST only

0

s0 s1 s10 s50 s100

#keywords

at Hangzhou (#3)

0

10

20

30

40

s0 s1 s10 s50 s100

RST+Others

RST only

#keywords

at Jiangxi (#2)

10

20

30

RST+Others

RST only

0

s0 s1 s10 s50 s100

#keywords

at Shanghai (#4)

0

10

20

30

s0 s1 s10 s50 s100

RST+Others

RST only

Page 63: Internet Censorship

By Location#keywords

at Jiangsu (#5)

0

10

20

30

40

50

60

RST+Others

RST only

0

s0 s1 s10 s50 s100

#keywords

at Shanxi (#7)

0

10

20

30

s0 s1 s10 s50 s100

RST+Others

RST only

#keywords

at Shandong (#6)

0

10

20

30

RST+Others

RST only

0

s0 s1 s10 s50 s100

#keywords

at Tianjin (#8)

0

10

20

30

s0 s1 s10 s50 s100

RST+Others

RST only

Page 64: Internet Censorship

By Location#keywords

at Beijing (#9)

10

20

30

RST+Others

RST only

#keywords

at Harbin (#11)

0

100

200

300

s0 s1 s10 s50 s100

RST+Others

RST only

0

s0 s1 s10 s50 s100

#keywords

at Liaoning (#10)

10

20

30

RST+Others

RST only

0

s0 s1 s10 s50 s100

#keywords

at Chengdu (#12)

0

10

20

30

s0 s1 s10 s50 s100

RST+Others

RST only

Page 65: Internet Censorship

Dynamic Page (Beijing)

#keywords by dynamic

at Jiangxi (#2)

15

Begin

Middle

0

5

10

s1 s10

Middle

End

Dynamic Page (Beijing)

#keywords by dynamic

at Jiangxi (#2)

s50 s100

Page 66: Internet Censorship

Static Page (Beijing)

#keywords by

static

15

0

5

10

s1 s10

Static Page (Beijing)

#keywords by

Begin

Middle

s50 s100

End

Page 67: Internet Censorship

Dynamic vs Static:

beginning#keywords by

pages

15

0

5

10

s1 s10

Dynamic vs Static:

#keywords by

static

dynamic

s50 s100

dynamic

Page 68: Internet Censorship

Dynamic vs Static: middle

#keywords by pages

at Jiangxi (#2)

15

0

5

10

s1 s10

Dynamic vs Static: middle

#keywords by pages

at Jiangxi (#2)

static

dynamic

s50 s100

dynamic

Page 69: Internet Censorship

Dynamic vs Static: end

#keywords by pages

at Jiangxi (#2)

15

0

5

10

s1 s10

Dynamic vs Static: end

#keywords by pages

at Jiangxi (#2)

static

dynamic

s50 s100

Page 70: Internet Censorship

Blacklist updated

#keywords by blacklist

50

60

old

0

10

20

30

40

s0 s1

new

Blacklist updated

#keywords by blacklist

s10 s50 s100

new

Page 71: Internet Censorship

Filtering Pattern: probabilistic?

filtering pattern

(91 out of 2045)

40

50

0

10

20

30

40

xoo oxo oox xxo

Filtering Pattern:

filtering pattern

(91 out of 2045)

xxo xox oxx xxx

Page 72: Internet Censorship

Blacklist

• ~95% blacklist keywords of HTML GET request also blocked in HTML response

• ~54% blocked when excluding Harbin•• Less substantial blocking on HTML response• Probably due to overhead and/or latency• Unlike request, response blacklist seems to be distributed over the nation

• Some GFCs seem to share

~95% blacklist keywords of HTML GET

request also blocked in HTML response

~54% blocked when excluding Harbin

Less substantial blocking on HTML response

Probably due to overhead and/or latency

Unlike request, response blacklist seems to be

distributed over the nation

Some GFCs seem to share

Page 73: Internet Censorship

Blacklist (4KB)

11

5

1

10

66

131

more

118

29

358

4353

122

30

7

19

9

118

Blacklist (4KB)

3

6

204

236

8

359

2,3 = ∅∅∅∅

Page 74: Internet Censorship

Blacklist (600B)

1, 4, 6, 8, 9, 10

5

19

122

359 18

29

30

358

Blacklist (600B)

11

1, 4, 6, 8, 9, 10

more

7

189

2, 12 = ∅∅∅∅

3: */A

Page 75: Internet Censorship

Blacklist (40KB)

11

9

359

4

317

319

more

5,6,

10

18

29

358

359

1122

19

8

6

30

Blacklist (40KB)

320

330

2

200

209

228

12 = ∅∅∅∅

3: */A

Page 76: Internet Censorship

Blacklist (175KB)

11

more 18

29

358

819

10

118

182

221

293

1,4,5,

7,9

Blacklist (175KB)

6

359

2 = ∅∅∅∅

3, 12: */A

Page 77: Internet Censorship

Blacklist (325KB)

11

5

241

more 18

29

358

7122

6

19

359

1,4,8

Blacklist (325KB)

10

112, 127, 13, 136

177, 18

930

162177, 18

185, 191, 197, 245

48, 50

323, 346

2 = ∅∅∅∅

3, 12: */A

162

178

259

290

Page 78: Internet Censorship

Blacklist (all)

104,107,109,135,161,163,204

205,210,213,230,232,236,238

249,256,263,268,285,357,370

40,55,56,73,74,79,84

11

245

10

more

118

9112,127,13,131,136,162,177

178,182,185,191,197,221

259,290,293,323,346,48,50,66

Blacklist (all)

200

2091

7189

5

2

3

18

19,29,122

358,359

30

358

209

228241

1,6

,8

317,319

320,330

353

4

12 = ∅∅∅∅

Page 79: Internet Censorship

Blacklist Size

#keywords blocked

in all sizes

300

350

400

0

50

100

150

200

250

300

#1 #2 #3 #4 #5 #6

Blacklist Size

#keywords blocked

in all sizes

#6 #7 #8 #9 #10 #11 #12

Page 80: Internet Censorship

Blacklist Size

Zoom-up of #keywords blocked

by location30

0

10

20

#1 #2 #3 #4 #5 #6

Blacklist Size

Zoom-up of #keywords blocked

by location

#6 #7 #8 #9 #10 #11 #12

Page 81: Internet Censorship

Conclusion• GFC performs symmetric filtering against HTML

• Less substantial on HTML response probably due to overhead/latency

• Distributed blacklist words•• Share a subset of blacklist words• Route path, packet size, and keyword location affect censorship efficiency

• Stateful vs. stateless TCP RSTs• Inversion suggests applicationcensorship???

• Still can’t explain many phenomenon

GFC performs symmetric filtering against

Less substantial on HTML response probably due to overhead/latency

Distributed blacklist words

Share a subset of blacklist words

Route path, packet size, and keyword location affect censorship efficiency

Stateful vs. stateless TCP RSTs

Inversion suggests application-layer

Still can’t explain many phenomenon

Page 82: Internet Censorship

Thank you

• Any questions or feedbacks• Any questions or feedbacksfeedbacks?feedbacks?

Page 83: Internet Censorship

Lab 4

• We’re not using an actual IDS, but by • the server is too simple and is just buffering up to 4 KB

•• can beat this by sending a sequence of packets which split keywords

• need to flush first before sending the next• otherwise, TCP RSTs will be triggered

• Lab 4 1/2 will be about TCP fragmentation(?)

We’re not using an actual IDS, but by iptables.

the server is too simple and is just buffering

can beat this by sending a sequence of

packets which split keywords

need to flush first before sending the next

otherwise, TCP RSTs will be triggered

Lab 4 1/2 will be about TCP fragmentation(?)

Page 84: Internet Censorship

Socket in Python

• import socket # client• s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

• s.connect((“10.0.0.3”, 8080))• s.connect((“10.0.0.3”, 8080))• s.send(“hello”)• d=s.recv(1024)• print repr(d)• s.close()

Socket in Python

import socket # client

s = socket.socket(socket.AF_INET,

socket.SOCK_STREAM)

s.connect((“10.0.0.3”, 8080))s.connect((“10.0.0.3”, 8080))

Page 85: Internet Censorship

Scapy

• TCP fragmentation?• Not a proper jargon! But, TCP supports byte streams and out-of

• Let’s call it for the sake of simplicity.• Let’s call it for the sake of simplicity.• Why not IP fragmentation? RFC1858!• Tiny fragment attack• can use fragroute for smaller IP packets

• can forge packets with Scapy

Not a proper jargon! But, TCP supports byte

of-order delivery.

Let’s call it for the sake of simplicity.Let’s call it for the sake of simplicity.

Why not IP fragmentation? RFC1858!

Tiny fragment attack

can use fragroute for smaller IP packets

can forge packets with Scapy

Page 86: Internet Censorship

Scapy (cont’d)• Use whatever you like, for example MuxTCP, C code with libpcap, etc. (Visit

code.)

• Scapy is troublesome in initiating TCP• need to change firewall rules by iptables• need to change firewall rules by iptables• iptables -A OUTPUT

RST -j DROP

• wouldn’t work for Lab 4 1/2 since the server is not under control

• initiate TCP handshake with your own socket

Scapy (cont’d)Use whatever you like, for example MuxTCP,

C code with libpcap, etc. (Visit Milw0rm for

Scapy is troublesome in initiating TCP

need to change firewall rules by iptablesneed to change firewall rules by iptables

A OUTPUT -p tcp --tcp-flags RST

wouldn’t work for Lab 4 1/2 since the server

is not under control

initiate TCP handshake with your own

Page 87: Internet Censorship

Scapy (cont’d)

• To use tcpdump, you need to set PATH• use scapy.sh script mailed• use sniff.py to sniff the network•• Need to know some TCP/IP stuff• IP• IPID, IP addresses• TCP• flags, sequence #, ack #, ports

Scapy (cont’d)

To use tcpdump, you need to set PATH

use scapy.sh script mailed

use sniff.py to sniff the network

Need to know some TCP/IP stuff

IPID, IP addresses

flags, sequence #, ack #, ports

Page 88: Internet Censorship

TCP Handshake• fd = open(“message.txt”, ‘r’)• lines = fd.readlines()• fd.close()

• for l in lines:• print l.strip()+’\n’

• import time• time.sleep(1)

TCP Handshakefd = open(“message.txt”, ‘r’)

lines = fd.readlines()


Recommended