Clusters in the Expanse: Understanding and Unbiasing IPv6 ... · IPv6 measurements Differences in...

Post on 17-Aug-2020

0 views 0 download

transcript

Chair of Network Architectures and ServicesDepartment of InformaticsTechnical University of Munich

Clusters in the Expanse:Understanding and Unbiasing IPv6 Hitlists

Oliver GasserTechnical University of Munich

RIPE 77, Amsterdam

Joint work

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 2

Internet measurementsActive Internet measurements

• Important tool to understand specific networks• Which IP addresses run an HTTPS web server in the Internet?• How securely configured are IoT devices in a company network?• Are my DNS servers vulnerable to amplification attacks?

• Used by researchers, security companies,. . .

but also badactors

Why is this research relevant for operators?

• Learn measurements techniques used in IPv6 vs. in IPv4• Understand how devices can be discovered in your network• Take action by conducting measurements yourself

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 3

Internet measurementsActive Internet measurements

• Important tool to understand specific networks• Which IP addresses run an HTTPS web server in the Internet?• How securely configured are IoT devices in a company network?• Are my DNS servers vulnerable to amplification attacks?

• Used by researchers, security companies,. . . but also badactors

Why is this research relevant for operators?

• Learn measurements techniques used in IPv6 vs. in IPv4• Understand how devices can be discovered in your network• Take action by conducting measurements yourself

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 3

Internet measurementsActive Internet measurements

• Important tool to understand specific networks• Which IP addresses run an HTTPS web server in the Internet?• How securely configured are IoT devices in a company network?• Are my DNS servers vulnerable to amplification attacks?

• Used by researchers, security companies,. . . but also badactors

Why is this research relevant for operators?

• Learn measurements techniques used in IPv6 vs. in IPv4• Understand how devices can be discovered in your network• Take action by conducting measurements yourself

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 3

IPv6 measurements

Differences in IPv4 and IPv6 measurement approaches

• IPv4• Brute-force scan complete Internet in a few hours (e.g. ZMap)

• IPv6• Address space too expansive for brute force scanning• Assemble target list of IPv6 addresses for scanning→ IPv6 hitlist

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 4

IPv6 hitlist

Assembling an IPv6 hitlist

• Leverage DNS to gather IPv6 addresses• Exploit structural properties to learn new addresses• Use crowdsourcing to get client addresses

Challenges

1. Clusters in hitlist sources2. Aliased prefixes3. Finding reachable addresses

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 5

IPv6 hitlist

Assembling an IPv6 hitlist

• Leverage DNS to gather IPv6 addresses• Exploit structural properties to learn new addresses• Use crowdsourcing to get client addresses

Challenges

1. Clusters in hitlist sources2. Aliased prefixes3. Finding reachable addresses

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 5

Hitlist sourcesWhere can we learn potential IPv6 addresses?

• Domain lists: zonefiles, toplists,blacklists• Rapid7 ANY DNS• Domains extracted from Certifi-

cate Transparency• Bitcoin node addresses• RIPE Atlas: traceroutes, ipmap• Scamper: traceroute to all as-

sembled addresses

2017-082017-09

2017-102017-11

2017-122018-01

2018-022018-03

2018-042018-05

10M

20M

30M

40M

50M

60MDomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper

Figure 1: Cumulative runup ofIPv6 addresses.

Observation

• Many addresses from domain lists, CT, and scamper

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 6

Hitlist sourcesWhere can we learn potential IPv6 addresses?

• Domain lists: zonefiles, toplists,blacklists• Rapid7 ANY DNS• Domains extracted from Certifi-

cate Transparency• Bitcoin node addresses• RIPE Atlas: traceroutes, ipmap• Scamper: traceroute to all as-

sembled addresses

2017-082017-09

2017-102017-11

2017-122018-01

2018-022018-03

2018-042018-05

10M

20M

30M

40M

50M

60MDomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper

Figure 1: Cumulative runup ofIPv6 addresses.

Observation

• Many addresses from domain lists, CT, and scamper

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 6

Hitlist sourcesWhere can we learn potential IPv6 addresses?

• Domain lists: zonefiles, toplists,blacklists• Rapid7 ANY DNS• Domains extracted from Certifi-

cate Transparency• Bitcoin node addresses• RIPE Atlas: traceroutes, ipmap• Scamper: traceroute to all as-

sembled addresses

2017-082017-09

2017-102017-11

2017-122018-01

2018-022018-03

2018-042018-05

10M

20M

30M

40M

50M

60MDomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper

Figure 1: Cumulative runup ofIPv6 addresses.

Observation

• Many addresses from domain lists, CT, and scamperOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 6

Hitlist sourcesHow diverse are the addresses from different sources?

100 101 102 103 104

AS

0.0

0.2

0.4

0.6

0.8

1.0

Frac

tion

of a

ddr.

in to

p X

ASes

DomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper

Figure 2: AS distribution for hitlist sources.

Autonomous System distribution

• Unbalanced (CT, domain lists) vs. balanced (RIPE Atlas)

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 7

Hitlist sourcesHow diverse are the addresses from different sources?

100 101 102 103 104

AS

0.0

0.2

0.4

0.6

0.8

1.0

Frac

tion

of a

ddr.

in to

p X

ASes

DomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper

Figure 2: AS distribution for hitlist sources.

Autonomous System distribution

• Unbalanced (CT, domain lists) vs. balanced (RIPE Atlas)

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 7

Hitlist sourcesHow diverse are the addresses from different sources?

100 101 102 103 104

AS

0.0

0.2

0.4

0.6

0.8

1.0

Frac

tion

of a

ddr.

in to

p X

ASes

DomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper

Figure 2: AS distribution for hitlist sources.

Autonomous System distribution

• Unbalanced (CT, domain lists) vs. balanced (RIPE Atlas)Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 7

Hitlist sourcesHow much of the announced address space do we cover?

5M

153K

2K

49

1

Inp

ut IP

ad

dre

sses

Figure 3: IPv6 prefixes with number of hitlist addresses per prefix.

BGP prefix distribution

• Good coverage of BGP prefixes: 25.5 k of 51.2 k• Some prefixes with many addresses

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 8

Hitlist sourcesHow much of the announced address space do we cover?

5M

153K

2K

49

1

Inp

ut IP

ad

dre

sses

Figure 3: IPv6 prefixes with number of hitlist addresses per prefix.

BGP prefix distribution

• Good coverage of BGP prefixes: 25.5 k of 51.2 k• Some prefixes with many addresses

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 8

Hitlist sourcesHow much of the announced address space do we cover?

5M

153K

2K

49

1

Inp

ut IP

ad

dre

sses

Figure 3: IPv6 prefixes with number of hitlist addresses per prefix.

BGP prefix distribution

• Good coverage of BGP prefixes: 25.5 k of 51.2 k• Some prefixes with many addresses

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 8

Hitlist sources

Key take-aways for network operations

1. IPv6 address space too vast to conduct brute-force measurements

2. Your addresses can be gathered from many dif-ferent publicly available sources (e.g. DNS, CT)

3. About 50 % of announced prefixes are coveredin our IPv6 hitlist

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 9

Address entropy clusteringAddressing schemes

• Question: How similar are addressing schemes in our hitlist?• Approach: Group addresses to find similar address schemes

01020304050/32 prefixes [%]

1

2

3

4

5

6

Clu

ster

ID

10 12 14 16 18 20 22 24 26 28 30 32IPv6 nybble (hex character)

0.0

0.2

0.4

0.6

0.8

1.0

Media

n e

ntr

opy

Figure 4: Addressing schemes.

• Only few addressing schemes• Low-bit addresses (e.g. ::1), privacy extensions, and EUI-

64 mapped MAC addresses clearly visible

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 10

Address entropy clusteringAddressing schemes

• Question: How similar are addressing schemes in our hitlist?• Approach: Group addresses to find similar address schemes

01020304050/32 prefixes [%]

1

2

3

4

5

6

Clu

ster

ID

10 12 14 16 18 20 22 24 26 28 30 32IPv6 nybble (hex character)

0.0

0.2

0.4

0.6

0.8

1.0

Media

n e

ntr

opy

Figure 4: Addressing schemes.

• Only few addressing schemes• Low-bit addresses (e.g. ::1), privacy extensions, and EUI-

64 mapped MAC addresses clearly visible

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 10

Address entropy clusteringAddressing schemes

• Question: How similar are addressing schemes in our hitlist?• Approach: Group addresses to find similar address schemes

01020304050/32 prefixes [%]

1

2

3

4

5

6

Clu

ster

ID

10 12 14 16 18 20 22 24 26 28 30 32IPv6 nybble (hex character)

0.0

0.2

0.4

0.6

0.8

1.0

Media

n e

ntr

opy

Figure 4: Addressing schemes.

• Only few addressing schemes• Low-bit addresses (e.g. ::1), privacy extensions, and EUI-

64 mapped MAC addresses clearly visibleOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 10

Address entropy clustering

Key take-aways for network operations

1. Most networks use one of a handful of address-ing schemes

2. Good: Industry best practices are followed

3. Bad: Addressing schemes might uncover “hid-den” hosts

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 11

Detecting aliased prefixesAliases

• Alias: Multiple addresses belonging to the same host• Aliased prefix: Complete prefix bound to the same host• Bias: As some hosts are overrepresented, aliased prefixes

introduce bias in the hitlist

Detecting aliased prefixes using pseudo-random probing

2001:0db8:0407:8000::/64

2001:0db8:0407:8000:0151:2900:77e9:03a8...

2001:0db8:0407:8000:f693:2443:915e:1d2e

Table 1: IPv6 fan-out for multi-level aliased prefix detection.

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 12

Detecting aliased prefixesAliases

• Alias: Multiple addresses belonging to the same host• Aliased prefix: Complete prefix bound to the same host• Bias: As some hosts are overrepresented, aliased prefixes

introduce bias in the hitlist

Detecting aliased prefixes using pseudo-random probing

2001:0db8:0407:8000::/64

2001:0db8:0407:8000:0151:2900:77e9:03a8...

2001:0db8:0407:8000:f693:2443:915e:1d2e

Table 1: IPv6 fan-out for multi-level aliased prefix detection.

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 12

Detecting aliased prefixes

5M

148K

2K

48

1

Inp

ut IP

ad

dre

sses

Figure 5: All prefixes covered by hitlist.

5M

148K

2K

48

1

Inp

ut IP

ad

dre

sses

Figure 6: Aliased prefixes.

• 55.1 M raw IPv6 addresses in hitlist• Few prefixes are aliased (e.g. Amazon, see right figure)• 25.7 M IPv6 addresses in aliased prefixes (46.6 %)• Validation using fingerprinting (iTTL, TCP opts, TCP TS)

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 13

Detecting aliased prefixes

Key take-aways for network operations

1. Aliased prefixes can introduce bias in IPv6 mea-surements

2. Can be detected with pseudo-random probing

3. Using aliasing to hide your prefixes and hosts isnot very effective

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 14

Address responsiveness

Cross protocol responsiveness

• If address responds on protocol X, how likely is it to respondon protocol Y?• Goal: Identify relevant addresses for specific measurements

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 15

Address responsiveness

ICMP TCP/80 TCP/443 UDP/53 UDP/443Protocol X

UDP/443

UDP/53

TCP/443

TCP/80

ICMP

Pr[P

roto

col Y

| Pr

otoc

ol X

]

0.017 0.035 0.054 0.0065 1

0.069 0.1 0.14 1 0.029

0.29 0.58 1 0.54 0.98

0.45 1 0.91 0.61 0.99

1 0.95 0.93 0.89 0.990.2

0.4

0.6

0.8

1.0

Figure 7: Likeliness to respond on protocol Y, if responding to protocol X.

• If responsive to one of the probes→ at least 89% chance itwill answer to ICMPv6• Web protocols: QUIC→HTTPS and HTTP, HTTPS→HTTP;

but not the other way around

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 16

Address responsiveness

ICMP TCP/80 TCP/443 UDP/53 UDP/443Protocol X

UDP/443

UDP/53

TCP/443

TCP/80

ICMP

Pr[P

roto

col Y

| Pr

otoc

ol X

]

0.017 0.035 0.054 0.0065 1

0.069 0.1 0.14 1 0.029

0.29 0.58 1 0.54 0.98

0.45 1 0.91 0.61 0.99

1 0.95 0.93 0.89 0.990.2

0.4

0.6

0.8

1.0

Figure 7: Likeliness to respond on protocol Y, if responding to protocol X.

• If responsive to one of the probes→ at least 89% chance itwill answer to ICMPv6• Web protocols: QUIC→HTTPS and HTTP, HTTPS→HTTP;

but not the other way aroundOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 16

Address responsiveness

Key take-aways for network operations

1. Knowing responsiveness on one service mightleak information about other services

2. Horizontal port scanning on all devices is notnecessary

3. Attackers might pick one port (e.g. TCP/80) andthen continue with only responsive hosts

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 17

Learning new addressesTechniques to learn new addresses

• Entropy/IP: Generate new addresses by leveraging entropyof seed addresses• Similar approach to grouping addresses based on their structure as

shown earlier• Presented at RIPE74 in Budapest by Paweł Foremski

• 6Gen: Generate new addresses in dense address regions• If we see addresses

• 2001:0db8:0407:8000::3• 2001:0db8:0407:8000::4• 2001:0db8:0407:8000::5• 2001:0db8:0407:8000::8• 2001:0db8:0407:8000::9

• Likely other valid addresses• 2001:0db8:0407:8000::6• 2001:0db8:0407:8000::7

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 18

Learning new addressesTechniques to learn new addresses

• Entropy/IP: Generate new addresses by leveraging entropyof seed addresses• Similar approach to grouping addresses based on their structure as

shown earlier• Presented at RIPE74 in Budapest by Paweł Foremski

• 6Gen: Generate new addresses in dense address regions• If we see addresses

• 2001:0db8:0407:8000::3• 2001:0db8:0407:8000::4• 2001:0db8:0407:8000::5• 2001:0db8:0407:8000::8• 2001:0db8:0407:8000::9

• Likely other valid addresses• 2001:0db8:0407:8000::6• 2001:0db8:0407:8000::7

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 18

Learning new addressesHow well do Entropy/IP and 6Gen perform?

• Input: All previously found IPv6 addresses• Generation: 118 M and 129 M, only 675 k overlapping• Responsiveness: 278 k and 489 k• Magnitude higher response rate for overlapping addresses

Table 2: Top 5 responsive protocol combinations for 6Gen and Entropy/IP.

ICMP TCP/80 TCP/443 UDP/53 UDP/443 6Gen Entropy/IP

3 7 7 7 7 66.8 % 41.1 %3 3 3 7 7 9.2 % 12.3 %7 7 7 3 7 7.3 % 23.1 %3 3 7 7 7 4.9 % 3.4 %3 3 3 7 3 3.2 % 6.1 %

• Different host populations

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 19

Learning new addressesHow well do Entropy/IP and 6Gen perform?

• Input: All previously found IPv6 addresses• Generation: 118 M and 129 M, only 675 k overlapping• Responsiveness: 278 k and 489 k• Magnitude higher response rate for overlapping addresses

Table 2: Top 5 responsive protocol combinations for 6Gen and Entropy/IP.

ICMP TCP/80 TCP/443 UDP/53 UDP/443 6Gen Entropy/IP

3 7 7 7 7 66.8 % 41.1 %3 3 3 7 7 9.2 % 12.3 %7 7 7 3 7 7.3 % 23.1 %3 3 7 7 7 4.9 % 3.4 %3 3 3 7 3 3.2 % 6.1 %

• Different host populations

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 19

Learning new addressesHow well do Entropy/IP and 6Gen perform?

• Input: All previously found IPv6 addresses• Generation: 118 M and 129 M, only 675 k overlapping• Responsiveness: 278 k and 489 k• Magnitude higher response rate for overlapping addresses

Table 2: Top 5 responsive protocol combinations for 6Gen and Entropy/IP.

ICMP TCP/80 TCP/443 UDP/53 UDP/443 6Gen Entropy/IP

3 7 7 7 7 66.8 % 41.1 %3 3 3 7 7 9.2 % 12.3 %7 7 7 3 7 7.3 % 23.1 %3 3 7 7 7 4.9 % 3.4 %3 3 3 7 3 3.2 % 6.1 %

• Different host populationsOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 19

Learning new addresses

Key take-aways for network operations

1. Address learning uncovers previously unknownaddresses

2. Techniques provide complementary address sets

3. Hiding in the expansive IPv6 address space mightbe more difficult

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 20

Conclusion

• IPv6 Internet too vast to conduct brute-force measurements• But you might be less “hidden” in IPv6 than you’d have thought• Addressing schemes might uncover “hidden” hosts• Responsiveness of one service might leak information about

other services

ipv6hitlist.github.io

Oliver Gasser <gasser@net.in.tum.de>https://www.net.in.tum.de/~gasser/

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 21

Conclusion

• IPv6 Internet too vast to conduct brute-force measurements• But you might be less “hidden” in IPv6 than you’d have thought• Addressing schemes might uncover “hidden” hosts• Responsiveness of one service might leak information about

other services

ipv6hitlist.github.io

Oliver Gasser <gasser@net.in.tum.de>https://www.net.in.tum.de/~gasser/

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 21

Conclusion

• IPv6 Internet too vast to conduct brute-force measurements• But you might be less “hidden” in IPv6 than you’d have thought• Addressing schemes might uncover “hidden” hosts• Responsiveness of one service might leak information about

other services

ipv6hitlist.github.io

Oliver Gasser <gasser@net.in.tum.de>https://www.net.in.tum.de/~gasser/

Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 21