+ All Categories
Home > Documents > BGP for big guys - Tieto · the traffic during the billing period.) 5 . ion 6 . ion • Numbers...

BGP for big guys - Tieto · the traffic during the billing period.) 5 . ion 6 . ion • Numbers...

Date post: 30-Apr-2019
Category:
Upload: nguyennhan
View: 213 times
Download: 0 times
Share this document with a friend
53
© 2012 Tieto Corporation 1
Transcript

© 2

012 T

ieto

Corp

ora

tio

n

1

© 2

012 T

ieto

Corp

ora

tio

n

2

© 2

012 T

ieto

Corp

ora

tio

n

3

© 2

012 T

ieto

Corp

ora

tio

n

• This tier hierarchy isn’t defined by any authority; the common understanding is Tier 1

don’t buy transit connectivity from anyone, they’re able to reach any destination via

their peers and customers. Generally this tier-based ranking system is used mostly for

marketing purposes.

• Interesting connotations:

• Tier 1 networks are not “the backbone of Internet”, in reality a Tier 2 network

with good peering can be much close to most end users than a Tier 1.

• Some Tier 2 networks are significantly larger than some Tier 1 networks and are

often able to provide better (= shorter end-to-end path, more reliability)

connectivity.

• Most of the Tier 1 networks offer global coverage, but some of them (e.g. AT&T)

are just regional.

• See Tier 1 networks list and other information at

http://en.wikipedia.org/wiki/Tier_1_network

4

© 2

012 T

ieto

Corp

ora

tio

n

• The transit service is typically priced per Mbps per month with a commitment to a

minimal volume of consumption. (For example 4€ per Mbps, with a minimal

consumption of 500 Mbps. The consumption is usually evaluated as ~90th percentile of

the traffic during the billing period.)

5

© 2

012 T

ieto

Corp

ora

tio

n

6

© 2

012 T

ieto

Corp

ora

tio

n

• Numbers taken from NIX.CZ, which is currently the 10th largest IXP in Europe, 15th

largest in the world (based on the peak traffic)

• An IXP typically provides just the infrastructure and related services (monitoring, route

servers), the members themselves have to agree the peering policy and configure

routing sessions between each other.

• Various conditions may be required by an IXP, e.g.:

• Connections to at least two locations

• Only peering traffic may be transferred over the IXP infrastructure, forwarding of

transit traffic is not allowed

• Only L3 devices may be connected to the IXP infrastructure (pretty popular but

not very smart rule)

• Open peering policy required (not seen very often nowadays)

• Route servers provide similar functionality as route reflectors. While route reflectors are

used in iBGP environment (and configured by neighbor <ip> route-reflector-client),

route servers are used in eBGP environment, configured by neighbor <ip> route-

server-client. Similarly to route reflectors, a route server collects routes from its

neighbors and redistributes them to them. Two important points in the route server

behavior are:

• The next-hop IP address from the originating router is preserved, the route

server doesn’t provide its IP address as the next-hop. (This is the default eBGP

7

© 2

012 T

ieto

Corp

ora

tio

n

behavior; next-hop address is preserved if it is on the same subnet as

the neighbor to which the update is sent.)

• AS number of the route server is not inserted into the AS path.

7

© 2

012 T

ieto

Corp

ora

tio

n

• See http://www.de-cix.net for more information

8

© 2

012 T

ieto

Corp

ora

tio

n

• See http://www.nix.cz/ and http://www.ficix.fi/english/main.php for more information

9

© 2

012 T

ieto

Corp

ora

tio

n

10

© 2

012 T

ieto

Corp

ora

tio

n

• ICANN manages not only IPv4 and IPv6 address spaces, but also autonomous system

numbers, protocol identifiers, gTLD and ccTLD top-level DNS and root servers. See

http://www.icann.org/ for more information.

• Only IP address management and AS numbers management is delegated to RIRs,

resources like global DNS and protocol numbers are managed directly by ICANN.

• The RIRs are African Network Information Centre (AfriNIC, http://www.afrinic.net/) for

Africa; American Registry for Internet Numbers (ARIN, https://www.arin.net/) for US,

Canada, parts of Caribbean region, and Antarctica; Asia-Pacific Network Information

Centre (APNIC, www.apnic.net) for Asia, Australia, New Zealand, and neighboring

countries; Latin America and Caribbean Network Information Centre (LACNIC,

http://www.lacnic.net/en/) for Latin America and parts of Carribean region; and finally

Réseaux IP Européens Network Coordination Centre (RIPE NCC, http://www.ripe.net/)

for Europe, Russia, Middle East, and Central Asia.

11

© 2

012 T

ieto

Corp

ora

tio

n

• Find web search interface at https://apps.db.ripe.net/search/query.html

• RIPE database Query Reference Manual available at https://www.ripe.net/data-

tools/support/documentation/ripe-database-query-reference-manual

• More information about RIPE database structure, objects, and their attributes available

in the RIPE database Update Reference Manual at https://www.ripe.net/data-

tools/support/documentation/update-ref-manual

12

© 2

012 T

ieto

Corp

ora

tio

n

• Other attributes also possible for inetnum object, see next slide.

• Status attribute possible values are:

• ALLOCATED PA, ALLOCATED PI, ALLOCATED UNSPECIFIED: Block

allocated to a LIR for non-portable (PA), portable (PI), or mixed assignments.

• ASSIGNED PA, ASSIGNED PI: The address space has been assigned to an

end user. (PA = Provider Aggregable, this assignment cannot be kept when

moving to another provider. PI = Provider Independent, can be kept as long as

the criteria for assignment are met.)

• Other values also possible, like SUB-ALLOCATED PA, LIR-PARTITIONED PA,

LIR-PARTITIONED PI for LIR sub-delegation purposes, or ASSIGNED

ANYCAST for TLD anycast networks

13

© 2

012 T

ieto

Corp

ora

tio

n

14

© 2

012 T

ieto

Corp

ora

tio

n

• LIRs are responsible for creating and updating objects in the RIPE DB. The authorized

persons public PGP key is stored in RIPE DB as key-cert object and is referenced by

auth attribute of the mntner object. Every request must be signed by an authorized

PGP key.

• Various maintainer-related attributes can be found in RIPE DB, for example:

• mnt-by: specifies the maintainer of the containing object and its sub-objects

• mnt-lower: specifies the maintainer of sub-objects of the containing object. (I.e.

allows to manage more specific inetnum objects under a “supernetwork”

inetnum object without allowing to modify the supernetwork object itself.)

• mnt-routes: specifies the maintainer of routing-related information in the

containing object and its sub-objects

15

© 2

012 T

ieto

Corp

ora

tio

n

• The idea behind route objects is to provide a mechanism how to verify a route is

advertised by the owning autonomous system.

• Operators were expected to filter advertisements on their edge routers, and accept only

routes advertised from the AS which is specified in the RIPE (or other RIR) database.

Unfortunately this filtering mechanism is rarely used, but it’s still highly recommended

to create the correct route objects.

16

© 2

012 T

ieto

Corp

ora

tio

n

• Aut-num object was intended to compose a routing registry, showing inter-AS relations

and allowing strict filtering of accepted prefixes between operators. Unfortunately lot of

AS holders do not keep the information up-to-date, so practical use of these objects are

limited and they aren’t use for filtering.

• Usually you can use them at least to check upstream providers of the AS in question,

see for example https://apps.db.ripe.net/whois/lookup/ripe/aut-num/AS5610.html for

Telefonica O2 CZ, or https://apps.db.ripe.net/whois/lookup/ripe/aut-num/AS1299.html

for Telia Sonera.

• As Telia Sonera is Tier 1 operator, no upstream providers can be identified in the

object, only peers and customers.

• The object ends with lot of remarks attributes, providing some information about

Telia Sonera routing policy and supported BGP communities.

17

© 2

012 T

ieto

Corp

ora

tio

n

• Members attribute may refer to an AS number (aut-num object) or another as-set

object. (Note AS-GTSCZ-CUST macro to describe GTS’s customers.)

• Transit providers very often ask their customers to provide the AS macro object, and

use it for filtering. Only prefixes originating in the included AS’s are accepted. This kind

of filtering should be always applied to customers and peering partners

18

© 2

012 T

ieto

Corp

ora

tio

n

19

© 2

012 T

ieto

Corp

ora

tio

n

• LGs are very useful to check if a prefix is correctly propagated in the world routing

20

© 2

012 T

ieto

Corp

ora

tio

n

• RIPE operated RIS can be found at https://www.ripe.net/data-tools/stats/ris

• Information about AS 1299 (Telia Sonera) shown on the picture

21

© 2

012 T

ieto

Corp

ora

tio

n

• RIPE operated looking glass can be found at http://www.ris.ripe.net/cgi-bin/lg/index.cgi,

provides routing information from route collectors located in different places (usually big

Internet Exchanges).

22

© 2

012 T

ieto

Corp

ora

tio

n

23

© 2

012 T

ieto

Corp

ora

tio

n

• BGP neighborhood is explicitly defined 1:1 relation. Intended neighbors (peers) must

be properly configured to establish BGP session.

• BGP uses TCP/179 to exchange messages; after initial OPEN message BGP session

is established and UPDATE messages are sent as needed to keep the routing

information up-to-date. Periodic KEEPALIVE messages are also exchanged to check

the peer’s availability.

24

© 2

012 T

ieto

Corp

ora

tio

n

• See Day in the Life of a BGP Update in Cisco IOS, http://meetings.ripe.net/ripe-

45/presentations/ripe45-routing-bgp-update.pdf, for more detailed description.

25

© 2

012 T

ieto

Corp

ora

tio

n

• ORF (Outbound route filtering) is a BGP feature allowing a BGP speaker to “install” its

inbound prefix list filter as and outbound filter on a BGP neighbor, which reduces

unneeded BGP updates. (As the prefixes which would have been rejected anyway are

not advertised at all.)

26

© 2

012 T

ieto

Corp

ora

tio

n

• The list on the slide is incomplete, showing only attributes important for description of

the best route selection process.

• The more complete list of attributes:

• Well-known, mandatory attributes: AS path, Origin, Next Hop

• Well-known, discretionary: Local preference, Atomic Aggregate

• Optional, transitive: for example Community string, Aggregator

• Optional, non-transitive: for example Multi-Exit Discriminator, Route Originator

ID

27

© 2

012 T

ieto

Corp

ora

tio

n

• Only the metrics important for traffic engineering purposes are described in the slide,

the actual best path selection algorithm is much more complex. You can find the

detailed description of the selection process for example at

http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094431.sht

ml.

28

© 2

012 T

ieto

Corp

ora

tio

n

• By default, URPF (Unicast Reverse Path Forwarding) is disabled on Cisco routers.

With BGP, loose URPF can be used (ip verify unicast source reachable-via any).

When connecting single-homed end users, strict URPF should be always configured

(ip verify unicast source reachable-via rx).

29

© 2

012 T

ieto

Corp

ora

tio

n

• This is an example of so-called hot potato routing. Both operators try to force the traffic

out of their network as soon as possible.

30

© 2

012 T

ieto

Corp

ora

tio

n

• With inbound traffic engineering we are trying to influence the behavior of neighboring

networks, our possibilities are clearly limited and our effort may be overridden by

settings on devices out of our control.

• Perfection is not possible, whatever we try is just “recommendation” for other networks.

31

© 2

012 T

ieto

Corp

ora

tio

n

• As the name implies, multi-exit discriminator purpose is to discriminate between more

available exists. Received MED value is advertised to other iBGP peers, but not to

eBGP peers. Thus MED relevance is limited to neighboring ASes.

• By default, MED is compared between prefixes only if they’re received from the same

remote AS. (As it is in the slide.) If bgp always-compare-med option is enabled,

MEDs advertised even from different ASes are compared.

• If bgp deterministic-med option is enabled, the routes are grouped by neighboring

ASes before the MED is compared.

• See How BGP Routers Use the Multi-Exit Discriminator for Best Path Selection

(http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094934.s

html) and How the bgp deterministic-med Command Differs from the bgp always-

compare-med Command

(http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094925.s

html) for detailed explanation.

32

© 2

012 T

ieto

Corp

ora

tio

n

33

© 2

012 T

ieto

Corp

ora

tio

n

• By prepending our AS number on a link, we can artificially decrease its attractiveness.

• This technique will not work perfectly. At least traffic from the Upstream B’s network will

arrive via the backup link all the time (because B will always prefer customers’ links

over peer/upstream links for outbound traffic, more in the Outbound traffic engineering

part). Similarly, some third parties networks might prefer B over A for their outbound

traffic and will not honor the AS path length.

34

© 2

012 T

ieto

Corp

ora

tio

n

• The important point here is that the best route selection algorithm is used only when

different pathes are available for identical prefix.

• Deaggregation means breaking our supernetwork to smaller network blocks and

advertising these blocks separately. As 10.0.0.0/16 and 10.0.0.0/17 are not identical,

the BGP selection algorithm will not take place, and the more specific route will always

be used.

• The main drawback is that deaggregation increases the size of the global routing table

(490k routes currently) and increases the load of the core Internet routers.

• When using deaggregation, use it moderately (do not deaggregate /16 to 256 /24)! Also

note that prefixes longer than /24 are generally not accepted over the Internet.

35

© 2

012 T

ieto

Corp

ora

tio

n

36

© 2

012 T

ieto

Corp

ora

tio

n

37

© 2

012 T

ieto

Corp

ora

tio

n

• Usually a need for BGP inbound loadbalancing indicates non-technical problem.

Influencing the traffic is just a workaround, not a solution.

• Case 1 : Two upstream providers (with intention to use them ~1:1), but significantly

more traffic is arriving through provider A, and much less through B. It means B has

worse (longer) paths to most of the Internet and as such should not be threaten

equally to A, so 1:1 loadbalancing is not desirable! Workaround is AS path prepend

towards A, while solution is to replace the crappy B by C.

• Case 2 : Two upstream providers, traffic (~1.5 Gbps) is balanced in about 60:40 ratio

(which is OK). Unfortunately 60% of 1.5 Gbps is 900 Mbps, which means the 1 GbE

link to A is almost saturated. Workaround is tweaking with deagreggation to achieve

almost 50:50 ratio. Solution is to upgrade the link to A (and also to B).

38

© 2

012 T

ieto

Corp

ora

tio

n

• Traffic blocking is achieved by routing the traffic to Null0 interface. With loose URPF

configured (ip verify unicast source reachable-via any) source IP address(es) of the

attack (if known) may be blocked using the same mechanism.

• See Remotely triggered black hole filtering,

http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6586/ps6642/prod_whit

e_paper0900aecd80313fac.pdf, for more information about the topic.

39

© 2

012 T

ieto

Corp

ora

tio

n

40

© 2

012 T

ieto

Corp

ora

tio

n

41

© 2

012 T

ieto

Corp

ora

tio

n

42

© 2

012 T

ieto

Corp

ora

tio

n

43

© 2

012 T

ieto

Corp

ora

tio

n

• Different customer may have different needs of what to advertise to them. (E.g. a

customer wants to receive only “national” routes, i.e. routes learned from (national)

peering partners.)

• Customers’ prefixes should be accepted also from peering partners and transit

providers (as well as peering partners’ prefixes should be accepted from transit

providers). Should a link to a customer/peering partner fail, the network remains

available the other way around.

• By “advertising customers’ prefixes to customers” is meant that routes received from a

customer are advertised to all other customers. Of course routes received from a

customer are not advertised to the same customer.

44

© 2

012 T

ieto

Corp

ora

tio

n

45

© 2

012 T

ieto

Corp

ora

tio

n

• Some important incidents were

• Turkish TTnet announcing the global routing table as originating from their AS on

24.12.2004

• Pakistani Telecom takes down Youtube on 24.2.2008

• China Telecom originating >15000 prefixes not belonging to them on 8.4.2010

• Find more at http://paper.ijcsns.org/07_book/201007/20100704.pdf

• Detecting BGP hijacks and other anomalies is possible e.g. with BGPmon

(http://bgpmon.net/) or Renesys Gradus (http://renesys.com/)

• In the man-in-the-middle modification, traffic must be routed back from attacking AS to

the victim’s AS.

• The attacker have to advertise the hijacked prefix prepending the real AS path

from the attacker’s AS to the victim’s AS. (Thus the transiting ASes between the

attacker and the victim will not accept the hijacked prefix due to the loop

prevention mechanism, and will follow the original path towards the victim.)

• The attacker then routes the hijacked prefix back to the victim.

• By adjusting TTL values, the attacker can also hidden their presence.

46

© 2

012 T

ieto

Corp

ora

tio

n

• The 20:07 hotfix by YouTube helped only partially because

• As the /24 was advertised from AS 36561 and AS 17557 simultaneously, only

some parts of Internet reached the correct AS

• Even though deaggregated /25s were advertised, they were filtered by most

operators and were visible only for small parts of Internet

• Find more information at http://www.ripe.net/internet-coordination/news/industry-

developments/youtube-hijacking-a-ripe-ncc-ris-case-study or

http://www.renesys.com/blog/2008/02/pakistan_hijacks_youtube_1.shtml

47

© 2

012 T

ieto

Corp

ora

tio

n

• One of the issues lies also in Mikrotik platform, which was used by Supronet for

routing. While Cisco use set as-path prepend <ASN> <ASN> … syntax, Mikrotik use

set-bgp-prepend <count> with count “allowed” to be 0 – 16. The Supronet

administrator followed Cisco logic and used command set-bgp-prepend 47868 (the

Supronet’s AS number). Due to Mikrotik bug, 1. the parameter value is not checked,

and 2. only lower 8 bits are actually used, resulting in prepending the AS 252 times

(which is the LSB of 47868)

• Detailed description of Cisco behavior can be found at http://www.lupa.cz/clanky/proc-

a-zda-supronet-shodil-internet/ (in Czech)

48

© 2

012 T

ieto

Corp

ora

tio

n

• With directly connected BGP peers, TTL should be set to 255 and the peer should only

accept packets with TTL of 255. It’s impossible to send a packet with TTL = 255 to a

non-directly connected host, this effectively prevents all spoofing attacks from third

parties not directly connected.

• Filtering not allocated prefixes requires to periodically update the filters.

• Depending on specific conditions, default route may be accepted from upstream

provides, or may be advertised to a customer

• Limit for peers, if not configured on per-peer basis, should be configured lower than the

global routing table size

49

© 2

012 T

ieto

Corp

ora

tio

n

• “Do not accept prefixes with the first AS in the AS path not belonging to the peer” –

This behavior is the default on Cisco routers, in case your peering in an IX is done

toward a BGP route-server with transparent AS path handling, this verification needs to

be de-activated (no bgp enforce-first-as).

• Private AS numbers may be used with a multi-homed customer without their own AS,

but must be filtered out towards peers and upstreams by configuring the neighbors with

remove-private-as.

• More detailed explanations of these policies can be found at http://www.ietf.org/id/draft-

jdurand-bgp-security-01.txt.

50

© 2

012 T

ieto

Corp

ora

tio

n

51

© 2

012 T

ieto

Corp

ora

tio

n

52


Recommended