Post on 14-Nov-2014
description
transcript
Networking in LinuxNetworking in Linux
Aditya Dev NayarAditya Dev NayarAvanish KushalAvanish KushalMayank KukrejaMayank KukrejaRavi GuptaRavi Gupta
DefinitionsDefinitionsTCP/IPTCP/IP : The set of all the protocols : The set of all the protocols used to transfer data from one used to transfer data from one computer to the other.computer to the other.
TCP/IP stackTCP/IP stack: The function layers : The function layers (stacked on top of each other) used (stacked on top of each other) used to categorize the functions performed to categorize the functions performed by the communication protocols. by the communication protocols.
DOD reference modelDOD reference model
TCP/IP StackTCP/IP Stack
➢ Process of data transfer Process of data transfer ➢ Every protocol communicates with Every protocol communicates with
its peerits peer➢ Headers and trailersHeaders and trailers
Structure of packet Structure of packet
NoteNote: Data structure for the layers : Data structure for the layers are compatible for the sake of are compatible for the sake of efficiency and avoid copying.efficiency and avoid copying.
Network Access LayerNetwork Access Layer➢ Transmission of a frame(packet)Transmission of a frame(packet)➢ Details of underlying physical Details of underlying physical
networknetwork➢ Adds the suitable header and Adds the suitable header and
trailer trailer
Internetwork layerInternetwork layer➢ Send the data across different Send the data across different
networksnetworks➢ Adds the suitable header and the Adds the suitable header and the
trailertrailer➢ No or minimal checks for errors No or minimal checks for errors
and retransmitsand retransmits
Host to host transport Host to host transport layerlayer➢ Formation of a connection is Formation of a connection is possible possible ➢ Checks for errors can be doneChecks for errors can be done➢ The delivery of the data packet can The delivery of the data packet can be be
ensured ensured
Application LayerApplication Layer
➢ Interacts with the usersInteracts with the users➢ Implements the encryption and Implements the encryption and
decryption techniques for data decryption techniques for data
OSI modelOSI model
IP protocolIP protocol➢ Protocol at internetwork layerProtocol at internetwork layer➢ Define the datagram, which is the Define the datagram, which is the
basic of transmission in the basic of transmission in the Internet.Internet.
➢ Define the Internet addressing Define the Internet addressing schemescheme
➢ Move data between the Network Move data between the Network Access Layer and the Host-to-Host Access Layer and the Host-to-Host Transport Layer.Transport Layer.
➢ Route datagrams to remote hostsRoute datagrams to remote hosts➢ Fragment and reassemble Fragment and reassemble
datagramsdatagrams
TCP and UDP protocolsTCP and UDP protocolsTCP:TCP: Reliable and full-duplex Reliable and full-duplex connections connections Reliable serviceReliable service
UDP:UDP: Stateless transmission Stateless transmission Minimum protocol overheadMinimum protocol overhead High speed High speed
SOCKETSSOCKETS
Definition: A Definition: A socketsocket is a software is a software construction representing a single construction representing a single connection between two networking connection between two networking applications. applications.
IP information at connection time : IP information at connection time : ip_route_connect() ip_route_connect()
Adv : No need to do continuous routing Adv : No need to do continuous routing table look-ups table look-ups
Socket StructuresSocket Structures
There are two main socket structures in Linux: general BSD sockets and IP specific INET sockets
BSD SOCKETS•struct sock *sk•struct proto_ops *ops
INET SOCKETS •struct sk_buff_head (receive/write)_queue •__u32 saddr •struct proto *prot
Establishing Establishing Connections Connections Server = Server =
gethostbyname(SERVER_NAME) gethostbyname(SERVER_NAME)
sockfd = sockfd = socket(AF_INET,SOCK_STREAM, 0); socket(AF_INET,SOCK_STREAM, 0);
connect(sockfd, connect(sockfd, &address,sizeof(address));&address,sizeof(address));
Socket Call Walk-Socket Call Walk-ThroughThrough Check for errors in call Check for errors in call Create (allocate memory for) socket Create (allocate memory for) socket
object object Put socket into INODE list Put socket into INODE list Establish pointers to protocol functions Establish pointers to protocol functions
(INET) (INET) Store values for socket type and protocol Store values for socket type and protocol
family family Set socket state to closed Set socket state to closed Initialize packet queues Initialize packet queues
Connect Call Walk-Connect Call Walk-ThroughThrough
Check for errors Check for errors Determine route to destination: Determine route to destination: Store pointer to routing entry in Store pointer to routing entry in
socket socket Call protocol specific connection Call protocol specific connection
function (e.g., send a TCP function (e.g., send a TCP connection packet) connection packet)
Set socket state to established Set socket state to established
Close Walk-ThroughClose Walk-Through
Check for errorsCheck for errors Change the socket state to Change the socket state to
disconnectingdisconnecting Do any protocol closing actions Do any protocol closing actions Free memory for socket data Free memory for socket data
structures (TCP/UDP and INET) structures (TCP/UDP and INET) Remove socket from INODE list Remove socket from INODE list
Sending MessagesSending Messages
Receiving MessageReceiving Message
Address Resolution Address Resolution Protocol Protocol Problem: Given an IP address find Problem: Given an IP address find
the MAC addressthe MAC address
Solution 1. Table Lookup:Solution 1. Table Lookup:
Searching or indexing to get MAC Searching or indexing to get MAC address.address.
ARP (contd.)ARP (contd.)
2. Closed-Form Computation: Using local IEEE 8022. Closed-Form Computation: Using local IEEE 802– addresses, e.g., Hardware Address = (IP_address & addresses, e.g., Hardware Address = (IP_address &
0xFF) !40:00:00:00:00:000xFF) !40:00:00:00:00:00
3. Message Exchange: ARP3. Message Exchange: ARP– The host broadcasts a request: “What is the MAC The host broadcasts a request: “What is the MAC
address of 127.123.115.08?”address of 127.123.115.08?”– The host whose IP address is 127.123.115.08The host whose IP address is 127.123.115.08 replies back: “The MAC address forreplies back: “The MAC address for
“ “127.123.115.08 is 8A-5F-3C-23-45-5616”127.123.115.08 is 8A-5F-3C-23-45-5616”
All three methods are allowed in TCP/IP networks.All three methods are allowed in TCP/IP networks.
Message formatMessage format
CachingCaching
ARP responses are cached.ARP responses are cached. Entry replaced whenEntry replaced when
Cache table fills up (oldest removed)Cache table fills up (oldest removed) After some time, e.g., 20 minutesAfter some time, e.g., 20 minutes
Sender’s address binding is Sender’s address binding is stored in the cache of the targetstored in the cache of the target
Proxy & Reverse ARPProxy & Reverse ARP
Proxy ARP: A router may act as a Proxy ARP: A router may act as a proxy for many IP addressesproxy for many IP addresses
Reverse ARP : What is the IP Reverse ARP : What is the IP address of a given hardware address of a given hardware address?address?
Used by diskless systems for their IPUsed by diskless systems for their IP
Need RARP server to respond.Need RARP server to respond.
RoutingRouting
RoutingRouting :The process of choosing a :The process of choosing a path over which to send packets. path over which to send packets.
Routing occurs at a TCP/IP host when it sends IP Routing occurs at a TCP/IP host when it sends IP packets, and occurs again at an IP router.packets, and occurs again at an IP router.
RouterRouter : A device that forwards the : A device that forwards the packets from one physical network to packets from one physical network to another.another.
Routers are commonly referred to as Routers are commonly referred to as gatewaysgateways. .
A Walk ThroughA Walk Through
When a host attempts communication with another host, When a host attempts communication with another host, IP first determines whether the destination host is local or IP first determines whether the destination host is local or on a remote networkon a remote network
If the destination host is remote, IP then checks the If the destination host is remote, IP then checks the routing table for a route to the remote host or remote routing table for a route to the remote host or remote network.network.
If no explicit route is found, IP uses its default gateway If no explicit route is found, IP uses its default gateway address to deliver the packet to a routeraddress to deliver the packet to a router
At the router, the routing table is again consulted for a At the router, the routing table is again consulted for a path to the remote host or network. If a path is not found, path to the remote host or network. If a path is not found, the packet is sent to the router's default gateway addressthe packet is sent to the router's default gateway address
The DetailsThe Details
Linux maintains 3 sets of Routing Linux maintains 3 sets of Routing DataData
1.1. Neighbour Table – Neighbour Table – Directly Directly connected computers.connected computers.
2.2. FIB Table – FIB Table – All other All other networks/computers.networks/computers.
3.3. Routing Cache – Routing Cache – Cache for FIB.Cache for FIB.
Neighbour TableNeighbour Table
• struct neigh_tablestruct neigh_table : Contains common : Contains common neighbour information.All computers connected neighbour information.All computers connected by same type of connection are in same table.by same type of connection are in same table.
• struct neighbourstruct neighbour : specific info about a : specific info about a neighbour like device which is connected to the neighbour like device which is connected to the neighbour, various flags regarding connection .neighbour, various flags regarding connection .
• struct neigh_parmsstruct neigh_parms : contains message : contains message travel time, queue length and other statistical travel time, queue length and other statistical information.information.
Forwarding Information BaseForwarding Information Base (FIB)(FIB)
FIB is a structure containing FIB is a structure containing routing information for any routing information for any valid IP address.valid IP address.
An exhaustive list of known IP An exhaustive list of known IP destinations and their best routes.destinations and their best routes. Complex Data structure.Complex Data structure. Slow access.Slow access.
Forwarding Information Forwarding Information BaseBase (FIB) (FIB)
Each IP subnet is represented by a Each IP subnet is represented by a fib_zonefib_zone data structure data structure
All of these are pointed at from the All of these are pointed at from the fib_zonesfib_zones hash table. The hash index is hash table. The hash index is derived from the IP subnet mask.derived from the IP subnet mask.
Routes to the same subnet described by Routes to the same subnet described by pairs of pairs of fib_nodefib_node and and fib_infofib_info data data
Route Cache Route Cache
Keeps every route that is currently in use or has Keeps every route that is currently in use or has been used recently in a hash table. been used recently in a hash table.
The index into the route table is a hash function The index into the route table is a hash function based on the least significant two bytes of the IP based on the least significant two bytes of the IP address.address.
If route not in cache, FIB looked up and a new entry If route not in cache, FIB looked up and a new entry made in route cache.made in route cache.
Routes chained in order of most frequently used Routes chained in order of most frequently used first. Removed when old.first. Removed when old.
Routing Cache…Routing Cache…conceptual organizationconceptual organization
Routing Information Routing Information Protocol (RIP)Protocol (RIP)
RIPRIP : Protocol for : Protocol for routers to track routers to track
distance to distance to different networks different networks and to share this and to share this information information among among themselves.themselves.
RIP contd..RIP contd..
At startup, information from all At startup, information from all neighbouring routers is requested.neighbouring routers is requested.
A received packet can be a A received packet can be a 'response' or a 'request'.'response' or a 'request'.
A response is sent to all A response is sent to all neighbours every 30 sec.neighbours every 30 sec.
Listens on UDP socket 520 for Listens on UDP socket 520 for incoming packets.incoming packets.
CSMA/CD TechnologyCSMA/CD Technology
RequirementRequirement
Ethernet network provides shared Ethernet network provides shared access to a group of attached nodesaccess to a group of attached nodes
Each node has a NIC (Network Each node has a NIC (Network Interface Card)Interface Card)
The shared cable allows any NIC to The shared cable allows any NIC to send whenever it wishessend whenever it wishes
But if two NICs happen to transmit at But if two NICs happen to transmit at the same time, a collision will occur, the same time, a collision will occur, resulting in the data being corrupted resulting in the data being corrupted
1.1. Source NIC Source NIC dispatches framedispatches frame
2.2. Frame transmits Frame transmits in both directionsin both directions
3.3. Every NIC Every NIC receives the receives the frame; do MAC frame; do MAC address matchingaddress matching
4.4. Intended NIC Intended NIC picks up the picks up the frame; rest drop itframe; rest drop it
CSMA/CD AlgorithmCSMA/CD Algorithm
Sense for carrier.Sense for carrier. If carrier present, wait until carrier If carrier present, wait until carrier
ends.ends.– Sending would force a collision and waste Sending would force a collision and waste
timetime Send packet and sense for collision.Send packet and sense for collision. If no collision detected, consider If no collision detected, consider
packet delivered.packet delivered.
CSMA/CD AlgorithmCSMA/CD Algorithm
Otherwise if collision detectedOtherwise if collision detected Send Jam SignalSend Jam Signal Abort immediatelyAbort immediately Perform “exponential back off” and Perform “exponential back off” and
send packet again.send packet again.– Start to send at a random time picked Start to send at a random time picked
from an intervalfrom an interval– Length of the interval increases with every Length of the interval increases with every
retransmissionretransmission
Collision DetectionCollision Detection
Time
A B C
Collision Detection: Collision Detection: ImplicationsImplications
All nodes must be able to All nodes must be able to detect the collision.detect the collision.– Any node can be senderAny node can be sender
The implication is that either The implication is that either we must have a short wires, we must have a short wires, or long packets.or long packets.– Or a combination of bothOr a combination of both
Can calculate length/distance Can calculate length/distance based on transmission rate based on transmission rate and propagation speed.and propagation speed.– Minimum packet size is 64 bytesMinimum packet size is 64 bytes
Cable length ~256 bit timesCable length ~256 bit times– Example: maximum coax cable Example: maximum coax cable
length is 2.5 kmlength is 2.5 km
A B C
CSMA/CD: Some CSMA/CD: Some DetailsDetails When a sender detects a collision, it When a sender detects a collision, it
sends a “jam signal”.sends a “jam signal”.– Make sure that all nodes are aware of the Make sure that all nodes are aware of the
collisioncollision– Length of the jam signal is 32 bit timesLength of the jam signal is 32 bit times
Exponential backoff operates in Exponential backoff operates in multiples of 512 bit times.multiples of 512 bit times.– Longer than a roundtrip timeLonger than a roundtrip time– Guarantees that nodes that back off longer Guarantees that nodes that back off longer
will notice the earlier retransmission before will notice the earlier retransmission before starting to sendstarting to send
Ethernet Frame Ethernet Frame FormatFormat
Preamble marks the beginning of the frame.Preamble marks the beginning of the frame.– Also provides clock synchronizationAlso provides clock synchronization
Source and destination are 48 bit IEEE MAC Source and destination are 48 bit IEEE MAC addresses.addresses.– Flat address spaceFlat address space– Hardwired into the network interfaceHardwired into the network interface
Type field is a demultiplexing field.Type field is a demultiplexing field.– What network layer (layer 3) should receive this What network layer (layer 3) should receive this
packet?packet?– Is actually a length field in the 802.3 standardIs actually a length field in the 802.3 standard
CRC for error checking.CRC for error checking.
Preamble Type PadDest Source Data CRC
8 6 6 2 4
Minimum Packet SizeMinimum Packet Size
Why put a minimum packet size?Why put a minimum packet size? Give a host enough time to detect Give a host enough time to detect
collisionscollisions In Ethernet, minimum packet size In Ethernet, minimum packet size
= 64 bytes (two 6-byte addresses, = 64 bytes (two 6-byte addresses, 2-byte type, 4-byte CRC, and 46 2-byte type, 4-byte CRC, and 46 bytes of data)bytes of data)
If host has less than 46 bytes to If host has less than 46 bytes to send, the adaptor pads (adds) send, the adaptor pads (adds) bytes to make it 46 bytesbytes to make it 46 bytes
Limited cable lengthLimited cable length
Limitation: Before the transmitted Limitation: Before the transmitted packet is completely dispatched packet is completely dispatched from the sender, all other nodes from the sender, all other nodes on the local network must at least on the local network must at least start receiving it. This assumption start receiving it. This assumption is required for the “jam signal” is required for the “jam signal” protocol to work.protocol to work.
Drawbacks of CSMA/CDDrawbacks of CSMA/CD
Ethernet CaptureEthernet Capture A has to send bigger file than BA has to send bigger file than B A transmits first.A transmits first. A and B then both simultaneously try to A and B then both simultaneously try to
transmit. B picks a larger retransmission transmit. B picks a larger retransmission interval than A and defers. interval than A and defers.
A sends, then sends again. A sends, then sends again. Both A and B attempt to resume Both A and B attempt to resume
transmissiontransmission
Drawbacks of CSMA/CDDrawbacks of CSMA/CD
A and B both back-off, however, since A and B both back-off, however, since B was already in back-off (it failed to B was already in back-off (it failed to retransmit), it chooses from a larger retransmit), it chooses from a larger range of back-off times (using the range of back-off times (using the exponential back-off algorithm). exponential back-off algorithm).
A is therefore more likely to succeed, A is therefore more likely to succeed, which it does in the example. The next which it does in the example. The next pause in transmission, A and B both pause in transmission, A and B both attempt to send, however, since this attempt to send, however, since this fails in this case, B further increases its fails in this case, B further increases its back-off and is now unable to fairly back-off and is now unable to fairly compete with A. compete with A.
Performance of Performance of CSMA/CDCSMA/CD Only one transmitterOnly one transmitter
– Near 100% utilisation of networkNear 100% utilisation of network– Possible to completely use 10MbpsPossible to completely use 10Mbps
Many transmitting NICsMany transmitting NICs– Some bandwidth wasted in collision Some bandwidth wasted in collision
detectiondetection– Typical busy network gives 2-4 MbpsTypical busy network gives 2-4 Mbps
ReferencesReferences
http://en.wikipedia.orghttp://en.wikipedia.org http://www.erg.abdn.ac.ukhttp://www.erg.abdn.ac.uk http://http://www.cisco.comwww.cisco.com The linux kernel: David A RuslingThe linux kernel: David A Rusling Linux IP Networking: Glenn HerrinLinux IP Networking: Glenn Herrin