+ All Categories
Home > Documents > Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data...

Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data...

Date post: 17-Oct-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
15
Linux Kernel Networking Neven Skoro <[email protected]> Mentor: Dr. Janusz Zalewski CEN 4516 Computer Networks Date Modified: December 2 nd 2006 http://satnet.fgcu.edu/~nskoro
Transcript
Page 1: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Linux Kernel Networking

Neven Skoro

<[email protected]>

Mentor: Dr. Janusz Zalewski

CEN 4516 Computer Networks

Date Modified: December 2nd 2006

http://satnet.fgcu.edu/~nskoro

Page 2: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

1. Introduction and Overview

Linux Operating Systems have been popular within the networking community for a long

time now. The main reason is of its wide capability range and its relatively fast speed. Others

might prefer to use a Linux system rather than Windows or Macintosh because unlike others it is

an open source, which means it is both free and accessible. By accessible here I mean you can

look in its source code and figure out what is going on and most importantly how things work in

the core. Another benefit of this is the option to modify the source code and make your own

version of the Linux Kernel. But this is an advance topic which should only be done by experts.

Two of the world’s fastest super computers today are running Linux OS and many

important databases are held on Linux servers, needless to say Linux OS are very important in

today’s world of networking. It was initially written by Linus Torvalds in 1991 as a portable

Operating system. Since then it was drastically rewritten and the code that Linus initially wrote is

only 2% of the current Linux system. However Linus Torvalds owns the trademark name Linux

in the United States, and he has made Linux available under the GNU General Public License

which allows developers to modify and reuse the Linux code.

Linux Kernel was mostly written in C programming language, and so is the networking

part of it, the data structures, classes and functions are all written in C. Therefore, a good

knowledge of C is required for understanding the inner workings of Linux Networking. There is

a heavy use of pointers to data structures, fields, variables and functions. This heavy use of

pointers makes the kernel run a lot faster than if it were to pass everything by value, which in the

long run reduces the number of CPU cycles the kernel produces.

2

Page 3: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

2. Important Networking Data Structures

To understand how Linux networking works it is important to understand its critical data

structures which are used quite extensively. The struct sk_buff is one of the most important data

structures used in Linux Networking code, because the packet is stored in this data structure.

Interestingly this data structure is used within all the layers of networking where they store the

headers, payloads, and information needed to organize their workings. The structure itself is

defined in the include file include/linux/skbuff.h and it contains a lot of variables which are used

for many different things by many different functions and macros. This data structure changes as

it passes different parts of the network layer. Each layer will append its own header to the

structure because adding is much more efficient than making a copy of the structure. Linux uses

skb_reserve function to change the variable that points to the beginning of the buffer. As the

buffer passes through the layers the previous header is not important to the current layer so

instead of removing the old header, the pointer is just moved to the current header which makes

the CPU work less.

All of the sk_buff data structures are kept in a double linked list so both new and

previous structures are easily accessible. However this gets a little bit more complicated as this

structure needs to be able to find the head of the list very quickly at any time and thus it has and

extra structure is added. This structure is sk_buff_head which contains struct sk_buff *previous

and struct sk_buff *next. We can see the full structure directly from the Linux source code

bellow.

109 struct sk_buff_head {110 /* these two members must be first. */111 struct sk_buff *next;112 struct sk_buff *prev;113 114 __u32 qlen;115 spinlock_t lock;116 };

3

Page 4: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

The number of elements in the list is defined in qlen and lock is used to prevent multiple

accesses to the list at the same time. Diving deeper, every sk_buff struct has a pointer to the

single buff_head struct.

Figure 1.0 List of sk_buff elements

In the figure 1.0 we can see different structs and their relationships. The struct sock *sk is

merely a pointer to a sock struct of the owning socket. If the buffer is just forwarded this pointer

would be set to Null. Unsigned int len contains the size of all the data in the buffer, and unsigned

int data_len contains the size of the data in the fragments. Unsigned int mac_len contains the size

of the Mac header and Atomic_t users is used mainly to ensure that a sk_buff structure is not

freed while another process is using it. Unsigned int truesize contains the total size of the buffer

and is set by alloc_skb function. Other fields include unsigned char *head, unsigned char *end,

unsigned char *data, unsigned char *tail, these define the edges of the buffer and the information

within.

4

Page 5: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Figure 1.1 head and tails pointers

We can see the working of head and tail pointers in the figure 1.1, the head points the

headroom, data points to data, tail points to the beginning of tale and the end points to the end of

tailroom.

Lastly, we have the destructor function pointer, and it is initialized when a buffer is

removed. If a buffer belongs to a socket the destructor pointer is set to sock_rfree or sock_wfree.

To check the amount of memory held by the sockets, these two socket routines can be used to

update the amount of memory allocated by these sockets.

Some other data structures that are important include struct timeval stamp, which is used

to represent when a packet was received and sometimes when one was scheduled. Struct

net_device *dev is used to describe a network device. Struct net_device *input_dev is used

mainly for traffic control it shows where the packed came from, but it has a NULL value if it was

generated locally. Struct net_device *real_dev is used for associating a real device with the

virtual device. Union {...} h,union {...} nh and union {...} mac are pointers to header protocols of

the TCP/IP stack to the L4, L3 and L2 respectively. Char cb[40] is used to store private data for

each network layer.

5

Page 6: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Two major parts of Linux networking are the allocation and freeing of memory. The

alloc_skb is used to allocate buffers and can be seen in the file net/core/skbuff.c. When

allocating memory space both memory for the buffer and sk_buff structure is needed because the

data buffer and the header are two separate entities.

skb = kmem_cache_alloc(skbuff_head_cache, gfp_mask & ~_ _GFP_DMA); ... ... ... size = SKB_DATA_ALIGN(size); data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);

From this code we can see that kmem_cache_alloc function is called by the alloc_skb to get an

ak_buff structure from the cache. The kmalloc function is called to get the data buffer.

Figure 1.2 alloc_skb function

The function dev_alloc_skb is executed in interrupt mode by the device drivers for buffer

allocation. It adds 16 bytes to the size for optimization purposes uses GEP_ATOMIC, an atomic

operation, because this function is called with an interrupt handler routine. The sample code

bellow shows the dev_alloc_skb function.

static inline struct sk_buff *dev_alloc_skb(unsigned int length){ return _ _dev_alloc_skb(length, GFP_ATOMIC);}

6

Page 7: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

The _ _ dev_alloc_skb function is used as a default provided that there is no

architecture specification definition.

static inlinestruct sk_buff *_ _dev_alloc_skb(unsigned int length, int gfp_mask){ struct sk_buff *skb = alloc_skb(length + 16, gfp_mask); if (likely(skb)) skb_reserve(skb, 16); return skb;}

The two functions kfree_skb and dev_kfree_skb are used to release the buffer back to the

cache. The function kfree_skb is called when there are no more users of the buffer left, meaning

when the skb->users is equal to1. If the buffer had 5 users, the kfree_skb would have to be called

5 times before the memory was freed. Figure 1.3 shows a very important flowchart in releasing

the buffers. When a sk_buff is freed, dst_release has to be decremented if the sk_buff was

holding a reference to the dst_entry data structure. In the end, the sk-buff data structure returns to

the skbuff_head_cache buffer pool (cache).

Skb_reserve is in charge of reserving the headroom in the buffer it also shifts the data and

the tail pointers which are pointing to the beginning and the ending of the payload. When the

memory allocation is called, the tail and the data are of the same value, that’s why skb_reserve is

called with skb_reserve(skb,n) to mark the head and the tail of the buffer.

static inline void skb_reserve(struct sk_buff *skb, unsigned int len){ skb->data+=len; skb->tail+=len;}

It is important to notice that the data is moved with the buffer, yet the function moves the

pointers to their new values.

7

Page 8: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Figure 1.3 kfree_skb function

8

Page 9: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

3. Transmission and Reception

3.1 Frame Reception

To understand frame reception it is important how the interrupt handler works and what

are its main responsibilities. The four major responsibilities include the copying the frame into

the sk_buff structure, however if the DMA is used only the pointer needs to be initialized and no

copying is needed. Secondly, initialization of some sk_buff fields for the higher layers. Thirdly,

Interrupt handler initializes some other private parameters accessible only to the specific device.

And lastly it sends a signal to the Kernel that a new frame is available by putting

NET_RX_SOFTIRQ on the execution schedule.

A generic framework called Netpoll is used for the sending and the receiving of frames. It

polls the network interface cards and thus interrupts are not directly needed. Most devices do

support Netpoll.

skb = dev_alloc_skb(pkt_len + 5); ... ... ... if (skb != NULL) { skb->dev = dev; skb_reserve(skb, 2); /* Align IP on 16 byte boundaries */ ... ... ... /* copy the DATA into the sk_buff structure */ ... ... ... skb->protocol = eth_type_trans(skb, dev); netif_rx(skb); dev->last_rx = jiffies; ... ... ... }

Most network devices use netif_rx function to put the sk_buff into the incoming packet

queue for the current Processor and calls the __cpu_raisesoftirq() to mark the

NET_RX_SOFTIRQ for execution. One important thing to acknowledge here is that if the queue

is full the packet transmitted will be lost. The example code above comes from vortex_rx that

uses the function netif_rx. The function dev_alloc_skb allocates the sk_buff data structure where

the frame is copied into. In this case the Vortex driver is a part of the Ethernet cards and because

9

Page 10: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

of that the driver automatically knows the size of the link layer’s header and most importantly,

how to read it. The driver calls skb_reserve function with the offset 16-n because it needs to

reserve 2 bytes for the IP header. The definition of NET_IP_ALIGN can be found in the

include/linux/sk_buff.h

3.2 Frame Transmission

Poll_List contains all the devices that are polled because they have a nonempty receive

queue. The devices that have information to transmit are listed in output_queue. Both of the

fields, poll_list and output_queue, are contained in the structure softnet_data. It is important to

know that for a device to be enabled for scheduling it has to have _ _LINK_STATE_START

flag set and _ _LINK_STATE_XOFF flag cleared for reception and transmission respectively.

Flags _ _LINK_STATE_RX_SCHED and _ _LINK_STATE_SCHED are set when a device is

scheduled for reception and transmission respectively. The kernel uses _ _netif_schedule

function to schedule a device for transmission. We can see the working of the code bellow.

static inline void _ _netif_schedule(struct net_device *dev){ if (!test_and_set_bit(_ _LINK_STATE_SCHED, &dev->state)) { unsigned long flags; struct softnet_data *sd; local_irq_save(flags); sd = &_ _get_cpu_var(softnet_data); dev->next_sched = sd->output_queue; sd->output_queue = dev; raise_softirq_irqoff(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); }}

First the function moves the device to the head of the output_queue list. There can be

only one such list for each individual CPU. Furthermore, all of the devices which are contained

in the list are linked together via pointers, more specifically net_device->nxt_sched. This

function can be called both inside and outside of interrupt context and for this reason when it is

called it temporarily disables the interrupt until it adds the device to the output_queue list.

10

Page 11: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Figure 3.1 Frame Transmission

11

Page 12: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Secondly, NET_TX_SOFTIRQ is scheduled for execution. Devices which have

something to send are marked in the _ _LINK_STATE_SCHED. The function _ _netif_schedule

would not do anything if the device has previously already been scheduled for transmission. To

make sure the device has transmission enabled before putting it on the schedule this function is

used;

static inline void netif_schedule(struct net_device *dev){ if (!test_bit(_ _LINK_STATE_XOFF, &dev->state)) _ _netif_schedule(dev);}

4. Internet Protocol

To understand how IP works we first need to look at its header structure. Some of its

fields have changed over time, but the main purpose and workings are still the same. The version

discussed in this paper is IPv4.

Figure 4.1 IP header

Most of the fields are self explanatory; the one that might need some explanation is time to

live. This value is decremented by routers as it hops from one destination to the next. It is

suppose to represent a timer before the packet is dropped even though it does not relate to

12

Page 13: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

actual wall time. The header checksum simply verifies the header to make sure it’s accurate,

and protocol field represents the higher layer protocol identifier which is L4. Transmission of

a packet is one of the main responsibilities of the IP protocol. The packet needs to pass

multiple routers, to get to its designed destination. The number of routers a packet passes is

known as the number of hops a packet makes from its source. The kernel uses a function

called ip_route_output_flow to find out a packets route which can be called at layers L3 and

L4.

2608 int ip_route_output_flow(struct rtable **rp, struct flowi *flp, struct sock *sk, int flags)2609 {2610 int err;2611 2612 if ((err = __ip_route_output_key(rp, flp)) != 0)2613 return err;2614 2615 if (flp->proto) {2616 if (!flp->fl4_src)2617 flp->fl4_src = (*rp)->rt_src;2618 if (!flp->fl4_dst)2619 flp->fl4_dst = (*rp)->rt_dst;2620 return xfrm_lookup((struct dst_entry **)rp, flp, sk, flags);2621 }2622 2623 return 0;2624 }

Kernel uses multiple functions to prepare for the transmission of a packet which include two

main ones, ip_append_data and ip_push_pending_frames. We can see an example of sending

a packet from the UDP layer with the function udp_sendmesg shown below.

int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len) { ... ... ... struct udp_opt *up = udp_sk(sk); ... ... ... int corkreq = up->corkflag || msg->msg_flags&MSG_MORE; ... ... ... err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen, sizeof(struct udphdr), &ipc, rt, corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags); if (err) udp_flush_pending_frames(sk); else if (!corkreq) err = udp_push_pending_frames(sk, up);

This function calls the ip_append_data and initiates the transmission of data by calling the

udp_push_pending_frames if and only if the corkreq is set to false.

13

Page 14: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

Figure 4.2 udp_push_pending_frames function

14

Page 15: Linux Kernel Networkingstudent.fgcu.edu/nskoro/Linux Networking.pdf · 2. Important Networking Data Structures To understand how Linux networking works it is important to understand

This function uses multiple flags that can be set or cleared for each individual

transmission request. In case the ip_append_data does not execute properly and fails, the queue

is flushed by calling the udp_flush_pending_frames, which is wrapper for the

ip_flush_pendingframes IP function. The flowchart in the figure above shows the main

workings of the udp_push_pending_frames function.

5. Conclusion

The internal workings of the kernel networking are very complex and it would take a

whole book to explain all the details behind it. Such a book has been written called the

Understanding Linux Network Internals and is in fact a very good book. This paper shows

you the overview of the most important functions and data structures which are used to

accomplish the most important communications in the Linux kernel networking. For more detail

understanding of all the networking functions and structures please refer to one of my references

listed in the resources section.

6. Resources

[1] Christian Benvenuti, Understanding Linux Network Internals, O'Reilly Media, Inc., 1005

Gravenstein Highway North, Sebastopol, CA 95472, December 2005

[2] Gianluca Insolvibile, Inside the Linux Packet Filterm, Linux Journal 2002-02-01,

URL: http://www.linuxjournal.com/article/4852

[3] Bill McCarthy, Learning Red Hat Enterprise and Fedora, O'Reilly Media, Inc., 1005

Gravenstein Highway North, Sebastopol, CA 95472, Fourth Edition, April 2004

15


Recommended