Some clues for Emulab source code (v1.0)
Lin Xue
June 2010
NOTE
This document is written step by step how Emulab works according to Emulab source code,
including parse input, read/write data from DB, run assign, call Dummynet, and etc. I read the
source code because I want to find how Emulab calls Dummynet to create Delay Node. It is not
going very detail, but it can give you a brief view of the Emulab mechanism and many clues that
will help you to understand the source code.
1. Emulab parses your NS file
Suppose you have written your own NS file in Emulab, and you start your experiment, now sim.tcl and
parse.tcl (/lib/ns2ir/) will work to parse your NS file and update the DB.
Sim.tcl defines the class Simulator; parse.tcl defines parse functions for every parameters. (As of now,
Emulab is very similar to NS2)
For example, if you write a line like this:
set link0 [$ns duplex-link $nodeB $nodeA 30Mb 50ms DropTail]
Here you set a duplex link between nodeA and nodeB, and you also set the bandwidth, delay and queue
management, which will be used for Dummynet later.
Since $ns is an object of “Simulator”, then you should look into function of sim.tcl:
Simulator instproc duplex-link {n1 n2 bw delay type args}
(Note: instproc, in Otcl, means add a method duplex-link into class Simulator)
Here we get the parameters from your NS file, and will parse them to lower layer.
2. Emulab puts the parameters into DB
As you may know that in Emulab, all the parameters from users’ input will be put into DB first.
Here inside duplex-link function, you see a line like this:
Link $curlink $self "$n1 $n2" $rbw $rdelay $type
So you see a new class “Link”, you should find where is the definition of “Link”. Go to lanlink.tcl
(/lib/ns2ir/) to find your answer. Here they define the LanLink class which has two children Lan and Link.
A LanLink contains a number of node:port pairs as well as the characteristics bandwidth, delay, and loss
rate. All the links and lans you defined in your NS file will go into this file.
As you can see, function:
LanLink instproc init {s nodes bw d type}
is the constructor of LanLink class, it takes all the parameters from users’ input. (bandwidth, delay and
queue management in your NS file) Moreover, since class LanLink is the superclass of class Link, class
Link will also be initialized.
After get all the parameters, surprisingly, you will find:
Link instproc updatedb {DB} and Lan instproc updatedb {DB}
Yes, that is the place where your parameters are set to DB!!
Inside these updated functions, you can see the function of spitxml_data, this function is in sim.tcl, and
exactly define how they make all the parameters to the XML file, and the SQL DB.
For the bandwidth, delay, and queue information, you’ll see a function like this:
$sim spitxml_data "virt_lans" $fields $values
That is to update all these parameters into the virt_lans table of DB.
You can find the calls of these updatedb functions inside function “run” in sim.tcl:
Simulator instproc run {}
That means whenever you write “run” in your NS file, Emulab will get all the parameters you set, and
update the DB. Actually, you can see in run function, Emulab will update many tables in the DB,
including virt_lans, virt_nodes, etc…
3. Emulab gets the data from DB
OK, now I suppose you have already known how the data is put into the DB. Now the problem is how
Emulab gets the data from DB, and use the data to assign specific hardware. As you may know, Emulab
uses a function “assign” to assign (or map) user’s requests to the specific hardware. The annealing
algorithm in assign will finally choose one optimal assignment of physical machines based on the input
and the current hardware they have. Here you need to know a function “assign_wrapper” which is an
interface between DB data representation and resource allocation algorithms. It will call the solver and
use the output to set up the database state that runs the rest of the process.
So let’s see assign_wrapper.in (testbed/src/testbed5.0/tbsetup), you will see a very important comment
there introduce how Emulab setup the virtual topology as follow.
You see the virt_lans table! Yes, Emulab is ready to read the parameters like the delay and bandwidth
you just set from the virt_lans table.
Then look into the function LoadExperiment(), and then to the function LoadVirtLans(), here you’ll see
Emulab load all the data from DB to the local variables (by DBQueryFatal).
Till now, Emulab has loaded all the parameters you just set, including bandwidth, delay, queue, and
etc… into its local variable.
4. Emulab creates the TOP file
Let’s go on reading assign_wrapper.in (testbed/src/testbed5.0/tbsetup). In order to run assign, Emulab
should have a file named TOP file which records the virtual topology as we see in last page.
Firstly, to open a TOP file:
open(TOPFILE,"> $topfile")
Secondly, there are two cases, one is there are just two virtual members elsif (@members == 2) so they
are just links, another is there are more than two virtual members elsif ($#members != 0) then to
generate virtual lan node.
In both cases, you will see the delay related variables will be stored in a variable delaylinks for future use:
$delaylinks{$plink} = [$member,$delay,$bw,$backfill,$loss,$member,$rdelay,$rbw,$rbackfill,$rloss,0];
Of course, all the variables in the right hand side of the equation are gotten from the local variables
Emulab read from DB previously:
my ($delay,$bw,$ebw,$backfill,$loss, $rdelay,$rbw,$rebw,$rbackfill,$rloss) = virtlandelayinfo($lan,
$member);
After you create your TOPFILE, close it:
close(TOPFILE);
5. Emulab runs assign
Still in assign_wrapper.in, after create the TOP file, Emulab is now ready to assign the physical resources
according to the virtual request, see function:
sub RunAssign ()
Before start assign, in addition to have the virtual topology file, Emulab still need the current physical
topology since without the snapshot of the current physical resources Emulab can not do the optimal
assignment. Again, it will create a PTOP file similar to the TOP file:
system("ptopgen $ptopargs > $ptopfile");
(If you’re interested in how the file is created, you can go to ptopgen.in
(testbed/src/testbed5.0/tbsetup). )
After that, you’ll see some lines:
# Run assign
my $cmdargs = "-P $ptopfile $topfile";
$cmdargs = "-uod -c .75 $cmdargs"
$cmd = "assign";
print "$cmd $cmdargs\n";
which will put the system command: assign –uod –c .75 –P $ptopfile $topfile
As of now, Emulab will run its assign program, which you can read starting from assign.cc
(testbed/src/testbed5.0/assign).
When Emulab finishes assign, Emulab will store the assign result in assign.log file, then operate on file
pointer ASSIGNFP which contains the mapping from virtual to physical:
if (!open(ASSIGNFP, "assign.log"))
I have not had time to read through the assign. It’s mostly about their introduction of the annealing
algorithm which will choose in several steps for the optimal assignment. If you figure it out in the future,
please share with me
6. Emulab stores physical link information
Still in assign_wrapper.in, during parsing the assign result “ASSIGNFP”, Emulab will store the information
of physical links (plinks) as follow:
That is to read every edges in the assign result file, by convention, in plinks. Plinks is indexed by virtual
name and contains (pnodeportA,pnodeportB) which means from one port of one physical node to which
another port of another node. The delay node is always the second entry (pnodeportB). It is related to
the delay node we want, we’ll see later.
7. Emulab converts the plinks into vlans, delays, and portmap
Still in assign_wrapper.in, now Emulab has already gotten the physical information from the assign
result, what Emulab want to do next is to convert the physical information into internal data structure
like vlans, delays, and portmap. Then update these variables into DB again for future use.
Emulab will loop every physical link to get all the information.
foreach $plink (keys(%plinks))
In each iteration of plink, there are several cases, like:
if (($lan,$virtA,$virtC) = ($plink =~ m|^linksdelaysrc/(.+)/(.+),(.+)$|))
(Node has a single entry in lan. Node is nodeportA, Delay node is nodeportB)
Or:
elsif (($lan,$virtA) = ($plink =~ m|^linkdelaysrc/([^/]+)/(.+)$|))
(Node may have multiple entries in lan, Delay node is nodeB and portB.)
And etc…
In each case, you can see the parameters about delay will be get from the previous local variable
delaylinks, and then put into the local variable $nodedelays:
my ($member0,$delay,$bandwidth,$backfill,$loss,$member1,$rdelay,$rbandwidth,$rbackfill,$rloss) =
@{$delaylinks{$plink}};
$nodedelays{$delayid++} = [$nodeB,$portB,$portD,$lan,$member0,$delay,$bandwidth,$backfill,$loss,
$member1,$rdelay,$rbandwidth,$rbackfill,$rloss];
This nodedelays variable will be used later for uploading all the delay related parameters to DB.
8. Emulab upload the updated information to DB
This is the second time Emulab put information to DB, first time Emulab put user’s input into the DB, this
time Emulab has already run assign, and want to put the updated informations into DB again.
Still in assign_wrapper.in, in Step 4 - Upload to DB, you’ll see Emulab want to upload the delay
information through:
foreach $delayid (sort {$a <=> $b} keys(%nodedelays))
You’ll see Emulab put the delay information into delay table:
DBQueryFatal("insert into delays " …….
Of course, there are other tables Emulab should upload, they are the same as the delay table.
One thing for clarification is that you may see there is information regarding link delay:
foreach $delayid (sort {$a <=> $b} keys(%linkdelays))
That is another kind of delay which uses end node delay, different from the delay node.
So that’s it! This is an entire process about Emulab read/wirte DB, run assign, and etc…
I write this based on the delay information. You can find other information in Emulab the same as this
process.
Now you’ve already know how Emulab reads user’s input, puts them into DB, runs assign based on the
virtual/physical information, updates DB for the second time. What I want next is how Emulab is related
to Dummynet.
9. Emulab event system
As you may know, Emulab has its own event system; you can find plenty of docs regarding its event
system. Still I want to know how Emulab deal with its delay events, which is called delay agent.
Delay agent is the agent used in Emulab event system for coordinating to control traffic shaping.
Changes can initiate anywhere, like automatic timed changes from Emulab, or manual changes from
Emulab server or a node. Delay agent allows for reactive traffic shaping, trace playback, etc.
So you need to read main.c (testbed/src/testbed5.0/event/delay-agent). Delay agent will be a process
which will be triggered by delay events.
You can see Emulab first create a raw socket to configure Dummynet through setsockopt:
s_dummy = socket( AF_INET, SOCK_RAW, IPPROTO_RAW );
Then delay agent has its own event subscribe:
if (event_subscribe(handle, agent_callback, event_t, NULL) == NULL)
So go to (testbed/src/testbed5.0/event/delay-agent/callback.c):
void agent_callback(event_handle_t handle, event_notification_t notification, void *data)
This function is called from the event system when an event notification is recd. from the server. It
checks whether the notification is valid (sanity check). If not print a warning, else call handle_pipes
which does the rest of the job.
According to that clue, go all the way from function handle_pipes, to function handle_link_modify, to
function set_link_params.
set_link_params function is the function which set all the previous parameter related to Dummynet to
Dummynet.
As you can see inside the function:
if (setsockopt(s_dummy,IPPROTO_IP, IP_DUMMYNET_CONFIGURE, &pipe,sizeof pipe)<0)
This is the function which will setup the IP_DUMMYNET_CONFIGURE to the Dummynet socket.
10. Emulab calls Dummynet
So now you need to see ip_dummynet.c (testbed/src/cron_branch/pelab/bw-
bottelneck/backfill_dummynet). It is very important for you to know that actually this file is the exact
file which runs on FreeBSD/Linux for Dummynet function. But you have one copy of it in your own
source file, if you want to change the function of Dummynet, you should change on that file, and put
that file to the original directory in FreeBSD/Linux for compile.
Here you will see a function:
static int ip_dn_ctl(struct sockopt *sopt)
This is the function handles for the various dummynet socket options (get, flush, config, del), it will react
to the IP_DUMMYNET_CONFIGURE Eumlab made just now.
As you can see, there are several requests Dummynet can handle, including: IP_DUMMYNET_GET,
IP_DUMMYNET_FLUSH, IP_DUMMYNET_CONFIGURE, and IP_DUMMYNET_DEL.
If you look into function:
static int config_pipe(struct dn_pipe *p)
That will be the exact function which setup pipe or queue parameters in Dummynet.
OK, so far you know how Emulab interact with Dummynet, I may want to go more deep into the
Dummynet source code in the future, because till now we can know how to change code based on our
software emulator.
11. Where can I find some architecture documents about the source code?
https://users.emulab.net/trac/emulab/wiki/Arch
12. How to build your own delay kernel
https://users.emulab.net/trac/emulab/wiki/kb96
Reference
[1] “A Solver for the Network Testbed Mapping Problem”, Robert Ricci Chris Alfeld, Jay Lepreau, School
of Computing, University of Utah, Salt Lake City, UT 84112, USA
[2] https://users.emulab.net/trac/emulab/wiki