Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | silvester-bruce |
View: | 219 times |
Download: | 6 times |
Before We Start
• How to read a research paper?
• How to write a paper review?
1
Reading A Research Paper
• Differs from reading a text book
• May not always provide all background details
• Iteratively activity
• Often requires obtaining additional background or skipping details
Active Reading
• Use a highlighter
• Write questions down
• Underline
• Go back and forth between sections
Organization• Abstract
• Introduction
• Background/Related work
• Proposed Technique
• Evaluation
• Conclusion
Abstract• Usually less than 300 words. Summarizes the
paper including brief motivation, high level description of contribution and results/ conclusions
• Stands on its own. The abstract of a paper can be removed and the rest of the paper is still complete
• Don’t need to define all terms in an abstract
• If you do define them you need to re-define in proper part of paper
Introduction
• This provides the high level motivation for the work. If the introduction is not compelling the paper will not be read
• Should also state what problem is being solved, what other work has been done that is similar, and why this work is unique
• Often ends with a list of the contributions
Introduction
VMGuest OS
App
Under client web service control
oversubscription
Motivation
• The principle bottleneck in large-scale clusters is often inter-node communication bandwidth
• Two solutions:Specialized hardware and communication protocols
e.g. Infiband, Myrinet(supercomputer environment)
Cons: No commodity parts (expensive) The protocols are not compatible with TCP/IP
Commodity Ethernet Switches Cons: Scale poorly non-linear cost increases with cluster Size.
high-end core switch, oversubscription(tradeoff)
Oversubscription Ratio
Server 1
B
……………
Server 2Server n
B
B
Upper Link Bandwidth(UB)
Oversubscription Ratio= B*n/UB
Current Data Center Topology• Edge hosts connect to 1G Top of Rack (ToR) switch• ToR switches connect to 10G End of Row (EoR) switches • Large clusters: EoR switches to 10G core switches
Oversubscription of 2.5:1 to 8:1 typical in guidelines
Key challenges: performance, cost, routing, energy, cabling
Data Center Cost
Design Goals
• Scalable interconnection bandwidth Arbitrary host communication at full bandwidth
• Economies of scale Commodity Switch
• Backward compatibility Compatible with hosts running Ethernet and IP
Related Work• Sometimes combined with background
• Briefly describes other similar research papers and tells how the work presented in the current paper differs
• Not meant to be a complete summary of each paper
• Usually categorizes other work into like groups. Often found right before conclusions
Background
• Any technical background that is required to understand the techniques that are presented should appear here.
• Often has a motivating example that can be used to explain the ideas.
Proposed Technique
• This is where you provide detail of the new material in your paper. You can present algorithms/methods/processes, etc.
• If you have a running example from the intro or background you can use this to illustrate your ideas.
Fat-Tree Topology
k/2 servers in each Rack
k/2 Edge Switches in each pod
k/2 Aggregation Switch in each pod
K Pods
Fat-tree Topology Equivalent
Routing
IP needs extension here!
(k/2)*(k/2) shortest path!
Single-Path Routing VS Multi-Path RoutingStatic VS Dynamic
ECMP(Equal-Cost Multiple-Path Routing)
Static Follow scheduling
limited multiplicity of path to 8-16 Increase routing table multiplicatively, hence latency time Advantage: Don’t need reordering! Modern Switch support!
Extract Source and
Destination Address
Hash Function(CRC1
6)
Determine which region
fall in
1 2 3 4
0 Hash-Threshold
Two-level Routing Table
• Routing Aggregation 192.168.1.2/24
192.168.1.10/24
192.168.1.45/24
192.168.1.89/24
192.168.2.3/24
192.168.2.8/24
192.168.2.10/24
0
1
192.168.1.0/24 0
192.168.2.0/24 1
Two-level Routing TableAddressing
• Using 10.0.0.0/8 private IP address
• Pod Switch: 10. pod. Switch.1. pod range is [0, k-1](left to right)
switch range is [0, k-1] (left to right, bottom to top)
• Core Switch: 10. k. i . j (i,j) is the point in (k/2)*(k/2) grid
• Host: 10.pod. Switch.ID ID range is [2, k/2+1] (left to right)
Two-level Routing Table10.0.0.1 10.0.0.2 10.0.0.3 10.4.1.1
10.4.1.2
10.4.2.1
10.4.2.2
10.2.0.2 10.2.0.3
Two-level Routing Table
• Two-level Routing Table Structure
• Two-level Routing Table implementation TCAM=Ternary Content-Addressable MemoryParallel
searchingPriority encoding
Two-level Routing Table---example
• example
Prefix Outgoing Port
10.0.0.0/24 0
10.0.1.1/24 1
0.0.0.0/0
Suffix Outgoing Port
0.0.0.2/8 3
0.0.0.3/8 2
Prefix Outgoing Port
10.0.1.2/32 0
10.0.1.3/32 1
0.0.0.0/0
Suffix Outgoing Port
0.0.0.2/8 2
0.0.0.3/8 3
Prefix Outgoing Port
10.0.0.0/16 0
10.1.0.0/16 1
10.2.0.0/16 2
10.3.0.0/16 3
Prefix Outgoing Port
10.2.0.0/24 0
10.2.1.0/24 1
0.0.0.0/0
Suffix Outgoing Port
0.0.0.2/8 2
0.0.0.3/8 3
Prefix Outgoing Port
10.2.0.2/32 0
10.2.1.3/32 1
0.0.0.0/0
Suffix Outgoing Port
0.0.0.2/8 2
0.0.0.3/8 3
Two-Level Routing Table
• Avoid Packet Reordering
• traffic diffusion occurs in the first half of a packet journey
• Centralized Protocol to Initialize the Routing Table
Flow Classification(Dynamic)
• Soft State(Compatible with Two-Level Routing Table)
• A flow=packet with the same source and destination IP address
• Avoid Reordering of Flow• Balancing
• Assignment and Updating
Flow Classification—Flow Assignment
Hash(Src,Des)
Have seen this hash value?
Lookup previously assign port x
Send packet on port x
Y
Record new flow record f
Assign f to least-loaded port x
Send packet on port x
N
Flow Classification—Update
•
Flow Scheduling
• distribution of transfer times and burst lengths of Internet traffic is long-tailed
• Large flow dominating
• Large flow should be specially handled
Flow Scheduling
▪ Eliminates global congestion▪ Prevent long lived flows from sharing the same
links▪ Assign long lived flows to different links
Edge Switch
Detecting Flow size above a
threshold
Notify the
central controlle
r
Assign this flow to non-conflicting
path
Fault-Tolerance
• Bidirectional Forwarding Detection session (BFD)
• Lower- to Upper-layer Switches
• Upper-layer to Core Switches
• For flow scheduling, it is much more easier to handle.
Failure b/w upper layer and core switches
Outgoing inter-pod traffic:local routing table marks the affected link as unavailable and chooses another core switch
Incoming inter-pod traffic:core switch broadcasts a tag to upper switches directly connected signifying its inability to carry traffic to that entire pod, then upper switches avoid that core switch when assigning flows destined to that pod
Failure b/w lower and upper layer switchesOutgoing inter- and intra pod traffic from lower-layer:
– the local flow classifier sets the cost to infinity and does not assign it any new flows, chooses another upper layer switch
Intra-pod traffic using upper layer switch as intermediary:– Switch broadcasts a tag notifying all lower level
switches, these would check when assigning new flows and avoid it
Inter-pod traffic coming into upper layer switch:– Tag to all its core switches signifying its ability to carry
traffic, core switches mirror this tag to all upper layer switches, then upper switches avoid affected core switch when assigning new flaws
Evaluation
• A strong paper has very extensive evaluation, including testbed implementation based and simulation based evaluation.
• Parts of this section include: methdology, metrics used, results, comparison, etc
Experiment Description—Fat-tree, Click
• 4-port fat-tree, there are 16 hosts, four pods (each with four switches), and four core switches.
• We multiplex these 36 elements onto ten physical machines, interconnected by a 48-port ProCurve 2900 switch with 1 Gigabit Ethernet links.
• Each pod of switches is hosted on one machine; each pod’s hosts are hosted on one machine; and the two remaining machines run two core switches each.
• bandwidth-limited to 96Mbit/s to ensure that the configuration is not CPU limited.
• Each host generates a constant 96Mbit/s of outgoing traffic
Experiment Description—hierarchical tree,click
• four machines running four hosts each, and four machines each running four pod switches with one additional uplink
• The four pod switches are connected to a 4-port core switch running on a dedicated machine.
• 3.6:1 oversubscription on the uplinks from the pod switches to the core switch
• Each host generates a constant 96Mbit/s of outgoing traffic
Result
Power and Heat
End of the Paper
• Conclusions and Future Work:
‣ Summarize your results and any conclusions drawn
‣ Describe briefly the main areas you plan to pursue as future work
Conclusion
• Bandwidth is the scalability bottleneck in large scale clusters
• Existing solutions are expensive and limit cluster size
• Fat-tree topology with scalable routing and backward compatibility with TCP/IP and Ethernet
What Makes A Good Research Paper
• Good methodology/technique
‣ Novel: new and not resembling something formerly known or used
• Good writing
• Publish in good journals/conferences
Paper Reviews
• We will write a series of paper review in this class
• First review due on next Tuesday in class for this paper
• Hand in a hard copy
Paper Review Form• Paper (author, title, complete reference)
• Short 100 word overview description
• Questions:
‣ What problem is the paper addressing?
‣ What is the contribution?
‣ Did you find any drawbacks?
‣ What is your assessment of the overall presentation style of the paper (consistency, clarity, ease of reading)?
‣ Any possible future work or improvements?