Date post: | 12-May-2015 |
Category: |
Technology |
Upload: | jeff-squyres |
View: | 1,184 times |
Download: | 2 times |
Cisco Public 1© 2013 Cisco and/or its affiliates. All rights reserved.
Open Source for CiscoHigh PerformanceComputingDr. Jeffrey M. Squyres
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 2
My talk today
1. Who am I?
2. Cisco and Open Source
3. My Open Source work at Cisco
Who am I?
Insert photo here
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Who am I?
Me
Technical Lead at Cisco Systems
Server division, VIC group
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Who am I?
I am not in marketing
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Who am I?
I cannot fix
your Linksys
router for you(perhaps you should try
DD-WRT)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Who am I?
I write code
Lots of code
All day
Every day
Awwww… yeah
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
I work in Open Source
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
I work in these Open Source projects
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
I work in Open Source
Open MPI
Hardware Locality (hwloc)
OpenFabrics
Linux kernel
Vast majorityof my work
is here
I’ve mademinor contributions
to these other 3
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
My background story…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
My background: research and academia
Undergrad, grad Post doc
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
My background: research and academia
LAM/MPI
I inherited this
I founded this
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Open MPIPACX-MPI
LAM/MPI
LA-MPI
FT-MPI
Sun CT 6
Project foundedin 2003,
merging multiple open sourceMPI projects
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
My own naievete:This is how I thought it would be
PACX-MPI
LAM/MPI
LA-MPI
FT-MPI
Sun CT 6
Me
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
…but this is what actually happened
PACX-MPI
LAM/MPI
LA-MPI
FT-MPI
Sun CT 6
Me
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
…and then this happened
PACX-MPI
LAM/MPI
LA-MPI
FT-MPI
Sun CT 6
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
…and then this happened
Us
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
What I learned
Differences=
Good
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
When you write code no one else will see…
You writeto your own
level ofexpectations
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
When your code is reviewed by your peers…
You writeto theirlevel of
expectations
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
When your code is reviewed by everyone…
You writeyour best code
Insert photo here
Cisco and Open Source
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Open Source around Cisco
More than just the 4 projectsI participate in
Much
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
Open Source around Cisco
More than just the 4 projectsI participate in
Much
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Latest major open source initiative
After the clouds part,
you are left with…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Latest major open source initiative
Major contributionto
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
OpenDaylight overview:SDN for everyone
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Pop quiz, hotshot
?Why does Cisco
do Open Source?
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Pop quiz, hotshot
Why does Ciscodo Open Source?
IT MAKES GOOD BUSINESS SENSE
• Stand on the shoulders of giants• Become part of the community• Contribute to tools / ecosystem
that we all use• Sell more products
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
1. Stand on the shoulders of giants
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
1. Stand on the shoulders of giants
Weare
elevated
Weelevateothers
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
2. Become part of the community
Circle oftrust
you
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
2. Become part of the community
Circle oftrust
This is whereyou need
to be
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
2. Become part of the community
Insert FOSS project
name hereThis is where
you needto be
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
3. Contribute
Pretend this isa really gross
picture of a leech
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
3. Contribute
Pretend this isa really gross
picture of a leech
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
3. Contribute
Pretend this isa really gross
picture of a leech
Just say noto leeches!
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
3. Contribute
Let’s be clear here…
FOSS is for everyone
EVERYONE(that’s kinda the point, right?)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
3. Contribute
Let’s be clear here…
FOSS is for everyone
EVERYONE(that’s kinda the point, right?)
So how can
there be leeches?
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
3. Contribute
I don’t contribute to everypiece of FOSS I use
Do you?
Of course not
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
3. Contribute
Those who can,should
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
3. Contribute
Big companies can contribute
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
3. Contribute
Individuals can contribute
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
3. Contribute
Small organizations can contribute
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
3. Contribute
Those who can,should
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
3. Contribute
Those who can,should
Submit a patch
Write a new pluginTest test test test testSubmit a GOOD bug report
Write documentation
Answer q
uestions
Suggest feature
s Evangelize
Send cash / beer
Use the software
Make schwag
Write a review
Review code
Obtain / provide funding
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
4. Sell more products
A giant
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
4. Sell more products
You, standingon the giant’sshoulders…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
4. Sell more products…in thecircle
of trust
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
4. Sell more products …contributingto the community
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Cisco UCSblade server
4. Sell more products
Cisco Nexus7000 router
Insert photo here
My Open Source work at Cisco
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Some background first…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
High Performance Computing (HPC)
Using supercomputers to solvereal world problems that are
TOO BIGfor laptops, desktops, or individuals servers
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
Generally speaking…
Supercomputer=
(Many) Racks of (commodity)high-end servers
(this is one definition; there are others)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
A typical supercomputer
Rack of36 1U
servers
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
How does that work?
Computational problem
Input Output
Take your computational problem…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
How does that work?
…and split it up!
Computational problem
Input Output
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
How does that work?
Computational problem
Input Output
Distribute the input dataacross a bunch of servers
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
How does that work?
Input Output
Use the network between serversto communicate / coordinate
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
How does that work?
Input Output
Use the network between serversto communicate / coordinate
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
How does that work?
Message Passing Interface (MPI)middleware is used for this communication
Input Output
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Why go to so much trouble?
Computational problem
One processor
hour
1 processor = …a long time…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Why go to so much trouble?
Computational problem
One processor
hour
One processor
hour
One processor
hour
21 processors = ~1 hour (!)Disclaimer: scaling is rarely perfect
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Coordination requires communication
This communicationmay happen a LOT
It therefore needsto be FAST
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Fast communication
Sourceserver
Destinationserver
Network
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
Let’s break it down
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Destination server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Ethernet switch
Port A
Port B
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Destination server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Ethernet switch
Port A
Port B
Let’s break it down
200 nanoseconds
299,792,458 m/s c
~8 microseconds(modern hardware) ~8-40 microseconds
Total:~17 – 81
microseconds
~40 microseconds(older hardware)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Really? Do we care about 17-81μs?
YES
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
Really? Do we care about 17-81μs?
• Intel Core i7 E5-2690 with turbo boost (3.5-3.8Ghz)“Sandy Bridge” 22nm processor
• LinX v0.6.4 (Linpack v10.3.4.007) benchmarkMeasures floating point operations per second
• 81.34 GflopsThat’s 81,340,000,000 floating point operations per second
17μs = 137,757,800 floating point operations
81μs = 656,375,400 floating point operations
Conclusion: yes, we absolutely care about 17-81μs!
Source: http://www.anandtech.com/show/4503/sandy-bridge-memory-scaling-choosing-the-best-ddr3
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 72
Network latency matters
HPC apps can do a LOT of computationduring network communication
Latency
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
How can we reduce network latency?
Hardware is faster than software.
The sooner software canhand off to hardware, the better.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Destination server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Ethernet switch
Port A
Port B
Let’s break it down
200 nanoseconds
299,792,458 m/s 299,792,458 m/s
~8 microseconds(modern hardware) ~8-40 microseconds
~40 microseconds(older hardware)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Destination server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Ethernet switch
Port A
Port B
Let’s break it down
200 nanoseconds
299,792,458 m/s 299,792,458 m/s
~8 microseconds(modern hardware) ~8-40 microseconds
~40 microseconds(older hardware)
Can’t do much about the speed of light
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Destination server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Ethernet switch
Port A
Port B
Let’s break it down
200 nanoseconds
299,792,458 m/s 299,792,458 m/s
~8 microseconds(modern hardware) ~8-40 microseconds
~40 microseconds(older hardware)
Can’t do much about the speed of light
Fastest Ethernet switches today are about 200ns(they’ll probably get a little faster over time)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Destination server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Ethernet switch
Port A
Port B
Let’s break it down
200 nanoseconds
299,792,458 m/s 299,792,458 m/s
~8 microseconds(modern hardware) ~8-40 microseconds
~40 microseconds(older hardware)
Can’t do much about the speed of light
Fastest Ethernet switches today are about 200ns(they’ll probably get a little faster over time)
8-40us is, by far, the biggest chunk of timeReduce this!
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Reducing server-side latency
What if we can skip some of these layers?
Who needs TCP? Raw L2 Ethernet frames, baby!
Who needs the operating system driver?
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Source server
HPC application
Operating system
MPI middleware
TCP stack
NIC driver
NIC hardware
Reducing server-side latency
What if we can skip some of these layers?
Who needs TCP? Raw L2 Ethernet frames, baby!
Who needs the operating system driver?
Let MPI talk directly to the NIC hardware
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Hardware usually only seen by the OS
Linux userspaceapplication
Linux kernel
Cisco VIChardware
Can I seethe hardware?
Please?
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 81
Hardware usually only seen by the OS
Linux userspaceapplication
Linux kernel
Cisco VIChardware
No.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 82
…except in special cases
Linux userspaceapplication
Linux kernel
Cisco VIChardware
Can I seethe OpenFabrics
hardware?
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 83
…except in special cases
Linux userspaceapplication
Linux kernel
Cisco VIChardware
Sure!
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 84
…except in special cases
Linux userspaceapplication
Linux kernel
Cisco VIChardware
Can I seethe OpenFabrics
hardware?Yay!
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
OpenFabrics Linux support
• Coalition of network vendors
• Successfully upstreamed “OS bypass for networking” into Linux
• http://www.openfabrics.org
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 86
Source server
HPC application
Linux
MPI middleware
VIC driver
Cisco VIC
Enabling direct MPI communication
Our project: enabling this MPI direct-to-hardware communication on Cisco serverswith the Cisco Virtual Interface Card (VIC)
in Linux.
Everything above the firmware will beopen source.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Kernel
Cisco VIC hardware
TCP / IP stack
Cisco VIC driver
Normal TCP software architecture
UserspaceUserspace sockets library
MPI library
Application
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 88
Kernel
Userspace verbs library
Cisco VIC hardware
Cisco USNIC software
MPI library
Userspace
Verbs IB core
Cisco USNIC driver
Bootstrappingand setup
Send and receivefast path
Application
Cisco codehere
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 89
…and the 4th project
Hardware Locality (hwloc)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 90
Hardware Locality (hwloc)
• Query your server’stopology
• NUMA nodesIncluding memory
• Processor sockets
• L3, L2, L1 cachesInstruction and data
• Cores
• Hyperthreads
• PCI devices
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 92
hwloc CLI
• Output formats supported:PDF, JPG, PNG, TIFF, FIG, …
Text (for console windows)
Curses
XML
• Great for feeding into scripts!
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
hwloc-bind: replaces numactl
• hwloc-bind socket:0.core:2 commandBind command to core 2 on socket 0
• hwloc-bind –getPrint a bitmap of your current bindings
• hwloc-bind --get | hwloc-calc -p -H socket.corePrint something more readable than a bitmap
• hwloc-bind --get | hwloc-calc -p -H socket.core.puEven show the hardware threads
• hwloc-ps [-a]Show where processes are bound
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 94
hwloc C API
• Get a tree data structure representing the topology
• Many API calls for manipulating / traversing the tree
• Typical actions:Get, set processor and memory bindings
React to cache sizes
• …everything you can do in the CLI, and more
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 95
Why is hwloc useful?
• Verify the internal topology of your serverHow much memory do you have?
Where is that memory?
What processor(s) are local to that memory?
How big are your L1, L2, L3 caches?
• Verify your internal PCI devicesDistinguish ethX devices from each other
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 96
Why is hwloc useful?
• Bind services to specific coresEnsure related services are on the same NUMA node
Put non-essential services on core 0 (e.g., NTP)
• Bind server-related servicesApache, Bind, NFS, …etc.
Increase performance by not letting them migrate
Keeps memory local, less inter-NUMA-node traffic
NTP etc.
Apache
NFS
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public 97
Cisco Public 9797© 2013 Cisco and/or its affiliates. All rights reserved.
“Open source is good.Open source works.
Thank you.