#lspe: Dynamic Scaling

Post on 05-Jul-2015

352 views 0 download

Tags:

description

Presentation given to the #lspe meetup (Large Systems Performance Engineering) on February 21, 2013 by Steve Shah. Topic for the night was Dynamic Scaling. This presentation is titled "Shock Absorbers and APIs" and covers features typical of ADCs (modern load balancers) that can help in managing scale as well as give a quick overview of what to expect from an API in an ADC.

transcript

Sr. Director, Product Management

February 21, 2013

#lspe: Dynamic ScalingShock Absorbers and APIs

Steve Shah

Disclaimer

• I’m going to talk about a product.Iot’s kind of necessary in order to make this talk useful.

ᵒBut a lot of you have this product or know someone that does!

ᵒThe product is pretty cool…

Iot can also sing and dance.

ᵒMaking coffee is on the roadmap.

• Sorry. ᵒYes, I am marketing scum.

ᵒNo, I will not to do a hard sell.

• My CompetitionᵒGoogle it. No really… It’s not hard to find them.

ᵒTheir product has various approaches too. I encourage you to ask them.

Performan

ce Offload SecurityAvailability

What is NetScaler?

NetScaler powers some of the world’s largest infrastructures.

1998 to 2012: From Load Balancing to Virtual

Networking

1998

L4 SLB

1999

L7 SLBGSLBMUX

2002

SSLCMPDNS

2005

AppFWSIPAAA-TM

2003

SSLVPNRHI

2006

ICAIPv6

2008

XMLnCore

2009

VPXEdgeSight

2011

SDXAppFlowDataStream

Secret Decoder Ring:SLB = Server Load BalancingGSLB = Global Server Load BalancingMUX = HTTP MultiplexingSSL = SSL AccelerationCMP = HTTP CompressionDNS = DNS Load Balancing / Proxy

RHI = Route Health InjectionICA = App Proxy for ICAIPv6 = IPv6 Routing, Switching, LBXML = XML Security, RoutingVPX = Virtual NetScalernCore = multi-core scalingSDX = Multi-tenant NetScaler

Agenda

• Things That Impact Scalability

• Shock Absorbers

• Out Scaling

• Your ADC has an API!

Things That Impact ScalabilityTouching on a bit of theory…

Load is Not Linear

• There are startup costs for enabling features in an ADC (memory and CPU)

• However, each incremental request takes a small fraction of resources

• As load increases, some global functions can take resources as wellᵒE.g., flushing unused IP fragments, running timers, management overhead, etc.

Data Structures and Big O

• I/O, Data structures, and String processing are big factors

• The two that get you are data structures and stringᵒACLs, VLANs, connection table, connection state, persistence table, etc.

ᵒHTTP request processing and policy execution

• Know your Big O – understand their impactᵒBig O notation is how programmers describe efficiency of algorithms

ᵒE.g., O(n) vs. O(log n) vs. O(1)

Shock AbsorbersCoping with Load

Launching v8: The Role of Data Structures

• Story time… launching a major service and what we learned

• Major new roll-out – expected to double the number of servers to handle

• Early testing revealed that large numbers of slow connections are meh

• Invest in your data structures! Clean up on several core structures

• Average connection lookup time driven to near constant time: O(1)

• Stir in a team that dreams in assembly language and can see cache

misalignment by glancing at code and shave another 20% off connection

lookup times (absolute times)

• Lesson: drive your apps to good data structures. Drive your vendors to do

better.

MaxConns and SurgeQ

Typical server performance curve

Peak perf – we want to

stay there

Incoming load

MaxConns and SurgeQ

Server stays operating at maximum throughput

Set max conns here

Queue incoming requests

in the ADC

Story time:

When 4 Hurricanes Hit

Out-Scaling

The SR-71 Approach: Go Faster

• Single Systemcoonfigured and managed as a

single logical system

• Scalablesocales with number of devices

(distributes work)

• Fault TolerantᵒHandles device failure, addition…

• Dynamic

Treat a collection of NS devices like a grand unified “big” device

The Sheet-metal Test

Steps:

• Take a cluster of NS, and an L2 switch.

• Configure the devices to your liking.

• Wrap the whole thing with sheet-metal, such

that only the network ports remain exposed.

Test:

Must be able to configure and use this contraption as

if it were just another NS box.

• connect wires into any visible port(s), create

LAGs at will, enable L2 mode, MBF …

• point GUI to Cluster’s IP and configure away

Clustering

• Create a single system image out of a collection of instancesIonstances = virtual machines, physical instances, or instances on multi-tenant boxes

• True shared management + data plane (the sheet metal test)

• Shared state for key data structures (persistence, health check, etc.)

• Linear scale by adding instances (up to 32)

• Ability to manage faults with proportional degradation

Real-timeAnalytics

Bandwidth

Connections

Top ‘N Requests

Response Time

Frequency

Policy Based Traffic Selection

Policy BasedActions

DecisionFeedback loop

Compress

Cache

Log

Drop

Respond

Scaling Globally

Global Server Load Balancing(GSLB)

Route Health Injection(RHI)

NetScaler uses DNS to send users to the closest site based on administrator defined metrics (geography, topology,

site performance, availability)

NetScaler dynamically updates routing tables to direct clients to the active site based on real-time health

monitoring of backend infrastructure.

Active

SiteMirror

Site

Your ADC Has an API!

API in a Nutshell: Your ADC Has This

API

Interfaces Client Toolkits Policy Statistics

SOAP RESTfulScripting

Perl/PHP/Python/PowerShell

OOPJava/C#/ASP/

.NET based

Reverse Call-Out

JSON/XML Bulk

ReportingGranular Reporting

More RESTful - HTTP Status Code

REQUEST RESPONSE

Citrix Confidential - Do Not Distribute

Success Case:GET http://<nsip>/nitro/v1/config/lbvserver/lbv1

Failure Case:POST http://<nsip>/nitro/v1/config/lbvserverContent-Type:application/vnd.com.citrix.netscaler.lbvserver+json

{"lbvserver":{"name":"lbv111", "servicetype":"HTTP"}

}

Success Case

HTTP 200 OK

Failure Case:

HTTP/1.0 409 Conflict

{"errorcode": 273, "message": "Resource already exists", "severity": "ERROR"

}

Indicate we want “rollback on failure” in this session

Prepare 3 lbvservers to be added in one bulk operation

Print results

Example: Using Java

Output

No attempt to add

“lb3” because of

Rollback behavior

AutoSense and AutoScale

CloudStack

Internet

NetScaler is auto-provisioned by CloudStack

M

M

M

M

NetScaler monitors servers for CPU, Memory, Latency, Throughput …NetScaler monitoring engine auto-detects abnormal behavior with servers

M

M

NetScaler triggers AutoScale capability in CloudStackCloudStack “auto-provisions” new server instances based on AutoScale policyOn successful AutoScale, CloudStack provides new service descriptionsNetScaler automatically adds new service resources and does bindingsTraffic is automatically scaled to the newly added services on NetScaler

M

M

Work better. Live better.