Post on 14-Jan-2016
transcript
Dynamic AWS Server UsageUsing Nagios Core
orHow to pay only for what you need
Eric Loyderic@bitnetix.com
877.33.VOICE@Bitnetix @SmartVox
About Bitnetix
3
About Eric Loyd and Bitnetix
Founder and CEO of Bitnetix Incorporated
VoIP services and IT/network consulting
Over 25 Years in IT and management at places like
Eastman Kodak
Frontier Communications / Global Crossing
Rochester Institute of Technology
Bitnetix started its eighth year in July, 2013
Digital Rochester GREAT Award Finalist in:
2012 for Communications Technology
2013 for Rising Star
Using Nagios since 2004
© 2013 Bitnetix Incorporated
History of SmartVox:Bitnetix’s VoIP Platform
5
History of SmartVox, our VoIP Platform
Pre-2012 – not yet called SmartVox
Bitnetix primarily focused on IT consulting
VoIP service was ~10% of business with servers located primarily at client sites
Custom Asterisk-based servers running FreePBX
We ran customer’s network so we had control over VoIP
2012 – Focus switched to VoIP
Focused now on hosted VoIP solutions
Made use of Amazon Web Services EC2 VPSOne per customer with no proxies* or media servers
Network/bandwidth was only customer responsiblity
© 2013 Bitnetix Incorporated
6
History of SmartVox, our VoIP Platform
2013 – SmartVox name born
Copyright, trademark, domain name, biz cards, etc.
Third generation born with multiple proxies, registrars, configuration servers, and media servers
June – Started Mission Matrix program & sales
AWS architecture leveraged for geography
Each customer gets own EC2 server
Proxies to closest zone, secondary “to the west”
Media servers located in zones base on number of simultaneous calls, conferences, etc.
VMs and CDRs stored in database
© 2013 Bitnetix Incorporated
Brief Overview of AWS
8
AWS EC2 Concepts
AWS – Amazon Web Services
Collection of cloud-based services:Storage (S3), DNS (Route 53), CDN, Server (EC2)
EC2 - Elastic Compute Cloud
Virtual servers in AWS datacenters (zones)US (3 = VA, CA, OR), EU (1), Asia (3), SA (1)
Persistent storage & flexible IP address assignment
Pay by the hour that it’s up, storage and bandwidth
Spot instances – “temporary” EC2 servers
Bring online as needed, terminated when shut down
© 2013 Bitnetix Incorporated
9
AWS EC2 Costs
LOTS of variables, but reasonable potential costs:
Reserved servers cost about $2.00 per day
Reserved instance pricing is contractual and static, based on size
Spot servers cost between $0.50-$2.50 per day
Spot instance pricing is dynamic, we assume ~$0.10 per hour
We quantize concurrent calls into 50-call blocks
One media server = 50 calls = 1 spot instance
Two media servers = 100 calls = 2 spot instances
Bandwidth and storage will add ~10%
Reducing AWS usage reduces cost
We keep these savings for ourselves. Shhhh!!!
© 2013 Bitnetix Incorporated
Why Nagios?
11
Why Nagios?
Extensive experience using it for clients
Bitnetix is a Nagios reseller
Needed centralized monitoring software
Integrate with Twitter for notifications
Integrate with Eventum via email for trouble tickets
Zero cost
Framework
Leverage SSH, HTTP, check_mk and livestatus!!
Custom checks and notifications (very important)
Ability to “cookie cutter” installs for AWS
© 2013 Bitnetix Incorporated
12
Initial Hurdles
Customer Premise Equipment
No real control over CPE choicesRouters block some traffic, “help” other traffic incorrectly
Need to be able to remotely [re-]configure phones
Figure out how to “cookie-cutter” EC2 servers
Customer boxes and SIP endpoints
Proxies and media servers
Wanted to monitor upstream providers as well
How to separate apparent from actual failure
Something’s broken, but overall service functional
© 2013 Bitnetix Incorporated
SmartVox Provisioning Process and Automation
14
SmartVox Network
DNS SRV records are key to redundant servers
© 2013 Bitnetix Incorporated
Sends the call on to the correct
phone/media server (VM, etc)
Figures out what customer should receive the calls
Sends incoming calls to
one/more border proxies
Provider
Border Proxy
Customer Proxy
Customer Proxy
Border Proxy
Customer Proxy
15
Provisioning Process
SmartVox AWS EC2 Provisioning Database
Customer information
Account (location/division/etc) information
Number of phones*, VM boxes, etc.
Computes how many proxies customer needs
DNS SRV records created for batch updates
Media server/VM entries created automatically
Phone provisioning info created automatically
Automatically places order for phones* (+some)
Phones drop-shipped to customer in about 3 days
© 2013 Bitnetix Incorporated
16
AWS EC2 Automation: Spot Instance API
Create spot instance -> gives request ID
Instance created with SmartVox created base image
Wait a bit -> query request ID -> get instance ID
Query instance -> get IP address
Update DNS with server information and IP
Update Nagios with server information and IP
When spot instances shut down, they terminate
No more expense for “burstable resources”
This sounds like a Nagios event handler…
© 2013 Bitnetix Incorporated
17
AWS EC2 Automation: Our Custom Image
SmartVox media server image includes Asterisk
Asterisk told to exit after waiting for calls to terminate
Startup script shuts down system after Asterisk exits
Instant “spot instance”Bring it online when needed, and terminate as required
Same basic idea for starting/stopping proxies
These tend to be more static than media servers
Platform can be adjusted automatically
COGS adjusts appropriately
Hey, let’s hook this up to Nagios!!© 2013 Bitnetix Incorporated
18
AWS EC2 Automation: More ideas
Quick aside about spot instances. Useful for:
Database dumps
Spot instance turned up to do MySQL copies
Run reports, dump, compress, purge, etc & term
Distributing web server load
Pop up another server and add to DNS
Instant on-demand capacity
Anything that you only want to do repeatedly but not for a long time, and only when you want to (or maybe if you have to)
© 2013 Bitnetix Incorporated
Use Nagios for:ProvisioningMonitoring
Capacity Planning
20
Provisioning
Rather than create EC2s, we just update Nagios
Automatically regenerate SIP proxy and media server dynamic_hosts.cfg file as part of provisioning process
Nagios looks for host up, doesn’t find it, fires off handler
Event handler queries EC2 to see if it’s being turned up (~10 min) or just not running. If it’s not running, it starts it.
DNS is batch updated every hour. 59 min TTLs
Phone provisioning handled via automatic extract from database to create HTTP served configuration files
Master/slave “config servers” (also in AWS) to send all this stuff to customers, with a URL to activate phones
Entire process from signature to functional < 1 week
© 2013 Bitnetix Incorporated
21
Monitoring
Nagios looks for hosts (see previous slide)
Automatically creates them if needed
Note that SIP proxies are not spot instancesDedicated to lifespan of customer/account so they are only terminated as part of de-provisioning process
Nagios looks at health of services
Determine if we have faults, outages, etc.
Can potentially reroute automatically (DNS SRV!)
Store performance info for capacity calculations
Notifications via Twitter and email
Come back tomorrow at 10:30 for how this works
© 2013 Bitnetix Incorporated
22
Capacity Planning
Quantize by 50 simultaneous calls per server
Perf data used to calculate historical usage
Can use cron to automatically add/remove servers
Nagios figures out “deltac” in current usage
If deltac = 0, we are just right (OK)
If deltac < 0, we have too much capacity (WARN)
If deltac > 0, we need more capacity (CRITICAL)
Event handler looks at state and either does nothing, tells least used box to stop Asterisk, or adds another box to the mix (see provisioning)
Capacity (and costs) dynamically adjust with usage
© 2013 Bitnetix Incorporated
23
Capacity Planning: DeltaC
deltac – Custom Nagios module
Looks at the last three times it ran on particular host
Quantized by 50 calls = change in 50-call volumes
If deltac = 0 then we return an OK state
If deltac < 0 then we are dropping call volumes and can SSH to a box and tell Asterisk to stop
This will then stop the spot instance and reduce cost
If deltac > 0 then we are gaining call volumes and trigger provisioning process
This will start a spot instance and increase cost
© 2013 Bitnetix Incorporated
Event Handler:DeltaC
25
How DeltaC Works
Let’s assume we’re creating a new hostec2-request-spot-instances ami-58296831 -p 0.04 --key "BTC EC2" --group Asterisk --instance-type m1.medium -n 1 --type one-time
Get back a “spotInstanceRequestId” (sir-722f4e34)
ec2-describe-spot-instance-requests sir-722f4e34
Get back an “instanceId” (i-6488e31f)
ec2-describe-instances i-6488e31f
Get back public IP address (ipAddress) of this machine
Now we have IP address and (internal) namePopulate DNS batch update queue
Regenerate /usr/local/nagios/etc/objects/dynamic_hosts.cfg
© 2013 Bitnetix Incorporated
26
DeltaC Saves Lives Money
Small percentage changes in usageresult in large changesin Cost Of Goods
For example:
© 2013 Bitnetix Incorporated
100 calls• 2 boxes• $0.20/hour• ~$75/year
500 calls• 10 boxes• $1.00/hour• ~$375/year
2000 calls• 20 boxes• $2.00/hour• ~$750/year
5000 calls• 50 boxes• $5.00/hour• ~$2000/year
Questions?
Eric Loyderic@bitnetix.com
877.33.VOICE@Bitnetix @SmartVox