AWS MeetupSan Francisco
Agenda
• Why do we need Log analytics?
• Intro to ELK
• What is Logz.io
• Installing ELK on your own
• Our Architecture
• EC2 machine comparison
Why do we need Log analytics?
Werner VogelsAWS CTO
“Log Analytics is Fundamental for
Building Cloud Applications”
Product Management
BusinessAnalysis
CustomerSuccess
BI
Monitoring
DevOps
IoT
Troubleshooting
Support
QA
IT OPPS , ITOA
Compliance
SecOpsSIEM
Multiple Use-Cases
Log driven development
• Errors, Warnings and exceptions
• Metrics
• Alerts
• Dashboard
WhyOpen
Source
*based on Logz.io research
The Market is Dominated by Open Source Solutions
Over the past 3 years, the market shifted attention from proprietary to open source
ELK Stack, 400,000+ companies
Splunk, Sumo Logic, Loggly, - 20,000
companies
Graphite has > 1M companies using it
ELK Popularity
Intro to ELK
Logstash
• Streaming data digestion
• Time normalization
• Field extraction
Elasticsearch
• Schema-less search DB
• Highly scalable
Kibana
• Visualization
Open source ELK +/-
Simple and beautifulIt’s simple to get started and play with ELK and the UI is just beautiful
Open SourceThe largest user base with a vibrant open source community that supports and improves the product
Fast. Very fast.Built on the Elasticsearch search engine, ELK provide blazing quick responses even when searching through millions of documents
Hard to ScaleData piles up and organization experience usage bursts. It’s super-complex building elastic ELK deployments that can scale up and down
Poor SecurityLogs include sensitive data and open source ELK offers no real security solution, from authentication to role based access
Not Production ReadyBuilding production ready ELK deployment is a great challenge organization face. With hundreds of different configurations and support matrix, making sure it’s always up is difficult
Up and running in minutesSign up in and get insights into your data in minutes
Logz.io Enterprise ELK Cloud Service
Production readyPredefined and community designed dashboard, visualization and alerts are all bundled and ready to provide insights
Infinitely scalableShip as much data as you want whenever you want
AlertsUnique Alerts system proprietary built on top of open source ELK transform the ELK into a proactive system
Highly AvailableData and entire data ingestion pipeline can sustain downtime in full datacenter without losing data or service
Advanced Security360 degrees security with role based access and multi-layer security
Installing ELK on your own
Prototype
• Installing ELK stack on a single server – 1hr
• Shipping one type of log – 1hr
• Log parsing – 2 hr
• Building Kibana Dashboard – 2hr
• 6 hours to get a simple Prototype
Turning ELK Production ready
OS Level OptimizationElasticsearch require a lot of OS level optimization in order to run properly.
Elasticsearch
Shard AllocationOptimizing insert and query times can be tricky and require a lot of attention.
Index ManagementBecause deletion is an expensive operation Index management is required for log analytics solutions
Zone awarenessThis is specific for AWS and required to achieve high availability
Cluster TopologyElasticsearch clusters require 3 Master nodes, Data nodes and Client nodes.
Bulk inserts OptimizationOptimizing insert time and latency
Capacity provisioningNeed to account for log bursts and be able to provision enough capacity.
Elasticsearch (2)
Archive (DR)Snapshot the data to a different repository for disaster recovery
Mapping managementMapping conflicts and sync issues need to be detected and addressed
MonitoringMarvell does a good job but require DevOps constant attention
CuratorRemove or optimize old indices
Alias managementFor better cluster control you need to define and use aliases
Data parsingExtracting values from text messages and enhancing them with geo user agent etc.
Logstash
High AvailabilityRunning logstash in a cluster is not trivial.
ScalabilityDealing with increase of load on the logstash servers
Burst ProtectionLogs tend to be bursty – A special buffer like Redis, Kafka etc. is required to front logstash
Rejection from ElasticsearchElaticsearch rejects about 1% of messages due to mapping issues –This needs to be addressed
Configuration managementA special infrastructure need to be in place to allow config changes with no data loss
SecurityKibana by default has no protection. User authentication is required to be implemented
Kibana
High AvailabilityRunning Kibana in a cluster for upgrades and high availability.
Role based accessIf you want to restrict access to certain information this capability needs to be developed
AlertsAlerts is not part of the open source.
Anomaly DetectionBasic anomaly detection is missing from the Kibana
Pre Canned DashboardsBuilding Dashboards and visualization in Kibana is tricky and require special knowledge
Turning ELK Production ready
~ 4-6 weeks of work
UpgradesChallenging to upgrade – need to be aware of backward compatibility.
Maintenance
Overall cluster healthMonitor the health of the environment
AWS IssuesDealing with AWS stability issues
Mapping conflictsDeal with arising mapping conflicts
Personnel redundancyNeed to have multiple people with deep knowledge of the stack
Capacity increase Provision additional capacity and grow the cluster.
Our Architecture
Ha Proxy
Listener
Listener
Listener
Listener
Kafka
Log Engine
S3
Elasticsearch Play server
Curator
Hot/Cold migration
DLQAlert
Engine
Kibana
Monitoring: ELK, Graphite, Nagios etc.
Shard optimizer
Log Engine
Logstash
API Gateway
Cluster Protec-
tion
Demo
AWS Server ComparisonMachine Number TB/Day
M1.xlarge 4 0.6
i2.xlarge 4 1
C3.8xlarge 6 1.5
C4.2xlarge + 1TB EBS 3 1.3
Questions?