Date post: | 12-Jan-2017 |
Category: |
Software |
Upload: | danny-abukalam |
View: | 458 times |
Download: | 4 times |
Unravelling LogsMatt Jarvis - Head of Cloud Computing @ DataCentred
Traditional log file analysis ...
● Troubleshooting● Post incident forensics● Security auditing● Reporting and analysis
Nova Controller :
● nova-api.log● nova-cert.log● nova-conductor.log● nova-scheduler.log
Glance Server :
● api.log● image-cache.log● registry.log
Neutron Controller :
● openvswitch-agent.log● server.log
Network Node :
● openvswitch-agent.log● neutron-ns-metadata-proxy*.log● metadata-agent.log● dhcp-agent.log
Compute Node :
● openvswitch-agent.log● nova-compute.log
● INGEST CENTRALLY
● STRUCTURE
● INDEX
● ANALYZE
● Distributed search engine● Highly scalable● Super fast● HTTP interface
FIXME Kibana screenshot
● Collect● Parse● Transform
Log Shipping
● Lightweight log shipper● Written in GO● Minimal resource usage● SSL● Transformation capabilities
Log Courier
{ "general": { "log file": "/var/log/log-courier.log", "admin enabled": true }, "network": { "transport": "tls", "servers": [ "your.logstash.server:55516" ], "ssl certificate": "/var/lib/puppet/ssl/certs/yourcert.pem", "ssl key": "/var/lib/puppet/ssl/private_keys/yourkey.pem", "ssl ca": "/var/lib/puppet/ssl/certs/ca.pem", "timeout": 40 }, "files": [ { "paths": [ "/var/log/syslog" ], "fields": { "shipper": "log-courier", "type": "syslog" } },]
input { courier { port => 55516 ssl_verify => true ssl_verify_ca => "/var/lib/puppet/ssl/certs/ca.pem" ssl_certificate => "/var/lib/puppet/ssl/certs/yourcert.pem" ssl_key => "/var/lib/puppet/ssl/private_keys/yourkey.pem" type => "log-courier" }}
filter { if [type] == "syslog" { if [message] =~ /Registrar received .* event/ { drop {} } grok { match => [ "message", "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" ] match => [ "message", "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY} %{TIME} %{POSINT:syslog_pid} %{WORD:severity} %{GREEDYDATA:syslog_message}"] match => [ "message", "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY} %{TIME} %{POSINT:syslog_pid} %{WORD:severity} %{GREEDYDATA:syslog_message}"] add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] add_field => [ "program", "%{syslog_program}" ] add_field => [ "timestamp", "%{syslog_timestamp}" ]
} syslog_pri { } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } }}
filter { if [type] == "native_syslog" { grok { match => [ "message", "%{SYSLOGLINE}" ] add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] } syslog_pri { } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } }}
filter {# Add in group tags we didn't add in forwarder due to bug# https://github.com/elasticsearch/logstash-forwarder/issues/65# By grouping the logs using tags we can then search all the related logs in kibana if [type] =~ /cinder.*/ { mutate { add_tag => [ "cinder", "oslofmt" ] } }}
output { elasticsearch { host => elasticsearch embedded => false protocol => http }}output { if [type] == "syslog" { riemann { riemann_event => { "description" => "%{syslog_message}" "service" => "%{syslog_program}" "state" => "%{syslog_severity_code}" } } }}
FILTER
aggregatealteranonymizecollatecsvcidrclonecipherchecksumdatede_dotdnsdropelasticsearchextractnumbersenvironmentelapsedfingerprintgeoipgroki18njsonjson_encodekvmutatemetricsmultilinemetaeventprunepunctrubyrangesyslog_prisleepsplitthrottletranslateuuidurldecodeuseragentxmlzeromq
INPUT
beatscouchdb_changesdrupal_dblogelasticsearchexeceventlogfilegangliagelfgeneratorgraphitegithubheartbeatherokuhttphttp_pollerircimapjdbcjmxkafkalog4jlumberjackmeetuppipepuppet_facterrelprssrackspacerabbitmqredissalesforcesnmptrapstdinsqlites3sqsstompsyslogtcptwitterunixudpvarnishlogwmiwebsocketxmppzenosszeromq
OUTPUT
boundarycirconuscsvcloudwatchdatadogdatadog_metricsemailelasticsearchelasticsearch_javaexecfilegoogle_bigquerygoogle_cloud_storagegangliagelfgraphtasticgraphitehipchathttpircinfluxdbjuggernautjirakafkalumberjacklibratologglymongodbmetriccatchernagiosnullnagios_nscaopentsdbpagerdutypiperiemannredminerackspacerabbitmqredisriaks3sqsstompstatsdsolr_httpsnssyslogstdouttcpudpwebhdfswebsocketxmppzabbixzeromq
Riemann - an event stream processor● very low latency● extensive Clojure API● API can also be extended with Java
(streams (where (and (service #"^riak") (state "critical")) (email "[email protected]")))
(by [:host :service])
(by [:host :service] (changed :state (rollup 5 3600 (email "[email protected]"))))
(use 'clojure.java.io)
(defn get_messages [filename] (with-open [rdr (reader filename)] (doall (line-seq rdr))))
(def messages (get_messages "/etc/riemann.conf.d/riemann.whitelist"))
(def whitelist_pattern (str "^((?!(" (clojure.string/join "|" messages) ")).)*$"))
(def email(mailer { :from "[email protected]" }))
(streams (by :service (where (or (state "2")(state "1")(state "0")) (where (description (re-pattern whitelist_pattern)) (rollup 3 3600 (email "[email protected]" ))))))
Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDBtftp: client does not accept optionsDHCP packet received on [a-zA-Z0-9-_]+ which has no addressCan\'t create new lease file: Permission denied\[\-\] Authorization failed\. The request you have made requires authentication\. from 127\.0\.0\.1\[\-\] \[instance: [a-zA-Z0-9-]+\] Instance not resizing[,] skipping migration\.^.*dhcp-failover rejected: incoming update is less critical than outgoing update$^.*Please use the the default quota class for default quota.$^.*FAILED: Has an address record but no DHCID, not mine.$^.*Found \d+ in the database and \d+ on the hypervisor.$^.*Arguments dropped when creating context.*^.*Failed to inspect.*of instance.*domain is in state of SHUTOFF^.*Unknown base file: /var/lib/nova/instances/_base/*^.*Couldn\'t obtain IP address of instance.*\[*\] IPMI message handler: BMC returned incorrect response, expected*\[-\] While synchronizing instance power states, found \d+ instances in the database and \d+ instances on the hypervisor
(use 'clojure.java.io)
(defn get_messages [filename] (with-open [rdr (reader filename)] (doall (line-seq rdr))))
(def messages (get_messages "/etc/riemann.conf.d/riemann.blacklist"))
(def blacklist_pattern (str "^?(" (clojure.string/join "|" messages) ").*$"))
(def pd (pagerduty "pagerduty_api_key"))
(streams (by :host (where (description (re-pattern blacklist_pattern)) (with {:state "Failure" :service "Hardware"} (throttle 1 43200 #(info %) (:trigger pd))))))
EDAC MC\d+: \d+ CE error on CPU#\d+Channel#\d+_DIMM#\d+.*ata\d+.\d+: exception.*ata\d+.\d+: failed command:.*ata\d+: link is slow to respond, please be patient.*ata\d+.\d+:.*failed.*
Log files
log courier
logstash
elasticsearch
riemann
kibana
pagerduty
Thanks for Listening !