+ All Categories
Home > Documents > Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*!...

Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*!...

Date post: 27-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
31
Copyright © 2013 Splunk Inc. Sean Delaney Client Architect, Splunk #splunkconf Log Velocity Monitoring
Transcript
Page 1: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Copyright  ©  2013  Splunk  Inc.  

Sean  Delaney  Client  Architect,  Splunk  #splunkconf  

Log  Velocity  Monitoring  

Page 2: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Legal  NoDces  During  the  course  of  this  presentaDon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauDon  you  that  such  statements  reflect  our  current  expectaDons  and  esDmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  please  review  our  filings  with  the  SEC.    The  forward-­‐looking  statements  made  in  this  presentaDon  are  being  made  as  of  the  Dme  and  date  of  its  live  presentaDon.    If  reviewed  aSer  its  live  presentaDon,  this  presentaDon  may  not  contain  current  or  accurate  informaDon.      We  do  not  assume  any  obligaDon  to  update  any  forward-­‐looking  statements  we  may  make.    In  addiDon,  any  informaDon  about  our  roadmap  outlines  our  general  product  direcDon  and  is  subject  to  change  at  any  Dme  without  noDce.    It  is  for  informaDonal  purposes  only  and  shall  not,  be  incorporated  into  any  contract  or  other  commitment.    Splunk  undertakes  no  obligaDon  either  to  develop  the  features  or  funcDonality  described  or  to  include  any  such  feature  or  funcDonality  in  a  future  release.  

 

Splunk,  Splunk>,  Splunk  Storm,  Listen  to  Your  Data,  SPL  and  The  Engine  for  Machine  Data  are  trademarks  and  registered  trademarks  of  Splunk  Inc.  in  the  United  States  and  other  countries.  All  other  brand  names,  product  names,  or  trademarks  belong  to  their  respecCve  owners.    

©2013  Splunk  Inc.  All  rights  reserved.  

2  

Page 3: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

About  Me  

! Splunk  Client  Architect    –  Splunker  for  2+  years  –  Using  Splunk  for  6+  years  –  Large  Splunk  Deployments  

!   Previously  –  Splunk  Professional  Services  –  10+  years  ProducDon  Services  for  a  large  Internet  Security  Company  

3  

Page 4: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Agenda  

!   Log  Velocity  !   Monitoring  and  AlerDng  !   Drill  Down  Demo    

4  

Page 5: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Log  Velocity  

Page 6: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

TradiDonal  Velocity  aka  Speed  

6  

Velocity (m/s)

Dis

tanc

e (m

)

Time (s)0 10 20 30 40 50 60

3

6

9

12

15

Page 7: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Log  Velocity  

7  

!   Logging  Data  Rate  –  Events  per  Second  (eps)  –  Data  Volume  per  Second  (kbps)  

Page 8: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Increases  or  Deceases  in  Log  Velocity  

8  

!   Environmental  changes  –  New  service,  servers  or  new  data  sources  added  to  Splunk  –  ApplicaDon  change  (New  code  deployment,  configuraDon  change)  –  Networking  Change  (Firewall,  RouDng)  –  Service  migraDon    

!   Traffic  changes  –  More  users  accessing  service(s)  –  Change  in  ApplicaDon  logging  level  (Debug  mode)  –  Core  component  is  down  or  intermiaent  has  issues  (Database)  –  Logs  not  being  generated  or  forwarded  (Changed  log  file  directory,  syslog  

server  down)  

Page 9: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Higher  Level  Approach  to  Service  Monitoring  

9  

!   Look  at  the  forest  not  just  the  trees  –  License  Usage  –  Event  rate  per  Index,  Sourcetype,  Source  –  Network  Throughput  –  Monitor  event  counts  for  errors  and  alerts  

Page 10: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Is  There  an  Issue?  

10  

!   OperaDons  team  is  alerted  that  Splunk  is  slow  !   Service  owners  noDce  slow  down,  then  their  website  is  unavailable  !   Customer  Service  call  volume  jumps  !   OperaDons  team  is  now  flooded  with  monitoring  alerts,  phone,  email  and  chat  messages  

Page 11: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Is  There  an  Issue?  

11  

•  Splunk  admins  noDce  a  major  spike  in  indexing  volume    

Page 12: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Is  There  an  Issue?  

12  

!   Further  invesDgaDon  detects  a  corresponding  spike  in  webserver  access  and  error  logs  

Page 13: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Is  There  an  Issue?  

13  

•  Service  was  DOSed  (from  an  internal  source)  

•  Early  detecDon  would  have  miDgated  the  issue,  reduced  customer  impact  

•  Alerts  on  either  indexing  volume  or  webserver  event  counts  would  have  noDfied  OperaDons  to  the  change  of  acDvity  

Page 14: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Log  Velocity  Use  Cases  

14  

!   Security  Use  Cases  –  DOS/DDOS  –  Service  or  Port  Knocking  

!   Webserver  Access  and  Error  Logs  !   MarkeDng  Campaigns  

Page 15: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Log  Velocity  Use  Cases  

15  

!   ApplicaDon  Error  Logs  !   ProducDon  Code  Updates/Rollouts  !   Infrastructure  Changes  !   Network  RouDng  or  Spanning  Tree  changes  !   DNS/SMTP  Changes  

Page 16: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Monitoring  Log  Velocity  

Page 17: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Where  to  Measure  Log  Velocity  

17  

! Splunk’s  metrics.log:  –  Event  counts  (ev),  events  per  second  (eps)  –  Data  indexed  (kb),  index  throughput  (kbps)  

!   Metrics  data  is  logged  by  group:  –  per_index_thruput!–  per_sourcetype_thruput!–  per_source_thruput!–  per_host_thruputhistory!

Page 18: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Where  to  Measure  Log  Velocity  

18  

 !   Example  searches:  

–  index=_internal source="*/metrics.log" "group=per_index_thruput" | timechart span=10m sum(ev) by series!

–  index=_internal source="*/metrics.log" "group=per_index_thruput" | timechart span=10m avg(kbps) by series!

Page 19: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Where  to  Measure  Log  Velocity  

19  

!   Other  sources:  –  Splunk  license  logs  –  Custom  event  count  searches  

> index=myapp error | timechart span=10m count!

Page 20: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Logging  Workloads  

20  

!   Log  data  workloads  are  normally  cyclic  !   Service  peaks  oSen  correspond  business  or  trading  hours  

Page 21: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Logging  Workloads  

21  

!   Weekday  trends  normally  follow  the  same  cycle  !   Logging  may  drop  off  on  weekend/holidays  (business  services)  !   Log  volume  could  be  greater  in  the  evening  or  weekends  (online  gaming)  

!   Logging  can  go  crazy  –  Black  Friday/Cyber  Monday  (online  shopping)  !   Take  into  account  global/regional  Dmezones  

Page 22: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Monitoring  and  AlerDng  

22  

AlerDng  Thresholds  

!   When  defining  alerDng  thresholds,  you  need  to  consider  either  semng  an  upper  boundary  or  your  data  workload  

!   Compare  to  the  same  Dme  period  yesterday,  last  week,  last  month  

Page 23: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Monitoring  and  AlerDng  

23  

AlerDng  Thresholds  

!   Absolute  Thresholds:  –   index=_internal  source="*/metrics.log"  group="per_index_thruput"  

series="main"  |  Dmechart  span=10m  sum(ev)  as  ev_count  |  stats  max(ev_count)  as  max_ev  |  search  max_ev>`lv_threshold`  

•  Macro  used  to  hold  `lv_threshold`  value:  

[lv_threshold]!definition = 600!iseval = 0!

Page 24: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Monitoring  and  AlerDng  

24  

AlerDng  Thresholds  

!   Compare  to  same  Dme  previous  day,  day  of  week,  etc  

earliest=-10m latest=@m index=_internal source="*/metrics.log" group="per_index_thruput" series="main" | stats sum(ev) as ev_count_1 | append [search earliest=-1450m latest=-1440m index=_internal source="*/metrics.log" group="per_index_thruput" series="main" | stats sum(ev) as ev_count_2 ] | stats first(ev_count_1) as ev_count_today, first(ev_count_2) as ev_count_yesterday | eval delta=abs(ev_count_today - ev_count_yesterday) | eval threshold=ev_count_yesterday*0.1 | search delta>threshold!

Page 25: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Monitoring  and  AlerDng  

25  

Summary  Indexing    

!   Summary  Indexing  your  Log  Velocity  has  benefits  –  Faster  Loads  for  Monitoring  Dashboards  –  Provide  faster  stats  for  comparaDve  alerDng  

Page 26: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Error  Log  Velocity  

26  

•  Monitor  and  baseline  error  counts  for  an  applicaDon  

•  Table  the  top  50  error  types/codes  •  When  a  new  code  release  is  deployed  

monitor  for  an  increase  of  errors  •  Table  the  top  50  error  types/codes  and    

compare  with  the  results  from  the  previous    release  

•  Deploy  patch/houix/update,  and  repeat  unDl  stable  state  has  been  re-­‐established  

sourcetype="apache_error" | rex "^(?:[^\]]*\]){3}\s*(?<phperr>[^\:]+)\:\s*(?<msg>.*)" | stats count by phperr,msg | sort - count | head 50 | fields count,msg!  

Page 27: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Drill  Down  Demo  

Page 28: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Summary    

28  

Monitoring  Log  Velocity  provides  addiDonal  insight  into  your  environment      

•  Detect  and  alert  on  environmental  changes  and  abnormal  traffic  volumes  

•  Provides  feedback  on  code  deployments  

•  First  level  alerDng  for  issues  •  Useful  for  NOC/SOC  monitoring  •  StarDng  point  for  drill  down  

invesDgaDons  

Page 29: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

QuesDons?  

Page 30: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

Next  Steps  

30  

Download  the  .conf2013  Mobile  App  If  not  iPhone,  iPad  or  Android,  use  the  Web  App    

Take  the  survey  &  WIN  A  PASS  FOR  .CONF2014…  Or  one  of  these  bags!    

1  

2  

Page 31: Log*Velocity*Monitoring*...Monitoring*and*AlerDng* 24 AlerngThresholds*! Compare*to*same*Dme*previous*day,*day*of*week,*etc*earliest=-10m latest=@m index=_internal source="*/metrics.log"

THANK  YOU  


Recommended