+ All Categories
Home > Technology > Troubleshooting Zenoss: A Support Perspective

Troubleshooting Zenoss: A Support Perspective

Date post: 21-Jan-2018
Category:
Upload: zenoss
View: 239 times
Download: 2 times
Share this document with a friend
44
Troubleshooting Zenoss A Support Perspective
Transcript
Page 1: Troubleshooting Zenoss:  A Support Perspective

Troubleshooting ZenossA Support Perspective

Page 2: Troubleshooting Zenoss:  A Support Perspective

Copyright and Trademark Notices

Copyright © 2015–2017 Zenoss, Inc. All rights reserved.Confidential and Proprietary Information of Zenoss.

Zenoss and the Zenoss logo are trademarks or registered trademarks of Zenoss, Inc. in the United States and other countries. All other trademarks, logos, and service marks are the property of Zenoss or other third parties. Use of these marks is prohibited without the express written consent of Zenoss, Inc. or the third-party owner.

Cisco, Cisco UCS, and Cisco Unified Computing System are registered trademarks or trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries.

UNIX is a registered trademark of The Open Group.

Linux is a registered trademark of Linus Torvalds.

Oracle, the Oracle logo, MySQL, and Java are registered trademarks of the Oracle Corporation and/or its affiliates.

MariaDB is a trademark or registered trademark of MariaDB Corporation Ab in the European Union and United States of America and/or other countries.

VMware, VMware vCloud, and VMware vSphere are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions.

RabbitMQ is a trademark and/or registered trademark of Pivotal Software, Inc. in the United States and/or other countries.

Active Directory, Microsoft, SQL Server, Windows, Windows PowerShell, and Windows Server are registered trademarks of Microsoft Corporation in the United States and other countries.

All other trademarks, registered trademarks, product names, and company names or logos mentioned in this document are the property of their respective owners.

Revision: 5.3.0 GXZ-TRN

2Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 3: Troubleshooting Zenoss:  A Support Perspective

Viewing Control Center Logs

Page 4: Troubleshooting Zenoss:  A Support Perspective

Viewing Control Center Logs

4

• If there is a problem with Control Center startup, view Control Center’s log output

• Control Center logs to systemd-journald– Log path is /var/log/journal/

• To display logs in a terminal session– journalctl -lu serviced --since='2017-04-10 00:00:00'o For logs with journalctl timestamps o Good for diagnosing problems that may have occurred in the past

– journalctl -flu serviced -o cato For logs without journalctl timestampso Good for troubleshooting live issues; you don’t need extra timestamps

if you’re watching what’s going on live

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 5: Troubleshooting Zenoss:  A Support Perspective

Viewing Service Logs in Control Center

Page 6: Troubleshooting Zenoss:  A Support Perspective

Displaying Log Messages

To access the Kibana interface, click the Logs tab in Control Center

6Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 7: Troubleshooting Zenoss:  A Support Perspective

Service Logs and Instance Logs

There are other ways to access Kibana that may prove to be more useful than going straight into Kibana and adjusting filters manually:

• Service Logs displays logs for all instances of a service• Instance Logs displays logs for only the instance specified

7Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 8: Troubleshooting Zenoss:  A Support Perspective

Serviced Log Export

• The serviced log export command can export logs from all services, or all instances of a service and package them in a tgz

– The index.txt file in the tgz file contains an index of which files correlate to which services/instances

– You can specify the --service=$SERVICE switch to only export one service

o Example: serviced log export --service=zope

8Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 9: Troubleshooting Zenoss:  A Support Perspective

Interacting with Containers

Page 10: Troubleshooting Zenoss:  A Support Perspective

Control Center Service Instances

• These serviced service subcommands require an instance specifier:– action

– attach

– logs

• To specify a service instance, append a slash (“/”) and the instance number to a service designation; for example: zope/0

• Instance numbers begin at zero (0)

• If the instance is running on a delegate host, serviced will attempt to open an SSH connection to the remote host

10Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 11: Troubleshooting Zenoss:  A Support Perspective

serviced service

• serviced service attach [srvc-instance | docker-id] [command [arg…]]

Run command in the existing container where the specified service is running (/bin/bash by default)

• serviced service shell [-i] service [command [arg…]]

Run an arbitrary command (/bin/bash by default) in a new container created using service's service definition

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.11

Page 12: Troubleshooting Zenoss:  A Support Perspective

serviced: Connecting to Services

Command

Spawns New

Container

Connects to Existing

ContainerRuns

Anywhere

Fixed Command

ListMounts

/mnt/pwd

Action ✓ ✓ ✓

Attach ✓

Run ✓ ✓ ✓ ✓

Shell ✓ ✓ ✓

12Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 13: Troubleshooting Zenoss:  A Support Perspective

• Locate the appropriate service ID and attach to the service:– serviced service status – serviced service attach $SERVICEID

• Switch to the zenoss user: – su - zenoss

• Run the daemon:– $DAEMONNAME run -v10 -d $DEVICEID --

monitor=$COLLECTORID• For example:

– zenmodeler run -v10 -d centos73-test.zenoss.loc --monitor=austin-col1

Example: Collector Daemon Debug Run

13Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 14: Troubleshooting Zenoss:  A Support Perspective

Example: Collector Daemon Debug Run (cont.)

You may occasionally see a message like this in your debug run:2017-01-10 17:40:05,440 ERROR zen.collector.config: Configuration for cent73-test.zenoss.loc unavailable -- is that the correct name?

This is usually an indication of one of the following:• You used the wrong daemon name. Confirm the daemon used to

collect for that device type.• You used the wrong device id. Confirm the id in the URL of the device

overview page in the UI.• You used the wrong collector name or failed to specify the collector

(defaults to localhost).• You did everything right but zenhub has no configuration for this

device. Check the zenhub logs.

14Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 15: Troubleshooting Zenoss:  A Support Perspective

Event Flow Problems and Flood Mitigation

Page 16: Troubleshooting Zenoss:  A Support Perspective

• If you suspect you’re having a problem with the event pipeline, check rabbitmq first:

– rabbitmq public endpoint– rabbitmqctl in the rabbitmq container

o rabbitmqctl list_queues -p /zenosso rabbitmqctl list_queues -p /zenoss messages consumers name

Identifying the Bottleneck - RabbitMQ

16Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 17: Troubleshooting Zenoss:  A Support Perspective

• If you’re experiencing a debilitating event flood and need to stop it, you can

– Identify the sourceo Look for lots of new events in the event consoleo Look at the collector daemon performance graphs in the CC UIoRun a zenqdump on the affected queue to see if you can see a trend in the

source of events; note: do this as the ‘zenoss’ user in any container with access to rabbitmq

– Run this to capture 1000 messages»zenqdump -M 1000 zenoss.queues.zep.rawevents &> /tmp/queuedump.txt

– Check the agent field on some sample events:»grep agent /tmp/queuedump.txt | uniq -c | sort

Event Flood Mitigation - Overview

17Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 18: Troubleshooting Zenoss:  A Support Perspective

• Fix the problem at the source– Disable syslog/traps on a device that is spamming the system– Set an iptables rule on the delegate host to block syslog/traps

• Stop the collector daemon that is receiving the flood– This will end collection for all other targets on that collector, but

sometimes that’s a smaller impact

Event Flood Mitigation - Overview

18Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 19: Troubleshooting Zenoss:  A Support Perspective

• Once you’ve identified and corrected the source of the problem, you may have a large backlog of events queued in RabbitMQ that will need to be processed before seeing real-time event data

–If you wish to discard the events in queue, you can use the zenq purge command to remove messages from a queue

•Runs from any container that has access to rabbitmq (zenhub, zeneventd, zeneventserver, zenactiond, zenjobs)

Event Flood Mitigation - Overview (cont)

19Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 20: Troubleshooting Zenoss:  A Support Perspective

Best Practices: Monitoring Control Center

Page 21: Troubleshooting Zenoss:  A Support Perspective

Monitoring Control Center

• Control Center ZenPack included with Resource Manager

• Adds /ControlCenter device class

• Components:– CC-Pools

– CC-Hosts

– CC-Services

– CC-Running

– CC-Volumes

– CC-ThinPools

21Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 22: Troubleshooting Zenoss:  A Support Perspective

Monitoring Control Center

22Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 23: Troubleshooting Zenoss:  A Support Perspective

Monitoring Control Center

Zenoss highly recommends monitoring Control Center storage and being prepared to take action in the event that you’re running low on storage

• CC-Volumes– Built-in thresholds for 80% and 90%– Events go to /Storage/Full event class

• CC-ThinPools– Built-in thresholds for 80% and 90%– Events go to /Storage/Full event class

Failure to monitor your CC instance or take action before your system runs out of disk space can lead to data corruption or an emergency-stop

23Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 24: Troubleshooting Zenoss:  A Support Perspective

Monitoring Control Center

Control Center ZenPack Documentation• https://www.zenoss.com/product/zenpacks/controlcenter

24Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 25: Troubleshooting Zenoss:  A Support Perspective

Best Practices: Time Synchronization

Page 26: Troubleshooting Zenoss:  A Support Perspective

Time Synchronization

Zenoss requires that all hosts running Control Center have synchronized time• Control Center’s internal networking mux and Zookeeper both

rely heavily on time synchronization• Time synch problems can lead to a wide range of failures• If public NTP isn’t available, the Control Center master can be

configured to be an NTP server and delegates can reference it for time

– This process is documented in the Control Center Installation Guide

• Relevant KB article here– https://support.zenoss.com/hc/en-us/articles/115000450583

26Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 27: Troubleshooting Zenoss:  A Support Perspective

Best Practices: Stay Informed!

Page 28: Troubleshooting Zenoss:  A Support Perspective

Stay Informed!

You can subscribe to receive email notifications when we publish new Knowledge Base content

Use the “Follow” button on any section of the Zenoss Knowledge Base• Subscribe to the Releases sections to be notified when we

release software updates• Subscribe to the Troubleshooting and System Administration

& Configuration sections for content that may help you in your day-to-day role

28Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 29: Troubleshooting Zenoss:  A Support Perspective

New Features: Control Center 1.3.0

Page 30: Troubleshooting Zenoss:  A Support Perspective

New Features in Control Center 1.3.0

30

• Rolling restarts for instanced services– Each service instance will restart after the prior has restarted and

passed its health checks.

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 31: Troubleshooting Zenoss:  A Support Perspective

New Features in Control Center 1.3.0

31

• Graceful application start, stop, and restart

– Hierarchical start means no more services “rushing the gate”

– Better service-to-delegate scheduling

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 32: Troubleshooting Zenoss:  A Support Perspective

New Features in Control Center 1.3.0

32

• Delegate Support for NAT– Just specify the NAT device

hostname and port when adding a new delegate host.

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 33: Troubleshooting Zenoss:  A Support Perspective

New Features in Control Center 1.3.0

33

• Emergency Shutdown– Protects the system from data loss due to running out of disk space– Based on a predictive threshold, not MinMax– After correcting the issue, clear the emergency flag and start

services

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 34: Troubleshooting Zenoss:  A Support Perspective

New Features in Control Center 1.3.0

• Features and fixes are detailed in CC 1.3.0 Release Notes– https://www.zenoss.com/services-support/documentation

34Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 35: Troubleshooting Zenoss:  A Support Perspective

Best Practices: Support Tickets

Page 36: Troubleshooting Zenoss:  A Support Perspective

Support Tickets: Terminology

36

• Using the correct terminology can shave off several interactions– Events and Notifications, not “Alerts.”– Modeling vs. Monitoring– Plugins vs. Templates, Datasources vs. Datapoints– Device Classes, Groups, Systems, Locations

• If you’re unsure, grab a screenshot.– Annotate it as much as you need to. MSPaint is your friend!

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 37: Troubleshooting Zenoss:  A Support Perspective

Support Tickets: SLAs

37

• Adhering to SLA helps everyone get better service– Low: Workaround exists or “How do I Zenoss?”– Normal: Non-critical function failure– High: Critical function failure– Urgent: Total product outage

• SLAs do overlap. We’re cool with that.– I.e., Notification failures are “High.” Unless that’s all you use the

product for, then they’re “Urgent.”– Mark your configuration questions “Low,” but explain in the ticket

why they are important.

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 38: Troubleshooting Zenoss:  A Support Perspective

Support Tickets: Tell us your goals

38

• Let us know what you’re trying to accomplish• Don’t fixate on the method; we may have a better one!• Check (and subscribe to) ZenPack and platform changelogs

– The feature you’re looking for may be in an update

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 39: Troubleshooting Zenoss:  A Support Perspective

Support Tickets: Providing Data

39

• Export events as .csv for easier viewing.• If we need them in .xml format for replay, we’ll request

that explicitly.

• Debug monitor/modeler syntax, one more time:• $DAEMONNAME run -v10 -d $DEVICEID --

monitor=$COLLECTORID• zencommand run -v10 -d test-rhel54.zenoss.loc --

monitor=austin-col1

• If no one is home• Leave a voicemail with a ticket number or your issue.• Chats are turned into tickets; leave as much information in your

initial chat request as possible

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 40: Troubleshooting Zenoss:  A Support Perspective

Bonus De-mystification!

Page 41: Troubleshooting Zenoss:  A Support Perspective

De-Mystifying Control Center Storage

Control Center has two primary datastores

• Docker pool

– Container filesystems

– /dev/mapper/docker-docker--pool

• Serviced pool (DFS)

– Distributed filesystem

o Served from the CC Master

– Automatically mounted via NFS on some RM hosts

o If a service that requires DFS is scheduled to start on a delegate host, serviced will mount the DFS there as long as that pool has DFS permissions

– /dev/mapper/serviced-serviced--pool

41Copyright © Zenoss, Inc. 2009–2017. All rights reserved.

Confidential and Proprietary Information of Zenoss.

Page 42: Troubleshooting Zenoss:  A Support Perspective

De-Mystifying Zookeeper

Control Center and Resource Manager both rely on Zookeeper

• Control Center’s Zookeeper instance runs as an internal service

– Responsible for scheduling services

– Keeps application configuration data

o Which services are running on which delegates, etc

– Quorum configuration is optional, but recommended

– Logs in the Control Center journal

• Resource Manager’s Zookeeper instance runs as part of the Zenoss.resmgr service

– Part of the metrics pipeline

o Used to manage HBase cluster

– 3 instances by default

o Configured in a quorum out-of-box

– Logs in its own container; gets sent to Elasticsearch/Logstash/Kibana42

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and Proprietary Information of Zenoss.

Page 43: Troubleshooting Zenoss:  A Support Perspective

Copyright © Zenoss, Inc. 2009–2017. All rights reserved. Confidential and proprietary information of Zenoss.

Copyright © Zenoss, Inc. 2009–2016. All rights reserved. Confidential and proprietary information of Zenoss.

43

Page 44: Troubleshooting Zenoss:  A Support Perspective

THANK YOU!


Recommended