Troubleshooting Core Dumps

8/11/2019 Troubleshooting Core Dumps

1/15

Postings may contain unverified user-created content and change frequently. The content is provided as-is and

is not warrantied by Cisco.1

Troubleshooting Core Dumps

Core dumps occur when a Linux process experiences a fault. This results in an outage of

the affected process or service. The process or service must restart to recover. During

these incidents, the server may remain up, but certain services may experience a briefoutage. This document covers troubleshooting core dumps and backtraces that may occur

on Communications Manager(CM or CUCM), Unity Connection(UC), Cisco Emergency

Responder(CER), Cisco Unified Presence Server(CUPS), Cisco Unified Contact Center

Express (UCCX or IPCC Express), or any product based on Cisco's Voice Operating System

(VOS) appliance model.

Identifying Core Dump Eventson page 1

Listing Core Dump Fileson page 3

Performing Core Analysison page 4

Understanding the Backtrace of a Core Fileon page 6

Example 1: File Size Limit Exceededon page 6

Example 2: Core when memory leak reaches maximum process memory

sizeon page 6

Example 3: Core Stack Corruptionon page 8

Cisco Bug Toolkit Searchon page 9

Troubleshooting Intentional Abortson page 12

Useful Information for Creating TAC Service Requestson page 15

Identifying Core Dump Events

Two of the most common ways in which to identify the occurrence of a core dump in

CallManager are the following:

CoreDumpFileFound RTMT alert messages found in Alert Central


2/15




Within RTMT Alert Central, more detail on the specific application that generated the

core can be found by right-clicking

on the alert selection. An example of the core dump alert details information can be

found below:

Application Event log alert messages indicating a core dump has occurred:

May 15 05:32:09 ccm-pub local7 2 : 0: May 15 09:32:08.865 UTC :

%CCM_LPM-LPMTCT-2-CoreDumpFileFound: The new core dump file(s) have been found in the
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10519/core_alert_central_details.JPGhttps://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10518/core_alert_central.JPG


3/15




system. TotalCoresFound:1 CoreDetails:The following lists up to 6 cores dumped by

corresponding applications. Core1:Cisco CallManager (core.10499.6.ccm.1273915815) App

ID:Cisco Log Partition Monitoring Tool Cluster ID: Node ID: ccm-pub

May 15 05:32:15 ccm-pub local7 2 : 138: May 15 09:32:15.231 UTC : %CCM_RTMT-RTMT-2-RTMT-

ERROR-ALERT: RTMT Alert Name:CoreDumpFileFound Detail:CoreDumpFileFound TotalCoresFound :

1 CoreDetails: The following lists up to 6 coresdumped by corresponding applications. Core1 : Cisco

CallManager(core.10499.6.ccm.1273915815) AppID : Cisco Log Partition Monitoring Tool ClusterID : NodeID :

ccm-pub . The alarm is generated on Sat May 15 05:32:08 EDT 2010. AppID:Cisco AMC Service Cluster ID:

Node ID:ccm-pub

Listing Core Dump Files

On the CallManager server in question, a list of core dumps can be obtained by issuing the

following command:

utils core list (CallManager version 5.x, 6.x)

utils core active list (CallManager version 7.x and later)

An example of 'utils core list' is provided below, where we observe the CCM service as the

core dump generator:
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10553/core_list.JPGhttps://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10553/core_list.JPG


4/15




Output of the command 'utils core active list' is similar to the example depicted above, with

the exception of the inclusion of the "active" parameter. This parameter was added in later

CallManager releases to allow core file listing from the CM Inactive partition (previous CM

version on the system, if an upgrade has taken place) without the need to perform a version

switch and reboot. Instead of supplying "active" as the command line parameter, inactive

partition core file listing is performed via 'utils core inactive list'.

An example of 'utils core active list' is provided below:

Performing Core Analysis

Once the core dump instance has been identified via the list command, the next step

is to obtain the core file backtrace for review. This function is provided by the following

command:

utils core analyze (CallManager version 5,x, 6.x)

utils core active analyze (CallManager version 7.x and later)

An example of the 'utils core anayze' command is provided below, where we are supplying a

ccm service core file that was generated on 11/30/2009 at 11:11:50:
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10563/core_active_list.JPG


5/15




Like the 'utils core active list' command, one can also perform core file analysis on the

inactive partition via the 'utils core inactive analyze ' command. This

feature is available in CallManager 7.x and later, and a screenshot of the 'utils core active

analyze' command is provided below:

In both examples, a warning is provided stating that this procedure will take a considerable

amount of I/O and may impact system performance. During the analysis process, the raw

core file is parsed and interpreted into a backtrace output that can be used to identify the

cause of the core dump.

The analysis process normally takes a minute or less to complete on average. The warning

about impact to system performance is a suggestion to run this command during a non-peak

time period to avoid a potential resource issue.
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10564/core_active_analyze.JPGhttps://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10546/core_analyze.JPG


6/15




Understanding the Backtrace of a Core File

The chief component of the core analysis process is retrieving the backtrace for review.

Once the analysis command has been executed, a section titled "Backtrace" will be

displayed on the command line, similar to the screenshot below:

The core backtrace output is composed of several process calls, denoted by #0, #1, #2,

etc. These lines indicate process calls stored in memory at the time of the service fault. In

many cases, these backtrace signatures are a unique fingerprint that can identify a particular

known or new defect in CallManager.

Example 1: File Size Limit Exceeded

Core was generated by `/usr/local/cm/bin/ccm'.Program terminated with signal 25, File size limit exceeded.#0 0x006

In this example the process was attempting to write to a file. The write attempt generted an

exception and generated a core file. The cause, "Program terminated with signal 25, File

size limit exceeded." is a direct match to

CSCsu94937 Multiple services core dumping with signal 25, File size limit exceeded.

Example 2: Core when memory leak reaches maximum process memory size

Memory leak in CCM process, resulting in intentional abort.
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10554/core_backtrace.pnghttp://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsu94937https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10554/core_backtrace.png


7/15




#0 0x00a157a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2

#1 0x01276825 in raise () from /lib/tls/libc.so.6

#2 0x01278289 in abort () from /lib/tls/libc.so.6

#3 0x0050d58b in __gnu_cxx::__verbose_terminate_handler () from /usr/local/cm/lib/

libstlport.so.5.1

#4 0x0050b2a1 in __cxxabiv1::__terminate () from /usr/local/cm/lib/libstlport.so.5.1

#5 0x0050b2d6 in std::terminate () from /usr/local/cm/lib/libstlport.so.5.1

#6 0x0050b41f in __cxa_throw () from /usr/local/cm/lib/libstlport.so.5.1

#7 0x0050b86c in operator new () from /usr/local/cm/lib/libstlport.so.5.1

#8 0x0a06bb2d in SdlProcessBase::operator new (size=102700) at

SdlProcessBase.cpp:105

#9 0x0a0014e2 in H245SessionManager::create (parentId={mSdlProcessName

= 0x0, mSdlNodeId = 4, mSdlAppId = 100, mSdlProcessNumber = 150,

mSdlProcessInstance = 2629}, vH245TerminalType=H245_Gateway,

vH245TransportConnectionMode=H245Client, vH245IpAddress=404699044,

vH245IpPort=40076, vTCPTos=96, vPassThruMSD=false, vTCSTimeout=10,

vFastStartInd=0, vFsAudioOutgoingLCN=0, vFsAudioIncomingLCN=0,

pktCaptureContext=0xbffab74d "", allowTCPKeepAlivesForH323=true) atProcessH245SessionManager.cpp:221

#10 0x08a5629c in H245Interface::start_Transition (this=0xbff99008, s=@0x5c70990) at /

vob/ccm/Common /Include/Sdl/SdlProcessBase.hpp:123

#11 0x08a99354 in H245Interface::fireSignal (this=0xbff99008, sdlSignal=@0x5c70990) at /

vob/ccm/Common /Include/Sdl/SdlProcessBase.hpp:175

#12 0x0a06c904 in SdlProcessBase::inputSignal (this=0xbff99008, rSignal=0x5c70990,

traceType=SdlSystemLog::SignalRouterThread, highPriority=0, normalPriority=0,lowPriority=0, veryLowPriority=0, lazyPriority=0, dbUpdatePriority=0) at


#13 0x0a0746ce in SdlRouter::callProcess (this=0xe225ac0, _sdlSignal=0x5c70990,

_deleteSignal=@0x36b8d07, _traceType=SdlSystemLog::SignalRouterThread, _hp=0,

_np=0, _lp=0, _vlp=0, _lzp=0, _dbp=0) at SdlRouter.cpp:371


8/15




#14 0x0a0740f3 in SdlRouter::scheduler (sdlRouter=0xe225ac0) at SdlRouter.cpp:281

#15 0x05514bd7 in ACE_OS_Thread_Adapter::invoke (this=0xfe57a30) at

OS_Thread_Adapter.cpp:94

#16 0x054d5087 in ace_thread_adapter (args=0x0) at Base_Thread_Adapter.cpp:137

#17 0x00db73cc in start_thread () from /lib/tls/libpthread.so.0

#18 0x0131a96e in clone () from /lib/tls/libc.so.6

In this example, the CCM process cores due to a memory leak and subsequent resource

exhaustion. Backtraces that include calls to "operator new" are typically a result of memory

leak. The process has requested the maximum amount of memory allowed by the operating

system so a core is forced. It is not possible to identify the specific memory leak from the

core only to state it is result of memory leak. Other methods must be used to identify the

source of the leak. Frequently this is possible by parsing SDL traces to identify objects that

are "Started" or "Created" and not subsequently "Stopped". From traces the above core was

eventually diagnosed back to:

CSCte50152 Memory Leak in CCM due to Transient SIP Connections.

Example 3: Core Stack Corruption

Memory corruption results in corrupted stack with "??" characters in place of function calls.

#0 0x4e52500a in ?? () #1 0xaffb3070 in ?? () #2 0xaffb9084 in ?? () #3 0x030dc678 in ?? () #4 0x00000000 in ?

In this example, a memory corruption incident had ocurred that resulted in the stack

being overwritten. In place of function calls, we observe "??" characters in its place.Unfortunately, a search against this backtrace alone will not correlate to a known defect. It

is recommended that the corresponding service log (e.g. ccm traces, tomcat logs) and the

complete core file be retrieved from the affected system for TAC review.
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCte50152


9/15




Cisco Bug Toolkit Search

Once a backtrace has been retrieved for the core dump event, the next step is to search the

Bug Toolkit for potential known defects. The following defect will be used for this example:

CSCta39769 UnicastBridgeControl Causes CUCM to Crash

#0 0x097a0850 in UnicastBridgeControl::removeConfResources (this=0x6a69f698) at

/vob/ccm/Common/Include/CallManager/TDCLCpShares.hpp:2622

#1 0x097ab5ae in UnicastBridgeControl::star_StationClose (this=0x6a69f698,

s=@0x6a981938)at ProcessUnicastBridgeControl.cpp:2193

#2 0x097bff64 in UnicastBridgeControl::fireSignal (this=0x6a69f698,

sdlSignal=@0x6a981938) at /vob/ccm/Common/Include/Sdl/SdlProcessBase.hpp:174

#3 0x09e4ae58 in SdlProcessBase::inputSignal (this=0x6a69f698, rSignal=0x6a981938,

traceType=SdlSystemLog::SignalRouterThread, highPriority=0, normalPriority=0,

lowPriority=0, veryLowPriority=0, lazyPriority=0, dbUpdatePriority=0) at


#4 0x09e52c1a in SdlRouter::callProcess (this=0xde9bcc8, _sdlSignal=0x6a981938,

_deleteSignal=@0x324bd97, _traceType=SdlSystemLog::SignalRouterThread, _hp=0,

_np=0,

_lp=0, _vlp=0, _lzp=0, _dbp=0) at SdlRouter.cpp:372

#5 0x09e5263f in SdlRouter::scheduler (sdlRouter=0xde9bcc8) at SdlRouter.cpp:282

#6 0x00a00ef3 in ACE_OS_Thread_Adapter::invoke (this=0x10b70b90) at


#7 0x009c1abf in ace_thread_adapter (args=0x0) at Base_Thread_Adapter.cpp:137

#8 0x003bf371 in start_thread () from /lib/tls/libpthread.so.0

#9 0x01339ffe in clone () from /lib/tls/libc.so.6

The following line will be used to perform initial searching in the Bug Toolkit:
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCta39769&from=summary


10/15




#2 0x097bff64 in UnicastBridgeControl::fireSignal(this=0x6a69f698,

sdlSignal=@0x6a981938) at /vob/ccm/Common/Include/Sdl/SdlProcessBase.hpp:174

In the search example screenshot, unique memory location identifiers have been removed

from the search statement to ensure that matches are found. It may also be necessary torefine the search criteria to a specific CUCM version if no matches are presented after the

search attempt, as shown below:
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10575/bug_toolkit_search.JPG


11/15




Software version 7.0 was selected in the modified search above to narrow down to a specific

subset of defects applicable to CUCM. With the search re-submitted for defects related to

version 7.0, the following results are displayed:
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10577/bug_toolkit_results.JPGhttps://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10576/bug_toolkit_modify_search.JPG


12/15




Troubleshooting Intentional Aborts

Core dumps that include the "IntentionalAbort" statement indicate a system resource issue

that was responsible for the service fault. The following ccm service core dump backtrace

example will be used to demonstrate steps involved in troubleshooting intentional aborts:

#0 0x001627a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2

#1 0x00d64815 in raise () from /lib/tls/libc.so.6

#2 0x00d66279 in abort () from /lib/tls/libc.so.6

#3 0x084c4e7a in preabort () at ProcessCMProcMon.cpp:101

#4 0x084c4e92 in IntentionalAbort (reason=0xa9fdbdc "CallManager's timers appear

incorrect. This may be due to CPU or blocked function. Attempting to restart

CallManager.") at ProcessCMProcMon.cpp:106

#5 0x084c66c3 in CMProcMon::verifySdlTimerServices () at ProcessCMProcMon.cpp:843

#6 0x084c7035 in CMProcMon::callManagerMonitorThread (cmProcMon=0xec122d0) at

ProcessCMProcMon.cpp:439

#7 0x0107e5fb in ACE_OS_Thread_Adapter::invoke (this=0xf3ef3b8) at


#8 0x01040cbf in ace_thread_adapter (args=0x0) at Base_Thread_Adapter.cpp:137

#9 0x002dc3cc in start_thread () from /lib/tls/libpthread.so.0

#10 0x00e061ae in clone () from /lib/tls/libc.so.6

The RIS Data Collector Perfmonlog information should be retrieved from the CUCM node

that experienced the core dump via RTMT for review, for the timestamp of the core dump


13/15




alert. Using Windows Performance log viewer, the process CPU utilization counters are

reviewed first, as shown below:

In the screenshot above, it is observed that CPU utilization appears stable prior to the

core dump incident. CPU utilization dips during the crash as resources are released. In

troubleshooting a potential CPU utilization issue, the concern would be a trend in CPU

increase leading up to the core dump incident.

The next component to examine in the Perfmon data is percentage VM used by the system.

In the current example, it is observed that this counter is particularly high for the time periodleading up to the core dump incident:
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10580/cm_core_cpu.JPG


14/15




Next, VMSize specific to all processes are examined to determine what caused the gradual

increase in memory utilization on the system. In this example, it was found that the VMSize

counter for the CCM process is relatively high and sloping upwards. This indicates that

CCM had cored due to a memory leak:
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10581/cm_core_vmsize.JPG


15/15



Useful Information for Creating TAC Service Requests

When opening a new TAC service request to troubleshoot a core dump incident, the

following information is useful to provide to TAC to expedite the process:

Full CallManager version in use (e.g. 7.1.3.32900-4)

Date/Time of the core dump incident

Application Event log is useful to provide for this information

Provide output of 'utils core list' command for absolute timestamps and core file names

Offending service that generated the core dump (e.g. CCM, CEF, Tomcat)

Core file backtrace output

Core file

Service logs for offending process

e.g. If a CCM core dump, provide Cisco CallManager traces for time period of incident

RIS Data Collector Perfmonlog

See Troubleshooting Intentional Aborts section
https://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10582/ccm_vm_size.JPGhttps://supportforums.cisco.com/servlet/JiveServlet/showImage/102-14743-10-10582/ccm_vm_size.JPG

Date post:	02-Jun-2018
Category:	Documents
Upload:	aravindant11
View:	277 times
Download:	2 times

Troubleshooting Core Dumps

Documents