Update on HP Caliper, the Performance Tool for Itanium® HP-UX and Linux Systems
September 2006Speaker: Stephen Williams Caliper Development Team Hewlett-Packard
Previous webcasts
• An introduction to HP Caliper, what it is, and how to use it. Webcast: September 9, 2003 Slides:
http://h21007.www2.hp.com/dspp/files/unprotected/caliper/HPCaliper090903_ppt.ppt
• An update on HP Caliper for HP-UX and Linux Itanium. Webcast: September 21, 2004
Slides:http://h21007.www2.hp.com/dspp/files/unprotected/caliper/Caliper36_092104.ppt
• Yet more HP Caliper: an update on the Itanium HP-UX and Linux Performance ToolWebcast September 20, 2005Slides:http://h21007.www2.hp.com/dspp/files/unprotected/caliper/Caliper050920.ppt
Agenda
• Quick overview of HP Caliper
• New features in HP Caliper 3.9, 4.0, and 4.1
• Future directions
• Hints and tips
• Summary
• DSPP information
• Q & A
What is HP Caliper?
• Per-process or system-wide performance measurement tool, for any Itanium®/Itanium®2 native applications
• For both HP-UX and Linux Integrity servers
• “Swiss army knife”
- Many different measurements
- Common user interface and options
- Multiple report formats – text, CSV, HTML
- Graphical user interface (new at 4.0)
• Uses Performance Monitor Unit (PMU) hardware and dynamic instrumentation as needed
Example command lines
caliper [measurement] [options] application [ app-opts ]
caliper [measurement] [options] PID1 [PID2 …]
caliper [measurement] [options] -w
Examples:
caliper fprof --html dir_name sweep3d
caliper dcache –t –p all cc himom.c
caliper cpu -w -o out.txt --dur 10
caliper scgprof –p myproc myscript.sh
caliper icache –o out.txt 8451 8452 8453
Measurements
Overview: cpu, ecount
Profiles: alat, branch,
dcache, dtlb, fprof,
icache, itlb, cycles
Traces: pmu_trace
Call graph: scgprof, cgprof*
Coverage: fcover*
Counts: acount*, fcount*
* not in Linux version
Used for:
What?
Where?
Details? (instrumented)
New features since HP Caliper 3.9
• Improved command line usability
• Quick Start reference card
• Improved reports for multi-process applications
• New ‘cycles’ measurement (dual-core Itanium 2 only)
• Richer sets of PMU events (dual-core Itanium 2 only)
• System-wide measurements
• Graphical user interface
Improved command line usability
• scgprof now the default measurement:$ caliper myprog collect scgprof data on myprog
• -a no longer required for attaching to processes:$ caliper 1234 collect scgprof data on process 1234
• Re-reporting of last recorded data is simple:$ caliper report [options]
• Reporting from an HP Caliper database simplified:$ caliper mydb.db
• New default: report to down to source—but not instruction—level (use -r all to get disassembly)
• New default: --process all (-p all)
Improved command line usability(short options)
More short options added. Here is the complete list:
Short Form Long Form-d --database-e (for elapsed time) --duration-f --options-file-H (long form help) --help-m --metrics-o --output-file-p --process-r --report-details-s --sampling-spec-t --threads all-v --version-w --scope system,attr_mod-h or -? (short form help) no equivalent
Improved command line usability(short measurement names)
Measurement names have been shortened:
New Name Old Namealat alat_missacount arc_countbranch branch_predictioncpu cpu_metricsdcache dcache_missdtlb dtlb_missfcount func_countfcover func_covericache icache_missitlb itlb_missecount total_cpu
Improved command line usability(simplified merge and diff syntax)
• --join deprecated. Instead, use:
$ caliper merge -o out.txt db1 [db2 . . .]
$ caliper diff -o out.txt db1 db2
• Note that you can merge per-process data in a single database:
$ caliper merge -o out.txt mydb
Quick Start reference card
http://h21007.www2.hp.com/dspp/files/unprotected/caliper/caliper-quick-start.pdf
Quick Start reference card (back side)
Improved reports for multi-process applications
• Caliper can now report:– Across-process CPU events– Histograms of processes and associated metrics:$ caliper report -o out.txt mydb
– Histograms of executables and associated metrics:$ caliper merge -o out.txt mydb
• Use --process-cutoff to change the number of processes or executables reported in the process or executable histogram.
Improved reports for multi-process applications (cont.)
Example of a merged process (executable) summary:
Process Summary-------------------------------------------% Total Cumulat IP % of IP Samples Total Samples Process------------------------------------------- 67.86 67.86 1797 be (1 instances) 20.17 88.03 534 ecom (1 instances) 5.25 93.28 139 u2comp (1 instances) 4.83 98.11 128 ld (1 instances) 0.72 98.83 19 sh (4 instances)-------------------------------------------[Minimum process entries: 5, percent cutoff: 2.00,
cumulative percent cutoff: 100.00]-------------------------------------------
New measurement: cycles
• On dual-core Itanium 2 systems, HP Caliper can now report average cycles per bundle:$ caliper cycles -o out.txt -r all myprog
• Resulting report resembles an fprof report (showing IP sample hits), but provides the following additional information at disassemby level:– Average cycles used to retire bundles. (With no stalls,
bundle should be retired in one cycle.)– Instructions that were split issued (i.e., instructions
not issued at the same time as the instruction that precedes them).
Richer PMU events sets
On dual-core Itanium 2 systems, HP Caliper now reports many more PMU events (and derivations) in one run. An example from an IP Sample (fprof) report:Metrics Summed for Entire Run-------------------------------------------------------- PLM Event Name U..K TH AC AT Count--------------------------------------------------------BE_L1D_FPU_BUBBLE.ALL x___ 0 T F 175989BE_RSE_BUBBLE.ALL x___ 0 T F 3250BE_FLUSH_BUBBLE.ALL x___ 0 T F 33615BACK_END_BUBBLE.FE x___ 0 F F 1208011CPU_OP_CYCLES.ALL x___ 0 T F 752736219BE_EXE_BUBBLE.ALL x___ 0 F F 209463BE_L1D_FPU_BUBBLE.L1D x___ 0 T F 175989BE_EXE_BUBBLE.GRALL x___ 0 F F 199727BE_EXE_BUBBLE.FRALL x___ 0 F F 8014BE_EXE_BUBBLE.GRGR x___ 0 F F 67CPU_CPL_CHANGES.ALL x___ 0 F F 1731--------------------------------------------------------
Richer PMU events sets (cont.)
% Unstalled execution (higher is better): 47.44 = % Unstalled execution% of Cycles lost due to Front end stalls (lower is better): 6.43 = % stalls due to ICACHE, ITLB and branch execution% of Cycles lost due to Pipeline flush stalls (lower is better): 9.23 = % stalls due to branch misprediction or interruption flush% of Cycles lost due to data access stalls (lower is better): 33.23 = % stalls due to DCACHE and DTLB (includes FR/FR stalls)% of Cycles lost due to RSE stalls (lower is better): 1.45 = % stalls due to RSE spilling/filling registers to/from memory% of Cycles lost due to Scoreboard stalls (lower is better): 2.22 = % stalls due to FPU and register dependency (excludes FR/FR stalls)Number of privilege level changes to/from all privileges: 73385 = CPU_CPL_CHANGES.ALL% of Cycles lost due to Front end stalls: 6.43 = 100 * (BACK_END_BUBBLE.FE / CPU_OP_CYCLES.ALL)% of Cycles lost due to Pipeline flush stalls: 9.23 = 100 * (BE_FLUSH_BUBBLE.ALL / CPU_OP_CYCLES.ALL)% of Cycles lost due to data access stalls (includes FR/FR stalls): 33.23 = % register load stalls (includes FR/FR) + % stalls due to L1D% of Cycles lost due to RSE stalls: 1.45 = 100 * (BE_RSE_BUBBLE.ALL / CPU_OP_CYCLES.ALL)% of Cycles lost due to Scoreboard stalls (excludes FR/FR stalls): 2.22 = % stalls due to FPU + % register dependency stalls% of Cycles lost due to register load stalls (includes FR/FR stalls): 26.81 = % GR/load dependency stalls + % FR/load or FR/FR dependency stalls% of Cycles lost due to FR/load or FR/FR dependency stalls: 0.20 = 100 * BE_EXE_BUBBLE.FRALL / CPU_OP_CYCLES.ALL% of Cycles lost due to GR/load dependency stalls: 26.61 = 100 * (BE_EXE_BUBBLE.GRALL - BE_EXE_BUBBLE.GRGR) / CPU_OP_CYCLES.ALL% of Cycles lost due to stalls in L1D cache and L1/L2 DTLB: 6.42 = 100 * (BE_L1D_FPU_BUBBLE.L1D / CPU_OP_CYCLES.ALL)% of Cycles lost due to register dependency stalls (excludes FR/FR stalls): 2.22 = (100 * BE_EXE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) - % register load stalls% of Cycles lost due to GR/GR dependency stalls: 2.14 = 100 * BE_EXE_BUBBLE.GRGR / CPU_OP_CYCLES.ALL
System-wide measurements
• Most measurements can now be made system-wide—across all processes and CPUs in both user and kernel space.
• Three levels of sample attribution:
--scope system[,attr-mod|attr-proc|attr-none]
• -w equivalent to: --scope system,attr-mod
• PLM: --event-defaults user|kernel|all
• Sample command (collect IP samples in both kernel and user space for 20 seconds):
$ caliper fprof –o o.txt --ev all –w –e 20
System-wide measurements (cont.)
• Limitations on HP-UX:– You must be logged in as the root user– Caliper may not be able to locate some executables and shared libraries, resulting in many “unattributed” samples. Workaround: use --module-search-path
• Limitations on Linux:
You cannot exclude idle time and the caliper process (though we hope to provide this feature in the future).
• Limitations on both HP-UX and Linux:
While caliper runs in system-wide mode, no other caliper process can be run on the same system.
New graphical user interface
• An Eclipse RCP application• Makes it easy to:
– Perform measurement collections– Browse Caliper databases– See measurement data, with easy drill down
• Can be run on remote Integrity server, with display shown on your desktop X server (not recommended on wide-area network) via:$ caliper -g
• Can be run locally on a Windows or Linux x86-based system (local GUI client communicates with Caliper server via ssh or rexec)
New graphical user interface(Projects view and Collect view)
Required fields and tabs in red
Only applicable collection tabs enabled
Saved collection setup
Previously collected data
Start process System wide Attach process
Start data collection
New graphical user interface(Measurement tab of Collect view)
Collection in progress
Stop data collection
Data cache misses selected
New graphical user interface(viewing data)
Saved collection specification
Process tree tab opened
Analyze view
Available data sets
Application output
New graphical user interface(CPU event counts)
Show CPU events tab
Show data for entire application
New graphical user interface(metrics derived from CPU events)
CPU events tab scrolled to show derived metrics
New graphical user interface(histogram viewer)
Hottest process (double-click to drill down)
Overview of entire histogram
Maximize or minimize by double-clicking Analyze view tab
Percent of application’s total misses in process be
New graphical user interface(drill down to functions)
Previous levels visited
Show ‘local’ percents (percent of total for be)
Use stacking bars
DagNode::dagConstMarkPredArc(DagNode *, DagNode *, Dag*)
Popups for long function names
Area viewed in table highlighted in Overview
New graphical user interface(drill down to disassembly)
Sorted by address
Show:
Click to show hotspots in table
Source
Source/disasm
Disassembly
New graphical user interface(sorting)
Sort bundles by misses
New graphical user interface(call graph viewer)
Current function
Callees visited
Multiple Analyze views allowed
Callers
Callees
Future directions
• Expected new features at HP Caliper 4.2 (January 07):– Load module-centric reports (e.g., across process profile of libc.so)– Call stack profiling (with wall-clock sampling)– Bucketing of data cache miss latencies (to help ascertain cache
levels accessed)– Trap profiling– Merge/diff capability in graphical user interface– Caliper Advisor integrated with graphical user interface
• Features beyond HP Caliper 4.2:– Caliper Advisor cheatsheets in graphical user interface– Data-centric cache miss reports– Integration with Ktrace/Kprofile– More data visualization aides in graphical user interface– Per-CPU/per-thread CPU metrics
Load modules as top level (v4.2)
View load modules as top level
Call-stack profile (v4.2)
Graph hot call paths by running time, blocked time, or both
CPU metrics overview (v4.2)
Overview of metrics collected by cpu measurement (default metrics)
Call-stack samples display (potential future display)
Overview of running and stopped threads
Call stacks at sample 754
“Playback” controls
Sample cursor (drag to any point)
Data-centric cache miss profile display(potential future display)
Double-click row (below) to view data addresses (above
Double-click row (below) to view instruction addresses (above
Double-click row to see function’s disassembly
3D histograms (potential future display)
Figure from CxPerf User’s Guide
Hints and tips: caliper command
• Getting CPU event names from caliper:– Dump all events names and descriptions:
$ caliper info all– List all event names (no other fields):
$ caliper info all –d name– List names of all events containing string “L3”:
$ caliper info L3 –d name– Or, use an ambiguous event name:
$ caliper ecount –metric L3_READ myprog
HP Caliper: usage error:
Ambiguous event name ("L3_READ") specified for "--metrics".
Matches L3_READS.ALL.ALL, L3_READS.ALL.HIT, L3_READS.ALL.MISS, L3_READS.DATA_READ.ALL, L3_READS.DATA_READ.HIT, L3_READS.DATA_READ.MISS, L3_READS.DINST_FETCH.ALL, L3_READS.DINST_FETCH.HIT, L3_READS.DINST_FETCH.MISS, L3_READS.INST_FETCH.ALL, L3_READS.INST_FETCH.HIT, L3_READS.INST_FETCH.MISS.
Hints and tips: caliper command (cont.)
• Getting report help:– Dump help file for cycles measurement:
$ caliper info –r cycles– Append help to a report:
$ caliper cycles --info –o out.txt myprog
• Providing command options using a file:$ caliper fprof –f myOptionsFile
• Helping Caliper find:– Source code:
--source-path-map dir|map[:dir|map:…]*– Symbols and disassembly:
--module-search-path dir[:dir:…]
* Where map == old_path,new_path
Hints and tips: using views
Detached view
Close
Minimize
Maximize
Local view menu
Restore views
Restore default locations
Common view menu (right-click on tab)
(not suported by Motif)
Summary
• Itanium execution performance tool
• Measures production applications
• Measures entire system
• Wide range of performance metrics available
• Explore performance data using textual or graphical reports
• Help available from [email protected]
• Available on HP-UX and Linux
http://www.hp.com/go/caliper
DSPP Tools & Resources for Itanium®2 Architecture Set You Up for Success
Software– development environments,
compilers, operating systems, installation/configuration tools, performance tools and more
Technical documentation– white papers, tutorials,
references documents and manuals, FAQ’s, known problems, sample code, etc.
Partner Resources– webconferencing services– podcast production services– trade show discounts
Equipment– rentals and purchase discounts
Community– Itanium® architecture forums,
source code repository, document sharing and mailing lists
Training and Education– online and classroom training
News & Events
Where to go …
Software Developer Resource Kit for the Intel® Itanium®2 microarchitecture: www.hp.com/go/hpitaniumdvd
Development and Business Resources from HP & Intel for HP Integrity-based solutions: www.hp.com/go/dspp-eap
Contact points for additional information: Americas email: [email protected]
telephone 1.800.249.3294 Europe email: [email protected]
telephone 800.100.929.70Asia-Pac email: [email protected]
or go to www.hp.com/go/dspp for local country phone numbers
Complete Survey to Win
HP & Intel are giving away an HP laptop to 1(one) lucky winner!!
• Promotion Period ends November 19, 2006
• Attend a webcast AND complete the post-event survey.
• Full promotion details can be found on DSPP at: http://h21007.www2.hp.com/dspp/bus/bus_BusDetailPage_IDX/1,1252,9284,00.html
Tuesday, October 24 – New Dual-Core Processor and Server Hardware
Tuesday, November 28 – Open MPTuesday, December 19 – HP-MPISign up for the DSPP newsletter to get the latest
webcast information sent to you directly.
Webcast replays may also be found at: www.hp.com/go/itaniumwebcasts
Did you know...that your company can use this same webconferencing tool – at a discounted price - to promote your HP Integrity solutions to your staff and customers? For members only...
http://h21007.www2.hp.com/dspp/bus/bus_BusDetailPage_IDX/1,,9173!0!,00.html
More Events
Intel® Early Access Program - Technology
The Early Access Program (EAP) gives you access to Intel® technology to support your current development cycle as well as early access to tools and information on new technologies. Your membership includes:– Early access to pre-release software development platforms– Access to Intel and 3rd party software and testing tools– Training through Intel® Software College and Web events– Technical content and how–to articles– Protected remote access to
easily evaluate and develop software safely and securely on platforms over the Internet
Intel® Early Access Program -Marketing Opportunities and Support
• Extensive marketing and business development
opportunities: – Inclusion in online and print versions of the Intel® Developer
Solutions Catalog– Intel quotes to support your PR– Case studies– Access to Intel’s event marketing asset kit– Participation in selected industry events and trade shows
• Support in your development efforts provided through:– Access to an Intel Account
Representative who will act as your primary contact
– Intel® Premier Support for confidential technical support
– 24/7 online support via www.intel.com/software/support
Related Intel® Resources
• Intel® Early Access Program – http://www.intel.com/software/EAP
• Intel® Software Network– http://www.intel.com/software
• Intel® Software College– http://www.intel.com/software/college
• Intel® Software Development Tools– http://www.intel.com/software/products
• Experience Intel® Itanium® 2 Architecture– http://www.intel.com/cd/ids/developer/asmo-na/eng/661
76.htm
Q&A Session:To ask a question over the phone, press *1 on your touch-tone telephone.
September 2006Speaker: Stephen Williams Caliper Development Team Hewlett-Packard
Q&A Session:To ask a question over the phone, press *1 on your touch-tone telephone.