Oow2007 performance

<Insert Picture Here>

Practical Performance Management for Oracle RAC

Barb Lundhild RAC Product ManagementMichael Zoll RAC Development, Performance

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.


Agenda

• Oracle RAC Fundamentals and Infrastructure • Common Problems and Symptoms• Application and Database Design• Diagnostics and Problem Determination • Summary: Practical Performance Analysis• Appendix

OBJECTIVE

• Realize that Oracle RAC performance does not requires “Black Magic”

• General system and SQL analysis and tuning experience is practically sufficient for Oracle RAC

• Problems can be identified with a minimum of metrics and effort

• Diagnostics framework and advisories are efficient


RAC Fundamentals and Infrastructure

Service

Oracle RAC Architecture

public network

Node1

Operating System

Oracle Clusterware

instance 1

ASM

VIP1

ListenerNode 2

Operating System

Oracle Clusterware

instance 2

ASM

VIP2

Listener

Service Node n

Operating System

Oracle Clusterware

instance n

ASM

VIPn

Listener

Service

/…/

Redo / Archive logs all instances

shared storage

Database / Control files

OCR and Voting Disks

Managed by ASM

RAW Devices

Oracle Clusterware

Node1 public network

EVMD

CRSD

OPROCD

ONS

VIP1

CSSDNode 2

EVMD

CRSD

OPROCD

ONS

VIP2

CSSDNode n

EVMD

CRSD

OPROCD

ONS

VIPn

CSSD

/…/

shared storage

OCR and Voting DisksRAW Devices

CSSDRuns in Real

Time Priority

Under the Covers

Redo Log Files

Node nNode 2

Data Files and Control Files

Redo Log Files Redo Log Files

DictionaryCache

Log buffer

VKTM LGWR DBW0

SMON PMON

LibraryCache

Global Resoruce Directory

LMS0

Instance 2

SGA

Instance n

Cluster Private High Speed Network

Buffer Cache

LMON LMD0 DIAG

Dictionary Cache

Log buffer

VKTM LGWR DBW0

SMON PMON

Library Cache


LMS0

Buffer Cache

LMON LMD0 DIAG

Dictionary Cache

Log buffer

VKTM LGWR DBW0

SMON PMON

Library Cache


LMS0

Buffer Cache

LMON LMD0 DIAG

Instance 1

Node 1

SGA SGA

Runs in Real

Time Priority

Global Cache Service (GCS)

• Manages coherent access to data in buffer caches of all instances in the cluster

• Minimizes access time to data which is not in local cache • access to data in global cache faster than disk access

• Implements fast direct memory access over high-speed interconnects • for all data blocks and types

• Uses an efficient and scalable messaging protocol• Never more than 3 hops

• New optimizations for read-mostly applications

Cache Hierarchy: Data in Remote Cache

Local Cache Miss

Datablock Requested

Datablock Returned

Remote Cache Hit

Cache Hierarchy: Data On Disk

Local Cache Miss

Datablock Requested

Grant Returned

Remote Cache Miss

Disk Read

Cache Hierarchy: Read Mostly

Local Cache Miss

No Message required Disk Read

Performance of Cache Fusion

Message:~200 bytes

Block: e.g. 8K

LMS

Initiate send and wait

Receive

Process block

Send

Receive

200 bytes/(1 Gb/sec )

8192 bytes/(1 Gb/sec)

Total access time: e.g. ~360 microseconds (UDP over GBE)Network propagation delay ( “wire time” ) is a minor factor for roundtrip time

( approx.: 6% , vs. 52% in OS and network stack )

Fundamentals: Minimum Latency (*), UDP/GBE and RDS/IB

Block size

RT (ms)

2K 4K 8K 16K

UDP/GE 0.30 0.31 0.36 0.46

RDS/IB 0.12 0.13 0.16 0.20

(*) roundtrip, blocks are not “busy” i.e. no log flush, no serialization ( “buffer busy”)AWR and Statspack reports would report averages as if they were normally distributed, the session wait history which is included in Statspack in 10.2 and AWR in 11g will show the actual quantilesThe minimum values in this table are the optimal values for 2-way and 3-way block transfers, but can be assumed to be the expected values ( I.e. 10ms for a 2-way block would be very high )

Infrastructure: Private Interconnect

• Network between the nodes of a RAC cluster MUST be private• Best practice is not to share IE with ISCSI storage

• Supported links: GbE, IB ( IPoIB: 10.2 ) • Supported transport protocols: UDP, RDS (10.2.0.3)• Use multiple or dual-ported NICs for redundancy and

increase bandwidth with NIC bonding• Large ( Jumbo ) Frames for GbE recommended

Infrastructure: Interconnect Bandwidth

• Bandwidth requirements depend on several factors ( e.g. buffer cache size, #of CPUs per node, access patterns) and cannot be predicted precisely for every application

• Typical utilization approx. 10-30% in OLTP• 10000-12000 8K blocks per sec to saturate 1 x Gb Ethernet ( 75-

80% of theoretical bandwidth )

• Generally, 1Gb/sec sufficient for performance and scalability in OLTP.

• DSS/DW systems should be designed with > 1Gb/sec capacity• A sizing approach with rules of thumb is described in

• Project MegaGrid: Capacity Planning for Large Commodity Clusters (http://otn.oracle.com/rac)

Infrastructure: IPC configuration

• Important Settings:• Negotiated top bit rate and full duplex mode • NIC ring buffers• Ethernet flow control settings• CPU(s) receiving network interrupts

• Verify your setup:• CVU does checking • Load testing eliminates potential for problems• AWR and ADDM give estimations of link utilization

• Buffer overflows, congested links and flow control can have severe consequences for performance

Infrastructure: Operating System

• Block access latencies increase when CPU(s) busy and run queues are long • Immediate LMS scheduling is critical for predictable block

access latencies when CPU > 80% busy

• Fewer and busier LMS processes may be more efficient. • monitor their CPU utilization• Caveat: 1 LMS can be good for runtime performance but

may impact cluster reconfiguration and instance recovery time

• the default is good for most requirements

• Higher priority for LMS is default• The implementation is platform-specific


Common Problems and Symptoms


Common Problems and Symptoms

• “Lost Blocks”: Interconnect or Switch Problems

• Slow or bottlenecked disks• System load and scheduling • Contention• Unexpectedly high latencies

Miss-configured or Faulty Interconnect Can Cause:

• Dropped packets/fragments• Buffer overflows• Packet reassembly failures or timeouts• Ethernet Flow control kicks in• TX/RX errors

“lost blocks” at the RDBMS level, responsible for 64% of escalations

“Lost Blocks”: NIC Receive Errors

Db_block_size = 8K

ifconfig –a:

eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04

inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95

TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0

…

“Lost Blocks”: IP Packet Reassembly Failures

netstat –s

Ip: 84884742 total packets received … 1201 fragments dropped after timeout … 3384 packet reassembles failed

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time Wait Class----------------------------------------------------------------------------------------------------

log file sync 286,038 49,872 174 41.7 Commit

gc buffer busy 177,315 29,021 164 24.3 Cluster

gc cr block busy 110,348 5,703 52 4.8 Cluster

gc cr block lost 4,272 4,953 1159 4.1 Cluster

cr request retry 6,316 4,668 739 3.9 Other

Finding a Problem with the Interconnect or IPC

Should never be here

Global Cache Lost block handling

• Detection Time in 11g reduced• 500ms ( around 5 secs in 10g )• can be lowered if necessary • robust ( no false positives )• no extra overhead

• Cr request retry event related to lost blocks• It is highly likely to see it when gc cr blocks lost show up

Interconnect StatisticsAutomatic Workload Repository (AWR )

Target Avg Latency Stddev Avg Latency Stddev

Instance 500B msg 500B msg 8K msg 8K msg

---------------------------------------------------------------------

1 .79 .65 1.04 1.06

2 .75 .57 . 95 .78

3 .55 .59 .53 .59

4 1.59 3.16 1.46 1.82

---------------------------------------------------------------------

Latency probes for different message sizes

Exact throughput measurements ( not shown)

Send and receive errors, dropped packets ( not shown )

“Blocks Lost”: Solution

• Fix interconnect NICs and switches• Tune IPC buffer sizes

Disk IO Performance Issues

• Log flush IO delays can cause “busy” buffers• “Bad” queries on one node can saturate an

interconnect link• IO is issued from ALL nodes to shared storage

• Use Automatic Database Diagnostic Monitor (ADDM) /AWR • single system image of I/O across cluster

Cluster-wide impact of IO or query plan issues responsible for 23% of escalations

Cluster-Wide I/O Impact

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time ------------------------------ ------------ ----------- ------ ------

log file sync 286,038 49,872 174 41.7

gc buffer busy 177,315 29,021 164 24.3

gc cr block busy 110,348 5,703 52 4.8 ``

Load Profile~~~~~~~~~~~~ Per Second

---------------

Redo size: 40,982.21

Logical reads: 81,652.41

Physical reads: 51,193.37

Node 2

Node 1

Expensive Query in Node 2

1. IO on disk group containing

redo logs is bottlenecked

2. Block shipping for “hot” blocks

is delayed by log flush IO

3. Serialization/Queues build up

IO and/or Bad SQL problem fixed

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class--------------------------- --------- ----------- ---- ------ ----------

CPU time 4,580 65.4

log file sync 276,281 1,501 5 21.4 Commit

log file parallel write 298,045 923 3 13.2 System I/O

gc current block 3-way 605,628 631 1 9.0 Cluster

gc cr block 3-way 514,218 533 1 7.6 Cluster

1. Log file writes are normal

2. Global serialization has disappeared

Drill-down: An IO capacity problem

Symptom of Full Table Scans I/O contention

Top 5 Timed Events Avg %Total wait CallEvent Waits Time(s) (ms) Time Wait Class---------------- -------- ------- ---- ---- ----------

db file scattered read 3,747,683 368,301 98 33.3 User I/O

gc buffer busy 3,376,228 233,632 69 21.1 Cluster

db file parallel read 1,552,284 225,218 145 20.4 User I/O

gc cr multi block 35,588,800 101,888 3 9.2 Clusterrequest

read by other session 1,263,599 82,915 66 7.5 User I/O

IO issues: Solution

• Tune IO layout• Tune queries with a lot of IO

CPU Saturation or Long Run Queues

Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s) (ms) Time Wait Class----------------- --------- ------ ---- ----- ----------

db file sequential 1,312,840 21,590 16 21.8 User I/Oread

gc current block 275,004 21,054 77 21.3 Clustercongested

gc cr grant congested 177,044 13,495 76 13.6 Cluster

gc current block 1,192,113 9,931 8 10.0 Cluster2-way

gc cr block congested 85,975 8,917 104 9.0 Cluster“Congested” : LMS could not dequeue messages fast enough Cause : Long run queue, CPU starvation

High CPU Load: Solution

• Run LMS at higher priority (default)• Start more LMS processes • Reduce the number of user processes• Find cause of high CPU consumption

Contention

Event Waits Time (s) AVG (ms) % Call Time

---------------------- --------- -------- -------- --------

gc cr block 2-way 317,062 5,767 18 19.0

gc current block 2-way 201,663 4,063 20 13.4

gc buffer busy 111,372 3,970 36 13.1

CPU time 2,938 9.7

gc cr block busy 40,688 1,670 41 5.5 -------------------------------------------------------

Global Contention on DataSerialization

Its is very likely that CR BLOCK BUSY and GC BUFFER BUSY are related

Contention: Solution

• Identify “hot” blocks in application• Reduce concurrency on hot blocks

High Latencies

Event Waits Time (s) AVG (ms) % Call Time

---------------------- ---------- ---------- --------- --------

gc cr block 2-way 317,062 5,767 18 19.0

gc current block 2-way 201,663 4,063 20 13.4

gc buffer busy 111,372 3,970 36 13.1

CPU time 2,938 9.7

gc cr block busy 40,688 1,670 41 5.5

-------------------------------------------------------

Tackle latency first, then tackle busy events

Expected: To see 2-way, 3-way

Unexpected: To see > 1 ms (AVG ms should be around 1 ms)

High Latencies : Solution

• Check network configuration • Private• Running at expected bit rate

• Find cause of high CPU consumption• Runaway or spinning processes

Health Check

Look for: • Unexpected Events

gc cr block lost 1159 ms

• Unexpected “Hints”• Contention and Serialization

gc cr/current block busy 52 ms

• Load and Schedulinggc current block congested 14 ms

• Unexpected high avggc cr/current block 2-way 36 ms


Application and Database Design

General Principles

• No fundamentally different design and coding practices for RAC

• BUT: flaws in execution or design have higher impact in RAC• Performance and scalability in RAC will be more sensitive to

bad plans or bad schema design • Serializing contention makes applications less scalable

• Standard SQL and schema tuning solves > 80% of performance problems

Scalability Pitfalls

• Serializing contention on a small set of data/index blocks• monotonically increasing key • frequent updates of small cached tables• segment without automatic segment space managmenent

(ASSM) or Free List Group (FLG)• Full table scans

• Optimization for full scans in 11g can save CPU and latency

• Frequent invalidation and parsing of cursors• Requires data dictionary lookups and synchronizations

• Concurrent DDL ( e.g. truncate/drop )

Health Check

Look for:• Indexes with right-growing characteristics

• Eliminate indexes which are not needed

• Frequent updated and reads of “small” tables• “small”=fits into a single buffer cache • Sparse blocks ( PCTFREE 99 ) will reduce serialization

• SQL which scans large amount of data• Perhaps more efficient when parallelized• Direct reads do not need to be globally synchronized

( hence less CPU for global cache )


Diagnostics and Problem Determination

MOST OF THE TIME, A PERFORMANCE PROBLEM IS NOT A Oracle RAC PROBLEM

Checklist for the Skeptical Performance Analyst ( AWR based )• Check where most of the time in the

database is spend (“Top 5” )• Check whether gc events are “busy”,

“congested”• Check the avg wait time• Drill down

• SQL with highest cluster wait time • Segment Statistics with highest block transfers

or JUST USE ADDM with Oracle RAC 11g!

Drill-down: An IO capacity problem

Symptom of Full Table Scans I/O contention

Top 5 Timed Events Avg %Total wait CallEvent Waits Time(s) (ms) Time Wait Class---------------- -------- ------- ---- ---- ----------

db file scattered read 3,747,683 368,301 98 33.3 User I/O

gc buffer busy 3,376,228 233,632 69 21.1 Cluster

db file parallel read 1,552,284 225,218 145 20.4 User I/O

gc cr multi block 35,588,800 101,888 3 9.2 Clusterrequest

read by other session 1,263,599 82,915 66 7.5 User I/O

Drill-down: SQL Statements

“Culprit”: Query that overwhelms IO subsystem on one node

Physical Reads Executions per Exec %Total

-------------- ----------- ------------- ------

182,977,469 1,055 173,438.4 99.3

SELECT SHELL FROM ES_SHELL WHERE MSG_ID = :msg_id ORDER BY ORDER_NO ASC

The same query reads from the interconnect:

Cluster CWT % of CPU

Wait Time (s) Elapsd Tim Time(s) Executions

------------- ---------- ----------- --------------

341,080.54 31.2 17,495.38 1,055

SELECT SHELL FROM ES_SHELL WHERE MSG_ID = :msg_id ORDER BY ORDER_NO ASC

GC

Tablespace Subobject Obj Buffer % of

Name Object Name Name Type Busy Capture-------- ------------- -------- ------ ------- ------

ESSMLTBL ES_SHELL SYS_P537 TABLE 311,966 9.91



…

Drill-Down: Top Segments

Apart from being the table with the highest IO demandit was the table with the highest number of block transfersAND global serialization

Findings Summary in EM

• Each finding type has a descriptive name • Facilitates search / aggregation / directives etc.

Recommendations

• Most relevant data for analysis can be derived from the wait events

• Always use Enterprise Manager (EM) and ADDM reports for performance health checks and analysis

• Activity Session History (ASH) can be used for session-based analysis of variation

• Export AWR repository regularly to save all of the above

ADDM Diagnosis for RAC

• Data sources are:• Wait events (especially Cluster class and buffer busy)• ASH• Instance cache transfer data • Interconnect statistics (throughput, usage by component, pings)

• ADDM analyzes for both the entire database (DATABASE analysis mode) and for each instance (INSTANCE analysis mode).

• Analysis of both database and instance resources summarized in a single report

• Allows drill down to specific instance.

What ADDM Diagnoses for RAC

• Latency problems in interconnect• Congestion (identifying top instances affecting the

entire cluster)• Contention (buffer busy, top objects etc.)• Top consumers of multiblock requests• Lost blocks• Reports information about interconnect devices. Warns

about using PUBLIC interfaces.• Reports throughput of devices, and how much of it is

used by Oracle and for what purpose (GC, locks, PQ)

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

OTHER SESSIONS to CHECKOUT

THURSDAY TIME

Title

10:00 AM S291242 Demystifying Oracle RAC Internals South 104

1:00 PMS291662 Using Oracle RAC and Microsoft Windows 64-bit as the Foundation (with Intel and Talx) South 309

4:00 PMS291670 Oracle Database 11g: First Experiences with Grid Computing (with Mobiltel and BCF) South 310

For More Information

http://search.oracle.com

or

otn.oracle.com/rac

REAL APPLICATION CLUSTERS

Date post:	11-May-2015
Category:	Technology
Upload:	ricky-zhu
View:	706 times
Download:	0 times

Oow2007 performance

Technology