Real World Experience Running
GoldenGate on Exadata
January 20, 2013
Presented by: Alex FatkulinSenior Consultant
Who am I ?
Senior Technical Consultant at Enkitec
11 years using Oracle
Clustered and HA solutions
Database Development and Design
Technical Reviewer
Blog at http://afatkulin.blogspot.com
2
My Replication Experience
Materialized View Replication – since 8i
Oracle Streams – since 9iR2
Oracle GoldenGate – since 10.4 (2009)
3
GoldenGate + Exadata
Gaining a lot of market momentum
Common scenariosZero Downtime Migrations and UpgradesETL Data FeedsData Replication
Solution effectiveness depends on in-depth technical knowledge
Standard documentation is often not enough
4
Agenda
General configuration
Tips & TricksManagerExtractDataPumpReplicat
DBFS
Grid Infrastructure Integration
5
General Configuration
6
General Configuration
GoldenGate binaries local on each compute node
DBFSTrail filesParameter filesCheckpoint filesBounded recovery filesReport files (optional)
DB accountsGGEXT – ExtractGGREP – Replicat, GGSCHEMA
7
Manager
8
Manager
PURGEOLDEXTRACTS to delete old trail files purgeoldextracts ./dridat/aa, usecheckpoints, minkeephours 8,
maxkeephours 8
PURGEDDLHISTORY to cleanup DDL history tables purgeddlhistory minkeepdays 7, maxkeepdays 7
PURGEMARKERHISTORY to cleanup Marker Table purgemarkerhistory minkeepdays 7, maxkeepdays 7
Start other processes when Manager starts AUTOSTART ER * Required if using Oracle’s Grid Infrastructure integration scripts
9
Extract
10
Redo Access
Redo is located on ASM
Archived logs usually located on ASM
Extract redo access optionsASM InstanceDBLOGREADER Integrated Capture
11
Redo Access - ASM Instance
TRANLOGOPTIONS ASMUSER, ASMPASSWORD
Works through ASM instance callsdbms_diskgroup.getfileattrdbms_diskgroup.opendbms_diskgroup.read
Not very efficient
Legacy
12
Redo Access - DBLOGREADER
TRANLOGOPTIONS DBLOGREADER
Works through OCI callsOCIPOGGRedoLogOpenOCIPOGGRedoLogReadOCIPOGGRedoLogClose
Select Any Transaction privilege required
Available since GoldenGate 11.1 and Oracle 10.2.0.5
13
Redo Access - Integrated Capture
Oracle Streams Capture front end
Extract becomes an XStreams clientReceives LCRs and transforms these to trail filesOracle Streams Complexity is hidden by ggsci
Allows access to all Oracle Streams Capture features
Available since GoldenGate 11.2
Latest BP recommended (Streams Capture bugs)
14
Extract – SCN token
Capture SCN for every operation in the trail file table user1.*, tokens(SCN=@getenv("oratransaction","scn"));
15
Logdump 10 >open ./dirdat/aa000002Current LogTrail is /u01/app/oracle/dbfs_mount/dbfs/ggs/dirdat/aa000002Logdump 11 >usertoken detailLogdump 12 >ggstoken detailLogdump 15 >n
2013/01/26 15:00:18.000.000 Insert Len 9 RBA 1092Name: SRC1.TAfter Image: Partition 4 GU s 0000 0005 0000 0001 32 | ........2
User tokens: 12 bytesSCN : 9352124
GGS tokens:TokenID x52 'R' ORAROWID Info x00 Length 20 4141 414f 7261 4141 4641 4144 4141 5441 4142 0001 | AAAOraAAFAADAATAAB..TokenID x4c 'L' LOGCSN Info x00 Length 7 3933 3532 3132 34 | 9352124TokenID x36 '6' TRANID Info x00 Length 8 3130 2e36 2e37 3639 | 10.6.769
Extract – Compressed Tables
Extract will ABEND if not using Integrated Capture
16
ERROR OGG-01028 Object with object number 60573 is compressed. Table compression is not supported.
Space Advisor is often the causeDBMS_TABCOMP_TEMP_CMP
Table may no longer exist (dropped)Looking up in DBA_OBJECTS will produce zero rows
Extract – Compressed Tables
17
SQL> select owner, object_name from dba_objects where object_id=60573;
no rows selected
SQL> select objectowner, objectname, optime from ggrep.ggs_ddl_hist where objectid = 60573 and fragmentno=1;
OBJECTOWNER OBJECTNAME OPTIME--------------- --------------- -------------------SRC1 COMP_TABLE 2013-01-26 16:09:43
SQL> begin 2 dbms_logmnr.start_logmnr( 3 startTime => to_date('2013-01-26 16:09:00', 'yyyy-mm-dd hh24:mi:ss'), 4 endTime => to_date('2013-01-26 16:10:00', 'yyyy-mm-dd hh24:mi:ss'), 5 Options => dbms_logmnr.DICT_FROM_ONLINE_CATALOG+dbms_logmnr.CONTINUOUS_MINE 6 ); 7 end; 8 / PL/SQL procedure successfully completed
SQL> select seg_owner, seg_name, to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss') dt from v$logmnr_contents where data_obj#=60573 and operation='DDL' and rownum=1;
SEG_OWNER SEG_NAME DT--------------- --------------- -------------------SRC1 COMP_TABLE 2013-01-26 16:09:45
Extract – Down Instances
Down Instances may prevent Extract from starting Instances kept offline in the cluster Instances that crashed
Extract checks for the latest SEQUENCE# lower than Extract’s begin time in V$LOG
If ARCHIVED = ‘YES’ it will lookup that SEQUENCE# in V$ARCHIVED_LOG
If archived log has been deleted Extract will ABENDCommonly happens if instance has been down for a
long time18
Extract – Down Instances
19
SELECT sequence#, DECODE(archived, 'YES', 1, 0) sequence#=34, archived=‘YES’ FROM v$log WHERE thread# = 2 AND sequence# = (select max(sequence#) from v$log where first_time < TO_DATE('2013-01-26 20:56:05', 'YYYY-MM-DD HH24:MI:SS') AND thread# = 2);
SELECT name no rows! FROM v$archived_log WHERE sequence# = 34 AND thread# = 2 AND resetlogs_id = 786746958 AND archived = 'YES' AND deleted = 'NO' AND standby_dest = 'NO' order by name DESC
ERROR OGG-00446 Could not find archived log for sequence 34 thread 2 under default destinations
Extract – Down Instances
20
create or replace view ggext.v$log as select group#, thread#, sequence#, bytes, blocksize, members, case thread# when 2 then 'NO' else archived end archived, status, first_change#, first_time, next_change#, next_time from sys.v_$log;
Temporary workaround (hack)
Extract will no longer try to lookup archived log and will be able to start
Extract – Cache Manager
21
CACHEMGR virtual memory values (may have been adjusted)CACHESIZE: 64GCACHEPAGEOUTSIZE (normal): 8MPROCESS VM AVAIL FROM OS (min): 128GCACHESIZEMAX (strict force to disk): 96G
Defaults might be set too high
Large transactions will cause Extract to consume up to CACHESIZEMight result in excessive swapping and memory
usage on the compute nodes
Adjust using CACHEMGR CACHESIZE 4G (example) Insufficient cache will impact large transactions
performance due to excessive page out
Extract – Bounded Recovery
22
Allows Extract to save in-flight transactions state
Located in GGS_HOME/BR directory
Done every 4 hours by defaultPerform now: SEND <GROUP> BR BRCHECKPOINT
IMMEDIATE
Make these available to each node in case of a failover
If bounded recovery files got corrupted Extract can still be started with BRRESET
Extract – Bounded Recovery
23
info EXA_EXT, showch... Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Thread #: 1 Sequence #: 84 RBA: 62266896 Timestamp: 2013-01-27 12:32:58.000000 SCN: 0.10578483 (10578483) Redo File: +DATA/dbm/onlinelog/group_2.258.786746973... BR Begin Recovery Checkpoint: Thread #: 2 Sequence #: 49 RBA: 340992 Timestamp: 2013-01-27 12:50:01.000000 SCN: 0.10600667 (10600667) Redo File:
Check bounded recovery info
DataPump
24
DataPump – General Config
Use PASSTHRU to skip data dictionary lookups
Specify GoldenGate VIP in RMTHOST If using Grid Infrastructure Integration
Use TCPFLUSHBYTES to allow larger writes on the Collector side
Use different names for source and destination trailsAvoids trail file purge bugs
25
DataPump – Network Compression
Trail files generally compress wellEverything passed as stringsFully qualified object names for each row changed
Use COMPRESS option (RMTHOST) to compress trails sent over the network
26
GGSCI (exa1.test.com) 37> send exa_dp tcpstats...Data compression is enabledCompress CPU Time 0:00:00.000000Compress time 0:00:00.581401, Threshold 1000Uncompressed bytes 77449138Compressed bytes 6291347, 133211222 bytes/second
DataPump – Trail not Available
Process will get stuck on positioning if trail [sequence] is not available
27
GGSCI (exa1.test.com) 4> add extract exa_dp, exttrailsource ./dirdat/aaEXTRACT added.GGSCI (exa1.test.com) 2> info EXA_DP
EXTRACT EXA_DP Last Started 2013-01-26 19:51 Status RUNNINGCheckpoint Lag 00:00:00 (updated 00:00:03 ago)Log Read Checkpoint File ./dirdat/aa000000 First Record RBA 0
...open("./dirdat/aa000000", O_RDONLY) = -1 ENOENT (No such file or directory)nanosleep({1, 0}, NULL) = 0open("./dirdat/aa000000", O_RDONLY) = -1 ENOENT (No such file or directory)nanosleep({1, 0}, NULL) = 0...
GGSCI (exa1.test.com) 7> alter EXA_DP, extseqno 2EXTRACT altered.
Replicat
28
Replicat – General Configuration
Use BATCHSQL where appropriate
Capturing SCNs as tokens on Extract side greatly helps in troubleshooting
Use multiple Replicat and Service Names to direct the workloadSegregate workload by instance affinity if you can
29
srvctl add service -d dbm -s ogg_rep1 -r dbm1 -a dbm2,dbm3,dbm4 ...srvctl add service -d dbm -s ogg_rep2 -r dbm2 -a dbm1,dbm3,dbm4 ......
Replicat - Sequences
Not very efficient sequence replication algorithmNo bind variables in replicateSequence calls
Larger sequence cache on source helps somewhat
30
BEGIN ggext .replicateSequence (TO_NUMBER(2), TO_NUMBER(20), TO_NUMBER(1), 'REP1', TO_NUMBER(0), 'S1', UPPER('ggrep'), TO_NUMBER(1), TO_NUMBER (0), ''); END;
Sequence values increment one-by-one and in nocache modeSYS.SEQ$ might become point of contention
Can result in a significant drag on highly active DBs
Replicat – Transient PK Updates
In the past transient PK updates were problematic
31
SQL> select * from src1.t; N V-- - 1 a 2 a 3 a
SQL> update src1.t set n=n+1; 3 rows updated
SQL> commit; Commit complete
Replicat – Transient PK Updates
Handled transparently since 11.2.0.2
32
SQL> update src1.t set n=2 where n=1; update src1.t set n=2 where n=1 ORA-00001: unique constraint (SRC1.SYS_C004692) violated
SQL> exec dbms_xstream_gg.enable_tdup_workspace; PL/SQL procedure successfully completed SQL> update src1.t set n=2 where n=1; 1 row updated ... SQL> exec dbms_xstream_gg.disable_tdup_workspace; PL/SQL procedure successfully completed SQL> commit; Commit complete
Replicat – GGS_STICK table
Temporary table used by DDLREPLICATION package
Any session which performed DDL will hold a TO enqueue on GGS_STICKTemporary Table Object Enqueue
Will prevent GGSCHEMA user drop
33
SQL> drop table ggrep.ggs_stick; drop table ggrep.ggs_stick ORA-14452: attempt to create, alter or drop an index on temporary table already in use
DBFS
34
DBFS
Create non-partitioned file system
Mount on all nodes
Use Oracle Grid Infrastructure to control where GoldenGate is runningAvoids accidental trail corruption
35
DBFS Performance
Understanding I/O profileExtract
4KB writes into the trail
DataPump 1MB reads from the trail
Collector 24KB (and smaller) writes into the trail (default) Use DataPump’s RMTHOST TCPFLUSHBYTES to tune
Replicat 1MB reads from the trail
AIO not utilized by GoldenGate
36
DBFS Performance
All IO ends up in a SecureFile segment inside a DBRelatively long code pathFavors throughput vs latency
Set SecureFiles segments to cachealter table dbfs.t_dbfs modify lob (filedata) (cache)
Put segments into recycle pool (if configured)alter table dbfs.t_dbfs modify lob (filedata) (storage
(buffer_pool recycle))
37
Grid Infrastructure Integration
38
Grid Infrastructure Integration
Note 1313703.1 Oracle GoldenGate high availability using Oracle ClusterwareRelies on Manager process to control everything elseGoldenGate checkpoint files manipulations
(copy/delete)
Use Oracle Grid Infrastructure Bundled AgentsRelies on Manager process as well
Write your own scripts
39
Grid Infrastructure Bundle Agents
Download from Oracle Clusterware web pagehttp://oracle.com/goto/Clusterware
Unzip into temporary location and install
40
./xagsetup.sh --install --directory /u01/app/oracle/xag --nodes exa2,exa3,exa4
Grid Infrastructure Bundle Agents
Make sure CRS_HOME environment variable is setScript relies on CRS_HOME to find crsctl executable
41
./agctl.pl add goldengate ogg1 \--gg_home /u01/app/oracle/ggs \--instance_type both \--oracle_home /u01/app/oracle/product/11.2.0/db_1 \--db_services dbm.ogg_rep1 \--databases dbm \--monitor_extracts exa_ext \--monitor_replicats exa_rep \--vip_name ora.dbm1.vip
[oracle@exa1 ~]$ crsctl status res xag.ogg1.goldengateNAME=xag.ogg1.goldengateTYPE=xag.goldengate.typeTARGET=OFFLINESTATE=OFFLINE
[oracle@exa1 ~]$ crsctl start res xag.ogg1.goldengateCRS-2672: Attempting to start 'xag.ogg1.goldengate' on ‘exa1'CRS-2676: Start of 'xag.ogg1.goldengate' on ‘exa1' succeeded
Write your own scripts
Not as hard as you can imagine
Create separate resource scriptsManagerExtractReplicatDataPump
Add resource example
42
crsctl add resource $RESNAME \ -type local_resource \ -attr "ACTION_SCRIPT=$ACTION_SCRIPT,\ CHECK_INTERVAL=30,RESTART_ATTEMPTS=10,\START_DEPENDENCIES='hard(ora.dbm.db,dbfs_mount,intermediate:ora.dbm1.vip)pullup(ora.dbm.db,dbfs_mount,intermediate:ora.dbm1.vip)',\ STOP_DEPENDENCIES='hard(ora.dbm.db,dbfs_mount,intermediate:ora.dbm1.vip)',\ SCRIPT_TIMEOUT=300"