Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | louis-durham |
View: | 55 times |
Download: | 0 times |
04/19/23 2
TOC
• Term & History
• Disaster Recovery Planning
• Backup & Restore Procedures
• Architecture (XPS differences)
• The grab bag
04/19/23 3
Terminology
• Serial Backup– Archives the entire system at a single point in time using
only one data stream
• Parallel Backup– Archives the requested dbspace one at a time to N data
streams
• External Backup– Allows a third party application to backup the database
server while maintain logical consistency
04/19/23 4
Terminology
• Cold Restore– Restoring the server when the database engine is
offline
• Warm Restore– Restores of dbspaces which occur while the
database engine is online
• Mixed Restore– A cold restore of set dbspaces followed by a warm
restore of other dbspaces
04/19/23 5
Terminology
• Imported Restore– Transferring an archive taken on one computer and
restoring it on a second computer
• Point-in-Time Restore– Restoring the entire system to a single point it time
• Restartable Restore– Allows the DBA to pickup the restore from the
failure point
04/19/23 6
Early Backup and Restore History
• 1.X Turbo– Only Quiescent mode archives
• 4.X named OnLine for advanced archiving technology
• 5.X same core technology– limitation revealed (scalability & extensibility)
04/19/23 7
DSA Backup and Restore History
• 6.0 new client/server model developed• 7.1 & 7.20 same core technology• 7.21 new client (onbar)• 7.3 server API re-write• 9.2 onbar usability features added
04/19/23 8
Pre-DSA Archive Bad Grammar Archive
• Archive Checkpoint (get timestamp)
• Free extents recorded
• Reserve pages saved
• Chunks backed-up by ascending chunk number
• Pages modified during archive are placed in physical log
• tbtape routinely scans physical log for unarchived before-images
• Pages placed directly to tape
04/19/23 9
Pre-DSA Restore
• Begins with OnLine off-line
• Reads configuration file, matches params to config params of archive tape
• Zero out logs (physical & logical)
• Validate size of all chunks
• Read tape, copying pages based on their address directly to disk
04/19/23 10
DSA Archive Architecture
Major Differences• True client-server architecture
• Archived pages logically grouped by dbspaces
• Granularity of creations
• Granularity of restores
• Warm restores
• Physical log pages kept in temp tables
04/19/23 11
Server Algorithm ChangesGood Grammar Archive
• List is made of all pages that should be archived– Cost vs Benefit
• Before images are queued by the modifier
• A new thread is responsible for the before image handling
04/19/23 13
What is a Successful Recovery?
• “Successful” recovery is defined by your business needs
04/19/23 14
Goals For Recovery
• Determine acceptable recovery time – How long can your business function without
the data?– How long can your production system be down
during a restore?
04/19/23 16
Recovery Strategy
Tune theStrategy
Analyze/Testthe Strategy
ImplementThe Strategy
SelectTools
Plan RecoveryGoals
04/19/23 17
Data Layout
• Poor data layout can hurt BAR performance
• Isolating the different types of data can facility restore priority
• Example– 8 dbspaces each with 2 chunk, but one dbspace
has 68 chunk
04/19/23 18
Data Layout Examples
• Important frequently modified in its own dbspaces– important data such as orders should
dbspace_orders– dbspace containing zipcodes and other lightly
modified data can be backed up with less frequency
04/19/23 20
Select Tools
Backup Utilitiesontape
ON-Bar
External Backup/Restore
Fault Tolerance MechanismsMirroring
High Availability Data Replication (HDR)
Enterprise Data Replication (DR)
Load/UnloadHigh Performance
Loader (HPL)
dbexport/dbimport
dbschema
SQL load/unload
onload/onunload
dbload
Customer ESQL programs
04/19/23 21
Ontape Backup Features
• Backup at the Server level
• Support for incremental backups
• Manual or continuous logical log backup
• Restore entire system or single dbspace
• Backup is self describing
04/19/23 22
On-Bar Backup Features
• Parallel backup and restore
• System and dbspace level backup and restore
• Support for incremental backups
• Manual or automatic backup of logical logs
• Instance point-in time recovery
• Open interface for communication with storage managers (XBSA)
04/19/23 23
External Backup Features
• EBR allows administrators to make a consistent copy of their dbspaces using external tools
• Used with many 3rd party backup products
• Allows for both cold and warm restores
04/19/23 24
EBR - Examples
• Planned uses:– File system snapshots– Breaking of mirrors– Third party “raw” backup
• Basic Steps– Block coserver(s) at checkpoint– Backup dbspaces using third party tools– Unblock coserver(s)
04/19/23 25
Restoring
• Logical Logs required
• Restore looks hung, nothings happening
• Handling unanticipated problems
04/19/23 26
Logical Logs Required for a Restore
• Cold Parallel Restore– Starting log is the log that contains the begin of the
oldest active transaction when the first archive checkpoint occurred
– At least the logical log that contains the last archive checkpoint
• Cold Whole System (Non-Parallel)– No logical logs required– Logs included with archive
04/19/23 27
Logical Logs Required for a Restore
• Warm Restore– Starting log is the log that contains the begin of
the oldest active transaction when the first archive checkpoint
– All logs to the current point in time
• If you are using DR then you must include the replay point
04/19/23 28
Example of Logical Logs Required for a Restore
Archive Checkpoint
Log 10 Log 11 Log 12 Log 13
B
B Oldest Begin Work
BB
Cold restore all Logs 10-12 Optional 13
Warm restore Logs 11-> No Optional Logs
Logs Required
04/19/23 29
Restartable vs. Suspended Restored
• Restartable Restore– When the database engine prematurely shuts
down the engine may be restarted in recovery mode
• Suspended Restore– When the archive client receives an error which
is restartable and the database engine does not shutdown
Restartable Restore
• Turned OFF by default
• What can restart when? – Whole system– Partial Restore– Logical Recovery from a cold restore
• Only available with On-BAR
• onbar -RESTART
04/19/23 31
Architecture
• Overview• Archive Clients• Moving Data
– IDS
– XPS
• Server Threads• XPS Architecture
04/19/23 32
What Pages are Sent to the Archive
• If page’s timestamp is older than maxstamp and newer than minstamp, it is put to tape
• If a page is greater than current stamp, but older than minstamp, it is put to tape, and it’s timestamp is updated to maxstamp-1
• Pages newer than max, but older than current are considered to be modified after the archive started, and are ignored.
04/19/23 34
OnLine Wheel-O-Death
0
Not ArchivedNot Archived
Max-StampMin-Stamp
Current Stamp
The timestamp at the start of the archive
The timestamp at the current point in time
The timestamp 50% away from Max-Stampie Max-Stamp - 2GB
All Pages in the red region have their timestamp updated along with being archived
04/19/23 36
DSA Client Server Model
SQLI/ASFNetwork Connection
StreamsLocal Connection
Archive Client
ArchiveBE
04/19/23 37
Moving Data between Client and Server
Archive Client
ONINIT
Shared Memory
SQLI RequestsArchive Data Buffer
SQLI Returns SharedMemory Address
04/19/23 38
Moving Data between Client/Server
• The size of the buffers used to transmit data– ontape - control by onconfig’s TAPEBLOCK– onBar - BAR_XFER_BUFSIZE - maximum size is one
online page smaller than 64kb
• The number of buffers:– ontape – onbar - BAR_XPORT_COUNT min 3 max 99
• Monitoring the data transfer– onstat -g stq
04/19/23 39
What Data is Shipped to the Archive Client
• Server sends raw online pages just like they exist on disk
04/19/23 40
Example of onstat -g stq
Stream Queue: (session 11 cnt 10) 0:ad91400 1:ada1400 2:adb1400 3:adc1400 4:add1400 5:ade1400 6:adf1400 7:ae01400 8:ae11400 9:ae21400
Full Queue: (cnt 0 waiters 0) 0:0 1:ada1400 2:adb1400 3:adc1400 4:add1400 5:ade1400 6:adf1400 7:ae01400 8:ae11400
Empty Queue: (cnt 0 waiters 1)
Stream Queue: (session 10 cnt 10) 0:ac8d400 1:ac9d400 2:acad400 3:acbd400 4:accd400 5:acdd400 6:aced400 7:acfd400 8:ad0d400 9:ad1d400
Full Queue: (cnt 9 waiters 0) 0:ac9d400 1:acad400 2:0 3:accd400 4:acdd400 5:aced400 6:acfd400 7:ad0d400 8:ad1d400
Empty Queue: (cnt 0 waiters 1)
04/19/23 42
Ontape Thread
• Always called ontape regardless of archive client
• Responsible for all communication to archive client
04/19/23 43
Scanner Thread (arc_backup1)
• The “dummy” thread, geared for I/O performance and not thinking
• Handed a list of pages to backup• Scans data from disk into shared
memory buffers• Makes NO decisions about the data• Ensures the page address is correct
04/19/23 44
Before Image Processor Thread (arc_backup2)
• Monitors the before image queues• Determines if the before image
needs to be saved or discarded• Drains the before image memory
queue, by storing the page images into temp tables
• Creates multiple temp tables if required
04/19/23 45
XPS Difference & Architecture Overview
• Basic XPS Architecture– Client Sub-Systems– Server Sub-Systems
• Differences– sysutils– configuration
04/19/23 46
Basic XPS Architecture
onbar
StorageManager 1
StorageManager 2
OnLine XPS
Coserver 3
Coserver 2
Coserver 4
Coserver 1
04/19/23 47
Client Sub-Systems
Executable Function
onbar Shell script wrapper
onbar_d The driver
start_worker Shell script wrapper
onbar_w Worker process
onbar_m Distributes bootfiles
onbar_s Checks server state
04/19/23 48
Client Sub-Systems
onbar
StorageManager 1
OnLine XPS
Coserver 3
Coserver 2
Coserver 4
Coserver 1
onbar_d
onbar_w
04/19/23 49
• ASF/local streams– Send/Receive commands and data buffers
• Backup Scheduler (BUS)– distributes tasks to workers
• XBAR– communicates between coservers
• RSAM– only sees a single coserver– manages all I/O to disk (dbspaces/chunks)
Server Sub-Systems
New
New
04/19/23 50
XBAR
• Interfaces with both BUS and RSAM• Manages distributed execution of backup and
restores– transfers data from the object’s coserver (coserver
where the dbspace/chunk exists) to onbar_w’s coserver (output coserver)
– Uses XMF between coservers– Uses local stream between onbar_w and output
coserver
04/19/23 51
• Manages user requests, workers, storage managers and coservers
• Farms out work to onbar_w
• Reports success or failure to onbar_d after each work item has been attempted
• onbar_w create a new worker queue in the bus when it is started
Backup Scheduler (BUS)
04/19/23 52
XBAR/BUS support in SMI
• New tables for BUS data structures:– sysbusession list of sessions
– sysbuobject what’s in the queue
– sysbuobjses for which session
– sysbusm BAR_SM paragraphs
– sysbusmdbspace space to BAR_SM map
– sysbusmlog logstream to BAR_SM map
– sysbusmworker worker to BAR_SM map
– sysbuworker info about each onbar_w
04/19/23 53
Moving Data between Client/Server Version 8
onbar
StorageManager 1
OnLine XPS
Coserver 3
Coserver 2
Coserver 4
Coserver 1
onbar_d
SQLI
Shared Memory
SQLIonbar_w
04/19/23 54
Difference Between8 and 7
• Multiple Nodes
• Non-locality of devices and data– Backup data may be shipped between nodes
• Multiple Storage Managers– One Storage manager can server the entire
system– Multiple storage managers can eliminate
performance bottlenecks for large systems
04/19/23 55
Difference immediately seen by DBAs
• Command line is slightly different
• Configuration parameters are very different– Version 7 has 6 configuration
parameters, none needs to be set
– Version 8 has 15 configuration parameters, most must be configured
04/19/23 56
Difference immediately seen by DBAs
• sysutils has more columns
• Emergency bootfiles– more columns– 1 boot file per coserver– Merge boot files
• Additional onstat options
04/19/23 58
arc_very_old_pages()
• Permanent solution #1– No longer use timestamps for recovery
– Disk timestamps do not need to be refreshed
– Memory and disk timestamp are different
– Bitmaps used to keep track of foreground writes
• Permanent solution #2– Multiple instances of the same page in the physical log
– Only the oldest instance of a page is restored during physical recovery
04/19/23 59
7.31 Solution #1
• Must be enabled CCFLAGS
Physical Recovery Started at Page(1:1065).Physical Recovery Complete: 0 Pages Examined 0 Pages Restored.
04/19/23 60
9.21 Solution #2
Physical Recovery Started at Page(1:1065).Physical Recovery Complete: 0 Pages Examined 0 Pages Restored.
04/19/23 61
Override Internal Error Checks
• The -O option is much like -f for UNIX rm
• Does many different things:– Allow restore of a space that is still on-line– Creates a filesystem entry for each chunk if
there isn’t one– Allows expiration of objects from sysutils and
the storage manager that may be needed in a restore
04/19/23 62
Archive Utilities
• Explaining onstat & oncheck options– onstat -d– onstat -g arc– onstat -g stq
• Validating Archive
• Managing the archive catalogs
04/19/23 63
onstat -g arc
num DBSpace Q Size Q Len Buffer partnum size scanner2 dbspace1 92 0 4 0x100085 240 0x2033ee3 dbspace2 69 0 1 0x100084 150 0x302f1a
Dbspaces - Archive Statusname number level date log log-positionrootdbs 1 0 10/04/2001.10:17 5 0x10b608dbspace1 2 0 10/04/2001.10:17 5 0x10b608dbspace2 3 0 10/04/2001.10:17 5 0x10b608sbspace1 4 0 10/04/2001.10:17 5 0x10b608sbspace2 5 0 10/04/2001.10:17 5 0x10b608
04/19/23 64
onstat -d information
• D Chunk is down
• L Storage space is being logically restored
• O Chunk is online
• P Storage Space is physically restored
• R Storage space is being restored
04/19/23 65
oncheck -pr Validating PAGE_1DBSP & PAGE_2DBSP...
DBspace number 2
DBspace name dbspace1
. . . . . DBspace archive status
Archive Level 0
Real Time Archive Began 10/04/2001 10:33:09
Time Stamp Archive Began 306128
Logical Log Unique Id 6
Logical Log Position 0x3d2018
Archive Level 1
Real Time Archive Began 10/04/2001 10:35:28
Time Stamp Archive Began 323695
Logical Log Unique Id 8
Logical Log Position 0x208018
04/19/23 67
Validating Archives
• What is actually validated
• What other information is there for me
• What else can go wrong with my validated restore
• How do I validated my archives
04/19/23 68
What is actually validated
• Format of each page on the archive is check (similar to oncheck -cd)
• Tape control pages are sanity check
• Each table is checked ensuring all pages of the table exist on the archive tape
• Reserve page format is validated
• Each chunk free list is verified
• Table extents are checked for overlap (oncheck -pe)
04/19/23 69
Other Information for the DBA
• AC_MSGPATH - Message log for archecker
• {AC_STORAGE}/INFO– extent list for each dbspace, oncheck -pe DBS.{dbspace_#}
– time to process each tape/object
– Information about the number and type of pages processed; profile.{pid}
• {AC_STORAGE}/SAVE– contains a binary image of control information
04/19/23 70
Profile InformationProfile Information
=======================
Total pages processed 51227
Total Data pages 49327
Total index pages 828
Total smart blob pages 6
Total blob space pages 0
Total partition pages 328
Total chunk free list pages 5
Total Reserve pages 12
Total bit map pages 335
MORE . . .
04/19/23 71
Extent Information
db1:sysprocedures 0x00200235 8
db1:sysprocbody 0x0020023D 32
db1:sysprocauth 0x0020025D 8
db1:sysprocedures 0x00200265 8
db1:sysprocbody 0x0020026D 32
db1:t1 0x0020028D 24344
FREE 0x002061A5 3
04/19/23 72
Validating Archives
• ontape– archecker -tdvs– AC_TAPEBLK, AC_TAPEDEV
• onbar– onbar -r -v (version 7.3X)– onbar -v (9.20 & 8.30)– onbar -b -v (8.30)
04/19/23 73
onsmsync
• Adds from ixbar files to sysutils
• Removes objects from sysutils
• Three expiration policies– -g: remove older than the Nth generation– -t: remove from before a datetime– -i: remove older than an interval
04/19/23 74
Understand ixBar Files
• Server name• object name• object type• is_serial• action id• archive level• SMV copy id high• SMV copy id low
• Backup start date• Backup start time• Backup end data• Backup end time
04/19/23 75
Storage Manager Snafus
• Timeout of onbar
• Error 131 Object not found
• Salvaging logs and getting wrong object
04/19/23 76
Recovery Snafus
• Check the devices are linked proper– KAIO only uses raw I/O– overlapping data
• While restoring database appears hung
04/19/23 78
Restore seems Hung
• The tape is done• onstat -D shows no I/O• Very little CPU activy• While the system clears the
physical and logical logs there is very little activity and the system appears to be hung.
04/19/23 79
Improvements
• A message into the online log indicating this phase of the restore started and completed.
• The use of intelligent parallelism to clear all the logs in a single chunks with one thread. One disk clear thread per chunk.
Clearing the physical and logical logs has startedCleared 2100 MB of the physical and logical logs in 612 seconds
04/19/23 80
Parallel Archive Procedures
• The archive is broken down into archive jobs with each dbspace being its own backup
• An onbar_d is started to backup a single dbspace
• Connects to database server and Storage manager requesting the backup session
• Updates sysutils and ixbar file