Backup And Recovery RequirementsBackup And Recovery Requirements
Routine backups must have minimal impact to development environment VOBS must be locked for a minimal amount of time
during backup Routine backups must capture relevant data in a
way that can be quickly and accurately recovered Data validation is required prior to backing up data to
tape All relevant data must be backed up at the same time
(i.e. registry, configuration, VOB storage)
Routine backups must have minimal impact to development environment VOBS must be locked for a minimal amount of time
during backup Routine backups must capture relevant data in a
way that can be quickly and accurately recovered Data validation is required prior to backing up data to
tape All relevant data must be backed up at the same time
(i.e. registry, configuration, VOB storage)
Backup And Recovery Requirements (Continued)Backup And Recovery Requirements (Continued)
Recovery time must minimize impact to developers - typical VOB server with 80 to 90 VOBs and 100 - 200 GB of storage (hub servers 130 – 160 VOBs)
Typical recovery scenario (takes a week): Restore data from backup media, i.e. tape (days!) Data validation on restored data, i.e. checkvob &
dbcheck (days! 10-15GB VOBs with 3-4 GB db’s) Sync replicas to get changes since last backup (this
alone takes about 8 - 12 hours) Reset client machines (rebooting required?) Minimize downtime during recovery -- needs to be
minutes/hours, not days or weeks
Recovery time must minimize impact to developers - typical VOB server with 80 to 90 VOBs and 100 - 200 GB of storage (hub servers 130 – 160 VOBs)
Typical recovery scenario (takes a week): Restore data from backup media, i.e. tape (days!) Data validation on restored data, i.e. checkvob &
dbcheck (days! 10-15GB VOBs with 3-4 GB db’s) Sync replicas to get changes since last backup (this
alone takes about 8 - 12 hours) Reset client machines (rebooting required?) Minimize downtime during recovery -- needs to be
minutes/hours, not days or weeks
Warm High Availability (WHA) ConfigurationWarm High Availability (WHA) Configuration
Aspects of WHA Implementation: Using SAN technology Snapshot to minimize VOB locks Specialized ClearCase configuration Currently only on VOB servers, could implement
View servers the same way Now some details!
Aspects of WHA Implementation: Using SAN technology Snapshot to minimize VOB locks Specialized ClearCase configuration Currently only on VOB servers, could implement
View servers the same way Now some details!
WHA Configuration (ContinuedWHA Configuration (Continued
Using SAN technology• Any server can dynamically control any storage device,
allowing for quick fail over of VOB servers• Use of a “shadow” disk for initial backup medium
Snapshot to minimize VOB locks Minimizes VOB lock times to less than 2 minutes
Specialized ClearCase configuration allows fail-over to new server with no required
changes to the ClearCase registry and configuration More details later!
Using SAN technology• Any server can dynamically control any storage device,
allowing for quick fail over of VOB servers• Use of a “shadow” disk for initial backup medium
Snapshot to minimize VOB locks Minimizes VOB lock times to less than 2 minutes
Specialized ClearCase configuration allows fail-over to new server with no required
changes to the ClearCase registry and configuration More details later!
WHA Configuration (Continued)WHA Configuration (Continued)
Hardware configuration SAN configuration ClearCase configuration
Hardware configuration SAN configuration ClearCase configuration
WHA Configuration (Continued)WHA Configuration (Continued)
Hardware configuration Unix Solaris servers SAN storage appliance – currently about 5 - 6 TB in
San Diego of ClearCase storage (VOBs and Views) Each VOB server has primary disk storage plus 2
“shadow images” of the VOB storage (3 copies on disk)
Large servers: 16GB RAM, 4 CPUs, GB network and 2GB interface to storage device
We have implemented WHA on all our VOB servers, large and small
Hardware configuration Unix Solaris servers SAN storage appliance – currently about 5 - 6 TB in
San Diego of ClearCase storage (VOBs and Views) Each VOB server has primary disk storage plus 2
“shadow images” of the VOB storage (3 copies on disk)
Large servers: 16GB RAM, 4 CPUs, GB network and 2GB interface to storage device
We have implemented WHA on all our VOB servers, large and small
WHA Configuration (Continued)WHA Configuration (Continued)
SAN configuration Many-to-many connectivity between servers and
storage locations Dynamic control of storage locations Accommodates snapshots and shadow images (where
dbcheck is run) Using 2 shadow images – one day apart
• Oldest one has successfully passed dbcheck and is/has been dumped to tape
• Newest one is undergoing dbcheck• Always have a validated copy of all necessary data on disk for
restoration
SAN configuration Many-to-many connectivity between servers and
storage locations Dynamic control of storage locations Accommodates snapshots and shadow images (where
dbcheck is run) Using 2 shadow images – one day apart
• Oldest one has successfully passed dbcheck and is/has been dumped to tape
• Newest one is undergoing dbcheck• Always have a validated copy of all necessary data on disk for
restoration
WHA Configuration (Continued)WHA Configuration (Continued)
ClearCase configuration Currently using ClearCase 4.2 When implementing a recovery, NO ClearCase
configuration changes are required (i.e. registry) Backup ALL relevant data at the same time
• VOB data and /var/adm/atria located on same disk location DNS alias used instead of real host name for ClearCase
license server Use Logical vs. Physical VOB storage locations in
registering DNS alias used for VOB servers (VOB server can
change by moving the alias)
ClearCase configuration Currently using ClearCase 4.2 When implementing a recovery, NO ClearCase
configuration changes are required (i.e. registry) Backup ALL relevant data at the same time
• VOB data and /var/adm/atria located on same disk location DNS alias used instead of real host name for ClearCase
license server Use Logical vs. Physical VOB storage locations in
registering DNS alias used for VOB servers (VOB server can
change by moving the alias)
WHA Configuration (Continued)WHA Configuration (Continued)
ClearCase configuration (continued) Use Logical vs. Physical VOB storage locations in
registering -- The path to the VOB storage must be the same, independent of host and storage location
Create links to VOB storage, example:• /local/mnt (this mount point always exists and is always shared• Use links to create logical physical mapping, need unique
logical paths for all VOB storage within the same region
/local/mnt/VOBSA /net/dnsalias/local/mnt2/vobs
/local/mnt/VOBSB /net/dnsalias/local/mnt3/vobs
ClearCase configuration (continued) Use Logical vs. Physical VOB storage locations in
registering -- The path to the VOB storage must be the same, independent of host and storage location
Create links to VOB storage, example:• /local/mnt (this mount point always exists and is always shared• Use links to create logical physical mapping, need unique
logical paths for all VOB storage within the same region
/local/mnt/VOBSA /net/dnsalias/local/mnt2/vobs
/local/mnt/VOBSB /net/dnsalias/local/mnt3/vobs
WHA Configuration (Continued)WHA Configuration (Continued)
ClearCase configuration (continued) Once links are created, register and tag (mkvob,
mkreplica…). Must use fully qualifying method:-host <dns alias of VOB server>-hpath <the linked path, not physical path>-gpath <the global and linked path>
Never use the real host name or real physical path!! To switch servers: Restore data, move host alias,
create links, stop and start ClearCase The clients and view servers must reacquire the new
VOB storage mount points, so restart ClearCase or reboot the clients
ClearCase configuration (continued) Once links are created, register and tag (mkvob,
mkreplica…). Must use fully qualifying method:-host <dns alias of VOB server>-hpath <the linked path, not physical path>-gpath <the global and linked path>
Never use the real host name or real physical path!! To switch servers: Restore data, move host alias,
create links, stop and start ClearCase The clients and view servers must reacquire the new
VOB storage mount points, so restart ClearCase or reboot the clients
WHA Configuration (Continued)WHA Configuration (Continued)
ClearCase configuration (continued) Example -- /vobs/bsc
• Host name is cyclone and VOB storage location: /local/mnt2/vobs/bsc.vob (physcial)/local/mnt/VOBS/bsc.vob (logical)
• DNS alias cyclone == edbvobA• Register and tag /vobs/bsc to DNS alias and logical
link instead of physical storage location
/net/edbvobA/local/mnt/VOBS/bsc.vob -VS-
/net/cyclone/local/mnt2/vobs/bsc.vob
ClearCase configuration (continued) Example -- /vobs/bsc
• Host name is cyclone and VOB storage location: /local/mnt2/vobs/bsc.vob (physcial)/local/mnt/VOBS/bsc.vob (logical)
• DNS alias cyclone == edbvobA• Register and tag /vobs/bsc to DNS alias and logical
link instead of physical storage location
/net/edbvobA/local/mnt/VOBS/bsc.vob -VS-
/net/cyclone/local/mnt2/vobs/bsc.vob
WHA Configuration (Continued)WHA Configuration (Continued)
ClearCase configuration (cont)
Example of lsvob (2 VOB servers, 3 storage locations):
* /vobs/mgw/msf_erab /net/mother/local/mnt/VOBSA/mgw/msf_erab.vob public* /vobs/mgw/msf_eedn /net/mother/local/mnt/VOBSA/mgw/msf_eedn.vob public* /vobs/mgw/msf_etm /net/mother/local/mnt/VOBSA/mgw/msf_etm.vob public* /vobs/cello/ose /net/mother/local/mnt/VOBSC/cello/ose.vob public* /vobs/ewu/perl /net/stepmother/local/mnt/VOBSB/ewu/perl.vob public* /vobs/ewu/freeware /net/stepmother/local/mnt/VOBSB/ewu/freeware.vob pu* /vobs/stre/det /net/stepmother/local/mnt/VOBSB/stre/det.vob public
ClearCase configuration (cont)
Example of lsvob (2 VOB servers, 3 storage locations):
* /vobs/mgw/msf_erab /net/mother/local/mnt/VOBSA/mgw/msf_erab.vob public* /vobs/mgw/msf_eedn /net/mother/local/mnt/VOBSA/mgw/msf_eedn.vob public* /vobs/mgw/msf_etm /net/mother/local/mnt/VOBSA/mgw/msf_etm.vob public* /vobs/cello/ose /net/mother/local/mnt/VOBSC/cello/ose.vob public* /vobs/ewu/perl /net/stepmother/local/mnt/VOBSB/ewu/perl.vob public* /vobs/ewu/freeware /net/stepmother/local/mnt/VOBSB/ewu/freeware.vob pu* /vobs/stre/det /net/stepmother/local/mnt/VOBSB/stre/det.vob public
WHA Configuration (Continued)WHA Configuration (Continued)
ClearCase configuration (continued) DNS alias used for VOB servers (VOB server can
change by moving the alias) The registered path and host is always the same no
matter what physical host is the VOB server! Always use the alias, for MultiSite as well. Machines
can come and go but the VOB server host name is always the same
There is both a Rational and SUN white paper documenting this configuration and setup!
http://www.rational.com/media/partners/sun/Ericsson_final.pdf
ClearCase configuration (continued) DNS alias used for VOB servers (VOB server can
change by moving the alias) The registered path and host is always the same no
matter what physical host is the VOB server! Always use the alias, for MultiSite as well. Machines
can come and go but the VOB server host name is always the same
There is both a Rational and SUN white paper documenting this configuration and setup!
http://www.rational.com/media/partners/sun/Ericsson_final.pdf
Backup ProcessBackup Process
All setup is completed and WHA implemented Lock VOBs (less than 2 minutes) We use SUN Instant Image TM to snapshot VOB
storage partition both VOB storage and /var/adm/atria is located here
(we also have trigger scripts and …) Snapshot is to shadow1
another disk partition, could be totally different disk Shadow2 passed data validation with “dbcheck”
yesterday and is being dumped to tape
All setup is completed and WHA implemented Lock VOBs (less than 2 minutes) We use SUN Instant Image TM to snapshot VOB
storage partition both VOB storage and /var/adm/atria is located here
(we also have trigger scripts and …) Snapshot is to shadow1
another disk partition, could be totally different disk Shadow2 passed data validation with “dbcheck”
yesterday and is being dumped to tape
Backup Process (Continued)Backup Process (Continued)
Once backup to shadow1 complete, “dbcheck” will be started for data validation
Once data validation is successful -- and it’s a new backup day -- shadow1 becomes shadow2, and shadow2 becomes shadow1, and it starts all over
If error found during dbcheck we take immediate corrective action – keep validated copy on disk (shadow2) while we check out the production data
There is ALWAYS a “good copy” on the shadow2 disk!
Once backup to shadow1 complete, “dbcheck” will be started for data validation
Once data validation is successful -- and it’s a new backup day -- shadow1 becomes shadow2, and shadow2 becomes shadow1, and it starts all over
If error found during dbcheck we take immediate corrective action – keep validated copy on disk (shadow2) while we check out the production data
There is ALWAYS a “good copy” on the shadow2 disk!
Recovery ProcessRecovery Process
Typical recovery scenario: Get another server or fix broken one – you have to give it
the same server hostname or change the ClearCase registry information!
Restore data from backup tape (100 - 200 GB, 2+ days) Do data validation, checkvob and dbcheck (2+ days) Restore replica (MultiSite users) for 80+ VOBs, this takes
at least 8 – 12 hours Clean up clients – typically a crash means NFS/MVFS is
messed up – REBOOT! Is that it? I wish it was! Developers can’t work! WHA recovery scenario?
Typical recovery scenario: Get another server or fix broken one – you have to give it
the same server hostname or change the ClearCase registry information!
Restore data from backup tape (100 - 200 GB, 2+ days) Do data validation, checkvob and dbcheck (2+ days) Restore replica (MultiSite users) for 80+ VOBs, this takes
at least 8 – 12 hours Clean up clients – typically a crash means NFS/MVFS is
messed up – REBOOT! Is that it? I wish it was! Developers can’t work! WHA recovery scenario?
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - get another server or fix broken one ANY server can act as the new VOB server. Of course
using an existing VOB/View server would degrade performance
Get VOBs on-line and back in service as fast as possible, WHA means I can “cut-over” to another server again later!
WHA recovery scenario - get another server or fix broken one ANY server can act as the new VOB server. Of course
using an existing VOB/View server would degrade performance
Get VOBs on-line and back in service as fast as possible, WHA means I can “cut-over” to another server again later!
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Get another server or fix broken one (cont) STEPS (same for any WHA cut-over):
• Move the DNS alias to new server• create the links (links for /var/adm/atria and VOB
physical storage locations from /local/mnt/VOBS?) Since /var/adm/atria was backed up with the VOB
storage, they are in sync Just turn ClearCase off/on and – NEW VOB
SERVER!
WHA recovery scenario - Get another server or fix broken one (cont) STEPS (same for any WHA cut-over):
• Move the DNS alias to new server• create the links (links for /var/adm/atria and VOB
physical storage locations from /local/mnt/VOBS?) Since /var/adm/atria was backed up with the VOB
storage, they are in sync Just turn ClearCase off/on and – NEW VOB
SERVER!
Recovery Process (Continued)Recovery Process (Continued) WHA recovery scenario - Restore data from
backup tape,100 - 200 GB Not 2+ days We don’t go to tape, unless we’ve had a real
disaster! We don’t do a “restore” we have 2 copies on disk! Use shadow1 if data validation is complete or
confidence level high – shadow2 is only 24-48 hrs old
Mount shadow disk to new VOB server (SAN makes this easy)
WHA recovery scenario - Restore data from backup tape,100 - 200 GB Not 2+ days We don’t go to tape, unless we’ve had a real
disaster! We don’t do a “restore” we have 2 copies on disk! Use shadow1 if data validation is complete or
confidence level high – shadow2 is only 24-48 hrs old
Mount shadow disk to new VOB server (SAN makes this easy)
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore data from backup tape (cont) Create the links to the VOB physical storage
location Much faster than transferring 100 – 200 GB data
from tape! 15 minutes MAX!
WHA recovery scenario - Restore data from backup tape (cont) Create the links to the VOB physical storage
location Much faster than transferring 100 – 200 GB data
from tape! 15 minutes MAX!
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Do data validation, checkvob and dbcheck Not 2+ days Takes a “very” long time (100-200GB of VOBs,
some with 4-6GB databases) Checkvob and dbcheck is run on all servers
monthly Daily successful dbcheck runs on shadow disk –
high confidence
WHA recovery scenario - Do data validation, checkvob and dbcheck Not 2+ days Takes a “very” long time (100-200GB of VOBs,
some with 4-6GB databases) Checkvob and dbcheck is run on all servers
monthly Daily successful dbcheck runs on shadow disk –
high confidence
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Do data validation, checkvob and dbcheck (cont) If shadow1 has completed dbchecks, use it, if not
use shadow2 NO time spent on data validation during recovery
because it was done during the backup phase! Would like checkvob and other data validation
utilities that can be run on off-line VOBs!
WHA recovery scenario - Do data validation, checkvob and dbcheck (cont) If shadow1 has completed dbchecks, use it, if not
use shadow2 NO time spent on data validation during recovery
because it was done during the backup phase! Would like checkvob and other data validation
utilities that can be run on off-line VOBs!
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica MultiSite heavily used with syncing internally
every 30 minutes – checked in changes will be available in another replica since the shadow image was snapshot!
Get the changes since the snapshot from other replica
By default – restorereplica wants to sync with ALL replicas (NOT all 30-40 we have )
**CAREFULL**
WHA recovery scenario - Restore replica MultiSite heavily used with syncing internally
every 30 minutes – checked in changes will be available in another replica since the shadow image was snapshot!
Get the changes since the snapshot from other replica
By default – restorereplica wants to sync with ALL replicas (NOT all 30-40 we have )
**CAREFULL**
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica (continued) Lots of VOBs, 80+ , this will still take at least 8 –
12 hours to only 2-4 replicas Must get update packets (that have changes since
the backup) from other replicas See example of commands on next slides!
WHA recovery scenario - Restore replica (continued) Lots of VOBs, 80+ , this will still take at least 8 –
12 hours to only 2-4 replicas Must get update packets (that have changes since
the backup) from other replicas See example of commands on next slides!
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica (continued) Example of commands:
mt restorereplica (default requires updates from all replicas)OR
mt restorereplica replica:ewuhub_bscng_aim replica:ewucth_bscng_aim replica:ewubo_bscng_aim
** MUST INCLUDE THE REPLICA THAT WAS THE LAST REPLICA THAT WAS EXPORTED TO JUST BEFORE THE CRASH!! – NEED TO AVOID DIVERGENCE IN THE VOB REPLICAS!
* Check via lsepoch, make sure the replica with record of the most changes that took place in the restored replica is included! (mt lsepoch ewuhub_bscng_aim@/vobs/bscng/aim)
WHA recovery scenario - Restore replica (continued) Example of commands:
mt restorereplica (default requires updates from all replicas)OR
mt restorereplica replica:ewuhub_bscng_aim replica:ewucth_bscng_aim replica:ewubo_bscng_aim
** MUST INCLUDE THE REPLICA THAT WAS THE LAST REPLICA THAT WAS EXPORTED TO JUST BEFORE THE CRASH!! – NEED TO AVOID DIVERGENCE IN THE VOB REPLICAS!
* Check via lsepoch, make sure the replica with record of the most changes that took place in the restored replica is included! (mt lsepoch ewuhub_bscng_aim@/vobs/bscng/aim)
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica (continued)
**WARNINGS – POSSIBLE DIVERGANCE**
** MUST INCLUDE THE REPLICA THAT WAS THE LAST REPLICA THAT WAS EXPORTED TO JUST BEFORE THE CRASH!! – NEED TO AVOID DIVERGENCE IN THE VOB REPLICAS!
• Cheak for latest replica sync’d to• lsepoch• lshistory
WHA recovery scenario - Restore replica (continued)
**WARNINGS – POSSIBLE DIVERGANCE**
** MUST INCLUDE THE REPLICA THAT WAS THE LAST REPLICA THAT WAS EXPORTED TO JUST BEFORE THE CRASH!! – NEED TO AVOID DIVERGENCE IN THE VOB REPLICAS!
• Cheak for latest replica sync’d to• lsepoch• lshistory
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica (continued) Check via lsepoch, make sure the replica
with record of the most changes that took place in the restored replica is included!
With ClearCase 4.X you can use –actual to query remote replicas
WHA recovery scenario - Restore replica (continued) Check via lsepoch, make sure the replica
with record of the most changes that took place in the restored replica is included!
With ClearCase 4.X you can use –actual to query remote replicas
Recovery Process (continued)Recovery Process (continued)
WHA recovery scenario - Restore replica (continued)
Check via lsepoch EXAMPLE: restored replica is ewucello_bscng_aim
mt lsepoch –actual ewuhub_bscng_aim@/vobs/bscng/aim oid:834d7251.f24c11d4.a4df.00:01:80:b8:c7:b4=450831450831
(ewucello_bscng_aim)
mt lsepoch –actual ewucth_bscng_aim@/vobs/bscng/aim oid:834d7251.f24c11d4.a4df.00:01:80:b8:c7:b4=450745450745
(ewucello_bscng_aim)
WHA recovery scenario - Restore replica (continued)
Check via lsepoch EXAMPLE: restored replica is ewucello_bscng_aim
mt lsepoch –actual ewuhub_bscng_aim@/vobs/bscng/aim oid:834d7251.f24c11d4.a4df.00:01:80:b8:c7:b4=450831450831
(ewucello_bscng_aim)
mt lsepoch –actual ewucth_bscng_aim@/vobs/bscng/aim oid:834d7251.f24c11d4.a4df.00:01:80:b8:c7:b4=450745450745
(ewucello_bscng_aim)
Recovery Process (Continued)Recovery Process (Continued) WHA recovery scenario - Restore replica (continued)
Example of commands – to find last replica exported to. This is not trivial, you have to check each replica you have been syncing with:
Example: mt lsreplica –invob /vobs/nmis
Replicas (14): boulder_nmis, bscclassic_nmis, cbssw_nmis, edbbsc_nmis, edbbsm_nmis, edbspe_nmis, edbtetra_nmis, ewubo_nmis, ewucth_nmis, ewuhub_nmis, ewustre_nmis, ramstest_nmis, servicenet_nmis, streit2_nmis
These replicas are the only ones the restored replica syncs with: boulder_nmis, bscclassic_nmis, ewubo_nmis, ewucth_nmis, ewuhub_nmis
WHA recovery scenario - Restore replica (continued) Example of commands – to find last replica exported to.
This is not trivial, you have to check each replica you have been syncing with:
Example: mt lsreplica –invob /vobs/nmis
Replicas (14): boulder_nmis, bscclassic_nmis, cbssw_nmis, edbbsc_nmis, edbbsm_nmis, edbspe_nmis, edbtetra_nmis, ewubo_nmis, ewucth_nmis, ewuhub_nmis, ewustre_nmis, ramstest_nmis, servicenet_nmis, streit2_nmis
These replicas are the only ones the restored replica syncs with: boulder_nmis, bscclassic_nmis, ewubo_nmis, ewucth_nmis, ewuhub_nmis
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica (continued)Example (cont): /vobs/nmis (must do lshistory
at each remote replica site!)
cleartool lshistory replica:boulder_nmiscleartool lshistory replica:bscclassic_nmiscleartool lshistory replica:ewubo_nmiscleartool lshistory replica:ewucth_nmiscleartool lshistory replica:ewuhub_nmis
Example results:12-Jun.15:55 root import sync from replica "bscclassic_nmis" to
replica “ewuhub_nmis” Review the output of the above commands, see which
was the last replica to be sent an export sync packet
WHA recovery scenario - Restore replica (continued)Example (cont): /vobs/nmis (must do lshistory
at each remote replica site!)
cleartool lshistory replica:boulder_nmiscleartool lshistory replica:bscclassic_nmiscleartool lshistory replica:ewubo_nmiscleartool lshistory replica:ewucth_nmiscleartool lshistory replica:ewuhub_nmis
Example results:12-Jun.15:55 root import sync from replica "bscclassic_nmis" to
replica “ewuhub_nmis” Review the output of the above commands, see which
was the last replica to be sent an export sync packet
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Restore replica (continued) Now run the restorereplica command with appropriate
replica(s) identified! (we use ALL replicas we sync with, but not replicas we never sync with)
mt restorereplica replica:boulder_nmis replica:bscclassic_nmis \replica:ewubo_nmis replica:ewucth_nmis replica:ewuhub_nmis
Now send export packets to those replicas and send packets with changes back. The VOB is locked until the replica you are restoring gets update packets from each!
Once all changes have been processed by the restored replica, you can unlock the VOBs and go to the next step
WHA recovery scenario - Restore replica (continued) Now run the restorereplica command with appropriate
replica(s) identified! (we use ALL replicas we sync with, but not replicas we never sync with)
mt restorereplica replica:boulder_nmis replica:bscclassic_nmis \replica:ewubo_nmis replica:ewucth_nmis replica:ewuhub_nmis
Now send export packets to those replicas and send packets with changes back. The VOB is locked until the replica you are restoring gets update packets from each!
Once all changes have been processed by the restored replica, you can unlock the VOBs and go to the next step
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Clean up clients Typically a crash means NFS/MVFS is messed up Easiest way to get clients and servers working
properly is to – REBOOT To try and clean-up clients without a reboot see
the basic script on the next page
WHA recovery scenario - Clean up clients Typically a crash means NFS/MVFS is messed up Easiest way to get clients and servers working
properly is to – REBOOT To try and clean-up clients without a reboot see
the basic script on the next page
Recovery Process (Continued)Recovery Process (Continued)
WHA recovery scenario - Clean up clients (continued) Script#!/bin/sh -x
/usr/sbin/fuser -uck /view
for VOB in `/usr/atria/bin/cleartool lsvob -s`do /usr/sbin/fuser -uck $VOB > /dev/null 2>&1
done
/usr/atria/bin/cleartool umount -all > /dev/null 2>&1
for MNT in `df | grep local/mnt | grep -v "/dev/dsk" | cut -f1 -d "("`do umount $MNT > /dev/null 2>&1
done
rm -r /vobs/*
/etc/init.d/atria stop
WHA recovery scenario - Clean up clients (continued) Script#!/bin/sh -x
/usr/sbin/fuser -uck /view
for VOB in `/usr/atria/bin/cleartool lsvob -s`do /usr/sbin/fuser -uck $VOB > /dev/null 2>&1
done
/usr/atria/bin/cleartool umount -all > /dev/null 2>&1
for MNT in `df | grep local/mnt | grep -v "/dev/dsk" | cut -f1 -d "("`do umount $MNT > /dev/null 2>&1
done
rm -r /vobs/*
/etc/init.d/atria stop
Recovery Process (Continued)Recovery Process (Continued)
WHA Restore completed! But developers can’t work!
Build issues – need error handling in build scripts VOBs and Views may have been created or deleted
since the backup: Created since backup - storage exists without entry in registry Deleted since backup - registry entry exists without storage
FIRST – MAKE SURE ALL VOB AND VIEW SERVER PROCESSES HAVE BEEN KILLED – this eliminates lots of potential problems (stop and restart ClearCase on all systems)
WHA Restore completed! But developers can’t work!
Build issues – need error handling in build scripts VOBs and Views may have been created or deleted
since the backup: Created since backup - storage exists without entry in registry Deleted since backup - registry entry exists without storage
FIRST – MAKE SURE ALL VOB AND VIEW SERVER PROCESSES HAVE BEEN KILLED – this eliminates lots of potential problems (stop and restart ClearCase on all systems)
Recovery Process (Continued)Recovery Process (Continued)
Build issues Case #1: VOBs that have been restored HAVE
references to DO’s• DO’s physically exist in VOB (no problem)• DO’s exist in view (ref count = 1) (again no problem)• DO’s references exist in VOBs, but the DO data DOES NOT
exist anymore (maybe removed since backup by rmview or rmdo)
Case #2: VOBs that have been restored DO NOT have references to DO’s that exist
• DO’s exist in a single view, reference count == 1, reference in the view but not the VOBs
• DO’s were promoted so references exist in multiple views (ref count > 1) – but not in the VOBs
Build issues Case #1: VOBs that have been restored HAVE
references to DO’s• DO’s physically exist in VOB (no problem)• DO’s exist in view (ref count = 1) (again no problem)• DO’s references exist in VOBs, but the DO data DOES NOT
exist anymore (maybe removed since backup by rmview or rmdo)
Case #2: VOBs that have been restored DO NOT have references to DO’s that exist
• DO’s exist in a single view, reference count == 1, reference in the view but not the VOBs
• DO’s were promoted so references exist in multiple views (ref count > 1) – but not in the VOBs
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case # 1VOBs that have been restored HAVE
references to DO’s• DO’s references exist in VOBs, but the DO data
DOES NOT exist anymore• maybe removed since backup by rmview or
rmdo
Build issues – Case # 1VOBs that have been restored HAVE
references to DO’s• DO’s references exist in VOBs, but the DO data
DOES NOT exist anymore• maybe removed since backup by rmview or
rmdo
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #1 (continued) Since DO pointers exist in the restored VOB, these DO’s
are considered during configuration lookup of builds. Results in Warnings! But it does rebuild the DO’s
clearmake -C sun -f /vobs/wds/build/include/Makefile.if -e clearmake: Warning: Unable to evaluate derived object "libimc.a.1@@07-Nov.19:10.220156" in VOB directory
"/vobs/bscng/ccl/imc/imc_if/lib.sp750@@"
** recoverview does NOT clean this up, you just keep getting warnings! We created a script to clean this up, but you might be able to just ignore the messages!
Build issues – Case #1 (continued) Since DO pointers exist in the restored VOB, these DO’s
are considered during configuration lookup of builds. Results in Warnings! But it does rebuild the DO’s
clearmake -C sun -f /vobs/wds/build/include/Makefile.if -e clearmake: Warning: Unable to evaluate derived object "libimc.a.1@@07-Nov.19:10.220156" in VOB directory
"/vobs/bscng/ccl/imc/imc_if/lib.sp750@@"
** recoverview does NOT clean this up, you just keep getting warnings! We created a script to clean this up, but you might be able to just ignore the messages!
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #1 (continued) If view has been deleted, ERROR message will be
generated (scripts need error handling)>>> (clearmake): Build evaluating lib1.a>>> (clearmake): Build evaluating one.o
No candidate in current view for "one.o">>> (clearmake): Shopping for DO named "one.o" in VOB directory
"/vobs/stre/do_test/.@@">>> (clearmake): Evaluating heap derived object "one.o@@05-
Jun.12:24.74">>> clearmake: Error: Unable to find view by
uuid:5b997e3d.78b711d6.ad2c.00:01:80:b6:87:eb, last known at "lime:/tmp/do3.vws".
>>> clearmake: Error: Unable to contact View - ClearCase object not found
>>> clearmake: Warning: View "lime:/tmp/do3.vws" unavailable -This process will not contact the view again for 60 minutes.NOTE: Other processes may try to contact the view.
>>> clearmake: Warning: Unable to evaluate derived object "one.o@@05-Jun.12:24.74" in VOB directory "/vobs/stre/do_test/.@@"
Build issues – Case #1 (continued) If view has been deleted, ERROR message will be
generated (scripts need error handling)>>> (clearmake): Build evaluating lib1.a>>> (clearmake): Build evaluating one.o
No candidate in current view for "one.o">>> (clearmake): Shopping for DO named "one.o" in VOB directory
"/vobs/stre/do_test/.@@">>> (clearmake): Evaluating heap derived object "one.o@@05-
Jun.12:24.74">>> clearmake: Error: Unable to find view by
uuid:5b997e3d.78b711d6.ad2c.00:01:80:b6:87:eb, last known at "lime:/tmp/do3.vws".
>>> clearmake: Error: Unable to contact View - ClearCase object not found
>>> clearmake: Warning: View "lime:/tmp/do3.vws" unavailable -This process will not contact the view again for 60 minutes.NOTE: Other processes may try to contact the view.
>>> clearmake: Warning: Unable to evaluate derived object "one.o@@05-Jun.12:24.74" in VOB directory "/vobs/stre/do_test/.@@"
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #2VOBs that have been restored DO NOT
have references to DO’s that exist• DO’s exist in a single view, reference count ==
1, reference in the view but not the VOBs• DO’s were promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
Build issues – Case #2VOBs that have been restored DO NOT
have references to DO’s that exist• DO’s exist in a single view, reference count ==
1, reference in the view but not the VOBs• DO’s were promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #2 (continued) DO’s exist in a single view, reference count == 1,
reference in the view but not the VOBs DO’s were promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
Recoverview can be used to clean this up, needs to be run in each view with a problem. Moves stranded DO’s to view .s/lost+found:
recoverview –vob <vob uuid> -tag <view tag>
Build issues – Case #2 (continued) DO’s exist in a single view, reference count == 1,
reference in the view but not the VOBs DO’s were promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
Recoverview can be used to clean this up, needs to be run in each view with a problem. Moves stranded DO’s to view .s/lost+found:
recoverview –vob <vob uuid> -tag <view tag>
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #2.2 (continued) DO’s promoted so references exist in multiple
views (ref count > 1) – but not in the VOBs *careful, view server processes have not been terminated!
lime /vobs/stre/do_test 53 ct setview do2lime /vobs/stre/do_test 51 ct ls -l view private object .cmake.stateversion Makefile@@/main/1 Rule: element *
/main/LATESTderived object four.o [no config record]derived object lib1.a [no config record]dir version lost+found@@/main/0 Rule: element * /main/LATESTderived object one.o [no config record]derived object three.o [no config record]derived object two.o [no config record]
Build issues – Case #2.2 (continued) DO’s promoted so references exist in multiple
views (ref count > 1) – but not in the VOBs *careful, view server processes have not been terminated!
lime /vobs/stre/do_test 53 ct setview do2lime /vobs/stre/do_test 51 ct ls -l view private object .cmake.stateversion Makefile@@/main/1 Rule: element *
/main/LATESTderived object four.o [no config record]derived object lib1.a [no config record]dir version lost+found@@/main/0 Rule: element * /main/LATESTderived object one.o [no config record]derived object three.o [no config record]derived object two.o [no config record]
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #2.2 (continued)DO’s promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
*view server processes have been terminated!
lime /vobs/stre/do_test 52 ct ls.cmake.stateMakefile@@/main/1 Rule: /main/LATESTcleartool: Error: Trouble looking up element "four.o" in directory ".".cleartool: Error: Trouble looking up element "lib1.a" in directory ".".lost+found@@/main/0 Rule: /main/LATESTcleartool: Error: Trouble looking up element "one.o" in directory ".".cleartool: Error: Trouble looking up element "three.o" in directory ".".cleartool: Error: Trouble looking up element "two.o" in directory ".".
Build issues – Case #2.2 (continued)DO’s promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
*view server processes have been terminated!
lime /vobs/stre/do_test 52 ct ls.cmake.stateMakefile@@/main/1 Rule: /main/LATESTcleartool: Error: Trouble looking up element "four.o" in directory ".".cleartool: Error: Trouble looking up element "lib1.a" in directory ".".lost+found@@/main/0 Rule: /main/LATESTcleartool: Error: Trouble looking up element "one.o" in directory ".".cleartool: Error: Trouble looking up element "three.o" in directory ".".cleartool: Error: Trouble looking up element "two.o" in directory ".".
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #2.2 (continued)DO’s promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
*view server processes have been terminated!
> ls -l./one.o: No such file or directory./two.o: No such file or directory./three.o: No such file or directory./four.o: No such file or directory./lib1.a: No such file or directory
Build issues – Case #2.2 (continued)DO’s promoted so references exist in
multiple views (ref count > 1) – but not in the VOBs
*view server processes have been terminated!
> ls -l./one.o: No such file or directory./two.o: No such file or directory./three.o: No such file or directory./four.o: No such file or directory./lib1.a: No such file or directory
Recovery Process (Continued)Recovery Process (Continued)
Build issues – Case #2.2 (continued) With proper shutdown of the view server process,
ClearCase automatically purges the references and enter a log message in /var/adm/atria/view_log:
06/12/02 10:54:44 view_server(24163): Warning: Cover object mother:/local/mnt2/workspace/vobs/stre/do_test.vbs:336e07d7.7e2b11d6.b659.00:01:80:b6:87:eb for 0x8000000a not found in VOB: ClearCase object not found
06/12/02 10:54:44 view_server(24163): Warning: Cover object mother:/local/mnt2/workspace/vobs/stre/do_test.vbs:336e07df.7e2b11d6.b659.00:01:80:b6:87:eb for 0x80000007 not found in VOB: ClearCase object not found
06/12/02 10:54:44 view_server(24163): Warning: Cover object 06/12/02 10:54:53 view_server(24163): Warning: Vob stale
0x8000000d: Purging
Build issues – Case #2.2 (continued) With proper shutdown of the view server process,
ClearCase automatically purges the references and enter a log message in /var/adm/atria/view_log:
06/12/02 10:54:44 view_server(24163): Warning: Cover object mother:/local/mnt2/workspace/vobs/stre/do_test.vbs:336e07d7.7e2b11d6.b659.00:01:80:b6:87:eb for 0x8000000a not found in VOB: ClearCase object not found
06/12/02 10:54:44 view_server(24163): Warning: Cover object mother:/local/mnt2/workspace/vobs/stre/do_test.vbs:336e07df.7e2b11d6.b659.00:01:80:b6:87:eb for 0x80000007 not found in VOB: ClearCase object not found
06/12/02 10:54:44 view_server(24163): Warning: Cover object 06/12/02 10:54:53 view_server(24163): Warning: Vob stale
0x8000000d: Purging
Recovery Process (Continued)Recovery Process (Continued)
VOBs and Views may have been created or deleted since the backup: VOBs or Views created since backup - storage
exists without entry in registry VOBs or Views deleted since backup - registry
entry exists without storage At least the registry is in sync with the data that
was restored• ClearCase configuration and VOB storage on same
device, gets backed up at the same time!
VOBs and Views may have been created or deleted since the backup: VOBs or Views created since backup - storage
exists without entry in registry VOBs or Views deleted since backup - registry
entry exists without storage At least the registry is in sync with the data that
was restored• ClearCase configuration and VOB storage on same
device, gets backed up at the same time!
Recovery Process (Continued)Recovery Process (Continued)
VOBs and Views may have been created or deleted since the backup (continued): You can use rgy_check to help clean this up
/usr/atria/etc/rgy_check –views | vobs It helps if you have standard storage locations for
VOBs and Views, you know where to look Sometimes you just need to wait for users to
complain! Remember those “error/warning” msg! Views are suppose to be temporary working
space right!
VOBs and Views may have been created or deleted since the backup (continued): You can use rgy_check to help clean this up
/usr/atria/etc/rgy_check –views | vobs It helps if you have standard storage locations for
VOBs and Views, you know where to look Sometimes you just need to wait for users to
complain! Remember those “error/warning” msg! Views are suppose to be temporary working
space right!