Rightsizing Data Sharing w/ DB2 Member Consolidation #6930A - 10/29/2014
Mike Tobe - Progressive Insurance
Bob Vargo - Progressive Insurance
© 2014 IBM Corporation
Agenda
• Introduction
Company Background
DB2 History at Progressive
• High Level Review of Member Destroy
Deactivate
Restore
Destroy
• Data Sharing Consolidation in Practice
Configuration Changes
Testing
Our Implementation
• Wrap Up
Progressive/IBM Collaboration
Futures
2
Progressive Insurance – “Who are We?”
Progressive Insurance – “Who are We?” • Founded in 1937
• 25,000 Employees
• HQ in Cleveland Suburb of Mayfield Village, Ohio
• House one of the country's largest, most contemporary art collections
• Progressive Firsts – “A few of many”
Drive-in Claims Office
Introduce MyRate based on “Your driving habits”
Offer Pet Injury coverage
Major auto insurance in the world to launch a Web site
Serve customers at the accident scene - Immediate Response® Claims Service
Introduce Reduce Rates for low risk drivers
Offer Comparison rates on the Web
Introduce Snapshot program and Name Your Price Option
4
Glossary of Terms and Acronyms
• XRC – IBM System Storage z/OS Global Mirroring Solution for Disaster Recovery
• IRT – Infrastructure Recovery Testing using our XRC environment
• Shadow Member – DB2 DS member up and running but all work has been routed away from it
• Quiesced Member – Place the DB2 DS member into a temporarily inactive state
• EMW – Extended Maintenance Window
5
z/OS Environment
6
Dev and Stress LPARs
Tec LPARs Production Sysplex Coupling Facilities
Production Sysplex
Tec Sysplex
Dev Sysplex
Data Center 2 CPU Configuration
Production Disaster Recovery Sysplex
XRC Datamover XRC Datamover
Dev and Stress LPARs
Production LPARs Production LPARs Production LPARs
Data Center 1 CPU Configuration
DB2 Environment – Current
7
P-LPAR
DB2D
DB2Q
DB2K
DB2F
DB1L*
DB6PDB1M
DB3F
* DB3KDB2L
DB4F
DB5FDB3SDB1S DB2S
DB2P
DW2P
DB5P
DW4P
DB3P
DW3P DB1D
DB1Q
DB1K
DB1F
DB4P
DW6P*
DB0P
Production
Operational
DW0P
Production
Warehouse
DB0M
Dev Sandbox
DB0F
Stress
Operational
DB0K
Dev/Stress
Operational
DB0Q
QA
Operational
DB0D
Dev
Operational
DB0L
Training
DB0S
Tech Sandbox
DB2 V10 for z/OS @ ProgressivezE
C1
2
zEC
12
zEC
12
zEC
12
zEC
12
CF01 CF02
DB1P
DEV CF01 DEV CF02 TEC CF01 TEC CF02
DW0D
Dev
Warehouse
DW1D
DW2D
DW0Q
QA
Warehouse
DW1Q
DW2Q
V2
10.10.2014
DW1P DW5P*
P-LPAR P-LPAR P-LPAR P-LPAR
P-LPAR
D-LPAR S-LPAR
D-LPAR
D-LPAR
T-LPARS-LPAR T-LPAR T-LPARD-LPAR
• DB2 V10
• z/OS 2.1
• CICS V4.2
• MQ V7
• 6 Prod LPARs
• 11 DB2 Env
History - Production 2002
• 4 Way Data Sharing
across 4 LPARs
• DB2 V7
• Installed New Billing system
• Increase Thread Footprint
8
DB1P
DB0P
Production
Operational
DB2 for z/OS @ Progressive
CF01 CF02
P-LPAR
DB2P
P-LPAR
DB3P
P-LPAR
DB4P
P-LPAR
History – Production 2002
• Start of our Virtual Storage
issues
• Went on the JC virtual
storage diet
• Create DBxPSTOR
• Some relief but….
Weekly DB2 recycles
• Horizontal growth required
9
DB1P
DB0P
Production
Operational
DB2 for z/OS @ Progressive
CF01 CF02
P-LPAR
DB2P
P-LPAR
DB3P
P-LPAR
DB4P
P-LPAR
History – Production 2002
• Working closely with our z/OS, CICS and MQ Teams
Created 4 new LPARs
Created 4 additional DB2 DS members
8 Way DS group running on 8 LPARs
• Provide relief but…..
• DB2 V7 - Ongoing VS Monitoring required
• Additional resources, added complexity, troubleshooting, Operational support
• Increased TCO of the Mainframe platform
10
DB1P
DB0P
Production
Operational
DB2 for z/OS @ Progressive
CF01 CF02
P-LPAR
DB2P
P-LPAR
DB3P
P-LPAR
DB4P
P-LPAR
P-LPAR
DB5P
P-LPAR
DB6P
P-LPAR
DB7P
P-LPAR
DB8P
History – Production 2006
11
• 2006 - Claims Conversion IDMS to DB2
• Created 2 additional DS members
10 Way DS group
On 8 LPARs
New members added to Dev & QA DS groups
• 2007 - Migrated to DB2 V8
• Average 200MB of additional
available VS
• Finally some relief
• 2012 - Migrated to DB2 V10
P-LPAR
DB5P
DBAP
P-LPAR
DB7P
P-LPAR
DB8P
P-LPAR
DB6P
DB9P
DB2 V10 Virtual Storage Footprint • PROGRESSIVE
• DATA SHARING VIRTUAL STORAGE CHECK
• COMMAND ==>
• CONNECT TO ( DB0P )
• REPORT( ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ )-
• ( ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ )-
• ----------------------------------------------------------------------------
• WLM - DB2 *- STG -* *- THREADS -* *- 64bt STMT CACHE -*
• CMD - PR - MEM -L *- PRU -* *- ACTV - DBAT -* *- HWM - CRNT -*
• More: +
• ___ - 2 – DB1P A - 23% - - 26 - 52 - - 27 - 32 -
• ___ - 3 – DB2P B - 24% - - 26 - 51 - - 57 - 32 -
• ___ - - DB3P - - 20% - - 12 - 0 - - 5 - 5 -
• ___ - 1 – DB4P B - 23% - - 30 - 42 - - 33 - 11 -
• ___ - 5 – DB5P - 22% - - 60 - 0 - - 904 - 891 -
• ___ - 4 – DB6P D - 22% - - 44 - 25 - - 1,931 - 1,927 -
12
DB2 V10
• During V10 Migration Project
High Log RBA warning on 1 member
The first member created over 20 years ago
Routed CICS and DDF workload away from the member
Keep close eye on moving forward
• VS issues a thing of the past
• Paved the way for z/OS LPAR and DB2 member consolidation
13
Action Plan
• Project Planning process
Planner / SME Meetings
Mainframe Platform Review Board Meetings
Infrastructure Review Board Meetings
RTB Steering Review Board Meetings
• Approves Projects for High-level Planning
• After High-level planning revisit for approval for execution
• z/OS LPAR Consolidation Project
Moving MQ, CICS, DB2 workload to other LPARs
• DB2 Production Re-Architecture Workload Package
DB2 Footprint Reduction
Exploit New Cost Saving Features of DB2 V10
3rd Party DB2 Tools Replacement with IBM DB2 Tools Suite
R&D DB2 Warehouse Appliance IDAA Netezza
14
z/OS LPAR and DB2 Member Consolidation
15
• Retired 2 LPARs
• 10 Way on 6 LPARs
• Over a 6 month period 4 DB2 members were turn to shadows, shutdown, deleted and destroyed
• 2 Waves
• 8 Way on 6 LPARs
• 6 Way on 6 LPARs
• Dev and QA DS groups were downsized first
P-LPAR
DB6P
DB9P
P-LPAR
DB8P
DB8P
P-LPAR
DB5P
DBAP
P-LPAR
DB7P
DB7P
Deleting (Deactivate) a DB2 Data Sharing Member
Delete (Deactivate) a DB2 Data Sharing Member
• Pre - DB2 10
To remove a DB2 member you could only Quiesce
Easy, no group outage
Still physically existed
DISPLAY GROUP still visible
• DB2 V10 NFM
Allows a member to be permanently deleted from the group
Only once you are certain you will not need the member or its BSDS and log data sets again
• A deleted member cannot be restarted
Must be in NFM for members to be deleted from a data sharing group
• Members do not need to be active in DB2 10 NFM before being deleted
• Deleting data sharing members is a two-step process
Delete (Deactivate)
Destroy
17
Delete (Deactivate) a DB2 Data Sharing Member
• DSNJU003
Utility was enhanced to delete Quiesce members
• Deleting is an offline implementation
Entire data sharing group must be brought down
Member record of the Quiesce
Member BSDSs are updated to remove the member details
• 3 new operators in DSNJU003
DELMBR DEACTIV (Deactivate member)
RSTMBR (Restore a deactivated member to Quiesced state)
DELMBR DESTROY (Destroy member)
18
Delete (Deactivate) a DB2 Data Sharing Member
• Deleting (DELMBR DEACTIV) a member from a data sharing group
• Ensures no new log data is created for the member
• Before deactivating a member
• Ensure that the data sharing group is DB2 V10 NFM
• Member to be deleted must be Quiesced before the surviving members
Ensures BSDSs of surviving members record Quiesced state of member
• Deactivating a data sharing member
Ensure member has no outstanding URs or active utilities
Stop all members of the group
Use the DSNJU003 DELMBR DEACTIV,MEMBERID=x control statement against all BSDSs of all members of the group
Restart surviving members
19
Restore a DB2 Data Sharing Member
Restoring a Deactivated DB2 Data Sharing Member
• A member that has been deactivated (but not destroyed) can be reactivated and restarted
No other members of the data sharing group must be stopped before restoring a deactivated member
• To restore a deactivated member:
Use DSNJU003 RSTMBR MEMBERID=x control statement against BSDS of the member to be reactivated
• x is the member ID
Restart the member
Restart all surviving members
• So the updated status of the reactivated member can be recorded
21
Destroying a DB2 Data Sharing Member
Destroying a Deactivated DB2 Data Sharing Member
• Permanently deletes it from the data sharing group
• Cannot be restored or reactivated
• Must deactivate it
To destroy a deactivated member and later reactivated/restarted
Must deactivate it again
• To destroy a member:
All members must be shutdown
Use DSNJU003 DELMBR DESTROY,MEMBERID=x control statement against all BSDSs of all members of the group
• x is member ID to be destroyed
23
Destroying a Deactivated DB2 Data Sharing Member - Cont.
• To destroy a member:
Restart surviving members
• All surviving members should be restarted
The member is now deleted
• Its BSDS and logs are no longer needed
Can reuse member name/ID of a destroyed member
• Must add a new member to the data sharing group
24
Delete and Destroy Approach – 2 Options
• Option 1 - 2 step process
Deactivate now and then Destroy at a later date
• Allows for longer period for validation
• Allows opportunity to restore the member. But…….
• Requires an additional group outage
• Option 2 - 1 step process
Deactivate and Destroy at that same time
• Only requires one group outage. But….
• Restore is not major part of your fallback plan
• If needed, must recreate new member using the old name
25
Data Sharing Consolidation in Practice
Implementation Timeline
27
Implementation – DS Groups Involved
• Non Production groups
Development Data Sharing Group reduced from 3-way to 2-way
QA Data Sharing Group also reduced from 3-way to 2-way
Other larger non production groups were not changed
Both of these groups were collapsed at the same time
• Production Group: 10-way to 6-way in two waves
Two members were destroyed in June 2014
Two more were destroyed in September 2014
28
Archive Logs and Recovery Considerations
• We keep our archive logs for thirty days
Quiescing thirty days prior to Destroy ensures that logs from the destroyed members cannot be used for recovery
• There are other options, such as . . .
System Backup
Mass Image Copies
29
Shadow Member and Quiesce Timeline
• We Deactivated and Destroyed the members at the same time
Back out, for us, would involve adding a new member with the same name
The sixty day lead time was used to validate our configuration changes
• Sixty days covers two of our peak processing periods
• The Shadow period was much longer for some members
30
Workload Redistribution
• We had been running with two members on two of our LPARs for a number of years
• The work was split across these members
CICS on both (member specific)
DDF on one member only
• We have very little outbound DDF traffic
Batch and other split with RANDOMATT=YES
• LPAR consolidation occurred prior to DB2 member destruction
Eight LPARs were reduced to six
Ten-way Data Sharing on those LPARs
31
Shadow Member
• Used to find any legacy member affinities
Some non-critical affinities were detected
• The subsystem was running and could be used if need be
No CICS regions
DDF=COMMAND
RANDOMATT=NO
Most subsystem resources were left intact
32
Configuration Changes
Configuration Changes – Active Logs
• The active log space from all destroyed subsystems will be added to the surviving members
This was not a part of our Deactivate/Destroy plan
• Our current peak logging rate is not much higher than the pre-consolidation peak
• The active log configuration will redesigned next year
The logs are on smaller volumes at the present time
34
Configuration Changes – DB2 Work File Datasets
• The Work File Databases were dropped right after the subsystems were destroyed
Half of the space was reclaimed - This added time to the outage
The other half of the space was added to the surviving members
Note: We originally intended to reclaim all of this space
35
Configuration Changes – Buffer Pools
• Buffer Pool memory was given to the surviving subsystems
Some of this memory was redeployed prior to the Deactivate/Destroys
• Additional Real Storage was purchased
Group Buffer pools were increased to support much larger Buffer Pools
But . . .
36
Configuration Changes – Real Storage
• Make sure that you have enough real storage to avoid paging!
• We ran into paging issues for DB2 V10 that were unrelated to Data Sharing consolidation
SVCDUMPs and paging caused slowdowns
Monitoring is now in place to set SLIPs to turn off DB2 dumps when paging for DBM1 is detected
This is the main reason we added more real storage
Real Storage usage dropped as subsystems destroyed
37
Configuration Changes – One last word on storage
• No increase in CSA or SQA as of yet
The total number of threads has decreased on some LPARs
• MAXDBAT will be bumped up in the near future for High Performance DBATs
• CTHREAD and MAXDBAT are still closely monitored
38
ZPARM Changes
• Both MAXDBAT and CONDBAT were increased
CTHREAD was left as is
• DSMAX was increased and will be increased again in the near future
Was set to a conservative value prior to V10
• No other ZPARM changes were implemented
39
MAXTEMPS
“Recommendation: If you need additional space, consider adding 32 KB work file data sets. Since Version 9 became available, DB2 exploits 32 KB work file data sets more aggressively than in earlier releases.”
• DB2 will use 32 KB work files for queries that will benefit from using it but do not require it
• This can cause issues for queries that require 32 KB work files
• MAXTEMPS does not help with this problem
• Has anyone converted to all 32 KB work file space ?
40
Testing Member Deactivate/Destroy
Options for Testing Member Destroy
• Build a Sandbox Data Sharing Group
We considered this option but did not take this approach
• Test with copies of the BSDSs
Although this is only a partial test it is actually quite valuable
• Use your Disaster Recovery Site
• Non-production systems
42
Change Log Inventory USERMOD
Hmmmm … This might be trouble !
• The USERMOD, AN1DDSM, only affects one CSECT in DSNJU003
• Invaluable for testing and potentially for your implementation
43
Disaster Site Testing
• Our Mirror systems were used for all initial testing
Most of the group was destroyed
• Dry run for each production change
Including the back out scenarios
May not be an exact Dry Run
• Also used for first DR test after the actual production member destroys
44
Our Implementation
Member Quiesce
• Stop the member with MODE(QUIESCE)
• Issue these commands from another member
DISPLAY GROUP
DISPLAY UTILITY (*) MEMBER(Quiesced-member-name)
DISPLAY DATABASE(*) RESTRICT
• Restart the member ACCESS(MAINT) if,
You have any unresolved work or you want to create one last archive log
• Force an Archive (optional)
• Stop the member with MODE(QUIESCE)
46
Member Deactivate
• Ensure that the member is really quiesced
Similar to the Quiesce process
“Ensure that the BSDSs of all surviving members of the group indicate that the member that is to be deactivated is in the quiesced state.”
• Use DSNJU004 (print log map)
• Remember the USERMOD
47
Member Deactivate – cont.
• Stop all members of the group
• Backup the BSDSs for all members
• Use DSNJU003 to update all BSDSs in the group
DELMBR DEACTIV,MEMBERID=x (x is the member to be deactivated)
We also ran DSNJU004 (print log map)
The USERMOD may be required at this point
48
Member Deactivate – cont.
• Restart all surviving members
A member’s BSDSs will not be updated with the new status until the member has been restarted
Run DSNJU004 (print log map) for the group after restarting the first member to see this
49
Member Destroy
• Stop all members of the group
• Backup the BSDSs for all members
• Use DSNJU003 to update all BSDSs in the group
DELMBR DESTROY,MEMBERID=x (x is the member to be destroyed)
We also ran DSNJU004 (print log map)
The USERMOD may be required at this point
50
Member Destroy – cont.
• Restart all surviving members
DSNJU004 (print log map) view:
51
Deactivate/Destroy – Two steps or one ?
• We used two steps for the production members
Allows for member restore: DSNJU003 RSTMBR MEMBERID=x
• But only one step for non-production
52
Things to Consider
• Define a Timeline/Approach that fits your needs
Archive Logs/Recovery Issues
Capacity
• Find your best option for testing
DR Site or Mirror
• Work with IBM
Remember the USERMOD
They can help validate your approach
53
Progressive / IBM Collaboration
IBM Collaboration
• Take advantage of Insight
Reaching out and discuss plans with DB2 Developers
• Monthly Meetings with Progressive’s IBM DB2 Support Team
• Close working relationship with our Advocate
Reviewed approach and plan
Applied recommended PTFs and Usermod
• Opened proactive IBM Server Request a week prior to implementation
Ensures the proper IBM resources are available during your change
55
Progressive – Next Steps
Next Steps
• Continue exploitation of DB2 V10 new features
HPDBATs
More Native SQL Stored Procedures
• Conversion has been challenging
• We have hundreds of newly created
1MB Page Size
• Future Consolidation of LPARs and DB2 members
57
We Value Your Feedback!
• Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it to continually improve the conference.
• Access the Insight Conference Connect tool to quickly submit your surveys from your smartphone, laptop or conference kiosk.
58
Questions ?
Thank You