Very Large DB2 pureScale implementation sharing

Very Large DB2 pureScaleimplementation sharingYunCheol Ha, IBM Australia

GyouByoung Kim, IBM Korea

Session Code: 4063

Date and Time of Presentation

Platform: DB2 for Linux, UNIX, Windows

2

Agenda

• Why DB2 pureScale

• Business requirements

• Technical challenges

• Solutions

• Architecture

• Workloads

• Database configuration

• Migration and consolidations

• Lessons & Learned

3

Why DB2 pureScale ?

• Continuous high availability• 24 x 7 x 365

• Linear Scalability • Data explosion and workload

• Simple cluster management• Automation

• Capacity On demand

• Application transparency

• Database Consolidation• Multi-tenancy

• Mixed workload

4

Business requirements

• A rapid change on customer services environment from provider centric services to consumer based services

• Enhancement of customer experience• Integrated single system for simple usability

• Mobile portal support

• 24x7x365 services

• Systematic compliance support• Reduction of the opportunity for irregularities and corruption

• Reduction of the cost of compliance

• Improvement in Productivity of internal staffs• Administrative efficiency

• Simplicity of process

• Agility of systems along with rule amendments

5

Technical Challenges

• Poor Customer services due to aging and complexity of IT infrastructure

• Data integrity, duplicated customer information across multiple systems

• Difficulty in managing heterogeneous systems and maintaining consistent performance due to complexity of technology applied

• Expectations on newly integrated systems• Efficient management and operations on very large data

• Consistent performance during seasonal peak workloads

• Flexible capacity management under workload seasonality

• Mixed workload

• Consolidation of 40 applications

• Bulk data load and batch workload against very large tables during online transaction operation

• Proven technologies and deployment experiences of very large DBMSs

6

Solutions

Operational DBMSDB2 pureScale for highly available and scalable operational systems

H/WRobust and stable AIX and Power 7 with Capacity On Demand(COD) and 10GE RoCE interconnect

Storage copy Storage snapshot copies for Very Large Database(VLDB) backups and daily batch processes

Disaster Recovery Synchronous Storage mirroring

DW DBMSPureData Operational Analytics for Enterprise Data Warehouse

Replication CDC from DB2 pureScale to PDOA

Data MigrationDB2 Data Partitioning Feature and federation technology for data consolidation and migration

7

Architecture > System Configurations

Internal Domain

Member130 cores

Member230 cores

Member330 cores

Dev QA

AIXTSA/GPFS

pureScale ClusterDB2 AESE

3rd party tool

AIXDB2 AESE

tools

AIXDB2 AESE

3rd party tools

Member12cores

Member22cores

Single2cores

Production servers

Public Domain

Member130 cores

Member230 cores

Member330 cores

Dev QA

AIXTSA/GPFS

pureScale ClusterDB2 AESE

3rd party tool

AIXDB2 AESE

tools

AIXDB2 AESE

3rd party tools

Member12cores

Member22cores

Single2cores

Production servers

Other biz server

AIXDB2 AESE

3rd party tools

Member135 cores

Member235 cores

Other biz server

AIXDB2 AESE

3rd party tools

Member18 cores

Member28 cores

• Total 17 pureScale Clusters• 2 main Very Large Database(VLDB) clusters with 3 members each

8

Architecture > various pureScale cluster Topology

Shared Storage

LogData

DB #1

Global LockGlobal Buffer

DB #2 DB #3

DB2 pureScale Cluster

Ethernet Switch

SAN Switch

Interconnect Switch

Application Servers Application Servers Application Servers

DB #1 DB #2 DB #3


Member Member Member

CF CF

DB #1 DB #2


Member Member

CF CF

DB #1 DB #2 DB #3


Member Member MemberCF

CF

3 Members + 2 CFs Dedicated

3 Members + 2 CFs collocated

DB #1 DB #2


Member MemberCF

CF

2 Members and 2 CFs dedicated

2 Mem. and 2 CFs collocated


DB #1

Member

CF

Logical Members

DB #2

Member

DB #3

Member

9

Architecture > Application configuration

• A consolidated database on pureScale for multiple applications

• DB2 and WAS Workload balance(WLB) and client affinity setup for

• Online applications on member 0 and 1

• Online Batch applications on member 2

• Consideration on Member Subset

• Automatic client reroute (ACR) setup for java and non java applications

Shared Storage

LogData

DB #1

Global LockGlobal Buffer

DB #2 DB #3


Ethernet Switch

SAN Switch

Interconnect Switch

Application Servers Application Servers Application Servers

10

Architecture > Network Topology

Storage

Public NIC Switch #2Public NIC Switch #1 10G RoCE Switch #1

Private NIC Switch #1

10G RoCE Switch #2

Private NIC Switch #2

SAN Switch #1 SAN Switch #2

RoCE(10G)

GPFS(1G)

RoCE(10G)

RoCE(10G)

RoCE(10G)

HBA(8G)

HBA(8G)

GPFS(1G)

RoCE(10G)

GPFS(1G)

RoCE(10G)

RoCE(10G)

RoCE(10G)

CPU : 16CMEM:160G

CF #1CPU : 16CMEM:160G

CF #2

HBA(8G)

HBA(8G)

GPFS(1G)

PubNic(10G)

RoCE(10G)

GPFS(1G)

GPFS(1G)

RoCE(10G)

PubNic(10G)

HBA(8G)

HBA(8G)

HBA(8G)

HBA(8G)

RoCE(10G)

RoCE(10G)

Pub(10G)

RoCE(10G)

GPFS(1G)

GPFS(1G)

RoCE(10G)

Pub(10G)

HBA(8G)

HBA(8G)

HBA(8G)

HBA(8G)

RoCE(10G)

RoCE(10G)

Pub(10G)

RoCE(10G)

GPFS(1G)

GPFS(1G)

RoCE(10G)

Pub(10G)

CPU : 30CMEM:224G

DB #3CPU : 30CMEM:224G

DB #1CPU : 30CMEM:224G

DB #2

HBA(8G)

HBA(8G)

HBA(8G)

HBA(8G)

RoCE(10G)

RoCE(10G)

• Redundancy on RoCE adapters and switches of interconnect network, 10Gbit adapters and switches of GPFS private network, and public network.

• Storage Area network(SAN) connection for shared storage

11

Architecture > Multiple RoCE Switches configuration

10G Switch 1 10G Switch 2

Mbr #1CFp

8

Mbr #2 Mbr #3 CFs

en0 en1 en2 en3 en4 en5 en6 en7 en0 en1 en2 en3 en4 en5 en6 en7

2 Switches : 20port + 16port = 36ports

Host1 Host2 Host3Host4 Host5

en0 en1 en2 en3 en4 en5 en6 en7 en0 en1 en2 en3 en4 en5 en6 en7 en0 en1 en2 en3 en4 en5 en6 en7

• High Availability and performance

• 10 RoCE adapter ports per switch

• 8 Inter Switch links(ISL) per switch

• Number of ISLs RoT = ( total number of CF interconnects + the number of member )/2

ISL=8

12

Architecture > Backups and Database Clone

• Fast Storage snapshot without impact on online transactions• VLDB database backup

• A point in time database clone

• Around 40TB including 20TB compressed data

Backu

p/R

esto

re

Snapshot

Snapshot

Large Batch WorkloadOLTP Workload

DatabaseBackup

DB #1 DB #2


Mem-0 Mem-1CF CF

LogData

DB #1

Mem-0

CF

Mem-1

LogData

DB #1

Mem-0

CF

Mem-1

LogData

13

DB #1

Mem-0

CF

Mem-1

LogData

Architecture > Database clone• Daily batch processing on the cloned database

• Logical members and a CF on a single server

• Snapshot on only data and transactional log files

• Database activation on only one member

DB #1 DB #2


Mem-0 Mem-1CF CF

LogDataSnapshot

Logical Members and FCM Ports

14

Data Migration > Data Migration & Consolidations

As-Was Systems

Data Source : File

Migration & New System transformation

DB2 DPF

Master dataData Qualitymanagement

Data Standardization

To-Be Model

Master Data ManagementIntegrated new portal

Operationaldata

GIS,BPM,CRM

New Systems

1

2

3

Data source : 40 data source migrationDB2 : federation (size 30%) Oracle : federation (Size : 60%)MSSQL : audit : FileCubrid and etc. : legal , Files

Cubrid

Legal Info

MSSQL

Audit

DB2

DW

OracleDB2

Support and management

Other operational system

DB2

Internal System

Public System

EDW System

PDOA

60TB

15

Data Migration > Data Migration & Consolidations

• Large data volume migrations within limited window of time• Total 50 ~ 60TB data source systems in Oracle, Sybase, DB2, etc

• Parallel Bulk data processing in the staging area using DB2 Database Partition Feature(DPF)• Data consolidations in DB2 DPF

• Source and target table mappings

• Source data extraction methods

Cursor load from Oracle and DB2 using federation server

Export and load from Sybase

• Table and SQL designs utilizing collocation joins maximizing DPF performance

• Pre cold/history data migration during online

• Offline data migration for active data to minimize downtime time

16

Workloads > Transactions

• Read & write ratio• 20 vs 80 read & write ratio during the

night

• 85 vs 15 read & write ratio during the day

• Transactions • Daily 360M commits and 1300M SQLs

• 4K commits per second

• 15K SQLs per second

• At peak hour, 30M commits and 100M SQLs

• 8K commits per second

• 30K SQLs per second

17

Workloads > Transactions

• Mixed workload• Substantial rows read from complex queries for Online reporting

along with short millisecond lookup query and write operations

• Heavy read workload on member2 used for batch processing

• Database connections• Average 19K connections from 3 members

18

Workloads > System utilization during peak

• Workload balance• Online transaction workload on

member 0 and member 1

• Online batch processing on member 2

• Application server level Workload balance across two members

• Seasonal workload spikes• Capacity On Demand to deal with

the seasonal workload

Member0

Member1

Member2

19

Database Configuration > Database Manager CFG

• NUMDB 1 • One critical main database per instance

• Simple CF related memory configurations for one database

• CF_MEM_SZ Automatic

• CF_NUM_WORKER Automatic

• CF_NUM_WORKER is set based on best practices • At least the number of interconnect ports

• 4 RoCE adapter ports per member or cf

• One or two less than the number of CF cores

• A dedicated LPAR for each CF

• CPU usage monitoring using db2pd –cfinfo or ENV_CF_SYS_RESORUCES administrative view

20

Database Configuration > Database CFG

• LOCKLIST size is around 1GB

• MAXLOCKS is set with 1• Extremely low MAXLOCKS parameter value to discourage

developers to develop bulk data processing SQLs

• SHEAPTHRES_SHR vs Sortheap size • 100 : 1 ratio

• Many concurrent users and substantial number of complex queries

• PCKCACHESZ • Due to complex queries, parameter markers couldn’t be used

• Optimal package cache size was assigned to reduce compilation time for huge number of SQLs

21

Database Configuration > Table space and table• Multiple automatic storage groups to handle a tens Terabytes

database• 4 storage paths for each automatic storage group considering IO

performance and balanced configurations

• 4 containers per table space

• Over 1000 table spaces to store around 10,000 tables

• Big tables are defined as range partitioned tables• Each partition is stored in its own table space

• Fast Roll in and Roll out

• Consideration on online operations and management

• Compression was turned on for all the tables and indexes • 70% table compression ratio

• 50% index compression ratio

22

Lessons & Learned > GPFS configuration

• Most of GPFS parameters are pre-optimized for pureScale

• Tuning on AIX AIO parameters to avoid contentions among AIO servers• The number of aio_maxservers processes in a AIX server is the

number of logical cores times the value of aio_maxservers parameter

• For example, under 30 physical cores, SMT4 enabled, and aio_maxservers= 30, the number of aio_maxservers processes are 3600 : 30 x 4 ( SMT4 ) x 30

• AIO server Monitoring : nmon Captial A

aio_maxservers = 1aio_minservers = 1

Default : aio_maxservers = 30Default : aio_minservers = 3

Lessons & Learned > Syslog configuration

• Syslog setup was recommended for troubleshooting

• Syslog integrates log data from many different types of systems• the RSCT, TSA, and DB2 create log messages, which could be

helpful for pureScale diagnosis, in the syslog daemon

• On AIX the syslog daemon isn’t configured and running as default. • vi /etc/syslog.conf

• *.debug /var/log/syslog.out rotate time 1d files 7 ( as root )

• touch /var/log/syslog.out ( as root )

• refresh –s syslogd (as root)

• http://www-01.ibm.com/support/docview.wss?uid=swg21302886

http://www-01.ibm.com/support/docview.wss?uid=swg21302886

24

Lessons & Learned > RoCE switches Configuration

• 2 RoCE switches were setup for high availability and the RoCEswitch failover was configured.• Disabling Converged Enhanced Ethernet(CEE) feature

• Enabling Global Pause (IEEE 802.3x) flow control to avoid dropped packets

#interface port <first port>-<last port>

#flowcontrol both

• Disabling spanning tree protocol (STP)

#spanning-tree mode disable

• Enabling Link Aggregate Control Protocol (LACP) on inter switch links(ISLs) on each switch to remove network loops

#interface port <port number>

#lacp mode active

#lacp key <any available key, 1 is ok if there is no previous config>

#exit

…

Lessons & Learned > uDAPL_ping• RDMA network health check

• uDAPL_ping script validates uDAPL connectivity through uDAPL ping for every listed HCA and cluster interconnect netname combination specified

• Download udapl_ping.zip from IBM

• uDAPL ping validation • Create a host-hca file including list of available HCAs from /etc/dat.conf and interconnect netnames

of pureScale hosts

• validateUdaplPing <host-hca-file> <dat-version>

• uDAPL_ping valididation example :

• Interconnect performance can be monitored through the CrossInvalidate message send time of MON_GET_CF_CMD CF commands table function, and • Average 10 microseconds: in good health

validateUdaplPing host-hca 2.0100 bytes from 10.10.1.1: seq=0 time=82495100 bytes from 10.10.1.1: seq=1 time=22100 bytes from 10.10.1.1: seq=2 time=22100 bytes from 10.10.1.1: seq=3 time=22100 bytes from 10.10.1.1: seq=4 time=23round-trip average: 16516uDAPL ping from HostA-en1-1 (client) to HostA-en1-1 (server) was successful……

vi host-hcaHostA HostA-en1-1 hca0HostA HostA-en1-2 hca1HostB HostB-en1-1 hca0HostB HostB-en1-2 hca1

Lessons & Learned > Interconnect performance• Interconnect performance can be monitored through the CrossInvalidate message

send time and other CF commands execution time of MON_GET_CF_CMD CF commands table function• Around or less than 10 microseconds indicates a good health

• MON_GET_CF_WAIT_TIME table function also provides interconnect transport time and CF command execution time. • Around or less 100 µs indicates a good health

27

Lessons & Learned > Member & CF Collocation • A dedicated LPAR for a CF is recommended in AIX environment

• However, when a member and a cf need to be collected in AIX RoCEenvironment, CPU binding is recommended

• 80-20 rule is applied to logical cores assignment between a member and a CF processes

• CF_NUM_WORKDERS = logical cores of a CF – 1

• CPU affinity setup using rset• Create rsets for the member and the cf

mkrset –c 0-20 pscale/memberrest

mkrset –c 21-26 pscale/cfrset

• Start db2 database manager

db2start

• Bind the db2 member processes to the member rset

ps –ef |grep db2

Attachrset pscale/memberrest <db2 process number>

• Bind CF processes to the CF rset

ps –ef |grep ca

Attachrset pscale/memberrest <cf process number>

28

Lessons & Learned > Database backup & clone

Step1db2 flush bufferpool alldb2 set write suspend for database

Step2/usr/lpp/mmfs/bin/mmfsctl filesystem suspend-write (Data & Logs)Snapshot copy (storage copy)

Step3/usr/lpp/mmfs/bin/mmfsctl filesystem resumedb2 set write resume for database

Clone Database DB2 backup from the snapshot image

Step1Attach and mount the snapshot copy to a clone server

Attach and mount the snapshot copy to a backup server

Step2 db2start db2start

Step3 db2inidb <dbname> as snapshotdb2 backup db <dbname> to /Filesystem or to Backup Library

• Database clone and DB2 backup from the snapshot copy

• Snapshot Storage copy

No Tasks Target Commands

Pre

Check

list

Copy DB2 fixpak image All Hosts tar –xvf / IBM/db2105fp4/ v10.5fp4_aix64_server_t.tar

Verify Minimum committed code level All Hosts pure1[root]:/IBM/db2105fp4/server_t>./installFixPack -show_level_info

check Free Disk Space All Hosts /opt (6300000KB), /tmp (2000000KB)

verify DB2 FixPack version db2level

Verify tsamp version All Hostspure1[root]:/IBM/db2105fp4/server_t/db2/aix/tsamp>./db2cktsa -v install

pure1[root]:/IBM/db2105fp4/server_t/db2/aix/tsamp>./db2cktsa -v media

verifty gpfs version All Hostspure1[root]:/IBM/db2105fp4/server_t/db2/aix/gpfs>./db2ckgpfs -v install

pure1[root]:/IBM/db2105fp4/server_t/db2/aix/gpfs>./db2ckgpfs -v media

1 Install Online Fixpak ->Member, CF-S, CF-P order All Hostspure1[root]:/IBM/db2105fp4/server_t>./installFixPack -p /opt/IBM/db2/V10.5.4

-I db2 -online -l /tmp/fp4install.log -t /tmp/fp4install.trc

2 Determine the success of the online fixpak update All Hosts pure1[root]:/IBM/db2105fp4/server_t>./installFixPack -check_commit -I db2

3 Commit the online fixpak updatepure1[root]:/IBM/db2105fp4/server_t>./installFixPack -commit_level -I db2 -l

/tmp/fp4install.log -t /tmp/fp4install.trc

4 Verify Fixpak version pure1@db2:/home/db2>db2pd –rustatus

Lessons & Learned > Online fixpak update

• Fixpak update from fixpak3 to fixpak4 during online operations in order of members and CFs and CFp servers

No Tasks Target Commands

Pre Same as fixpak rolling update

1 Stop database manager pure1@db2:/home/db2>db2stop

2 Stop DB2 instance All Hosts db2stop instance on pure1

3 Install Fixpak All Hostspure1[root]:/IBM/db2105fp4/server_t>./installFixPack -p /opt/IBM/db2/V10.5.4 -

I db2 -offline -l /tmp/fp4install.log -t /tmp/fp4install.trc –f TSAMP –f GPFS

4When db2instance –list shows inconsistent state, refresh the

resource modelpure1@db2:/home/db2>db2cluster –cm -repair –resources

5 Determine the success of the online fixpak update All Hostspure1[root]:/IBM/db2105fp4/server_t>./installFixPack -check_commit -I db2 -t

/tmp/checkcommit.trc -l /tmp/checkcommit.log

6 Commit the fixpak updatepure1[root]:/IBM/db2105fp4/server_t>./installFixPack -commit_level -I db2 -l

/tmp/commitlevel.log -t /tmp/commitlevel.trc

7 Restart db2 instance All Hosts pure1@db2:/home/db2>db2start instance on pure1

8 Restart database manager pure1@db2:/home/db2>db2start

Lessons & Learned > Offline fixpak update (simplified)

• Offline Fixpak update to apply special build during system maintenance time in order of members and CFs and CFpservers

No Tasks Commands

All the hosts in order of Members, CFs, and CFp

1 Quiesce a Member $ db2stop member 0 quiesce 30

2 Stop db2 instance on the member host $ db2stop instance on pure1

3 Enter Cluster manager Maintenance mode #/opt/IBM/db2/V10.5.4/bin>./db2cluster -cm -enter –maintenance

Host 'pure1' has entered maintenance mode.

4 Enter shared file system cluster Maintenance mode #/opt/IBM/db2/V10.5.4/bin>./db2cluster -cfs -enter –maintenance

Host ‘pure1’ has successfully entered file system maintenance mode.

System maintenance

5 Exit cluster manager Maintenance Mode # opt/IBM/db2/V10.5.4/bin>./db2cluster -cm -exit –maintenance

Host 'pure1' has exited maintenance mode. Domain

'db2domain_20141025220444' has been started.

6 Exit shared file system cluster Maintenance Mode # opt/IBM/db2/V10.5.4/bin>./db2cluster –cfs -exit –maintenance

Host 'pure1' has successfully exited file system maintenance mode.

7 Restart the instance $/home/db2>db2start instance on pure1

8 Restart the member $/home/db2>db2start member 0

Lessons & Learned > Online System maintenance

• Online system maintenance to apply HW configuration changes

Lessons & Learned > Websphere Application Server

• Workload Balance(WLB) and Automatic Client Reroute(ACR) setup• WLB and ACR configuration using JCC data sources to connect DB2 pureScale cluster

from WAS version 7.0.0.9 or later

• Data source custom properties :• Driver type: the JCC driver type 4.

• Database name : the pureScale database name

• Server name: hostname of one DB2 pureScale connect member

• Port number: TCP/IP port number of the DB2 pureScale connect member

• enableSysplexWLB: true to enable WLB

• Dynamic Alternate Server list

• db2.jcc.outputDirectory

• Or Static Alternate Server list

• clientRerouteAlternateServerName: the server list separated by commas.

• clientRerouteAlternatePortNumber: TCP/IP port list separated by commas for the servers specified in clientRerouteAlternateServerName.

• Data source connection pool properties:• Purge Policy: set to FailingConnectionOnly for WAS to support seamless ACR

Lessons & Learned > Websphere Application Server

• maxTransportObjectidleTime may be needed tune.

• Under WLB, each logical connection from WAS may have a physical connection ( aka Transport. One transport object per one physical connection ) to every single member.

• The number of Transport object is controlled by maxTransportObjectsproperty and the transport objects can be dropped when the transport objects are idle more than the seconds defined by maxTransportObjectidleTime. The default value is 10 secs.

• Dropping the idle transport objects from a member could cause a performance degradation because no idle transport objects are available on the member when they are needed for new connections

• 3600 secs were assigned to maxTransportObjectIdleTime to avoid frequent transport objects drop.

Lessons & Learned > Distributed Shell (Dsh)• Simplifying the task of managing multiple pureScale AIX hosts

• Dsh commands distribute the same commands on all the pureScale hosts

• Dsh setup :1. Install AIX version csm.dsh.1.6.0.0.bff

2. Add node list

vi /.profile

export WCOLL=/node_list

3. Cat node_list

[pure1] root:/> cat node_list

pure1

pure2

pure3

pure4

• Example :• pure1[root]:/>dsh -c

• dsh> date

• pure1: Sun Nov 16 18:42:52 2014

• pure4: Sun Nov 16 18:42:52 2014

• pure3: Sun Nov 16 18:42:52 2014

• pure2: Sun Nov 16 18:42:52 2014

• dsh> exit

35

• Impressive DB2 pureScale performance for complex queries without much tuning efforts, which were quite slow before.

• Most of SQLs are running well when the statistic information of database objects is updated properly and appropriate indexes are created on them, but few SQLs require tunings, and this requires deep knowledge and experience on the DB2 optimizer.

• Preparation of prerequisite of H/Ws and OS S/Ws is the most important step for the pureScale installation

• Easy monitoring and problem determination & solving of pureScale cluster services ( RSCT/TSA/GPFS ) are needed

Feedbacks

YunCheol HaIBM [email protected]

Very Large DB2 pureScaleimplementation sharing

Please fill out your session

evaluation before leaving!

GyouByoung KimIBM [email protected]

mailto:[email protected]

mailto:[email protected]

Date post:	18-Dec-2021
Category:	Documents
Upload:	others
View:	18 times
Download:	2 times

Very Large DB2 pureScale implementation sharing

Documents