Clustering Types of Clustering. Objectives At the end of this module the student will understand...

Post on 28-Dec-2015

214 views 0 download

Tags:

transcript

ClusteringClusteringTypes of Clustering Types of Clustering

ObjectivesObjectives

At the end of this module the student will At the end of this module the student will understand the following tasks and understand the following tasks and concepts.concepts.

What clustering is and why you would What clustering is and why you would want itwant it

Clustering optionsClustering options Differences between various types of Differences between various types of

clustering; advantages and clustering; advantages and disadvantagesdisadvantages

Factors to consider when choosing a Factors to consider when choosing a cluster typecluster type

What is a cluster?What is a cluster?

MultipleMultiple systems performing a single systems performing a single functionfunction

Black boxBlack box

A cluster is a group of independent servers that function as a single system.

OR

Q: What is RAC?

A: Oracle’s clustered database for real (i.e. everyday) applications

Oracle Oracle RACClustered it

becomes…

Database

Instance

Database

Instances

What Is a RAC Cluster?

• Nodes• Interconnect• Shared disk subsystem• Instances• Database

NodeNode

Interconnect

Disks

What is Real Application Clusters?

• Two or more interconnected, but independent servers

• One instance per node

• Multiple instances accessing the same database

• Database files stored on disks physically or logically connected to each node, so that every instance can read from or write to them

Database vs InstanceRAC Cluster consists of ….

One or more instances

One Database residing on shared storage

Node 1

Instance 1

Node 2

Instance 2Interconnect

Shared Storage

LocalDisk

LocalDisk

Database

Database Cluster Types

Shared Everything– Shared Disk/Cache– Oracle and IBM Mainframes– More Reliable As You

Add Computers– No Data Partitioning Required

Shared Nothing– Private Disk/Cache– Microsoft and IBM Unix/NT– Less Reliable As You

Add Computers– Static Data Partitioning

DataDataA-ZA-Z

DataDataA-EA-E

DataDataF-KF-K

DataDataL-SL-S

DataDataT-ZT-Z

Server 1Server 1 Server 2Server 2 Server NServer N

• • •

ClusterWareClusterWare ClusterWareClusterWare ClusterWareClusterWare

Key Software Components

Server 3Server 3

ClusterWareClusterWare

OSOS OSOS OSOSOSOS

Raw / CFS / NAS / ASM

Oracle RACOracle RAC

Listener

Oracle RACOracle RAC

Listener

Oracle RACOracle RAC

Listener

Oracle RACOracle RAC

Listener

Cluster DefinitionsCluster Definitions

Shared Nothing (Federated)Shared Nothing (Federated) Replicated SiteReplicated Site Shared DiskShared Disk FailoverFailover

Active/PassiveActive/Passive Active/ActiveActive/Active

Shared EverythingShared Everything

Shared Nothing ClusterShared Nothing Cluster

Only one CPU is connected to a diskOnly one CPU is connected to a disk May have shared memoryMay have shared memory MPP MPP ((Massively Parallel Systems) Systems are Systems are

Shared NothingShared Nothing Other vendors have “Shared Other vendors have “Shared

Nothing” clustersNothing” clusters

Federated (Shared Federated (Shared Nothing) ClusterNothing) Cluster Distributed database Distributed database

(separate database (separate database on each machine)on each machine)

Data is spread across Data is spread across nodes; each machine nodes; each machine has part of the datahas part of the data

Function is spread Function is spread across nodesacross nodes

Two-Phase CommitTwo-Phase Commit

Database Database

Server Server

Got it?

Got it!

Good!

1.

2.

3.

Replicated SystemReplicated System Data replicated at Data replicated at

the server the server (network) level or (network) level or at the storage at the storage (SAN) level(SAN) level

Multiple copies of Multiple copies of the same the same databasedatabase

Most common Most common implementation is implementation is Active/PassiveActive/Passive

Failover between Failover between nodesnodes

Database Database

Server Server

Server levelReplication

Storage levelReplication

Active Node Passive Node

or

Shared Disk ClusterShared Disk Cluster

Shared file systemShared file system Multiple systems attached to the same Multiple systems attached to the same

diskdisk All nodes must have access to dataAll nodes must have access to data Only one database instance; only one Only one database instance; only one

node has “ownership” of the shared disknode has “ownership” of the shared disk Synchronization between systems; If one Synchronization between systems; If one

node fails, then the other takes overnode fails, then the other takes over

Failover ClusterFailover Cluster

One system is a standby system for One system is a standby system for anotheranother

Only one system doing work at a Only one system doing work at a timetime

Pseudo-Shared DiskPseudo-Shared Disk Limited scalability in active/passive Limited scalability in active/passive

modemode

Failover ClusteringFailover Clustering

Fault tolerant Fault tolerant systems; systems; highly highly availableavailable

Basic failover Basic failover clusters don’t clusters don’t scale beyond scale beyond two nodestwo nodes

Users

DatabaseDatabase

Server Server

Active/Passive vs. Active/Passive vs. Active/ActiveActive/Active Both are failover onlyBoth are failover only Active/PassiveActive/Passive

One node is activeOne node is active The other is passive until failoverThe other is passive until failover

Active/ActiveActive/Active Still uses active/passive technologyStill uses active/passive technology 2 separate databases2 separate databases One is active on node A and passive on node One is active on node A and passive on node

BB The second database is active on node B and The second database is active on node B and

passive on node A.passive on node A. Separate applications and user connections to Separate applications and user connections to

each of the different databaseseach of the different databases

Active/PassiveActive/Passive

Node A is Node A is activeactive Node B is Node B is passivepassive

until/unless Node A until/unless Node A failsfails

Only one Oracle license Only one Oracle license is requiredis required

Node A Node B

Active/PassiveActive/Passive

Node A Node BXIf Node A fails …

Active/PassiveActive/Passive

Node B becomes Node B becomes activeactive

Node A is dead Node A is dead (definitely (definitely passivepassive!) !) until repaired and until repaired and then “failed back” then “failed back” if necessary.if necessary.

Node A Node BX

Active/ActiveActive/Active

Application Group A Application Group A and User Group A are and User Group A are activeactive on Node Aon Node A

Application Group B Application Group B and User Group B are and User Group B are activeactive on Node Bon Node B

Each node serves as Each node serves as failover for the other.failover for the other.

2 separate databases. 2 separate databases. Both nodes are not Both nodes are not accessing the same accessing the same data at the same time.data at the same time.

Oracle license required Oracle license required on each nodeon each node

Node A

Application A

User Group A

Passive Fail-over for B

Node B

Passive Fail-over for A

Application B

User Group B

Storage Options

Raw Files– All platforms

Clustered File System– Windows & Linux - OCFS– Solaris & AIX – Veritas CFS

Automatic Storage Management (ASM)– All platforms

NAS (via NFS)– Linux & Solaris

Real Applications Clusters Use RAC for scalability

– Add instances against same database files providing more Oracle processes and increasing number of users

– Add or remove nodes when needed

Oracle 10g provides platform independent Cluster Ready Services (CRS) to handle failover of services to surviving nodes

RAC Instance ArchitecturePublic Network

Node 1

ASM inst

DB inst 1

CRS

ASM inst

DB inst 2

CRS

ASM inst

DB inst 3

CRS

Node 2 Node 3

OCR

Vote

DB

Nodeapps:Vip, ons, gsd

privatenetworknetworkprivate

Nodeapps:Vip, ons, gsd

Nodeapps:Vip, ons, gsd

OCR And Voting Disk

Oracle Cluster Registry:The OCR maintains cluster configuration information that

is used by each node of the cluster to determine the state of the cluster. OCR also maintains information about cluster resources:

• Databases• Instance• ServicesEach node in the cluster maintains a copy of the OCR in

memory for better performance and also responsible for updating the OCR as required in shared storage.

Voting Disk: The Voting Disk Files are used by Oracle Clusterware to

determine which nodes are currently members of the cluster.

ASM Disk Groups, Disks and Database Files

Disk Group 1

Disk 1 Disk 4

Disk Group 2

Disk 2 Disk 5 Disk 6 Disk 7

Disk 3

Disk Group 3

File 3

File 4

File 5

File 2

File 5

File 6

File 1

RAC Scalability & Availability

High Availability– Node & Database instance no longer

represent single point of failure– survive node and instance failures

RAC Network Checklist

• Public adapter should be first

• Private adapter should be second

• Ping node’s public hostname to verify

• Ping each node’s public & private hostname

• Don’t use the name ‘Private’ for the Private NetworkNetwork Connections

Advanced-> Advanced Setting

RAC Best Practices Eliminate Single Points of Failure

– NIC’s, Switches, Interconnect, Shared Storage, Power Supplies

– Understand cost vs. availability tradeoff Use fastest switch available for private

interconnect– Disable additional protocols such as spanning tree protocol– Increase MTU size as high as switch allows – I.e. 9000– Allow cards and switch ports to autonegotiate speed.

Use static IP addresses– Public LAN resolved by DNS and hosts file– For cluster interconnect use non-routable IP

(10.X or 192.168.X)

Clustering Solutions from Clustering Solutions from OracleOracle Oracle FailsafeOracle Failsafe Oracle Data GuardOracle Data Guard Advanced ReplicationAdvanced Replication Shared Nothing ClusterShared Nothing Cluster Oracle Parallel ServerOracle Parallel Server Real Application Clustering (RAC)Real Application Clustering (RAC)

Oracle Fail Safe Integrated with Microsoft Clustering, Fail Safe

is a core feature included with every Oracle version

In the event of a system failure, Oracle Fail Safe works with Microsoft Cluster Server to restart Oracle databases and applications on a surviving cluster node

MSCS and Fail Safe uses “share-nothing” architecture (only one node can access shared datafiles at any time)

FailsafeFailsafe

MS Clustering EnabledMS Clustering Enabled Two servers one disk subsystemTwo servers one disk subsystem Switches in the event of a hardware Switches in the event of a hardware

failurefailure Requires recoveryRequires recovery

Oracle Fail Safe

Fail Safe Server

ResourceMSCS

Resource DLL

Fail Safe Server

ResourceMSCS

Resource DLL

Node A Node B

Fail Safe Manager

Cluster Disks

PrivateDisk

PrivateDisk

Before Failover…

Primary Server for Virtual Server C

Database Clients Secondary Server forVirtual Server C

PrivateDisk(s)

Database DisksShared I/O Interconnect

Group 1: Disk Resource IP Address Resource Network Name Resource Net Listener Resource Oracle Database Resource

Virtual Server: C

Node: A

Oracle Fail Safe Database Instance

PrivateDisk(s)

Node: B

Oracle FailSafe Server

MSCS

Oracle FailSafe Server

MSCS

Node goes down…

Primary Server for Virtual Server C

Database Clients Secondary Server forVirtual Server C

PrivateDisk(s)

Database DisksShared I/O Interconnect

Group 1: Disk Resource IP Address Resource Network Name Resource Net Listener Resource Oracle Database Resource

Virtual Server: C

Node: A

Oracle Fail Safe Database Instance

PrivateDisk(s)

Node: B

Oracle FailSafe Server

MSCS

Oracle FailSafe Server

MSCS

After failover…

Primary Server for Virtual Server C

Database Clients Secondary Server forVirtual Server C

PrivateDisk(s)

Database DisksShared I/O Interconnect

Group 1: Disk Resource IP Address Resource Network Name Resource Net Listener Resource Oracle Database Resource

Virtual Server: C

Node: A

Oracle Fail Safe Database Instance

PrivateDisk(s)

Node: B

Oracle FailSafe Server

MSCS

Oracle FailSafe Server

MSCS

Fail Safe Manager

Oracle Data Guard Data Guard is Oracle’s Disaster Recovery product

which maintains and monitors one or more standby databases to protect enterprise data from failures, disasters, errors, and corruptions

Standby databases, which can be located across large geographic regions away from the primary database, can be switched to the production role if a problem occurs with the primary

Can use different Windows versions for primary and standby (2003 for primary, 2000 for standby)

DG is free with Enterprise Edition of RDBMS

http://www.oracle.com/technology/deploy/availability/htdocs/DataGuardOverview.html

Oracle Data GuardOracle Data Guard

Mirrored ServerMirrored Server Physical StandbyPhysical Standby

Archive Logs are applied to the remote databaseArchive Logs are applied to the remote database Switchover occurs in the event of a failureSwitchover occurs in the event of a failure

Logical StandbyLogical Standby Log Miner technology is used to generate SQLLog Miner technology is used to generate SQL Standby Database can also be used for read-only Standby Database can also be used for read-only

reportingreporting AdvantagesAdvantages

Safe from user failureSafe from user failure Can be in different locationCan be in different location No recovery requiredNo recovery required

www.brianhitchcock.net

Brian Hitchcock October 23, 2007 Page 41

DataGuard Classic*

PrimaryDatabase

Online RedoLogs

ArchivedRedo Logs

StandbyDatabase

ArchivedRedo Logs

DataGuard

Standby mounted, recoveringorStandby read-only, no apply

Can switch back and forth-Primary becomes standby-Standby becomes primary

*Before choice of physical or logical standby

RAC + DataGuard– Eliminates physical location as SPOF

Maximum Availability Architecture (MAA)

RAC Cluster

NYC

Primary

Single Instance

Boston

Standby

Data Guard

Maximum Availability Architecture

WAN Traffic Manager

Dedicated Network

Primary Site Secondary Site

Data Guard- Flashback- RMAN- Grid Control

RAC RAC

10g AS 10g AS

- Flashback- RMAN- Grid Control

Advanced ReplicationAdvanced Replication

Uses Updatable-SnapshotsUses Updatable-Snapshots Replicates to another systemReplicates to another system Systems stay in syncSystems stay in sync

Oracle Parallel ServerOracle Parallel Server

Shared disk cluster productShared disk cluster product Loosely CoupledLoosely Coupled Scalable performance Scalable performance No downtime in the event of a No downtime in the event of a

system failuresystem failure Replaced by RAC in 9iReplaced by RAC in 9i

Factors to Consider for Factors to Consider for ClusteringClustering Which do you need most?Which do you need most?

High Availability – Failover Clusters, Synchronous Replication, High Availability – Failover Clusters, Synchronous Replication, Data GuardData Guard

Performance scalability – Active/Active failover clusters, N-to-Performance scalability – Active/Active failover clusters, N-to-N failover clustersN failover clusters

Both – Oracle RACBoth – Oracle RAC Administration complexityAdministration complexity

Failover clusters – relatively lowFailover clusters – relatively low Oracle RAC – relatively highOracle RAC – relatively high

Substantially less complex for 10g RAC than 9i RACSubstantially less complex for 10g RAC than 9i RAC Local or long distance?Local or long distance?

Local – Failover, RACLocal – Failover, RAC Remote – Federated database, Replication, Standby Remote – Federated database, Replication, Standby

database/Data Guarddatabase/Data Guard Oracle license costsOracle license costs

Active/Passive failover clusters – active nodes onlyActive/Passive failover clusters – active nodes only Active/Active failover clusters, RAC – per nodeActive/Active failover clusters, RAC – per node