IA L01 Cluster File System Hands-On Lab - Veritasvox.veritas.com/legacyfs/online/veritasdata/IA...

IA L01 – Cluster File System Hands-On Lab

Description Learn how to configure CFS inside VMware using VMDK files as backend storage. This lab will guide you through the steps to configure a two node cluster, setting up the fencing configuration and creating an active/active file system across the virtual machines.

Using Campus Cluster features you will understand how to create a resilience architecture achieving zero RPO and near zero RTO in virtual environments. Understand I/O shipping policies and how it can enhance your cluster resiliency providing better recovery capabilities.

At the end of this lab, you should be able to

Get familiar with Cluster File System running in a VMware environment

Setup a Clustered File System

Configure Fencing using Coordination Point Servers

Create a Campus Cluster configuration, with storage mirrored across sites

Enable I/O Shipping for enhanced reliability

Utilize Storage Checkpoints across cluster nodes

Configure Preferred Fencing

Notes A brief presentation will introduce this lab session and discuss key concepts.

The lab will be directed and provide you with step-by-step walkthroughs of key features.

Feel free to follow the lab using the instructions on the following pages. You can optionally perform this lab at your own pace.

Be sure to ask your instructor any questions you may have.

Thank you for coming to our lab session.

2 of 29

Lab Environment

The lab environment will simulate a Campus Cluster configuration, where two ESX servers are available, each at a different site. Each ESX has access to both a local Data Store and a remote Data Store from the other site. Therefore, each Virtual Machine (VM) will have access to its local VMDK and a remote VMDK, and will be able to write and read from each.

Using Cluster File System, a mirrored volume will be created to ensure a synchronous copy of the data in each site at any time. Since Cluster File System enables concurrent access and data integrity through fencing, both nodes of the cluster (VMs) will be able to read and write to the same file system (composed by the mirrored volume) at the same time.

Arbitration will be performed via Coordination Point (CP) Servers. Ideally, there will be one CP Serer in each site, plus a third CP server in a third site.

Note your group number as it will be used to name your servers. Use your group number to name your servers, where you will substitute XX by the group number. Ej: 01, 02, 12, etc.

Group XX:

Node 1: vm-cfs-XXn1

Node 2: vm-cfs-XXn2

3 of 29

Cluster Name: vm-cfsXX

Example for Group 01:

Node 1: vm-cfs-01n1

Node 2: vm-cfs-01n2

Cluster Name: vm-cfs01

Cluster Node Credentials:

User root and password Password for each of the cluster nodes

Windows Server Credentials:

Username: Administrator

Password: Passw0rd (Passw –zero- rd)

RDP Gateway Credentials:

Username: tso\ia-lab-X – where X is the number of your lab group 1, 2, etc…

Password: Vi$1On - Capital letter V, lowercase letter i, dollar sign, numeral 1, Capital letter O, lowercase letter n

Coordination Point Servers:

CP Servers provide a lock mechanism to determine which nodes get to fence from other nodes. It runs on a remote system or cluster and provides arbitration functionality by allowing cluster nodes to perform the following tasks:

- Self-register to become a member of the cluster with access to the data drives

- Check which other nodes are registered as members of the cluster

- Self-unregister from the cluster

- Forcefully unregister other nodes (preempt) as members of the cluster

Always an odd number of CP Servers is needed, and 3 is the minimum supported for a production environment to avoid single point failures. For this lab, though, only 1 CP Server will be used.

Take note of its name and IP address

CP Server 2: vm-cps2

Software Location:

In case the software is needed, it will be located under /SW folder. Please note that the packages have been already deployed in order to save some time and being able to use this lab time efficiently.

4 of 29

Connecting to the Demo Environment

First step will be to open a Putty connection with each node of the cluster. In order to do that you need to establish a Windows RDP session to your assigned desktop. From the VMware Workstation guest, click on the assigned RDP icon for your system.

There are 16 available systems, and you may need to work in groups. You can take turns, or designate an individual to perform the steps. Make sure to use the appropriate RDP connection file located on the desktop of your classroom virtual machine environment to connect.

Please refer to the assigned systems you were given at the beginning of the lab. You will access the Windows server over RDP. The VMware Workstation image you are using has the necessary RDP sessions already on the desktop.

Double-click on your assigned RDP connection. You will be presented with a credentials authentication similar to the following:

Enter the RDP Gateway Userid and Password as provided on the previous page of this guide:

RDP Gateway Username: tso\ia-lab-X – where X is the number of your lab group 1, 2, etc…

RDP Gateway Password: Vi$1On - Capital letter V, lowercase letter i, dollar sign, numeral 1, Capital letter O, lowercase letter n

Your screen should appear similar to the following:

5 of 29

You will next be prompted for the desktop credentials. Provide the Windows Server login and password credentials as indicated on page 1 of this guide (Administrator/Passw0rd) The completed credentials should appear similar to the following:

Should you be prompted for any certificate or site acceptance criteria, please accept them. The login process may take a few moments while the session is secured; please be patient. After successful authentication, an RDP desktop should be visibile, and looks similar to the picture below:

6 of 29

From that window, double click on the PuTTY icon an open a session which each of the cluster nodes assigned to your lab group:

vm-cfs-XXn1

vm-cfs-XXn2

Where XX is your group number. Remember the user and password are root/Password

7 of 29

Exercise 1: Configure CFS

Cluster Setup

Cluster File System has already been installed in the cluster as part of the base VM images. Therefore, as each new Virtual Machine is deployed from that base template, it already includes CFS software. The only remaining step is to create and configure the cluster, which is the purpose of this exercise.

Go to the first node of your cluster vm-cfsXn1 and run the installer with the configure option:

/opt/VRTS/install/installsfcfsha601 –configure

Type the names of the two nodes:

vm-cfs-XXn1 vm-cfs-XXn2

The installer will first verify the systems configuration. SSH or RSH is needed between the nodes for install purposes. The installer will allow you to automatically set it. Answer yes

Would you like the installer to setup ssh or rsh communication automatically between the systems?

Superuser passwords for the systems will be asked. [y,n,q] (y) y

Enter the root password for the second node (Password)

Enter the superuser password for system vm-cfs-01n2:Password

Select ssh has the communication method (1)

Select the communication method [1-2,b,q,?] (1) 1

If the systems clock are not synchronized you will have to do it. Select y and enter ntp.symantec.com

Do you want to synchronize system clocks with NTP server(s)? [y,n,q] (y) y

Enter the NTP server names separated by spaces: [b] ntp.symantec.com

8 of 29

You will next be prompted for Fencing configuration. Given that this is a virtual environment, where VMDK files are used for backend storage, Coordination Point (CP) Servers will be used. Answer Y

Do you want to configure I/O Fencing in enabled mode? [y,n,q,?] (y) y

Enter the cluster name (vm-cfsXX, where XX is your group name)

Cluster Name: vm-cfsXX

Configure the heartbeat links using LLT over Ethernet (option 1)

How would you like to configure heartbeat links? [1-3,b,q,?] (1) 1

The installer should auto-detect the networks used. Eth3 will be the private link and eth2 will be the public link that will be used as low-priority. Note that this may differ in your lab environment. Default values suggested by the installer are typically good.

Enter the NIC for the first private heartbeat link on vm-cfs-01n1: [b,q,?] (eth3)

Would you like to configure a second private heartbeat link? [y,n,q,b,?] (n) n

Do you want to configure an additional low-priority heartbeat link? [y,n,q,b,?]

(n) y

Enter the NIC for the low-priority heartbeat link on vm-cfs-01n1: [b,q,?] (eth2)

Are you using the same NICs for private heartbeat links on all systems?

[y,n,q,b,?] (y) y

The installer will check the media speed and connectivity for each interface. If the check does not pass, go back and review your configuration.

Allow the cluster to choose a unique cluster ID and check it is not in use by another cluster

Enter a unique cluster ID number between 0-65535: [b,q,?] (xxxxxx)

9 of 29

And now check that the cluster ID is not currently used in the same network

Would you like to check if the cluster ID is in use by another cluster? [y,n,q] (y) y

Check the summary of the configuration is correct. (remember than X is your group number)

Cluster information verification: Cluster Name: vm-cfsX Cluster ID Number: xxxxxx Private Heartbeat NICs for vm-cfsXn1: link1=eth3 Low-Priority Heartbeat NIC for vm-cfsXn1: link-lowpri1=eth2 Private Heartbeat NICs for vm-cfsXn2: link1=eth3 Low-Priority Heartbeat NIC for vm-cfsXn2: link-lowpri1=eth2 Is this information correct? [y,n,q,?] (y)

The installer will ask for a Virtual IP for the cluster. This is not mandatory and it will not be used in this lab.

Do you want to configure the Virtual IP? [y,n,q,?] (n) n

Next step is whether or not to use secure mode. Secure mode allows utilization of the Operating System user and password instead of the traditional admin user for VCS. This configuration is out of the goals of this lab, so secure mode will not be used.

Would you like to configure the VCS cluster in secure mode? [y,n,q,?] (n) n

Leave the default credentials for user admin and password password

Do you wish to accept the default cluster credentials of 'admin/password'?

[y,n,q] (y) y

Select no for both SMTP and SNMP questions:

Do you want to configure SMTP notification? [y,n,q,?] (n) n

10 of 29

Do you want to configure SNMP notification? [y,n,q,?] (n) n

Answer y to stop all SFCFSHA processes

Do you want to stop SFCFSHA processes now? [y,n,q,?] (y)

The cluster has been configured and the installer will continue with the fencing configuration described in the next section.

Fencing configuration

Now is time to configure fencing. Select option 1 for Coordination Point client based fencing

Fencing configuration 1) Configure Coordination Point client based fencing 2) Configure disk based fencing Select the fencing mechanism to be configured in this Application Cluster: [1-2,q] 1

The storage used in this lab is based on VMDK files, which do not support SCSI3-PR, so choose n for the following question:

Does your storage environment support SCSI3 PR? [y,n,q,b,?] n

In the next question, select y for Non-SCSI3 fencing:

Do you want to configure Non-SCSI3 fencing? [y,n,q,b] (y) y

The minimum number of CP Servers for a supported configuration is 3. For this lab, fencing with only one CP Serer will be configured (vm-cps2):

Enter the total number of coordination points. All coordination points should be

Coordination Point servers: [b] (3) 1

The CP Server has been already configured and will be used by all the clusters in this lab.

vm-cps2

11 of 29

One IP address will be used to communicate with the CP Server. For production environments, the CP Server can be reached through multiple networks if needed.

How many IP addresses would you like to use to communicate to Coordination Point

Server #1? [b,q,?] (1) 1

Enter the fully qualified domain name for the CP Server:

Enter the Virtual IP address or fully qualified host name #1 for the Coordination Point Server #1: [b] 10.60.118.183

The default port should be fine

Enter the port in the range [49152, 65535] which the Coordination Point Server vm-cps2.engba.symantec.com would be listening on or simply accept the default port suggested: [b] (14250)

Review w the information is correct.

CPS based fencing configuration: Coordination points verification Total number of coordination points being used: 1 Coordination Point Server ([VIP or FQHN]:Port): 1. 10.182.100.137 ([10.182.100.137]:14250) Is this information correct? [y,n,q] (y)

Each node has now been registered with the CP Server.

To apply the fencing configuration, VCS needs to be restarted. Anwer yes:

Are you ready to stop VCS and apply fencing configuration on all nodes at this

time? [y,n,q] (y)

It is recommended to configure the Coordination Point Agent on the client (the cluster we are configuring), so the CP Server processes are proactively monitored from the cluster.

The Coordination Point Agent monitors the registrations on the coordination points.

Do you want to configure Coordination Point Agent on the client cluster? [y,n,q] (y)

12 of 29

Enter a non-existing name for the service group for Coordination Point Agent: [b] (vxfen)

Verify Fencing and Cluster Configuration

Verify all the nodes configured with each CP Server

# cpsadm -s vm-cps2 -a list_nodes ClusterName UUID Hostname(Node ID) Registered =========== =================================== ================ =========== vm-cfs01 {bf24835a-1dd1-11b2-85d1-0e94196c94f6} vm-cfs-01n1(0) 1 vm-cfs01 {bf24835a-1dd1-11b2-85d1-0e94196c94f6} vm-cfs-01n2(1) 1

And the nodes registered for your cluster

# cpsadm -s vm-cps2 -a list_membership -c vm-cfs01 List of registered nodes: 0 1

Check your cluster configuration is correct

# hastatus –sum

13 of 29

Exercise 2: Storage Configuration

The goal of this lab will be to have two disks connected to each VM, each one located in a different data store which is located in a different data center . The steps involving a login into the ESX server or vCenter have already been done. They are highlighted here just for clarity, but the student will not have to perform those steps. There is further details and documentation on the Storage Foundation Cluster File System HA on VMware VMDK Deployment Guide which can be downloaded here:

https://www-secure.symantec.com/connect/articles/storage-foundation-cluster-file-system-ha-vmware-vmdk-deployment-guide

These are the steps that need to be followed in order to add VMDKs in shared mode to more than one Virtual Machine, indicated by the check marks.

This step has already been completed

Step to be completed by the student



14 of 29

Enable Disk UUID on VMs

NOTE: This step has already been completed by the instructor, because access to vCenter is needed

The first step that needs to be taken is to set the disk.EnableUUID parameter for each VM to “TRUE”. This step is necessary so that the VMDK always presents a consistent UUID to the VM, thus allowing the disk to be mounted properly. For each of the nodes (VMs) that will be participating in the cluster, follow these steps:

From vSphere client:

Power off the guest

Right click on guest and select "Edit Settings..."

Click on "Options" tab on top

Click on "General" under the "Advanced" section

Click on the "Configuration Parameters.." on right hand side

Check to see if the parameter "disk.EnableUUID" is set, if it is there then make sure it is set to "TRUE"

If not there then click "Add Row" and add it.

Power on the guest

VMDK creation and mapping to VMs

NOTE: This step has already been configured by the lab instructor because access to vCenter is needed

15 of 29

Two VMDKs have been created and provisioned to the two nodes. Each one is located in a different data store which is located in a different site.

Data Store Virtual Disk on ESX

VMDK Name Virtual Device

Virtual SCSI Driver

VMDK Size (GB)

DS1 Hard disk 2 cfsX/shared1.vmdk SCSI 1:0 Paravirtual 25

DS2 Hard disk 3 cfsX/shared2.vmdk SCSI 2:0 Paravirtual 25

Using vCenter, those VMDKs are attached to each VM in the cluster using the parameters specified in the previous table.

Enable the multi-write flag

This has already been configured.

In order to share a VMDK file between two or more VMs, the multi-writer flag needs to be enabled. The procedure is explained in the following VMware article:

http://kb.vmware.com/kb/1034165

These are the steps to enable it for the two VMDKs we have created:

1. On the vSphere Client, right-click on the VM virtual machine. Go to “Edit

Settings”, then click on the “Options” tab. Click on “General” under the

“Advanced” option. Press the “Configuration Parameters…” button.

2. Click on the “Add Row” button.

3. Enter scsi1:0.sharing on the Name column

4. Enter multi-writer on the Value column

5. Repeat steps 2 through 4 and enter the multi-writer value for the rest of the SCSI

controllers and targets. In our case:

scsi1:0.sharing multi-writer

scsi2:0.sharing multi-writer

6. Press OK

Install ASL (Array Support Library) for VMDKs

In order for the cluster file system to work properly with the VMDK files, an ASL must be installed in the virtual server. The ASL package (VRTSaslapm) version that contains the VMDK ASL is 6.0.100.100. This version is available for downloaded from: http://sort.symantec.com The direct link to this package is: https://sort.symantec.com/asl/details/609

http://kb.vmware.com/kb/1034165

http://sort.symantec.com/

https://sort.symantec.com/asl/details/609

16 of 29

# cd /SW/ASL/RHEL6 # rpm -Uvh VRTSaslapm-6.0.100.100-GA_RHEL6.x86_64.rpm Preparing... ########################################### [100%] 1:VRTSaslapm ########################################### [100%] Installing keys for APMs # vxdctl enable # vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:none - - online invalid vmdk0_1 auto:none - - online invalid vmdk0_2 auto:none - - online invalid [root@vm-cfs-01n1 RHEL6]#

Follow the same steps in the second node

Exclude boot disk from Volume Manager configuration

It is a best practice to exclude the boot disk from Volume Manager. This will allow the shared VMDK files to be configured to use the same name. In order to exclude the disk, run the command vxdmpadm with the name of the boot disk.

Fist check that sda is the disk used for boot:

# df -k | grep boot

/dev/sda1 495844 32671 437573 7% /boot

Now verify with vxdisk command which is the Volume Manager name pointing to disk sda:

[root@vm-cfs-01n1 RHEL6]# vxdisk list vmdk0_0 | grep state sdb state=enabled [root@vm-cfs-01n1 RHEL6]# vxdisk list vmdk0_1 | grep state sda state=enabled [root@vm-cfs-01n1 RHEL6]# vxdisk list vmdk0_2 | grep state sdc state=enabled [root@vm-cfs-01n1 RHEL6]#

Note that depending on your system environment, the name of the disk in your lab may be different.

In our case it is vmdk0_1, so let’s exclude that disk:

17 of 29

[root@cfs01 RHEL6]# vxdmpadm exclude dmpnodename=vmdk0_1

And verify that the boot disk is no longer reported under VxVM configuration:

[root@vm-cfs-01n1 RHEL6]# vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:none - - online invalid vmdk0_2 auto:none - - online invalid

Now in order to have a cleaner configuration, let’s order the disk again:

[root@vm-cfs-01n1 RHEL6]# vxddladm assign names [root@vm-cfs-01n1 RHEL6]# vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:none - - online invalid vmdk0_1 auto:none - - online invalid [root@vm-cfs-01n1 RHEL6]#

Repeat the same steps in the second node , ensuring that the boot disk is excluded from DMP.

18 of 29

Exercise 3: Campus Cluster Configuration

Enabling site consistency

To create a Campus Cluster configuration, we are first going to tag each VM and each disk to the site they belong. In a VMware environment, this is assuming that the VM will be able to run vMotion between ESXs located in the same datacenter, but not to ESXs in the other data center, which would break site redundancy.

First let’s tag each of the VMs:

Commands on the first node (vm-cfs-Xn1):

# vxdctl set site=site1

Verify the settings:

# vxdctl list | grep site

Commands on the second node (vm-cfs-Xn2):

# vxdctl set site=site2

Verify the settings:

# vxdctl list | grep site

Initialize and configure the new diskgroup with the two disks, enabling site consistency:

From the first node execute the following commands:

Switch the CVM master to this node

# vxclustadm setmaster vm-cfsXn1

Initialize the disk

[root@vm-cfs-01n1 RHEL6]# vxdisksetup -i vmdk0_0

[root@vm-cfs-01n1 RHEL6]# vxdisksetup -i vmdk0_1

Create a new DiskGroup

# vxdg -s init datadg vmdk0_0=vmdk0_0

# vxdg -g datadg adddisk vmdk0_1=vmdk0_1

Tag each of the disks

19 of 29

# vxdisk settag site=site1 vmdk0_0

# vxdisk settag site=site2 vmdk0_1

Add the disk group to the sites it may belong:

# vxdg -g datadg addsite site1

# vxdg -g datadg addsite site2

Enable Site Consistency

# vxdg -g datadg set siteconsistent=on

Verity vxprint output:

[root@vm-cfs-01n1 RHEL6]# vxprint Disk group: datadg TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg datadg datadg - - - - - - SR site1 - ACTIVE - - - - - SR site2 - ACTIVE - - - - - dm vmdk0_0 vmdk0_0 - 33381536 - - - - dm vmdk0_1 vmdk0_1 - 33381536 - - - -

Volume and File System creating

Create a volume in that Disk Group

[root@vm-cfs-01n1 RHEL6]# vxassist -g datadg make vol1 maxsize

VxVM vxassist INFO V-5-1-16204 Site consistent volumes use new dco log types (dcoversion=30). Creating it by default.

Create the file system:

# mkfs -t vxfs /dev/vx/rdsk/datadg/vol1

Add the mount point to the cluster configuration

20 of 29

# cfsmntadm add datadg vol1 /data1 all=crw

Mount it

[root@vm-cfs-01n1 RHEL6]# cfsmount /data1 Mounting... [/dev/vx/dsk/datadg/vol1] mounted successfully at /data1 on vm-cfs-01n1 [/dev/vx/dsk/datadg/vol1] mounted successfully at /data1 on vm-cfs-01n2

Simple Cluster File System testing

Those of you who are not familiar with the cluster file system behavior can follow the following steps to verify the behavior of a file system that is currently mounted in all the nodes at the same time

Commands on vm-cfs-XXn1 Commands on vm-cfs-XXn2

Verify file system mounted

# df -kh | grep data1

16G 40M 15G 1% /data1

Verify file system mounted

# df -kh | grep data1

16G 40M 15G 1% /data1

# echo "hello cfs" >> /data1/file1

# cat /data1/file1

hello cfs

# echo "hi there" >> /data1/file1

# cat /data1/file1

hello cfs

hi there

21 of 29

Optional Exercise 4: I/O Shipping & Detach Policies

Introduction

The following figure outlines the storage configuration that has been created in this lab. There’s a file system which is accessible from two sites at the same time. There’s also a volume that is available in the two sites, ensuring a synchronous copy of the data resides within both arrays at the same time.

Custer Volume Manager (CVM) uses a shared storage model. A shared disk group provides concurrent read and write access to the volumes that it contains for all nodes in a cluster.

Cluster resiliency means that the cluster functions with minimal disruptions if one or more nodes lose connectivity to the shared storage. When CVM detects a loss of storage connectivity for an online disk group, CVM performs appropriate error handling for the situation. For example, CVM may redirect I/O over the network, detach a plex, or disable a volume for all disks, depending on the situation.

The different marks on the diagram shows the recovery scenarios (CVM) will be able to recover:

1. Loss of site connectivity

22 of 29

2. Host (ESX or VM) failure

3. Storage Failure

4. Complete Site Failure

5. Loss of connectivity to storage at one site from hosts on all sites

6. Loss of connectivity to storage at all sites from the host at a site

The behavior of CVM can be customized to ensure the appropriate handling for your environment.

In previous CFS releases, you could configure a disk group failure policy for a shared disk group. This requirement was removed in 6.0.1 as now CVM will maintain connectivity to the shared disk group as long as at least one node in the cluster has access to the configuration copies. This is known as I/O Shipping. The master node performs changes to the configuration and propagates the changes to all the nodes. If the master node loses access to the copies of the configuration, it sends the writes to the slave node that has access. I/O Shipping in CFS improves resiliency and reduces previous complexities.

Storage connectivity failures can be grouped into two types: Failures that affect only a sub-set of nodes, and failures who affects all the nodes in the cluster. In order to determine what action needs to be taken, the disk detach policy needs to be configured.

There are two disk detach policies, and they can be classified depending on the goal you have for your configuration

Maintain computational process

This is the goal of the global detach policy. For any I/O error, the plex will be detached globally from all the nodes to maintain data consistency. All the nodes will be kept alive, but one copy of the data (the failed plex) will not be available. All nodes are able to continue to perform I/O, but the redundancy of the mirrors is reduced. With this policy, a local fault on one node has a global impact on all nodes in the cluster.

Maintain configuration copies

This will be the goal of the local detach policy. For any local I/O error, the node suffering the storage connectivity problem will lost all the storage connectivity to all the plexes, in order to avoid partial writes. The plex is not detached for the whole cluster. This policy ensures that all of the plexes are available for I/O on the other nodes. Only the node or nodes that had the failure are affected.

To minimize failures and improve resiliency, 6.0.1 release incorporate I/O shipping, which allows local writes to be transferred to other nodes of the cluster, and therefore avoiding any error when a local failure happens.

I/O shipping improves the behavior of local detach policy, because for any local partial or total failure, the IOs will be shipped to another node in the cluster, maintaining also all the plexes copies.

23 of 29

Local detach policy and I/O shipping configuration

In this lab we will configure a local detach policy and enable I/O shipping. To test these, a disk will be disconnected from one of the nodes and we will see how operation continues. Later, the same disk will be disconnected for the other node, losing all connectivity to that plex and operations will still continue with the remaining copy.

I/O shipping is disabled by default, and it needs to be enabled. This parameter is set per diskgroup

# vxdg -g datadg set ioship=on

And now set the disk detach policy to local (global is the default)

# vxdg -g datadg set diskdetpolicy=local

Verify the changes

# vxdg list datadg | egrep "ioship|detach"

detach-policy: local

ioship: on

Testing the changes

We are going to produce a local I/O error to verify that CFS can continue writing to the disk through the other node. Therefore we are going to trigger failure scenario 5 described before in two phases.

Connect to the second node in the cluster and in one terminal console produce some writes. You can simply execute the script /TEST/genload:

# /TEST/genload

This is the content of that script:

while true do dd if=/dev/zero of=/data1/file2 bs=64k count=150000 rm /data1/file2 done

24 of 29

On each node, open a terminal window and run the iostat command to measure the IO. Note that the second node is the one writing to the two disks in parallel.

# iostatn –xn 5

Now you will need help from the instructor in order to connect to vCenter and remove one of the disks attached to the second cluster node.

Call your lab instructor and ask him to disconnect the disk for you.

In our case, it will be Hard disk 3, which is connected to vm-cfs-XXn1_1.vmdk

Observe now how iostat output shows some activity in the first node and your script continue to work in the second one.

The output of vxdisk list on the second node, prints the disk as local failed (lfailed)

[root@vm-cfs-01n2 data1]# vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:cdsdisk vmdk0_0 datadg online shared vmdk0_1 auto:cdsdisk vmdk0_1 datadg online shared lfailed

Now let’s disconnect the same disk on the first node, simulating a total array failure in the second site.

Again ask your lab instructor to do so.

25 of 29

The script still continue writing into the storage.

But now you can check how the disk is in error and disabled:

[root@vm-cfs-01n2 data1]# vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:cdsdisk vmdk0_0 datadg online shared vmdk0_1 auto - - error shared - - vmdk0_1 datadg detached was:vmdk0_1 [root@vm-cfs-01n2 data1]#

The cluster still have access to the first plex and that is the reason it continue writing with no issues. The service has not been affected at all.

To recover the original situation, ask the instructor to reconnect the storage back to the VMs

.

After a while, the configuration will be automatically recovered with no impact to the users. The DCO will be used to only sync the writes done while the mirror was disconnected.

[root@vm-cfs-01n1 data1]# vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:cdsdisk vmdk0_0 datadg online shared vmdk0_1 auto:cdsdisk vmdk0_1 datadg online shared [root@vm-cfs-01n2 data1]# vxdisk list DEVICE TYPE DISK GROUP STATUS vmdk0_0 auto:cdsdisk vmdk0_0 datadg online shared vmdk0_1 auto:cdsdisk vmdk0_1 datadg online shared

26 of 29

Optional Exercise 5: Storage Checkpoints

A storage checkpoint within Storage Foundation is a file system snapshot that provides a consistent view of the file system. This exercise will demonstrate the ease of creating storage checkpoints in a CFS environment..

A checkpoint will be created for /data1 file system, and the snapshot will be mounted in the second node of the cluster for read only.

First the checkpoint needs to be created for /data1 folder. Connect to the second node and take a couple of checkpoints:

# fsckptadm create chkpt1_data1 /data1

# fsckptadm create chkpt2_data1 /data1

List them

# fsckptadm list /data1

Using the command cfsmntadm, the checkpoint can be included under the cluster configuration for the second node. Use the first checkpoint taken and use the mount point /backup:

# cfsmntadm add ckpt chkpt1_data1 /data1 /backup all=ro vm-cfs-01n2 Mount Point is being added... /backup added to the cluster-configuration

Let’s mount the storage checkpoint at the second node:

# cfsmount /backup vm-cfs-01n2

You can run df command on each node to verify how the /backup folder is only available in the second node.

Now unmount the checkpoint and remove it from the configuration.

# cfsumount /backup

Unmounting...

/backup got successfully unmounted from seintcfsg01n2

# cfsmntadm delete /backup

Mount Point is being removed...

27 of 29

/backup deleted successfully from cluster-configuration

# fsckptadm remove chkpt1_data1 /data1

And verify only one checkpoint remains available

# fsckptadm list /data1

28 of 29

Optional Exercise 6: Preferred Fencing

Configuration

A campus cluster environment may have less reliability in their network interconnections, and therefore it is important to have a way to define either: The most critical server, or which applications should not suffer any interruption if the network communication between sites is lost.

This is called Peferred Fencing with CFS. Preferred fencing can be enabled by server or by group.

In this exercise, we will revisit our disk failure scenario in the previous exercise, and we are going to prioritize the first node to continue providing the service. In order to prioritize the node, the attribute FencingWeight will be used. A higher value means larger priority to keep the service.

To configure preferred fencing, run the following commands:

# haconf -makerw # haclus -modify PreferredFencingPolicy System # hasys -modify vm-cfs-01n1 FencingWeight 50 # hasys -modify vm-cfs-01n2 FencingWeight 10 # haconf -dump -makero

Testing

We will simulate a disk path failure via software by disabling the LLT (private interconnection) communication.

Given that none of the servers will be able to reach the other side, they cannot determine whether the network is faulted or the server is down. In order to recover a possible service failure, both servers will reach the CP Server and one of them will continue providing the service, while the other will run a kernel panic in order to avoid any data corruption.

The preferred fencing configuration previously set will favored one of the servers (first node in our case)

First verify LLT is running:

# lltstat -nvv | head LLT node information: Node State Link Status Address * 0 vm-cfs-01n1 OPEN eth3 UP 00:50:56:AA:13:0D eth2 UP 00:50:56:AA:4E:D9 1 vm-cfs-01n2 OPEN eth3 UP 00:50:56:AA:4A:21 eth2 UP 00:50:56:AA:4F:4E 2 CONNWAIT eth3 DOWN

29 of 29

If you want to take a look to the events happening in each node, open a terminal console in each node and leave this command running:

# tail -f /var/VRTSvcs/log/vxfen/vxfend_A.log

In our case, eth3 and eth2 are the interfaces where we are going to simulate an error. Type the following command:

# lltconfig -t eth2 -L 0; lltconfig -t eth3 -L 0

In the node retaining the service, you can observer the following entry in the log:

checking membership: 0 and my_id: 0

Successfully preempted node 1 on cp [10.182.100.137]:14250

I won the race

To recover the service in the second node, from the firstnode enable LLT again:

# lltconfig -t eth2 -L 3

# lltconfig -t eth3 -L 3

And reboot the failed node

Date post:	10-Jul-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

IA L01 Cluster File System Hands-On Lab - Veritasvox.veritas.com/legacyfs/online/veritasdata/IA...

Documents