Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | martin-powell |
View: | 225 times |
Download: | 0 times |
Continuously Available File Server: Under the HoodClaus JoergensenPrincipal Program ManagerMicrosoft Corporation
WSV410
Agenda
Remote File Storage for Server ApplicationsScale-Out File Server for application data
Setup and configurationClustered Shared VolumesScale-Out File Server cluster groupScale-Out File Server scalabilitySMB Transparent Failover
This session assumes familiarity with:Windows Server 2008 R2 Failover Clustering, including Cluster Shared VolumesWindows Server 2008 R2 File Server
Remote File Storage for Server Applications
New scenario in Windows Server 2012
Server apps storing data files on file sharesExamples:
Hyper-V VHD, configuration files, snapshots etc.SQL Server database and log filesIIS content and configuration files
Benefits:Easy provisioning and management
Share management instead of LUNs and zoning
FlexibilityDynamically relocate server in datacenter without needing to reconfigure network or storage access
Leverage network investmentsSpecialized storage networking infrastructure or knowledge is not required
Lower CapEx and OpEx
Example:
File Server
File Server
SharedStorage
Hyper-V Server
App Serve
r
Web Serve
r
DB Serve
r
SQL Server
IIS
Clustered File Server
Scale-Out File Server for Application Data
New clustered file serverTargeted for server app storageKey capabilities*:
Dynamic scaling w. active-active file sharesFault tolerance with zero downtimeFast failure recoveryClustered Shared Volume cacheCHKDSK with zero downtimeApplication consistent snapshotsSupport for RDMA enabled networksSimpler management
RequirementsWindows Failover Cluster with Clustered Shared VolumesBoth application server and file server cluster must be running Windows Server 2012
Application Servers
Single File System Namespace
Cluster Shared Volumes
Single Logical File Server (\\fs\share)
Data Center Network
(Ethernet, InfiniBand or combination)
*) Capabilities highlighted in orange are unique to Scale-Out File Server
title
Setup and Configuration
Setup and Configuration
Install the necessary role on all nodesFile Server roleFailover Clustering feature
Create clusterNo special requirements
Add cluster disks to Cluster Shared VolumesConfigure networks for:
Client Access Points (CAP)Clustered Shared Volumes (CSV)
Create File Server RoleSelect “Scale-Out File Server for application data”Give it a network name
Create file shares
Windows PowerShell Example
#Install Roles and Features Import-Module ServerManager Add-WindowsFeature -name File-Services, Failover-Clustering, RSAT-Clustering
#Create Failover Cluster New-Cluster –Name smbclu –Node FSF-260403-07, FSF-260403-08, FSF-260403-09
#Add Cluster Disk 1 to Cluster Shared Volumes Add-ClusterSharedVolume -Name “Cluster Disk 1”
#Configure Cluster Network 1 for Client Access and Cluster Network 2 for CSV (may not be needed) $(Get-ClusterNetwork -Name "Cluster Network 1").Role=3 $(Get-ClusterNetwork -Name "Cluster Network 2").Role=1
#Create Scale-Out File Server Add-ClusterScaleOutFileServerRole -Name smbsofs
#Create File Share New-SmbShare -Name vm1 -Path c:\clusterstorage\volume1\vm1 –FullAccess domain\hvhost$
title
Cluster Shared Volumes
Cluster Shared Volumes File System
Fundamental to and required for Scale-Out File ServersScale-Out file shares require CSVFS pathsSupports VSS for SMB file shares
CSVFS supports most NTFS features and operationsDetailed information available with Windows Server 2012 Release Candidate here
Direct I/O support for file data accessCaching of CSVFS file data (controlled by oplocks)
Redirects I/O for metadata operations to coordinator nodeRedirects I/O for data operations when a file is being accessed simultaneously by multiple CSVFS instancesLeverages SMB Direct and SMB Multichannel for internode communication
Cluster Shared Volumes CachingImprove CSV I/O Performance
Windows Cache Manager integrationBuffered read/write I/O is cached the same way as NTFS
Clustered Shared Volumes Block CacheRead-Only cache for un-buffered I/O
I/O which is excluded from Windows Cache ManagerDistributed cache guaranteed to be consistent across the clusterSignificant value for Pooled VM VDI scenariosEnabling CSV Block Cache:
SharedVolumeBlockCacheSizeInMB – Cluster common property0 = DisabledNon-zero = the amount of RAM in MB to be used for cache on each cluster node Recycling of resource is not needed
CsvEnableBlockCache - Physical Disk resource private property 0 = Disabled (default)1 = Enabled for that clustered shared volumeRequires recycling the resource to take effect
CHKDSK with Clustered Shared Volumes
CHKDSK is seamless with CSVCHKDSK is significantly improved with scanning (online) separated from repair (offline)With CSV repair is online as well
CHKDSK processing with CSV1. Cluster checks (once a minute) to see if CHKDSK (spotfix) is required2. Cluster enumerates NTFS $corrupt to identify affected files3. Cluster pauses the affected CSV file system (CSVFS) to pend I/O4. The underlying NTFS volume is dismounted5. CHKDSK (spotfix) is run against affected files for a maximum of 15 seconds to
ensure application are not affected6. The underlying NTFS volume is mounted and CSV namespace is un-paused
If CHKDSK (spotfix) did not process all recordsCluster will wait 3 minutes before continuingEnables a large set of affected files to be processed over time
If corruption is too largeCHKDSK (spotfix) is not run and marked to run at next Physical Disk online
title
Anatomy of a Scale-Out File Server
Scale Out File Server group
ContainsDistributed Server NameScale-Out File Server
Group Type:ScaleoutFileServer
Resource Types:Scale Out File ServerDistributed Network Name
Get-ClusterGroup | ? {$_.GroupType -eq "ScaleoutFileServer"} | FL Name, OwnerNode, State, GroupType
Name : smbsofs33OwnerNode : FSF-260403-07State : OnlineGroupType : ScaleoutFileServerGet-ClusterGroup | ? {$_.GroupType -eq "ScaleoutFileServer"} | Get-ClusterResource
Name State OwnerGroup ResourceType---- ----- ---------- ------------Scale-Out File Server Online smbsofs33 Scale Out File Serversmbsofs33 Online smbsofs33 Distributed Network Name
Distributed Network Name (DNN)
Client Access Point (CAP) for a Scale-Out File Server DNS Name on the networkSecurity
Creates and manages computer object in ADRegisters credentials with LSA on each node
DNSRegisters the CAP with DNS
Registers node IP address for all nodes Does not use virtual IP addresses
DNN updates DNS whenDNN resource comes online and every 24 hoursA node is added or removed to/from clusterA cluster network is added or removed as a client networkIP address changes
If not using dynamic DNS, you must manually add the DNS records with the node IPs for the cluster networks enabled for client access for each node
> smbsofs33Server: stb-red-dc-01.stbtest.microsoft.comAddress: 10.200.81.201
Non-authoritative answer:Name: smbsofs33.ntdev.corp.microsoft.comAddresses: 2001:4898:0:fff:0:5efe:10.217.108.49 2001:4898:0:fff:0:5efe:10.217.108.103 2001:4898:0:fff:0:5efe:10.217.108.148 10.217.108.148 10.217.108.49 10.217.108.103
IPs on same subnet. One
for each node.
Distributed Network Name (DNN)
DNS will round robin client DNS lookupsDNS sort IPv6 and IPv4 separatelyConcatenates with IPv6 at top
SMB client is resilient to unavailable IPs1. Attempts to connect to first IP address2. After 1 second, client attempts the next 7 IP
addresses3. If any of the previous attempts fail, client
attempts next IP address4. Client will continue until it reaches end of list5. Client will proceed with the first server to
respondSMB client
Connects to one and only one cluster node for a given scale-out file serverCan connect to different cluster nodes for each scale-out file server
> smbsofs33Server: stb-red-dc-01.stbtest.microsoft.comAddress: 10.200.81.201
Non-authoritative answer:Name: smbsofs33.ntdev.corp.microsoft.comAddresses: 2001:4898:0:fff:0:5efe:10.217.108.49 2001:4898:0:fff:0:5efe:10.217.108.103 2001:4898:0:fff:0:5efe:10.217.108.148 10.217.108.148 10.217.108.49 10.217.108.103
> smbsofs33Server: stb-red-dc-01.stbtest.microsoft.comAddress: 10.200.81.201
Non-authoritative answer:Name: smbsofs33.ntdev.corp.microsoft.comAddresses: 2001:4898:0:fff:0:5efe:10.217.108.103 2001:4898:0:fff:0:5efe:10.217.108.148 2001:4898:0:fff:0:5efe:10.217.108.49 10.217.108.49 10.217.108.148 10.217.108.103
Scale Out File Server (SOFS)
Scale Out File Server resource is responsible forOnline scale-out file shares on each nodeListen to scale-out share creations, deletions and changesReplicate changes to other nodesEnsure consistency across all nodes for the Scale-Out File Server
Implemented using cluster clone resourcesAll nodes run a SOFS cloneThe clones are started and stopped by the SOFS leaderThe SOFS leader runs on the node where the Scale Out File Server resource is online
Scale-Out File Server group behavior
The group is online on one of the nodesMoving the group
Moves the responsibility for coordinationDoes not affect the availability of the name or shares
Admin can constrain which cluster nodes can be used
Modify “possible owners” list for DNN and SOFS resourceUseful if some nodes must be reserved for other workloads
Client Redirection
SMB Clients are distributed at initial connect through DNS Round RobinSMB Clients are not redistributed automaticallySMB Clients connected to a Scale-Out File Server can be redirected to use a different cluster node
Scale-Out File Server Cluster
Node A Node B
1
SQL Server
W WNode C
W
Witness communication
SMB communication
1
Get-SmbWitnessClient | FL ClientName, FileServerNodeName, WitnessNodeNameClientName : SQLServerFileServerNodeName : AWitnessNodeName : B
Move-SmbWitnessClient –ClientName SQLServer –DestinationNode C
3
Cluster Network Planning
SMB Client to SMB ServerUse cluster networks enabled for client accessIf using multiple network adapters, each must be on separate IP subnets
CSV trafficMetadata updates
Infrequent for Hyper-V and SQL Server workload
Mirrored Storage Spaces No storage connectivityPrefers cluster networks not enabled for client accessLeverages SMB Multichannel and SMB Direct (SMB over RDMA)
Disable iSCSI networks for cluster use, to prevent unpredictable latencies
Storage IO (FC, iSCSI, SAS)
MetadataMirrored Spaces
Storage Link Failures
SMB ClientTo
SMB Server
title
Scale-Out File Server Scalability and Performance
Test Bed Topology
SMB clients8 computers, each with 2x10Gbps
Scale-Out File Server cluster
8 nodes, each with 2x10Gbps
SAN Storage2x8Gbps FC Fabric to File Server4x4Gbps FC Fabric to StorageRAID 5 LUNS
2x10Gbps
Bandwidth Scalability
IOMeterParameters
512KiB IO size100% Sequential Read1 thread 144 outstanding IOs
Local Remote
Overall throughput (MiBps)
6,100 6,000
Delta from local ~2%
Local Remote~2%
Preliminary results based on Windows Server 2012 Beta
1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
6000
7000
Run 1Run 2Run 3Run 4Run 5Run 6Run 7Run 8Run 9Run 10
# of Nodes
Overa
ll T
hro
ughput
(MiB
ps)
1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
6000
7000
Run 1Run 2Run 3
# of Nodes and Clients
Overa
ll T
hro
ughput
(MiB
ps)
Bottlenecked on 2x4Gbps FC
Hyper-V boot-storm
Local vs. Remote
Uses parent/diff VHDX8GB CSV block cacheFrom VM state change to user logon complete320 virtual machines / host2,560 virtual machines (8 host)
CSV Cache Enabled vs. Disabled
Uses parent/diff VHDX8GB CSV block cacheFrom VM state change to user logon complete320 virtual machines / host5,120 virtual machines (16 host)
Individual VM Boot Time
(in seconds)
Local RemoteMinimum 18 19Maximum 34 36Average 23 25
Individual VM Boot Time
(in seconds)Enabled Disabled
Minimum 19 18Maximum 61 1141Average 29 211
With CSV cache enabled, 90% booted in <40s
Preliminary results based on Windows Server 2012 RC
title
SMB Transparent Failover
Historical - Windows Server 2008 R2Failovers are not transparent
Targeted for traditional file server use scenariosServer applications expect storage to be continuously availableIn Windows Server 2008 R2
Connection and file handles are lost on share failover, leading toApplication disruptionAdministrator intervention required to recover File Server Cluster
Node A Node B
\\fs1\share
1
2\\fs1\share
3
SQL Server
Failover share and connections and handles lost
2
Normal operation1
Administrator intervention needed to recover
3
File Server Cluster
Windows Server 2012SMB Transparent Failover
Failover transparent to server application
Zero downtimeSmall IO delay during failover
Supports planned and unplanned failovers
Hardware/software maintenanceHardware/software failuresLoad balancing / Client Redirection
Resilient for both file and directory operationsInteroperable with both types of clustered file servers:
Scale-Out File Server“Classic” File Server
Requires:Windows Server 2012 Failover ClusterSMB Client with SMB 3.0File shares configured with Continuously Availability property (default)
Failure occurs - connections and handles lost,temporary stall of IO
2
Normal operation1
Connections and handles auto-recoveredApplication IO continues with no errors3
1 3
Node A Node B
\\fs1\share
\\fs1\share2
SQL Server
SMB Transparent FailoverNew components (1/2)
SMB Client (Redirector)Client operation replayEnd-to-end support for replayable and non-replayable operations
SMB ServerSupport for network state persistenceFiles are always opened Write-Through
SMB Server
SMB Client SMB Server
User
Kernel
User
Kernel
WitnessService
WitnessClient
Witness Protocol
SMB Redirector
File System
Resume Key Filter
SMB Server
SMB 3.0Operation replayState
persistence
User
Kernel
SMB Transparent FailoverNew components (2/2)
Resume Key FilterResume handle state after planned or unplanned failoverFence handle state information
Witness ProtocolEnables faster unplanned failover because clients do not wait for timeoutsEnables dynamic reallocation of load with Scale-Out File Servers
SMB Server
SMB Client SMB Server
User
Kernel
User
Kernel
WitnessService
WitnessClient
Witness Protocol
SMB Redirector
File System
Resume Key Filter
SMB Server
SMB 3.0Operation replayState
persistence
User
Kernel
Resume Key FilterOverview
Resume handle state after planned or unplanned failoverPersist state information only for handles with continuous availability context
Installs with Failover Clustering featureSits on file server file system stackAttaches to all cluster disks
Resume Key FilterFeatures (1/3)
Protection of handle state so the client can reconnect
For example, needed when failure occurs when the client has an exclusive no-share handleBlock new handle creation until the previously known handles are resumed or cancelled (timed out)
Protection from namespace inconsistencyNeeded when failure occurs as a file rename is in flight
Resume Key FilterFeatures (2/3)
Enable Create ReplayNeeded when failover occurs as a FILE_CREATE is in-flight
RKF records the pre-existence state for the file BEFORE the create is passed down to NTFSAfter failover, the client re-issues the create as a ReplayOn receipt of the Replay, RKF figures out the correct processing for FILE_CREATE so that the client sees the correct result
Now exists: FILE_CREATE => FILE_OPEN and the return result is FILE_CREATED
Resume Key FilterFeatures (3/3)
Restoration of Delete Pending stateNeeded when a file has multiple handles open and has been marked for deletion when failover occursRKF holds Delete Pending state above NTFS so that existing handles can be resumed after failover
Handling for the change of the Read Only attributeNeeded when the read only attribute is changed with pre-existing writersRKF undoes the RO attribute to allow the restoration of the prior granted access
Opaque storage for remote file system specific dataE.g. SRV stores information needed to resume Byte Range Locks
Resume Key FilterVolume instance attach
Volume Protection• Database is being
loaded from store• All creates are held
until complete (<3s)
Namespace protection• Local handles are
being established• All rename and
create operations are blocked until complete (<60s)
Create Protection• Remote handles are
being resumed• All new creates are
blocked until all handles are resumed or cancelled (<60s)
Handles Cancelled• Unclaimed handles
are cancelled to release file create blackout
SMB WitnessOverview
Enables faster recovery from unplanned failuresSMB clients do not need to wait for TCP timeouts
Enables dynamic reallocation of load with Scale-Out File Servers
Administrator can redirect SMB client to a different cluster node
Installs with Failover Clustering featureIs a Service and runs on all cluster nodes
Not to be confused with Failover Cluster File Share Witness
SMB WitnessRegistration process1. SMB client connects to
\\fs1\share on Node A and notifies the Witness client
2. Witness client obtains list of cluster members from witness service on node A
3. Witness client removes the data node (Node A) and selects a witness server (Node B)
4. Witness client registers with Node B for notification on events for \\fs1
5. Witness server on Node B registers with cluster infrastructure for event notification on \\fs1
File Server Cluster
Node A Node B
\\fs1\share
1
\\fs1\share
SQL Server
Witness Witness
Witness communication
SMB communication
2 4
File Server Cluster
1
SMB WitnessNotification process1. Normal operation
SMB connection with Node AWitness connection with Node B
2. Unplanned failure on Node A3. Cluster infrastructure notifies
Witness server on Node B4. Witness server on Node B
notifies Witness client that Node A went offline
5. Witness client notifies SMB client6. SMB client drops its connection
to Node A and starts reconnecting with another cluster node (Node B)
7. Witness client attempts to select new Witness server
Node A Node B
\\fs1\share
1
\\fs1\share
SQL Server
Witness Witness
Witness communication
SMB communication
4 6
Enhanced and New Event Logs
Application and Services – Microsoft – Windows – SMBClientApplication and Services – Microsoft – Windows – SmbServerApplication and Services – Microsoft – Windows – ResumeKeyFilterApplication and Services – Microsoft – Windows – SMBWitnessClientApplication and Services – Microsoft – Windows – SMBWitnessService
Example: SMB Transparent Failover
demo
Claus JoergensenPrincipal Program ManagerWindows File Server Team
Scale-Out File Server
The TechEd Cluster in a Box Demo StackCluster in a Box prototypes
QuantaWistron
LSI HA-DAS MegaRAID® and SAS controllersQuanta application servers, JBOD expansion, and 10GbE switchMellanox IB FDR NICs and switch OCZ SAS SSDsInfrastructure
Domain Controller serverPower distribution unit1GbE switchKeyboard & monitor
MegaRAID® is a registered trademark of LSI Corporation
Remote File Storage for Server Applications
New scenario in Windows Server 2012
Server apps storing data files on file sharesExamples:
Hyper-V VHD, configuration files, snapshots etc.SQL Server database and log filesIIS content and configuration files
Benefits:Easy provisioning and management
Share management instead of LUNs and zoning
FlexibilityDynamically relocate server in datacenter without needing to reconfigure network or storage access
Leverage network investmentsSpecialized storage networking infrastructure or knowledge is not required
Lower CapEx and OpEx
Example:
File Server
File Server
SharedStorage
Hyper-V Server
App Serve
r
Web Serve
r
DB Serve
r
SQL Server
IIS
Related ContentBreakout SessionsVIR306 Hyper-V over SMB2: Remote File Storage Support in Windows Server 2012 Hyper-VWSV303 Windows Server 2012 High-Performance, Highly-Available Storage Using SMBWSV310 Windows Server 2012: Cluster-in-a-Box, RDMA, and MoreWSV314 Windows Server 2012 NIC Teaming and Multichannel SolutionsWSV322 Update Management in Windows Server 2012: Revealing Cluster- Aware Updating
WSV330 How to increase SQL availability and performance using Window Server 2012 SMB 3.0 solutionsWSV334 Windows Server 2012 File and Storage Services Management
SIA, WSV, and VIR Track Resources
Talk to our Experts at the TLC
#TE(sessioncode)
DOWNLOAD Windows Server 2012 Release Candidate
microsoft.com/windowsserverHands-On Labs
DOWNLOAD Windows Azure
Windowsazure.com/teched
Resources
Connect. Share. Discuss.
http://northamerica.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
Complete an evaluation on CommNet and enter to win!
Please Complete an Evaluation Your feedback is important!
Multipleways to Evaluate Sessions
Scan the Tagto evaluate thissession now on myTechEd Mobile
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.
title
Appendix
SMB Transparent Failover Semantics (1/2)Server Side: State persistence until client reconnectsServer obeys a
contract with the client to ensure replay of operation is transparent to applicationAll race conditions cleanly addressedProtocol documentation will fully define behavior
Server State Preservation State Affected Comments
Preserved (for persistent handle timeout interval) – for transient/permanent network disconnects, server failovers
In progress CREATEs Replay, duplicate resolved via GUID
Opened file handles Fenced until client replays the open with same GUID. Includes support for Desired Access, Share modes
Read/Write I/Os Must ensure all writes prior to failover are flushed before processing replay of reads or writes
In-progress byte range locks Replay, duplicate resolved via sequence numbers
Established byte range locks Server preserves, client does not replay
Sticky timestamps Office interop
SMB2 FIDs describing open handles
Only the persistent portion of SMB2 FID is needed
Not preserved(client replays, etc)
Enumeration of dir & EAs Client restarts enumeration (Win32 API compliant)
Close Client replays
Change notification queue/block Client handles this
Oplock state Not Continuously Available – only used by down-level SMB clients, which don’t use CA
Mixed In progress lease breaks Replay if reconnect to same node. More complex if new node.
Renegotiated on reconnect
File and directory lease state Renegotiated on open re-establishment. Write+Handle leases are preserved.
SMB 3.0 Server
User
Kernel
User
Kernel
SMB2 Redirector
File System
Resume Key Filter
SMB2 Server
SMB 3.0Operation
replayState
persistence
SMB Transparent Failover Semantics (2/2)Client Side: state recoveryClient obeys a contract with the server to ensure replay of operation is transparent to applicationAll race conditions cleanly addressedProtocol documentation will fully define behavior
State Preservation Action
State Affected Comments
Simple Replay of operation(requires server state to ensure correct operation)
CREATE (file or directory) Using prior Create GUID, issue “re-open”.
Read or Write I/Os Replay (after Create is reconnected).
Rename/set DELETE_DISPOSITION
SMB2 FID or GUID used for open data handles, lease handles, opened for delete/rename handles.
In progress byte range lock requests
Replay - duplicates resolved via sequence numbers
FSCTLs Replay after re-open
Close Replay (re-open, then close), but re-open fails, is okay
Attempt to replay, potentially renegotiate
Directory Lease state Renegotiation can cause directory cache flush.
File Lease state Write+Handle leases preserved, all else could be renegotiated
Cached file data & metadata Write-Back data cache is preserved. May cause flush of metadata and/or read caches.
Other Action Granted byte range locks No replay – server preserves.
Enumeration State (dir and EAs) Start enumeration over, skip entries already returned.
Change notification queue/block Complete to app with error code to force re-enumeration/requeue.
SMB 3.0 Server
User
Kernel
User
Kernel
SMB2 Redirector
File System
Resume Key Filter
SMB2 Server
SMB 3.0Operation
replayState
persistence
Cluster File Server – feature interoperability
Area
Feature / capability
Clustered File Server
“Classic” Scale-Out
Data
managemen
t
BranchCache
Data de-duplication
DFS Namespaces - Root
DFS Namespaces - Leaf
DFS Replication
FSRM (Quota, Screening, Reporting)
FSRM Classification
File Server VSS Agent
Folder Redirection
Client Side Caching
Apps
Information Worker Notrecommende
d
Hyper-v
SQL Server
Area Feature / capability
Clustered File Server
“Classic”
Scale-Out
SMB
Capabilities
SMB Transparent Failover
SMB Scale-Out
SMB Multichannel
SMB Direct
SMB Encryption
File System
NTFS
ReFS
CSVFS