Date post: | 22-Jan-2018 |
Category: |
Technology |
Upload: | inwin-stack |
View: | 442 times |
Download: | 0 times |
CEPH LUMINOUS UPDATEXSKYHaomai Wang
2017.06.06
• Haomai Wang,activeCeph contributor• Maintainmulticomponents• XSKYCTO,aChina-basedstoragestartup• [email protected]/[email protected]
WhoAmI
• Hammerv0.94.x(LTS)– March '15• Infernalis v9.2.x– November'15• Jewelv10.2.x(LTS)– April'16• Krakenv11.2.x– December'16• Luminous v12.2.x(LTS)– September‘17(delay)
Releases
Ceph Ecosystem
• BlueStore=Block+NewStore• Key/valuedatabase(RocksDB)formetadata• Alldatawrittendirectlytorawblockdevice(s)
• FastonbothHDDs(~2x)andSSDs(~1.5x)– SimilartoFileStore onNVMe,wherethedeviceisnotthebottleneck
• Fulldatachecksums(crc32c,xxhash,etc.)• Inlinecompression(zlib,snappy,zstd)• Stableanddefault
RADOS- BlueStore
• requiresBlueStore toperformreasonably• signicant improvementineffciency over3xreplication• 2+2→2x4+2→1.5x• smallwritesslowerthanreplication– earlytestingshowed4+2isabouthalfasfastas3xreplication
• largewritesfasterthanreplication– lessIOtodevice
• implementationstilldoesthe“simple”thing– allwritesupdateafullstripe
RADOS– RBDOverErasureCode
• ceph-mgr– newmanagementdaemontosupplementceph-mon(monitor)– easierintegrationpointforpythonmanagementlogic– integratedmetrics
• makeceph-monscalableagain– offloadpg statsfrommontomgr– pushto10KOSDs(planned“bigbang3”@CERN)
• newRESTAPI– pecan– basedonpreviousCalamariAPI
• built-inwebdashbard
CEPH-MGR
AsyncMessenger
• AsyncMessenger– CoreLibraryincludedbyallcomponents– KernelTCP/IPdriver– Epoll/Kqueue Drive–Maintainconnectionlifecycleandsession– replacesagingSimpleMessenger– fixedsizethreadpool(vs2threadspersocket)– scalesbettertolargerclusters– morehealthyrelationshipwithtcmalloc– nowthedefault!
DPDKSupport
• BuiltforHighPerformance– DPDK– SPDK– FulluserspaceIOpath– Shared-nothingTCP/IPStack(Seastarrefer)
• RDMAbackend– InheritNetworkStack andimplementRDMAStack– Usinguser-spaceverbsdirectly– TCPascontrolpath– ExchangemessageusingRDMASEND– Usingsharedreceivequeue– Multipleconnectionqp’s inmany-to-manytopology– Built-inintocephmaster– AllFeaturesarefullyavailoncephmaster
• Support:– RH/centos– INFINIBANDandETH– Roce V2forcrosssubnet– Front-endTCPandback-endRDMA
RDMASupport
Plugin Default HardwareRequirement
Performance Compatible OSDStorageEngineRequirement
OSDDiskBackendRequirement
Posix(Kernel) YES None Middle TCP/IPCompatible None None
DPDK+UserspaceTCP/IP
NO DPDKSupportedNIC High TCP/IPCompatible BlueStore MustbeNVMESSD
RDMA NO RDMASupportedNIC
High RDMA SupportedNetwork
None None
MessengerPlugins
RecoveryImprovements
RBD- iSCSI
• TCMU-RUNNER+LIBRBD• LIO+KernelRBD
RBDMirrorHA
RGWMETADATASEARCH
RGWMISC
• NFSgateway– NFSv4andv3– fullobjectaccess(notgeneralpurpose!)
• dynamicbucketindexsharding– automatic(nally!)
• inlinecompression• Encryption– followsS3encryptionAPIs
• S3andSwiftAPIoddsandends
NFS-Client
nfs-ganesha(nfs-v4)
librgw-file
RADOS
NFS-Server RadosGW
Apps
rados api
S3API SwiftAPI
rados apiRadosHandler
• multipleactiveMDSdaemons(nally!)• subtreepinningtospecificdaemon• directoryfragmentationonbydefault• (snapshotsstillobydefault)somanytests• somanybugsfixed• kernelclientimprovements
CephFS
CephFS – MultiMDS
Container
• Rados– IOPathRefactor– BlueStore Peformance
• QoS– dmclock
• Dedup– basedonTiering
• Tiering
Future
GrowingDeveloperCommunity
HowToHelp
Thank you