Post on 12-Jun-2018
transcript
Agenda
• Ceph Introduction and Architecture
• Why MySQL on Ceph
• MySQL and Ceph Performance Tuning
• Head-to-Head Performance MySQL on Ceph vs. AWS
• Architectural Considerations
• Where to go next ?
What is Ceph ?
• Open Source• Software Defined Storage Solution• Unified Storage Platform ( Block , Object and File Storage )• Runs on Commodity Hardware• Self Managing, Self Healing• Massively Scalable • No Single Point of failure
Architectural Components
RGWA web services
gateway for object storage, compatible with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable,
fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata
OBJECTS
VIRTUAL DISKS
FILESYSTE
M
RADOS Components
OSDs ( Object Storage Daemon )• 10s to 10000s in a cluster• Typically one daemon per disk• Stores actual data on disk• Intelligently peer for replication & recovery
Monitors• Maintain cluster membership and health• Provide consensus for distributed decision-making• Small, odd number• Do not store dataM
CRUSH Algorithm
CLUSTER
10
01
01
10
10
01
11
01
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
PLACEMENT GROUPS (PGs)
OBJECTS
Controller Replication Under Scalable Hashing
OBJECTS
OBJECTS
OBJECTS
OBJECTS
Data is organized into pools
CLUSTER
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
POOLS(CONTAINING PGs)
10
01
11
01
10
01
01
10
01
10
10
01
11
01
10
01
10 01 10 11
01
11
01
10
10
01
01
01
10
10
01
01
POOLA
POOLB
POOL C
POOLD
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object storage, compatible with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable,
fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata
APP
HOST/VM
CLIENT
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object storage, compatible with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable,
fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata
APP
HOST/VM
CLIENT
• Ceph #1 block storage for OpenStack
• MySQL #4 workload on OpenStack
• (#1 - 3 often use database too !)
• 70% apps use LAMP on OpenStack
• MySQL leading open-source RDBMS
• Ceph leading open-source SDS
Why MySQL on Ceph?MARKET DRIVERS
• Distributed, elastic storage pools on commodity servers
• Dynamic data placement
• Flexible volume resizing
• Live instance migration
• Pool and volume snapshot
• Read replicas via copy-on-write snapshots
• Familiar environment like public clouds
Why MySQL on Ceph?OPS EFFICIENCY DRIVERS
Why MySQL on Ceph?Database Requires HIGH IOPS
Workload Media Access MethodGeneral Purpose Spinning/SSD Block
Capacity ( $/GB ) Spinning ObjectHigh IOPS ( $/IOPS ) SSD / NVMe Block
Tuning MySQL• Buffer pool > 20%
• Flush each Transaction or batch?
• Percona Parallel double write buffer feature
Tuning Ceph• RHCS 1.3.2, tcmalloc 2.4 , 128M thread cache
• If ( OSDs on Flash media) ; then
• Co-resident journals
• 2-4 OSDs per SSD/NVMe
• If ( OSDs on Magnetic media ) ; then
• SSD Journals
• RAID write back cache
• RBD cache
• Software cache
Tuning for Harmony
Tuning for HarmonyCreating a separate pool to serve IOPS workload
Creating multiple pools in the CRUSH map
• Distinct branch in OSD tree
• Edit CRUSH map, add SSD rules
• Create pool, set crush_ruleset to SSD rule
• If storage provisioning using OpenStack ; then
• Add volume type to Cinder
• If ! OpenStack ; then
• Provision database storage volumes from SSD pool
Head-To-Head LABTest Environment
• EC2 r3.2xlarge and m4.4xlarge
• EBS Provisioned IOPS and GPSSD
• Percona Server
• Supermicro servers
• Red Hat Ceph Storage RBD
• Percona Server
Ceph OSD Nodes5x SuperStorage SSG-6028R-OSDXXX
Dual Intel Xeon E5-2650v3 (10x core)32GB SDRAM DDR32x 80GB boot drives 4x 800GB Intel DC P3700 (hot-swap U.2 NVMe)1x dual port 10GbE network adaptors AOC-STGN-i2S 8x Seagate 6TB 7200 RPM SAS (unused in this lab)Mellanox 40GbE network adaptor(unused in this lab)
MySQL Client Nodes12x Super Server 2UTwin2 nodes
Dual Intel Xeon E5-2670v2 (cpuset limited to 8 or 16 vCPUs)64GB SDRAM DDR3
Storage Server Software:Red Hat Ceph Storage 1.3.2Red Hat Enterprise Linux 7.2Percona Server 5.7.11
5x OSD Nodes 12x Client Nodes
Shared 10G
SFP
+ Netw
orking
Monitor Nodes
SUPERMICRO Ceph ClusterLab Environment
Architectural ConsiderationsUnderstanding the workloads
Traditional Ceph Workload
• $/GB
• PBs
• Unstructured data
• MB/sec
MySQL Ceph Workload
• $/IOP
• TBs
• Structured data
• IOPS
Fundamentally Different Design
Traditional Ceph Workload
• 50-300+ TB per server
• Magnetic Media (HDD)
• Low CPU-core:OSD ratio
• 10GbE->40GbE
MySQL Ceph Workload
• < 10 TB per server
• Flash (SSD -> NVMe)
• High CPU-core:OSD ratio
• 10GbE
Architectural Considerations
8x Nodes in 3U chassisModel: SYS-5038MR-OSDXXXP
Per Node Configuration:CPU: Single Intel Xeon E5-2630 v4Memory: 32GB NVMe Storage: Single 800GB Intel P3700 Networking: 1x dual-port 10G SFP+
+ +
1x CPU + 1x NVMe + 1x SFP
SUPERMICRO MICRO CLOUDCEPH MYSQL PERFORMANCE SKU
MySQL on Red Hat Ceph Storage
Download the PDFhttp://bit.ly/mysql-on-ceph
Reference Architecture White Paper
Red Hat Ceph Storage Test DriveLearning by Doing
• Absolutely Free• Ceph playground• 10 Node Ceph Lab on AWS• Self paced , instruction led
http://bit.ly/ceph-test-drive
Ceph Test Drive: http://bit.ly/ceph-test-drive
MySQL on Ceph Reference Arch: http://bit.ly/mysql-on-ceph
Thank You
Join us to hear aboutMySQL and Red Hat Storage Free Test Drive Environment
Today 3:40 PM, Room : Lausanne