Date post: | 21-Mar-2017 |
Category: |
Technology |
Upload: | suse-italy |
View: | 63 times |
Download: | 2 times |
Lo storage è diventato softwareSUSE® Enterprise Storage
Powered by Ceph Technology
Alessandro RennaSales Engineer - [email protected]
Ettore Simone
2
SUSE Software-Defined Infrastructure An Open, Flexible Infrastructure Approach
Application Delivery
Management
Operations, Monitor and Patch • SUSE Manager• openATTIC
Cluster Deployment• Crowbar• Salt
Orchestration• Heat• Kubernetes
Custom Micro Service ApplicationsKubernetes / Magnum
Physical Infrastructure: Server, Switches, Storage
Public Cloud
SUSE Cloud Service Provider Program
ContainersSUSE CaaS Platform
Software Defined Everything
StorageSUSE Enterprise Storage
NetworkingSDN and NFV
VirtualizationKVM, Xen, VMware, Hyper-V, z/VM
Operating SystemSUSE Linux Enterprise Server
Platform as a ServiceCloud Foundry
Private Cloud / IaaSSUSE OpenStack Cloud
Enterprise Data Storage Challenges
Common Limitations of Traditional Enterprise Storage
Unable to Scale and Manage Data Growth
Expensive Won’t Extend tothe Software-defined
Data Center
$
PRESENT
FUTURE
£€
$
Data management challenges
What are your key challenges in terms of data management in your organisation?
None of the above
Silos of storage
Complex administration
Storage capacity planning
Cost of storage
Performance / availability
Challenges w ith back up, disaster recovery and archiving
Increasing volumes of data
Security and data governance
3%
26%
35%
39%
45%
46%
46%
54%
56%
*1202 senior IT decision makers across 11 countries completed an online survey in July / August 2016
Storage costs
9 in 10 organizations are worried about how to manage storage costs as capacity increases
To a great extent; 40.90%
To some extent; 50.70%
Not at all; 8.40%
*1202 senior IT decision makers across 11 countries completed an online survey in July / August 2016
A growing challenge in 2016: LARGE data storage
Individual piece of data that is enormous (100s GB+)
Massive, unstructured data types:• Video, Audio, Graphics, CAD, etc
Does not require real time or fast access
Applications include:• Energy: Seismic Exploration Maps
• Healthcare: X-Ray Libraries, 3D Ultrasound
• Media: Video storage, Streaming Media
• HPC: Shared Research Storage
• CCTV, Police Body Cams, Security Surveillance, more 8
Why traditional storage is not best for LARGE data
Runs out of storage capacity quick• Appliances usually hold ~500TB of total capacity – Large data eats
into that quick
Not designed to scale in the multiple 100TB+ range• Designed when data and storage requirements were lower
• typically ~500TB in total capacity
Not designed to support unstructured data types• Better suited for data that is structured
• Better suited for data that can exist in a relational databases
Expensive• typically ~$3M - 5M per appliance
9
The sooner you use up storage space in your appliance, the sooner you need to
make another expensive storage decision!
“Object Storage” technology handles LARGE data much better
Object storage was designed…• To handle storage of large, unstructured data by from the ground up
• To handle both structured and unstructured data - traditional storage was designed for structured data
• On cloud principles, so it can scale to TB, PB, and beyond – no known capacity limit.
And when using open source software vs proprietary software• The economics of the whole solution become even better without software licensing costs.
• Open source projects bring rapid levels of innovation, so open source object storage technology is improved frequently and collaboratively
10
Since scale-out, open source object storage is essentially software running on commodity servers with local drives,
you should never run out of space in the object store – if more capacity is needed, you just add another server.
Reduce IT Costs with an Intelligent Software-Defined Storage Solution
Reduce IT costs with an intelligent software-defined storage management solution that uses commodity off-the-shelf servers and disk drives
- Significant CAPEX savings
- Reduce IT operational expense
- Optimize infrastructure without growing IT staff
CAPEX/OPEX
1212
SUSE Enterprise StorageSoftware-defined Storage
Open Source Ceph as the Base
Code Developers
782
Core Regular Casual 22 53 705
Total downloads
160,015,454
Unique downloads
21,264,047
Client Servers(Windows, Linux, Unix)
RADOS (Common Object Store)
Block Devices
Server
Object Storage File Interface
Storage Server
Storage Server
Storage Server
Storage Server
Server
Server
Storage Server
Storage Server
Applications File Share
OSD OSD OSD OSD OSD OSD
Net
wo
rkC
lust
er
MO
N
RB
DiS
CS
I
S3
SW
IFT
Cep
hFS
*
Mo
nit
ors
MO
NM
ON
SUSE Enterprise Storage Enterprise Class Storage Using Commodity Servers and Disk Drives
Latest hardware
Reduce capital
expense
Hardware flexibility
$
SUSE Enterprise Storage Unlimited Scalability with Self Managing Technology
MonitorNodes
Management Node
Object Storage
Block Storage
File System
StorageNodes
Increase capacity andperformance by simply adding new storage or storage nodesto the cluster
Support today’s investment Adapt to the future
Legacy Data Center
- Network, compute and storage silos
- Traditional protocols – Fibre Channel, iSCSI, CIFS, NFS
• Process Driven
- Slow to respond
• This is where you probably are today
Software-defined Data Center
- Software-defined everything
• Agile Infrastructure
- Supporting a DevOps model
- Business driven
• This is where you need to get to
SUSE Enterprise StorageEnable Transformation
PRESENT
FUTURE
£€
$
Introducing SUSE Enterprise Storage 4
SUSE Enterprise Storage 4 – Major Features
• Unified block, object and files with production ready CephFS filesystem
• Expanded hardware-platform choice with support for 64 bit ARM
• Asynchronous replication for block storage and multisite object replication
• Enhanced ease of management with SUSE openATTIC
• Enhanced cluster orchestration using Salt
• Early access to NFS Ganesha support and NFS access to S3 buckets
File Storage
Block Storage
Object Storage
Major new features and benefits with SUSE Enterprise Storage 4.
• Advanced graphical user interface for simplified management and improved cost efficiency, using the openATTIC open source storage management system.
19
Demo Time!openATTIC
SUSE Enterprise Storage Protocols & Use Cases
Enterprise Data Capacity Utilization
Software-defined storage use cases
80% of data
50-60%of Enterprise Data
20-25%
15-20%
1-3%Tier 0Ultra High Performance
Tier 1High-value, Online Transaction Processing (OLTP), Revenue GeneratingTier 2Backup/Recovery, Reference Data, Bulk Data
Tier 3Object, Archive,Compliance Archive,Long-term Retention
Source: Horison Information Strategies - Fred Moore
Large Data - Content Store
Scientific Organizations
• Meteorological data
• Telescope recordings
• Satellite feeds
Media Industries
• TV stations
• Radio stations
• Motion picture distributors
• Web music/video content
Large Data - Video Surveillance
• Facility security surveillance
• Red light/traffic cameras
• License plate readers
• Body cameras for law enforcement
• Military/government visual reconnaissance
Object or Block Bulk Storage
• Data that constantly grows during the course of business
• SharePoint data
• D2D Backup
- HPE Data Protector, CommVault, Veritas and others
• Financial records
• Medical records
26
SUSE Enterprise Storage Fit in the Backup Architecture
How does SUSE Enterprise Storage fit? It Replaces Dedupe appliances and it augments tape devices by keeping more data online cost effectively.
Application Servers
Backup Server
Dedupe Appliance or Disk Array
Tape Library
SUSE Enterprise Storage
Replace
Augment
27
HPC Storage Archive Use CaseArchive Data not Needed Immediately to Secondary Tier
Primary Storage
HPC Compute Cluster
Round Robin Policy Manager
SUSE Enterprise Storage
Lower TCO Easy to Grow Reduce Footprint Never Migrate
OBJECT
RADOS Native
S3
Swift
NFS to S3
Useful for:
• Backup
• Cloud Storage
• Large Data store for applications
28
OBJECT – Characteristics
WAN friendly
High latency tolerant
Cloud Native Apps
Usually MB and larger size
Scales well with large number of users
29
Demo Time!Object Storage with S3 interface
Virtual Machine (VM) Storage
• Ceph is already the leading storage choicefor OpenStack environments
• Low and mid i/o virtual machine storagefor major hypervisor platforms
• kvm – native RBD
• Hyper-V – iSCSI
• VMware - iSCSI
32
RADOS Block Device (RBD)
● Block device abstraction● Can be used as:
● Mount point (with kernel module)● VM virtual volumes (QEMU librbd bidings)● iSCSI
● Block device is called “RBD Image”● Thin provisioning● Images are striped in objects of fixed size
● Stored in OSDs
RBD Mirroring
34
RBD Mirroring
● RBD images are stored in a single cluster● A cluster is composed by a set of servers● Interconnected by a LAN● Single data center
● Asynchronously mirror RBD images to a remote site● RBD Mirroring● Since Ceph Jewel release → SUSE Enterprise Storage 4
35
RBD Mirroring
Primary Site
Secondary Site
Tertiary Site
Bob
Alice
mirroring
mirroring
RBD-Mirrordaemon
RBD-Mirrordaemon
Primary Site
mirroring
36
RBD Mirroring
● Asynchronous replication
● rbd-mirror daemon fetches data from the primary site
● rbd-mirror daemon writes data in local site
● rbd-mirror daemon requires access to both remote and local clusters
● Crash Consistency
● Write events and barriers are preserved during replication
37
RBD Mirroring - Journaling
● Mirroring relies on the new RBD image feature: Journaling● Image Journal:
● Logs all operations that change the contents and structure● Writes● Resizes● Flushes● Etc…
● Sequential● Stored as objects in OSDs
38
RBD-Mirror Daemon
● The core component of rbd-mirror feature● Runs in one of the Ceph cluster servers● Requires access to the primary site monitors and OSDs● Performs all mirroring work
● Reads journal entries in the primary site● Replays entries in the local cluster
● Syncs image upon bootstrap● Also, resyncs
● Handles fail-over operations● Promotions, Demotions
39
Primary Site Failure
● When the primary site fails:● demote the primary image to non-primary● promote the secondary image to primary
● Force promote, if primary site is completely inaccessible
● Re-sync of images● When failed site becomes accessible again
CephFS
41
Introduction to CephFS
● POSIX like clustered file system atop Ceph
● File access remains important in storage● Allow existing applications to utilize Ceph storage● Interoperability with existing infrastructure● Directories and permissions● Elastic capacity
42
Architecture
● Object Storage Daemon (OSD)● Monitors● Metadata Server (MDS)
● Manages filesystem namespace● State stored within RADOS cluster● Active/Standby
● Standby MDS steps in on failure of primary● Active/Active
● Not currently supported● Sharding of directory tree
● Client● Communicates directly with OSD and MDS daemons
43
Architecture
44
Clients
● In-kernel CephFS client● mount.cephfs
● FUSE● Libcephfs
● NFS Ganesha● Samba● Your application
45
NFS Ganesha
● NFS server in user-space● Comprehensive protocol support: V2, V3, V4, v4.1, v4.2● Pluggable back-end for filesystem specific functionality
● Ceph back-end (FSAL)
● Technical Preview with SUSE Enterprise Storage 4
46
Samba
● Windows interoperability suite● File sharing, authentication and identity mapping
● Ceph module for Samba• Access CephFS from any SMB client
● Windows, macOS, etc.
● Enabled via smb.conf parameter: vfs objects = ceph
● Coming soon → SUSE Enterprise Storage 5
47
Openstack Manila
● Not a client per se
● Management and provisioning of file shares● Independent of underlying file server and data-path● Back end file server specific drivers
48
Openstack Manila
CephFS / SMB / NFS
ManilaService
Storage(Ceph
Cluster)
49
Supported CephFS Scenarios (SES4)
● A single active MDS. A minimum of one, better two standby MDS● CephFS snapshots are disabled (default) and not supported in this
version● Clients are SUSE Linux Enterprise Server 12 SP2 based, using the
cephfs kernel module driver. The FUSE module is not supported.● No directory may have more than 100,000 entries (files or subdirectories
or links).
Demo Time!CephFS
SUSE Enterprise Storage Deployments Guidelines
SUSE Enterprise Storage Minimum Configuration
4 SES OSD storage nodes• 10 Gb Ethernet (2 networks bonded to multiple switches)
• 32 OSD’s per storage cluster
• OSD journal can reside on OSD disk
• Dedicated OS disk per OSD storage node
• 1 GB RAM per TB raw OSD capacity for each OSD storage node
• 1.5 GHz per OSD for each OSD storage node
• Monitor nodes, gateway nodes and metadata server node can reside on SES OSD storage nodes: • 3 SES monitor nodes (requires SSD for dedicated OS drive)
• iSCSI gateway, object gateway or metadata server nodes require redundant deployment
• iSCSI gateway, object gateway or metadata server require incremental 4 GB RAM and 4 Cores
Separate management node• 4 GB RAM, 4 Core, 1 TB capacity
https://www.suse.com/documentation/ses-3/book_storage_admin/data/cha_ceph_sysreq.html
52
Minimum Recommended Configuration (Production)
7 SES OSD storage nodes (no single node exceeds ~15%)• 10 Gb Ethernet (4 physical networks bonded to multiple switches)
• 56+ OSDs per storage cluster
• RAID 1 OS disks for each OSD storage node
• SSDs for Journal• 6:1 ratio SSD journal to OSD
• 1.5 GB RAM per TB raw OSD capacity for each OSD storage node
• 2 GHz per OSD for each OSD storage node
Dedicated physical nodes for infrastructure nodes:• 3 SES Monitors; 4 GB RAM , 4 core processor, RAID 1 SSDs for disk
• 1 SES management node; 4GB RAM, 4 core processor, RAID 1 SSDs for disk
• Redundant physical deployment of gateway nodes or metadata server nodes:• SES object gateway nodes; 32 GB RAM, 8 core processor, RAID 1 SSDs for disk
• SES iSCSI gateway nodes 16 GB RAM, 4 core processor, RAID 1 SSDs for disk
• SES metadata server nodes (one active/one hot standby); 32 GB RAM, 8 core processor, RAID 1 SSDs for disk
https://www.suse.com/documentation/ses-3/book_storage_admin/data/cha_ceph_sysreq.html
53
SUSE Enterprise Storage Business Benefits
SAVINGS: Total cost of ownership• Reduced CAPEX expenditures
• Reduced OPEX expenditures
FLEXIBILITY: Adaptability to evolving business needs• Reduced dependency upon proprietary “Locked In” storage
CONFIDENCE: Reliability and availability• Leverage SUSE world-class support and services
Appendix
SUSE Enterprise Storage 5 – Ceph BlueStore
CEPH Architecture Overview
SUSE Enterprise Storage – Object Storage Daemon (OSD)
SUSE Enterprise Storage – Storage Node
SUSE Enterprise Storage – Monitor Node
SUSE Enterprise Storage Reliable Autonomous Distributed Object Store (RADOS) Cluster
XFS
SUSE Enterprise Storage – CephFS Metadata Server
• Manages metadata for a POSIX compliant shared filesystem (CephFS)
• Active with hot standby (multiple MDS later release)
SUSE Enterprise Storage – CephFS Metadata Server
• Manages metadata for a POSIX compliant shared filesystem (CephFS)
• Active with hot standby (multiple MDS later release)
CEPH Architecture OverviewData Placement
SUSE Enterprise Storage – Ceph Pools
A pool is a logical container for storage objects
A pool has a set of parameters• name• numerical ID (internal to RADOS)• number of replicas OR erasure
encoding settings• number of placement groups• placement rule set• owner
Pools support certain operations• create object• remove object• read object• write entire object• snapshot of the entire pool
SUSE Enterprise Storage – Ceph Placement Group (PG)
Placement groups help balance data across OSDs in the cluster
One PG typically exists on several OSDs for replication
One OSD typically serves many PGs
SUSE Enterprise Storage – Ceph Placement Group (PG)
Placement groups help balance data across OSDs in the cluster
One PG typically exists on several OSDs for replication
One OSD typically serves many PGs
SUSE Enterprise Storage – CRUSH Placement Algorithm
CRUSH is a pseudo-random data placement algorithm• fast calculation – no lookup table• repeatable, deterministic• statistically uniform distribution
CRUSH uses a map of OSDs in the cluster • includes physical topology, like row, rack, host• includes rules describing which OSDs to consider
SUSE Enterprise Storage – CRUSH Example
Ceph Pool Name = SwimmingPool
Object Name = RubberDuck
SUSE Enterprise Storage – Ceph Read Example
SUSE Enterprise Storage – Ceph Write Example
SUSE Enterprise Storage – Multiple Replication Options
SUSE Enterprise Storage – Cache Tiered Pools
Benchmarking
Why Benchmark at all?
To understand the ability of the cluster to meet your performance requirements
To establish a baseline performance that allows for tuning improvement measurements
Provides a baseline for future component testing for inclusion into the cluster and understanding how it may affect the overall cluster performance
77
The Problem – Lack of Clarity
Most storage requirements are expressed in nebulous terms that likely don’t apply well to the use case being explored
• IOPS
• GB/s
Should be expressed in
• Protocol type with specifics if known• Block, File, or Object
• IO size • 64k, 1MB, etc
• Read/Write Mix with type of IO• 60% Sequential Write with 40% random reads
• Include the throughput requirement
78
Protocols & Use Cases
OBJECT
RADOS Native
S3
Swift
NFS to S3
Useful for:
• Backup
• Cloud Storage
• Large Data store for applications
80
OBJECT – Characteristics
WAN friendly
High latency tolerant
Cloud Native Apps
Usually MB and larger size
Scales well with large number of users
81
OBJECT – When to use journals
There are occasions that journals make sense in object scenarios today
• Smaller clusters that may receive high bursts of write traffic• Data Center Backups
• Smaller Service Providers
• Use cases where there may be a high number of small objects written
• Rebuild Requirements – Journals reduce time for the cluster to fully rebalance after an event
• Burst Ingest of large objects. Bursty writes of large objects can tie up a cluster without journals much easier
82
BLOCK
RBD
iSCSI
Use Cases:• Virtual Machine Storage
• D2D Backups
• Bulk storage location
• Warm Archives
83
File
CephFS is a Linux native, distributed filesystem
• Will eventually support sharding and scaling of MDS nodes
Today, SUSE Recommends the following usage scenarios
• Application Home
84
Should I Use Journals?
What exactly are the journals?• Ceph OSDs use a journal for two reasons: speed and consistency. The journal enables the Ceph OSD Daemon to commit
small writes quickly and guarantee atomic compound operations.
Journals are usually recommended for Block and File use cases
There are a few cases where they are not needed
• All Flash
• Where responsiveness and throughput are not a concern
You don’t need journals when trying to gain read performance, no effect there.
85
Benchmarking the right thing
Understand your needs
• Do you care more about bandwidth, latency or high operations per second?
Understand the workload• Is it sequential or random?
• Read, Write, or Mixed?
• Large or small I/O?
• Type of connectivity?
86
Watch for the bottlenecks
Bottlenecks in the wrong places can create a false result
• Resource Bound on the Testing Nodes?• Network, RAM, CPU
• Cluster Network Maxed Out?• Uplinks maxed
• testing nodes links maxed
• switch cpu maxed
• Old drivers?
87
Benchmarking Tools - Block & File
FIO - current and most commonly used
iometer - old and not well maintained
iozone - also old and not a lot of wide usage
Spec.org - industry standard audited benchmarks, specSFS is for network file systems. fee based
spc - another industry standard, used heavily by SAN providers, fee based
88
Block - FIO
FIO is used to benchmark block i/o and has a pluggable storage engine, meaning it works well with iSCSI, RBD, and CephFS with the ability to use an optimized storage engine.• Has a client/server mode for multi-host testing
• Included with SES
• Info found at: http://git.kernel.dk/?p=fio.git;a=summary
• sample command & common options
89
Benchmarking Tools - Object
Cosbench - COSBench - Cloud Object Storage Benchmark
COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services. Object storage is an emerging technology that is different from traditional file systems (e.g., NFS) or block device systems (e.g., iSCSI). Amazon S3 and Openstack* swift are well-known object storage solutions.
https://github.com/intel-cloud/cosbench
90
Object - Cosbench
Supports multiple object interfaces including S3 and Swift
Supports use from CLI or web GUI
Capable of building and executing jobs using multiple nodes with multiple workers per node
Can really hammer the resources available on a radosgw
And on the testing node
91
Summary
Choose the benchmark(s) and data pattern(s) that best fit what you want to learn about the solution.
• Benchmarking can help determine ‘how much’ a solution can do, but also help understand ‘sweet spots’ for SLA and cost.
• Ceph supports different I/O ingest, so important to cover each type
Build from benchmark results
• More complex testing starts with baseline expectations.
• Next steps: canned application workloads, canary/beta deployments
92
257-000041-001