+ All Categories
Home > Technology > Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

Date post: 21-Jul-2015
Category:
Upload: ceph-community
View: 59 times
Download: 5 times
Share this document with a friend
Popular Tags:
29
CEPH@DeutscheTelekom A 2+ Years Production Liaison Ievgen Nelen, Gerd Pruessmann - Deutsche Telekom AG, DBU Cloud Services, P&I
Transcript
Page 1: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

CEPH@DeutscheTelekomA 2+ Years Production LiaisonIevgen Nelen, Gerd Pruessmann - Deutsche Telekom AG, DBU Cloud Services, P&I

Page 2: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 2

SpeakersIevgen Nelen & Gerd Prüßmann

• Cloud Operations Engineer

• Ceph cuttlefish

• Openstack diablo

• @eugene_nelen

[email protected]

• Head of Platform Engineering

• CEPH argonaut

• Openstack cactus

• @2digitsLeft

[email protected]

Page 3: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

Overviewthe business case

Page 4: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 4

OverviewBusiness Marketplace

• https://portal.telekomcloud.com/

• SaaS Applications from Software Partners (ISVs) and DT offered to SME customers

• i.e. Saperion, Sage, PadCloud, Teamlike, Fastbill, Imeet, Weclapp, SilverERP, Teamdisk ...

• Complements other cloud offerings from Deutsche Telekom (Enterprise cloud from T-Systems, Cisco Intercloud, Mediencenter etc.)

• IaaS platform based only on Open Source technologies like OpenStack, CEPH and Linux

• Project started in 2012 with OS Essex, CEPH in production since 3/2013 (bobtail)

Page 5: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015– Strictly confidential, Confidential, Internal – Author / Presentation title 5

Overviewwhy opensource? Why ceph?

• no vendor lock in!

• easier to change and adapt new technologies / concepts - more independent from vendor priorities

• low cost of ownership and operation, utilizing commodity hardware and Open Source

• no license fees - but professional support

• modular and horizontally scalable platform

• automation and flexibility allow for faster deployment cycles, than in traditional hosting

• control over open source code - faster bug fixing and feature delivery

Page 6: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

DETAILSBASICS

Page 7: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 7

DETAILSceph basics

• Bobtail > Cuttlefish > Dumpling > Firefly (0.80.9)

• Multiple CEPH clusters

• overall raw capacity 4.8 PB

• One S3 and cluster (~810TB raw capacity - 15 storage nodes - 3 MONs)

• multiple smaller RBD clusters for REF, LIFE and DEV

• S3 storage for cloud native apps (Teamdisk, Teamlike) and for backups (i.e RBD)

• RBD for persistent volumes / data via Openstack Cinder (i.e. DB volumes)

Page 8: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 8

Detailsceph basics

Page 9: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

DETAILSHardware

Page 10: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 10

DETAILShardware

Page 11: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

• Supermicro2x Intel Xeon E5-2640 v2 @ 2.00GHz64GB RAM7x SSDs18x HDDs

• Seagate TerascaleST4000NC000 4TB HDDs

• LSI MegaRAID SAS 9271-8i

• 18 OSDs per node: RAID1 with 2 SSD for /, 3 RAID0 with 1 SSD for journals, 18 raid0 with 1 hdd for OSD

• 2x10Gb network adapters

07.05.2015 11

DETAILShardware

• Supermicro1x Intel Xeon E5-2650L @ 1.80GHz64GB RAM36x HDDs

• Seagate Barracuda ST3000DM001 3TB HDDs

• LSI MegaRAIDSAS 9271-8i

• 10 OSDs per node: RAID1 for /, 10 RAID0 with 1 hdd for journals, 10 raid0 with 2 hdd for OSD

• 2x10Gb network adapters

Page 12: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

DetailsConfiguration & deployment

Page 13: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 13

detailsconfiguration & deployment

• Razor

• Puppet

• https://github.com/TelekomCloud/puppet-ceph

• dm-crypt disk encryption

• osd location

• XFS

• 3 replica

• OMD/Check_mk http://omdistro.org/

• ceph-dash https://github.com/TelekomCloud/ceph-dash for dashboard and API

• check_mk plugins (Cluster health, OSDs, S3)

Page 14: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

Detailsperformance tuning

Page 15: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 15

detailsperformance tuning

• Problem - Low IOPS, IOPS drops

• fio

• Enable RAID0 Writeback cache

• Use separate disks for ceph journals (better use SSDs – scale out project)

• Problem - Recovery/Backfilling consumes a lot of cpu, decrease of performance

• osd_recovery_max_active 1 number of active recovery requests per OSD at one time

• osd_max_backfills 1 maximum number of backfills allowed to or from a single OSD

Page 16: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 16

detailsperformance Tests – current hardware / IO

Page 17: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 17

detailsperformance Tests – curr.Hardware/Bandwidth

Page 18: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

lessons learned

Page 19: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 19

lessons learnedoperational experience

• Chose your hardware well !!

• I,e. RAID and hard disks -> enterprise grade disks (desktop HDs are missing important features like TLER/ERC)

• CPU/RAM planning: calculate 1GHz CPU power and 2GB RAM per single OSD

• pick nodes with low storage capacity density for smaller clusters

• At least 5 nodes for a 3 replica cluster (i.e. for PoC, testing and development purposes)

• Cluster configuration “adjustments”:

• increasing PG num > impact on cluster because of massive data migration

• Rolling software updates / upgrades worked perfectly

• CEPH: has a character – but highly reliable - never lost data

Page 20: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 20

lessons learnedoperational experience

• Failed / ”Slow” disks

• Inconsistent PGs

• Incomplete PGs

• RBD pool configured with min_size=2

• Blocks IO operations to the pool / cluster

• fixed in Hammer (allows PG replication while replica level below min_size pool/OSD)

Page 21: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

/var/log/syslog.log

Apr 12 04:59:47 cephosd5 kernel: [12473860.669262] sd 6:2:10:0: [sdk] Unhandled error code

root@cephosd5:/var/log# mount | grep sdk /dev/mapper/cephosd5-journal-sdk on /var/lib/ceph/osd/journal-disk9

root@cephosd5:/var/log# grep journal-disk9 /etc/ceph/ceph.conf osd journal = /var/lib/ceph/osd/journal-disk9/osd.151-journal

/var/log/ceph/ceph-osd.151.log.1.gz

2015-04-12 04:59:47.891284 7f8a10c76700 -1 journal FileJournal::do_ write: pwrite(fd=25, hbp.length=4096) failed :(5) Input/output error

07.05.2015 21

lessons learnedoperational experience

Page 22: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

5/7/2015 22

lessons learned incomplete PGs - what happened?

OSD node

OSD

Journal

pg pgOSD

JournalOSD node

OSD

Journal

pg pgOSD

Journal

OSD node

OSD

Journal

pg pgOSD

Journal

pg

Page 23: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

glimpse of the future

Page 24: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 24

OverviewSCALE OUT Project

+40%

Current overall capacity:

~60 storage nodes

5,4 PB Storage Gross

~0,5 PB S3 Storage Net

Planned Capacity for 2015:

~90 storage nodes

7,5 PB Storage Gross

~1,5 PB S3 Storage Net

Page 25: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 25

Future setupscale out project

• 2 physically separated rooms

• Data distributed according the rule

• not more than 2 replicas in - one room not more than 1 replica in one rack

Page 26: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 26

Future setupNew crushmap rules

rule myrule {

ruleset 3

type replicated

min_size 1

max_size 10

step take default

step choose firstn 2 type room

step chooseleaf firstn 2 type rack

step emit

}

crushtool -i real7 --test --show-statistics --rule 3 --min-x 1 --max-x 1024 --num-rep 3 --show-mappings

CRUSH rule 3 x 1 [12,19,15]CRUSH rule 3 x 2 [14,16,13]CRUSH rule 3 x 3 [3,0,7]…

Listing 1: crushmap rule Listing 2: Simulate 1024 Objects

Page 27: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 27

Future setupdreams

• cache tiering

• make use of shiny new SSDs in a hot zone / cache pool

• SSD pools

• Openstack live migration for VMs (boot from rbd volume)

Page 28: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

Q & a

Page 29: Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

07.05.2015 29

QUESTION & ANSWERS

• Ievgen Nelen

• @eugene_nelen

[email protected]

• Gerd Prüßmann

• @2digitsLeft

[email protected]


Recommended