OpenStack Swift usage at Turkcell

Post on 14-Feb-2017

241 views 5 download

transcript

OpenStack Summit Barcelona, October 20161

OpenStack Swift usage at TurkcellDoruk AksoyOrhan Bıyıklıoglu Christian Schwede

OpenStack Summit Barcelona, October 20162

About TurkcellLeading mobile operator in Turkey and the region

● 9 countries

● 66+ million subscribers

OpenStack Summit Barcelona, October 20163

AnkaraIstanbul

OpenStack Summit Barcelona, October 20164

Turkcells public cloud storage offeringMeet Lifebox (Akıllı Depo)

● Store and share photos, videos & music in the cloud

● https://mylifebox.com

● Can be used on mobile and desktop

● Open to everyone

OpenStack Summit Barcelona, October 20165

Turkcells public cloud storage offeringMeet Lifebox (Akıllı Depo)

● Started in 2014 with a legacy solution

● Migrated in 2015 to an in-house developed solution,

using OpenStack Swift as storage backend

● Today: 3.3PB storage, Over 3M users

OpenStack Summit Barcelona, October 20166

OpenStack Summit Barcelona, October 20167

OpenStack Summit Barcelona, October 20168

OpenStack

OpenStack Summit Barcelona, October 20169

Planning the Swift deployment

OpenStack Summit Barcelona, October 201610

Using SwiftWhy Swift is a good fit in this case

● Unstructured data: object storage makes sense

○ Metadata stored separately (filenames, directories, tags)

● Availability, durability, scalability, flexibility

○ Failure Resiliency

○ Use existing hardware

○ Run customized middlewares

● Swift can be deployed standalone

OpenStack Summit Barcelona, October 201611

Things to keep in mind

● Distribute objects and containers

○ Billions of objects in a single container doesn’t scale

● Keep eventual consistency in mind

● Estimate your growth

○ … and choose your partition powers wisely

● Know your failure domains

○ … and design your rings around them

Plan well and avoid future worries

OpenStack Summit Barcelona, October 201612

App architecture

Swift Proxy RabbitMQ

Oracle DB Elasticsearch

ImageMagickffmpeg

Keystone

MySQL DB

App

OpenStack Summit Barcelona, October 201613

Swift deployment & monitoring

OpenStack Summit Barcelona, October 201614

Initial architecture8 identical servers, 3 storage systems

Loadbalancer

Region 1

swift01

swift02

swift03

Storage 1

Region 2

swift04

swift05

swift06

Storage 2

Region 3

swift07 swift08

Storage 3

Statsd / grafana

OpenStack Summit Barcelona, October 201615

Initial architecture

Loadbalancer

Swift Proxy Keystone

Swift Account

Swift Container Swift Object

MySQL

Disks

OpenStack Summit Barcelona, October 201616

Deploying SwiftRed Hat Enterprise Linux & OpenStack Platform

● Customized standalone Swift deployment

● Baremetal server deployment using Kickstart

● Manual ring management

● Ansible to install & configure Swift

○ Started using the manual install guide

○ Tuned settings later on based on metrics

OpenStack Summit Barcelona, October 201617

● Single Ansible playbook using tags for:

○ Repository management & RPM installation

○ Installation of customized middlewares

○ Configuration & Tuning of Swift & Keystone

○ Ring deployment

○ Enabling & restarting of services

Customized Ansible playbook

OpenStack Summit Barcelona, October 201618

MonitoringThe usual suspects: statsd, grafana, recon, ...

● Separate INFO & WARN log files for each service

● statsd metrics collected and visualized using Grafana

● swift-recon to collect important metrics and trigger alarms

● swift-dispersion-report to monitor rebalance progress

● healthcheck middleware queried by existing monitoring system

OpenStack Summit Barcelona, October 201619

OpenStack Summit Barcelona, October 201620

swift-dispersion-reportMonitor rebalance progress

swift-dispersion-report --object-only

Queried 8192 objects for dispersion reporting, 25s, 0 retriesThere were 3190 partitions missing 0 copy.There were 5002 partitions missing 1 copy.79.65% of object copies found (19574 of 24576)

OpenStack Summit Barcelona, October 201621

swift-reconQuerying metrics directly from Swift

curl http://192.168.10.1:6002/recon/load{"5m": 0.18, "15m": 0.35, "processes": 16105, "tasks": "1/131", "1m": 0.11}

swift-recon --replication[replication_time] low: 2863, high: 53089, avg: 24440.5, total: 195523, Failed: 0.0%, no_result: 0, reported: 8Oldest completion was 2016-07-27 21:12:36 (2 days ago) by... Most recent completion was 2016-07-29 21:19:34 (3 hours ago)

OpenStack Summit Barcelona, October 201622

Challenges

OpenStack Summit Barcelona, October 201623

ChallengesRebalancing, write_affinity & inodes

● Started with 8 servers, added 5 new servers

○ 40% of data needed to be redistributed evenly across all nodes

● write_affinity: write to two regions initially, replicate to 3rd afterwards

○ Requires more space in primary regions

● Growth as fast as new disks/servers added

○ running replicators with handoffs_first helped

OpenStack Summit Barcelona, October 201624

Tuning SwiftProcess concurrency, timeouts, cache pressure

● Increased thread concurrency / workers

○ replicator workers affect IOPS

● Increased object-replicator timeout settings

○ node_timeout

○ http_timeout

○ rsync_io_timeout

○ rsync_timeout

OpenStack Summit Barcelona, October 201625

Things are seldom what they seemWhen something’s broken, It’s likely not Swift’s fault

● Know your load balancer well

○ Especially when streaming data

● Closely monitor other moving parts

○ Keystone response times

○ low-level IO stats

■ inode cache misses slowed down replication a lot

■ vfs_cache_pressure = 1

OpenStack Summit Barcelona, October 201626

Outlook

OpenStack Summit Barcelona, October 201627

Growing usageMore servers, distributed services, and more clusters

● Growth is actually higher than initially expected

● Expand server and storage capacity x5 by the end of 2017

● Upgrade RHEL and OpenStack Platform

○ While being in production

○ Add Elasticsearch/Kibana/Logstash

OpenStack Summit Barcelona, October 201628

Next stepsMore servers, distributed services, and more clusters

● Run services separately

○ Few dedicated Keystone servers

○ Dedicated object storage nodes

○ Using storage policy to keep disk usage balanced

● Second Swift cluster for different app in place

OpenStack Summit Barcelona, October 201629

Questions?