+ All Categories
Home > Documents > OpenStack Swift usage at Turkcell

OpenStack Swift usage at Turkcell

Date post: 14-Feb-2017
Category:
Upload: donhi
View: 241 times
Download: 5 times
Share this document with a friend
29
OpenStack Summit Barcelona, October 2016 1 OpenStack Swift usage at Turkcell Doruk Aksoy Orhan Bıyıklıoglu Christian Schwede
Transcript
Page 1: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20161

OpenStack Swift usage at TurkcellDoruk AksoyOrhan Bıyıklıoglu Christian Schwede

Page 2: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20162

About TurkcellLeading mobile operator in Turkey and the region

● 9 countries

● 66+ million subscribers

Page 3: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20163

AnkaraIstanbul

Page 4: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20164

Turkcells public cloud storage offeringMeet Lifebox (Akıllı Depo)

● Store and share photos, videos & music in the cloud

● https://mylifebox.com

● Can be used on mobile and desktop

● Open to everyone

Page 5: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20165

Turkcells public cloud storage offeringMeet Lifebox (Akıllı Depo)

● Started in 2014 with a legacy solution

● Migrated in 2015 to an in-house developed solution,

using OpenStack Swift as storage backend

● Today: 3.3PB storage, Over 3M users

Page 6: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20166

Page 7: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20167

Page 8: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20168

OpenStack

Page 9: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 20169

Planning the Swift deployment

Page 10: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201610

Using SwiftWhy Swift is a good fit in this case

● Unstructured data: object storage makes sense

○ Metadata stored separately (filenames, directories, tags)

● Availability, durability, scalability, flexibility

○ Failure Resiliency

○ Use existing hardware

○ Run customized middlewares

● Swift can be deployed standalone

Page 11: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201611

Things to keep in mind

● Distribute objects and containers

○ Billions of objects in a single container doesn’t scale

● Keep eventual consistency in mind

● Estimate your growth

○ … and choose your partition powers wisely

● Know your failure domains

○ … and design your rings around them

Plan well and avoid future worries

Page 12: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201612

App architecture

Swift Proxy RabbitMQ

Oracle DB Elasticsearch

ImageMagickffmpeg

Keystone

MySQL DB

App

Page 13: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201613

Swift deployment & monitoring

Page 14: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201614

Initial architecture8 identical servers, 3 storage systems

Loadbalancer

Region 1

swift01

swift02

swift03

Storage 1

Region 2

swift04

swift05

swift06

Storage 2

Region 3

swift07 swift08

Storage 3

Statsd / grafana

Page 15: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201615

Initial architecture

Loadbalancer

Swift Proxy Keystone

Swift Account

Swift Container Swift Object

MySQL

Disks

Page 16: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201616

Deploying SwiftRed Hat Enterprise Linux & OpenStack Platform

● Customized standalone Swift deployment

● Baremetal server deployment using Kickstart

● Manual ring management

● Ansible to install & configure Swift

○ Started using the manual install guide

○ Tuned settings later on based on metrics

Page 17: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201617

● Single Ansible playbook using tags for:

○ Repository management & RPM installation

○ Installation of customized middlewares

○ Configuration & Tuning of Swift & Keystone

○ Ring deployment

○ Enabling & restarting of services

Customized Ansible playbook

Page 18: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201618

MonitoringThe usual suspects: statsd, grafana, recon, ...

● Separate INFO & WARN log files for each service

● statsd metrics collected and visualized using Grafana

● swift-recon to collect important metrics and trigger alarms

● swift-dispersion-report to monitor rebalance progress

● healthcheck middleware queried by existing monitoring system

Page 19: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201619

Page 20: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201620

swift-dispersion-reportMonitor rebalance progress

swift-dispersion-report --object-only

Queried 8192 objects for dispersion reporting, 25s, 0 retriesThere were 3190 partitions missing 0 copy.There were 5002 partitions missing 1 copy.79.65% of object copies found (19574 of 24576)

Page 21: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201621

swift-reconQuerying metrics directly from Swift

curl http://192.168.10.1:6002/recon/load{"5m": 0.18, "15m": 0.35, "processes": 16105, "tasks": "1/131", "1m": 0.11}

swift-recon --replication[replication_time] low: 2863, high: 53089, avg: 24440.5, total: 195523, Failed: 0.0%, no_result: 0, reported: 8Oldest completion was 2016-07-27 21:12:36 (2 days ago) by... Most recent completion was 2016-07-29 21:19:34 (3 hours ago)

Page 22: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201622

Challenges

Page 23: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201623

ChallengesRebalancing, write_affinity & inodes

● Started with 8 servers, added 5 new servers

○ 40% of data needed to be redistributed evenly across all nodes

● write_affinity: write to two regions initially, replicate to 3rd afterwards

○ Requires more space in primary regions

● Growth as fast as new disks/servers added

○ running replicators with handoffs_first helped

Page 24: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201624

Tuning SwiftProcess concurrency, timeouts, cache pressure

● Increased thread concurrency / workers

○ replicator workers affect IOPS

● Increased object-replicator timeout settings

○ node_timeout

○ http_timeout

○ rsync_io_timeout

○ rsync_timeout

Page 25: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201625

Things are seldom what they seemWhen something’s broken, It’s likely not Swift’s fault

● Know your load balancer well

○ Especially when streaming data

● Closely monitor other moving parts

○ Keystone response times

○ low-level IO stats

■ inode cache misses slowed down replication a lot

■ vfs_cache_pressure = 1

Page 26: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201626

Outlook

Page 27: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201627

Growing usageMore servers, distributed services, and more clusters

● Growth is actually higher than initially expected

● Expand server and storage capacity x5 by the end of 2017

● Upgrade RHEL and OpenStack Platform

○ While being in production

○ Add Elasticsearch/Kibana/Logstash

Page 28: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201628

Next stepsMore servers, distributed services, and more clusters

● Run services separately

○ Few dedicated Keystone servers

○ Dedicated object storage nodes

○ Using storage policy to keep disk usage balanced

● Second Swift cluster for different app in place

Page 29: OpenStack Swift usage at Turkcell

OpenStack Summit Barcelona, October 201629

Questions?


Recommended