Date post: | 02-Nov-2014 |
Category: |
Technology |
Upload: | angelo-rajadurai |
View: | 8,691 times |
Download: | 10 times |
Tuning Storage Subsystem for DatabasesAngelo Rajadurai
Agenda
Performance issues in StorageHybrid Storage (Disks, SSDs, Memory)ZFS - Not Just Another File SystemTuning for databases (General principles)Tuning for MySQLTuning for PostgreSQLTuning for Oracle
• Some very practical advice based on > recent test results
> Improved pgbench results from 70 tps for pure disk to 5003 tps with SSD and tuning
> Improved sysbench results from 425 tps to 1811 tps with SSD and tuning for read/write.
> Improved sysbench results from 786 tps to 3085 tps with SSD and tuning for read.
> collection of tuning knowledge from Sun performance engineers and the community
• Some very good resources at the end of the talk for further study
Why?
Storage Performance
Cache
Memory
Disk Cache
Large Capacity Disks
High Performance Disks
Price
Performance
100,000 X
Performance diffe
rential
Latency ComparisonBridging the DRAM to HDD Gap
1nS
10nS
100nS
1uS
10uS
100uS
1mS
10mS
100mS
1 S
TAPE
HDDFLASH/
SSD
DRAMCPU
Storage TechnologyPrice, Performance & Capacity
Technologies Capacity (GB)
Latency(microS) IOPs Cost/IOPS
($)Cost/GB($)
CloudStorage Unlimited 60,000 20 17c/GB 0.15/month
Capacity HDDs 2,500 12,000 250 1.67 0.15
Performance HDDs 300 7,000 500 1.52 1.30
SSDs(write) 64 300 5000 0.20 13
SSDs(read only) 64 45 30,000 0.03 13
DRAM 8 0.005 500,000 0.001 52
Incorporating FlashStorage Hierarchy
Hybrid StorageFlash as Cache
Application
DRAMLevel 1 Cache
FlashLevel 2 Cache
Disk Primary Storage
Rea
d FlashWrite side Log
Write
ZFS - Last Word in FilesystemPooled Storage Vs Traditional Volumes
Data Management UnitSmarts Built Right Into the Filesystem
Administering ZFS in two slidesAs easy as pie
• zpool commands> create a single disk pool:
# zpool create newpool diskname> create a pool with a mirror
# zpool create newpool mirror disk1name disk2name> Add device to a pool:
# zpool add poolname diskname> Replace a bad disk
# zpool replace poolname baddiskname newdiskname> History of commands on the pool:
# zpool history poolname> How is my pool performing:
# zpool iostat poolname
No format command, No fdisk partitions, No volumes
Administering ZFS in two slidesAs easy as pie
• zfs commands> create a filesystem:
# zfs create poolname/fs-name> set filesystem property:
# zfs set quota=size poolname/fs-name# zfs set compression=on poolname/fs-name# zfs set nfsshare=on poolname/fs-name# zfs set recordsize=16k poolname/fs-name
> get filesystem property: # zfs get compressratio poolname/fs-name# zfs get all poolname/fs-name
> snapshot the filesystem: # zfs snapshot poolname/fs-name@snapshotname
No newfs, No mkfs, No /etc/vfstab, No fsck
ZFS and Hybrid StorageAs easy as pie
• Read side> Add ssd as a read side cache> # zpool add poolname cache ssd-device
• Write side> Add SSD as a ZFS Intent Log device> # zpool add poolname log ssd-device
ZFS Performance Features
• Copy-on-write> Turns Random writes to Sequential writes
• Dynamic Striping across all devices> Maximize throughput
• Multiple Block Sizes> Automatically chosen to match workload
• IO Pipelining> Priority/Deadline scheduling, sorting, aggregation
• Intelligent prefetch• Compression - Improves performance & Capacity• Can safely use write cache on disks
DatabasesNot Just Another Application
• Most Databases do their own buffering> Filesystem caching can get in the way> “double buffer” problem
• Most Databases do “prefetch”> Filesystems prefetch can cause extra IO> “directio” gets filesystem out of the way
• Have their own “log” mechanism. > Interesting interaction with a transaction based filesystem
• Multiple blocks sizes> Database & Transaction log, block sizes are normally different
Tuning ZFS for DatabasesTuning is Evil - Long live Tuning
• In general tuning is evil. Let ZFS do it for you.• A few fine tuning tips for databases
> Get to the latest update of OS> Set the recordsize to match database
block size> Separate Transaction logs and data
onto separate zpools> [Note: This will be addressed with the ZIL bypass property fix]> Reduce the impact of double buffering by changing the caching
method to “metadata only”> Use separate ZIL (ZFS Intent Log) preferably SSD> Use SSD as secondary cache - L2ARC (Level 2. Adaptive
Replacement Cache)
ZFS tuning for MySQL• Many tuning depends on storage engine• For Innodb
> Prefer to cache in Innodb rather than ARC zfs set primarycache=metadata poolname/database
> Set recordsize to 16k for data and 128k for log zfs set recordsize=16k poolname/database (Note: do this before you load any data)
> Turn off prefetch set zfs:zfs_prefetch_disable = 1 (in /etc/system) (File level prefetch not triggered if you change record size to 16k)
> Use raid0 or mirror over raidz raidz is no suitable for random IO
> Add SSDs for either read side or write side based on workload zpool add datapool cache ssd-disk zpool create logpool ssd-disk3 In my.cnf set innodb_data_home_dir & innodb_log_group_home_dir
ZFS tuning for MySQL
• More tuning for Innodb> Some device vendors flush cache even
when not needed. (eg. battery backed cache) set zfs:zfs_nocacheflush = 1
> Turn on compression zfs set compression=on poolname/database ZFS does not turn on compression if less than 12.5% saving. IO reduction may offset the extra cpu cost
> Disable double writes innodb_doublewrite=0 (in my.cnf) ZFS does not allow any partial writes so no need to guard against it.
ZFS tuning for PostgreSQL
• Postgres tuning hints> Set recordsize to 8k
zfs set recordsize=8k poolname/database> Turn down ARC cache.
set zfs:zfs_arc_max in /etc/system> Add SSDs for either read side or write side based on workload
zpool add poolname cache ssd-name zpool add poolname log ssd-name
> Use separate pool for log (preferably one with SSD) & data initdb -X log_directory_name create tablespace datatbs location 'database_directory_name' create database mydb with tablespace datatbs
> Don’t forget to basic Postgres tuning on Solaris - (huge gains) Set shared_buffers, temp_buffers, work_mem, maintenance_work_mem, wal_sync_method, synchronous_commits etc see: http://blogs.sun.com/jkshah/entry/best_practices_with_postgresql_8
ZFS tuning for Oracle
• Oracle tuning hints> Set recordsize to match db_block_size (default 8k)
zfs set recordsize=8k poolname/database> Use separate pool for Oracle logs
make sure record size of the log filesystem is left to the 128k default> Add SSDs for either read side or write side based on workload
zpool add poolname cache ssd-name zpool add poolname log ssd-name
Benchmark results
• Hardware> Sun x4150 2 x Quad core 2.3 GHz Xeon
12 GB ram 3 x 10000 rpm drives 3 x 32 GB SSDs
• Software> OpenSolaris 2009.06> Postgres 8.3.7> MySQL 5.4 beta
Benchmark results
• pgbench & Postgres> command line: pgbench -c 10 -s 10 -t 10000 pgbench
Description TPSSingle disk ZFS 72 tps
2 Raid 0 disk + SSD as level 2 cache 241 tps
Above + general postgres optimization 2026 tps
+ all the data on SSD 2603 tps
+ data on hdd & log on SSD 4372 tps
+ primarycache=metadata 5003 tps
Benchmark results
• sysbench & mysql 5.4> read/write test: sysbench --max-time=300 --max-requests=0 --test=oltp --
oltp-dist-type=special --oltp-table-size=10000000 --num-threads=20 run
Description TPSSingle disk ZFS 425 tps
raid0 ZFS 670 tps
+ SSD cache 788 tps
+ Separate intent log 1352 tps
+ With optimization 1809 tps
Benchmark results
• sysbench & mysql 5.4> read test: sysbench --max-time=300 --max-requests=0 --test=oltp --oltp-
dist-type=special --oltp-table-size=10000000 --num-threads=20 --oltp-read-only=on run
Description TPSSingle disk ZFS 786 tps
2 disk raid0 ZFS 1501 tps
+ SSD cache 1981 tps
+ Separate intent log on SSD 2567 tps
+ optimization 3065 tps
!"#$ %&'(&)*+,&-./'#0)1*#+),23$ 456$ 7*8")&*1
!"#$!%&'()*$'&(+,(-$.//012.$
9
*#%'345*6*5$!%(#+(5&#*
:$;<$=*='&-$>$?@A9BB;<$!6!$1)CDC
78)74+*#!8%3$!9(5(:5*
E($+'$F@;<$=*='&-G$142A?H<$!6H6$1)CDC
E($+'$IA?:$;< write-optimized SSDs
!9(5(:5*$95"!%*'49(-(:5*
E($+'$?I:;<$=*='&-G$I::A?H<$!6H6$1)CDC
E($+'$FA?BB;<$&*,1.?FA?:;<$J&)+*$!!5C
6K+)L*M6K+)L*$/2"C+*&)#N$0'&$O.6
&-%8&#(5$#*%;&'<8#)$=$9&##*9%868%3$>(55$,&+*5!?
IA?B$;P$Q+R*&#*+$S'(+)K,2T
@A?$;P$Q+R*&#*+$SK'((*&T$>$IA?;P$Q+R*&#*+$S'(+)K,2T
U/$'&$!/!V$O<6$0'&$+,(*$P,KD"(
!%(#+('+$@*(%"'*!$>(55$,&+*5!?
622$5,+,$%&'+'K'2C$,#1$5,+,$!*&L)K*C$V#K2"1*1S6#,2-+)KCG$!#,(CR'+CG$7*(2)K,+)'#G$/'=(&*CC)'#G$4U!G$/VU!G$)!/!VWT
<")2+M)#$@A?$;P$Q+R*&#*+$('&+C
7*='+*$/'#C'2*$S!*&),2$'&$Q+R*&#*+T
X)NR+CM'"+$Y,#,N*=*#+
*#%'345*6*5$95"!%*'49(-(:5*
E($+'$F@;<$=*='&-G$ZFA?H<$!6H6$1)CDC
E($+'$FA?BB;<$&*,1.?FA?:;<$J&)+*$!!5C
6K+)L*M6K+)L*$/2"C+*&)#N$0'&$O.6
Sun Unified Storage
Getting these systems at a discountSun Startup Essentials
• Exclusive program for startups• Eligibility <6 yrs. Old, <150
employees• Co-marketing opportunities• Funding assistance• Deeply discounted storage and
servers certified for Linux, Windows, and Solaris
• Hosting starting at $40• Open source software, and
discounted MySQL• Free email based tech support • Free and discounted training on
Sun technologies • Member-only webinars
sun.com/startup
Resources
• ZFS info: http://www.opensolaris.org/os/community/zfs/
• ZFS Best Practices Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
• ZFS Evil Tuning Guide:http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
• Blogs of note:> All things performance tuning:
http://blogs.sun.com/realneelhttp://blogs.sun.com/roch
> Postgres tuning - Jignesh’s Bloghttp://blogs.sun.com/jkshah
> Angelo’s bloghttp://blogs.sun.com/angelo
Tuning Storage Subsystem for DatabasesAngelo [email protected]://blogs.sun.com/angelotwitter: rajadurai