Date post: | 10-Jun-2015 |
Category: |
Technology |
Upload: | marc-fielding |
View: | 551 times |
Download: | 2 times |
It’s a Solid State World
How Exadata X3 leverages flash memory
Gwen Shapira
Marc Fielding
About Gwen – Solutions Architect,
Cloudera
– Oracle ACE Director
– Presents, Blogs, Tweets
– @gwenshap
© 2013 Pythian 2
About Marc
© 2013 Pythian 3
• Senior Consultant with Pythian’s
Advanced Technology Group
• 12+ years Oracle production
systems experience starting with
Oracle 7
• Blogger and conference
presenter
pythian.com/news/author/fielding
• Occasionally on twitter: @mfild
Remember your first SSD?
… you’ll never forget it
4 © 2013 Pythian
Sh*t people say about SSDs
© 2013 Pythian 5
Fast for reads
Don’t use for writes
Use for random writes
Don’t use for REDO
Used for REDO
Only used in Exadata
Only Sun flash devices are supported
Unreliable
Becomes slower over time
Type of SSD matters
Use SATA SSD
Use PCI SSD Use SSD in SAN
Too expensive
Is it same as Flash?
© 2013 Pythian 6
Solid State Disk
=
No moving parts
=
Low-latency random I/O
The technology: NAND flash • Slower than RAM, but both
nonvolatile and affordable in large capacities
• SLC
– One bit per cell
– High performance
• MLC
– Two bits per cell
– More capacity = cheaper
© 2013 Pythian 7
0
1
00
01
10
11
We will talk about • I/O Performance
• Using SSDs for Oracle
• How Exadata uses SSDs
• SSD devices
• Practice: Reading SSD
Vendor Specs
© 2013 Pythian 8
Cells, pages, and blocks
© 2013 Pythian 9
Cell
1bit Page
4K
Block
128 Pages
512K
Plane = 1024 Blocks = 512MB
Planes are grouped into dies
which are grouped into packages
The big gocha
• Reads = 4KB pages
• Writes = 4KB pages
• Deletes = 512KB blocks
© 2013 Pythian 10
Reads: orders of magnitude • CPU registers – 0.3 * ns (1 cycle)
• CPU Cache L1 – 1.2* ns
• CPU Cache L2 – 3.0* ns
• CPU Cache L3 – 12-24 ns
• Main Memory (RAM) – 60-100 ns
• SSD – 60,000 ns
• Magnetic Storage (“DISK”) – 3,000,000 ns
• SAN devices ~ 15,000,000 ns
© 2013 Pythian 12
Don’t forget throughput • 15K RPM SAS HDD – 120-200MB/s
• PCIe SSD – 1-2GB/s
• But … How many disks do you use?
• Network bandwidth?
• CPU Bus bandwidth?
© 2013 Pythian 13
Writes
• Writes on new SSD – 250,000 ns
• Comparable to rotating disk
How much data can you write to a new 250GB
SSD?
© 2013 Pythian 14
Deletes • Can’t overwrite data without deleting first
• Can only delete blocks of 128*4K pages
• To Overwrite a page:
– Read 127 pages
– Write 127 to a free block
– Delete old block
– Perform the write we originally requested
• Takes 2ms
• Each cell can only be written 100K times
© 2013 Pythian 15
The SSD controller • Does the “magic” behind the scenes
• Deletes in the background (“garbage collection”)
• Tracks free space
• Balances I/O over cells
(“wear leveling”)
• Manages spare capacity
(“overprovisioning”)
• Manages RAM cache
© 2013 Pythian 16
The consequences • Write Amplification
– How much data is really written when we write 1MB
– 1 means no overhead
– The closer to 1 the better
– Less than 1 means the vendor is lying
• Never benchmark a brand-new SSD – Run benchmarks long enough to run out of
overprovisioned space
© 2013 Pythian 17
We will talk about • I/O Performance
• Using SSDs for Oracle
• How Exadata uses SSDs
• SSD devices
• Practice: Reading SSD
Vendor Specs
© 2013 Pythian 18
© 2013 Pythian 22
Solid-state your whole database?
• SSDs solve I/O latency problems
• But not if db file sequential read is not in your
top 5 wait events
• And not if you haven’t maxed out your RAM for buffer
cache (yet)
• If your CPU utilization is high, solve this first.
© 2013 Pythian 23
SSD mistakes
• SSD in primary but not DR site
– I/O capacity to apply real-time updates
– What if you need a switchover
• Over-managing active segments
– If DBAs didn’t have enough to do already…
• Database smart flash cache
© 2013 Pythian 25
Database “smart” flash cache
© 2013 Pythian 26
Disk
SGA
Flash Cache
Block
read from
disk
Block evicted
from SGA is
written to
SSD cache
by DBWR
If block is
needed, it is
read from
SSD
Database “smart” flash cache • Pros:
– Automatically keeps active data in SSD
• Cons: – Large overhead for managing cache, all taken from SGA
– Overhead for DBWR
– No benefit and some overhead for writes
– Only one disk
Using Smart Flash Cache will make your I/O faster than using just disks, but smartly placing data on SSD will be even faster.
© 2013 Pythian 27
We will talk about • I/O Performance
• Using SSDs for Oracle
• How Exadata uses SSDs
• SSD devices
• Practice: Reading SSD
Vendor Specs
© 2013 Pythian 28
In the beginning • Exadata V1, 2008
• Joint project of HP and Oracle
• Designed for big and long-running
queries (think data warehouses)
• No flash cache
© 2013 Pythian 29
And then • Exadata V2, 2009
• Brand-new PCI-based flash cache
• Integrated with storage servers
• A full high-performance rack has:
– 4 * 14 Sun F20 flash accelerator cards
– 96GB * 4 * 14 = 5.4TB SLC flash
– 75 GB/sec flash throughput
– 1.5m IOPS
• Note that InfiniBand will limit you to 4GB/sec per DB node
© 2013 Pythian 30
Fast-forward to 2012 • Exadata X3, 2012
• Still integrated with storage servers
• A full high-performance rack has:
– 4 * 14 Sun F40 flash accelerator cards
– 400GB * 4 * 14 = 22.4TB MLC flash
– 100 GB/sec flash throughput
– 1.5m IOPS
• Same InfiniBand speeds
© 2013 Pythian 31
Just announced • Flash cache compression
– Fit more data into your flash
– Exadata hardware support TBD
– Only if the data isn’t already compressed (HCC)
© 2013 Pythian 32
Exadata smart flash cache
• Not the database smart flash cache
• No victim caching here
• Flash memory on storage servers
• Can be used for traditional storage too (but you
lose capacity to redundancy)
© 2013 Pythian 33
Uncached reads
© 2013 Pythian 34
1. Uncached data is read
from disk first
2. Sent to the database
3. and then copied to cache
Disks SSD Cache
cellsrv Database
Cached reads – Cached blocks come from
flash cache directly
– Except smart scans: disk only
– If you set cell_flash_cache keep
they read from
both disk and flash
© 2013 Pythian 35
Disks SSD Cache
cellsrv Database
Writes (1) – Writes go to disk first
– Then copied to cache,
sometimes
• Indexes and tables with
random read I/O are
prioritized
• Or use cell_flash_cache
keep
© 2013 Pythian 36
Disks SSD Cache
cellsrv Database
Writes (2) – Write back cache
– 11.2.0.3 BP9+
– Writes go to SSD first
– Then copied to disk,
eventually
© 2013 37
Disks SSD Cache
cellsrv Database
Exadata smart flash logging • In some Exadata systems: I/O outliers
• Slow log file syncs
• But aren’t flash writes slow?
• We now write to both disk and flash
• Puts an upper limit on latency
• Data corruption bug fixed in
11.2.3.2.1, and ASM resilvering
bug fixed in 11.2.0.3 BP9
© 2013 Pythian 38
Mixed workloads • Classic example: OLTP and DW on
same system
• DW does long-running, I/O-intensive
queries
• OLTP does relatively little I/O transfer
• But OLTP very latency sensitive
• DW monopolizes the flash cache
• How to prioritize cache for OLTP?
© 2013 Pythian 39
The workaround • Control via I/O resource manager alter iormplan dbplan=((name=dss, level=1, flashcache=off),
(name=other, level=1, flashCache=on));
• Disables flash cache entirely for a DB
• Very coarse control: on or off
• Obvious effect in I/O performance
• Use only if you need it
• cellcli list flashcachecontent can show what is in the cache
© 2013 Pythian 40
We will talk about • I/O Performance
• Using SSDs for Oracle
• How Exadata uses SSDs
• SSD devices
• Practice: Reading SSD
Vendor Specs
© 2013 Pythian 41
Interfaces • SATA
– 32 outstanding IO
– 6Gb/s = 600MB/s
– significant latency
• SAS
– 256 outstanding IO
– 6Gb/s = 600MB/s
© 2013 Pythian 42
Interfaces • PCIe
– “Flash” “Accelerator”
– Multiple 500 MB/s lanes
– Low latency
– Multiple SAS/SATA controllers on card
for extra throughput
© 2013 Pythian 43
Interfaces
• Fiber channel
– Use existing storage
infrastructure
– High latency
– Shared: works with RAC
• Proprietary PCI
– By flash array vendors
– Avoids latency penalty of FC
© 2013 Pythian 44
We will talk about • I/O Performance
• Using SSDs for Oracle
• How Exadata uses SSDs
• SSD devices
• Practice: Reading SSD
Vendor Specs
© 2013 Pythian 45
© 2013 Pythian 46
Write faster
than read?
© 2013 Pythian 47
Identical
read/write?
Intel SSD 910
© 2013 Pythian 48
© 2013 Pythian 49
RAMSAN
© 2013 Pythian 50
Wrapping up • SSDs make random reads wicked fast
• Writes and deletes are complicated
• Exadata’s smart flash cache speeds up random reads
• Not all SSDs are the same
• Read vendor specs carefully
© 2013 Pythian 51
Thank you and Q&A
© 2013 Pythian 52
@gwenshap
@mfild