• Managed Services • Cloud Services • Consul3ng Services • Licensing
Using Exadata Flash Disks to Speed up Joins and Sorts Kasey Parker Enterprise Architect [email protected]
Managed Services
Cloud Services
Consul3ng Services
Licensing
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Who is Centroid?
§ Centroid is a leading provider of Oracle Technology, Applica8ons and Infrastructure/Hos8ng solu8ons
§ Oracle Pla8num Partner • Oracle Excellence Award: 2014 Engineered Systems Partner of the Year • Selected to Oracle’s Top 25 Strategic Partner Program • Top 5 Oracle Partner for Hardware/Storage
§ Established in 1997
§ Office loca8ons: Troy, MI (HQ); San Francisco, CA; Los Angeles, CA; Dallas, TX
§ 200+ Consultants
§ “Clients for life” approach to customer rela8onships
§ Oracle Exadata Center of Excellence established in 2011 • Centroid Authored -‐ Oracle Exadata Recipes (Published Feb-‐2013)
QUICK FACTS
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Agenda § Exadata Overview § Why Exadata? § Exadata Flash
§ Smart Flash Cache § Flash-‐based Grid disks
§ Using Flash for Temp § Risk/Reward § How to set it up
§ Q&A
• Managed Services • Cloud Services • Consul3ng Services • Licensing
EXADATA OVERVIEW
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Exadata Architecture Database hardware and soOware plaPorm “in a box”
Scale-‐Out Database Servers • 8x 2-‐socket, or 2x 8-‐socket Xeon database servers • Oracle Database, ASM, RAC; Linux or Solaris • Standard Ethernet to data center
Scale-‐Out Intelligent Storage Servers • 2-‐socket storage servers, Exadata Storage SoOware • Up to 672 terabytes disk per rack • 56 PCI Flash memory cards per rack
InfiniBand Network • Unified internal connec3vity ( 40 Gb/sec )
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Exadata Configura3on Op3ons Start small and grow as needed – upgraded onsite
Half Rack X4-‐2
Full Rack X4-‐2 X4-‐8
Quarter Rack X4-‐2
Eighth Rack X4-‐2
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Exadata Hardware Summary X4-2 Full X4-2 Half X4-2 Quarter X4-2 Eighth
Database Servers 8 4 2 2
Database Grid Cores 192 96 48 24
Database Grid Memory (GB) 2048 (max 4096) 1024 (max 2048) 512 (max 1024) 512 (max 1024)
InfiniBand switches 2 2 2 2
Ethernet switch 1 1 1 1
Exadata Storage Servers 14 7 3 3
Storage Grid CPU Cores 168 84 36 18
Raw Flash Capacity 44.8 TB 22.4 TB 9.6 TB 4.8 TB
Raw Storage Capacity High Perf 200 TB 100 TB 43.2 TB 21.6 TB
High Cap 672 TB 336 TB 144 TB 72 TB
Usable mirrored capacity High Perf 90 TB 45 TB 19 TB 9 TB
High Cap 300 TB 150 TB 63 TB 30 TB
Usable Triple mirrored capacity High Perf 60 TB 30 TB 13 TB 6.3 TB
High Cap 200 TB 100 TB 43 TB 21.5 TB
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Exadata Hardware Summary X4-2 Full X4-2 Half X4-2 Quarter X4-2 Eighth
Database Servers 8 4 2 2
Database Grid Cores 192 96 48 24
Database Grid Memory (GB) 2048 (max 4096) 1024 (max 2048) 512 (max 1024) 512 (max 1024)
InfiniBand switches 2 2 2 2
Ethernet switch 1 1 1 1
Exadata Storage Servers 14 7 3 3
Storage Grid CPU Cores 168 84 36 18
Raw Flash Capacity 44.8 TB 22.4 TB 9.6 TB 4.8 TB
Raw Storage Capacity High Perf 200 TB 100 TB 43.2 TB 21.6 TB
High Cap 672 TB 336 TB 144 TB 72 TB
Usable mirrored capacity High Perf 90 TB 45 TB 19 TB 9 TB
High Cap 300 TB 150 TB 63 TB 30 TB
Usable Triple mirrored capacity High Perf 60 TB 30 TB 13 TB 6.3 TB
High Cap 200 TB 100 TB 43 TB 21.5 TB
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Exadata Hardware Exadata X4-‐2 SQL IO Performance
1 -‐ Bandwidth is peak physical scan bandwidth achieved running SQL, assuming no compression. Effec3ve data bandwidth will be much higher when compression is factored in. 2 -‐ IOPS – Based on read IO requests of size 8K running SQL, typically with sub-‐millisecond latencies. Note that the IO size greatly effects flash IOPS. Others quote IOPS based on 2K, 4K or smaller IOs that are not relevant for databases and measure IOs using low level tools instead of SQL. 3-‐ Actual Performance varies by applica3on. 4 –Load rates are typically limited by database server CPU, not IO. Rates vary based on load method, indexes, data types, compression, and par33oning
X4-2 Full Rack
X4-2 Half Rack
X4-2 Quarter
X4-2 Eighth
Flash Cache SQL Bandwidth1,3
High Cap Disk 100 GB/s 50 GB/s 21.5 GB/s 10.7 GB/s
High Perf Disk 100 GB/s 50 GB/s 21.5 GB/s 10.7 GB/s
Flash SQL IOPS2,3 8K Reads 2,660,000 1,330,000 570,000 285,000
8K Writes 1,960,000 980,000 420,000 210,000
Disk SQL Bandwidth1,3
High Cap Disk 20 GB/s 10 GB/s 4.5 G/s 2.25 GB/s
High Perf Disk 24 GB/s 12 GB/s 5.2 GB/s 2.6 GB/s
Disk SQL IOPS High Cap Disk 32,000 16,000 7,000 3,500
High Perf Disk 50,000 25,000 10,800 5,400
Data Load Rate4 20 TB/hr 10 TB/hr 5 TB/hr 2.5 TB/hr
• Managed Services • Cloud Services • Consul3ng Services • Licensing
WHY EXADATA?
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Why Exadata? Exadata is designed to eliminate the most common bomleneck for large databases…
Timely transfer of large data sets from storage subsystem to database server
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Why Exadata? Solving the IO BoUleneck Solu3on 1: Enlarge the pipe
• Physical disks, on all cells, work in parallel to serve IO requests • Large Infiniband pipe (40GB/Sec)
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Why Exadata? Can’t we do that with other high performance storage soluWons?
YES… There is nothing Magical about Exadata hardware, and it’s s3ll the same Oracle Database
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Intelligent storage – Scale-out InfiniBand storage – Smart Scan query offload
+ + +
Hybrid Columnar Compression – 10x compression for warehouses – 15x compression for archives
Smart PCI Flash Cache – Accelerates random I/O up to 30x – Triples data scan rate
Data remains compressed for scans and
in Flash
Benefits Cascade to Copies
compress
primary DB
standby test dev backup
uncompressed
Exadata’s Secret Sauce
Why Exadata?
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Why Exadata? Solving the IO BoUleneck Solu3on 2: Reduce the IO opera3ons
• Done using Exadata’s Secret Sauce: Storage Offloading, Smart Flash Cache and Hybrid Columnar Compression
• 10X reduc3on in data sent to database servers is common
• Managed Services • Cloud Services • Consul3ng Services • Licensing
EXADATA FLASH
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Smart Flash Cache • Caches Read and Write I/Os in PCI flash • Transparently accelerates read and write intensive
workloads – Up to 2.66 million 8K read IOPS from SQL – Up to 1.96 million 8K write IOPS from SQL
• Persistent write cache speeds database recovery • Exadata Flash Cache is much more effec3ve than
flash 3ering architectures used by others – Caches current hot data, not yesterday’s – Caches data in granules 8x to 16x smaller than 3ering
• Greatly improves the effec3veness of flash
I/Os
2.66 Million 8K Read 1.96 Million 8K Write IOPS from SQL
Other Flash Features can be configured if needed E.g. Cache compression, Cache pinning, Flash Disks (for Temp)
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Smart Flash Cache • Caches Read and Write I/Os in PCI flash • Transparently accelerates read and write intensive
workloads – Up to 2.66 million 8K read IOPS from SQL – Up to 1.96 million 8K write IOPS from SQL
• Persistent write cache speeds database recovery • Exadata Flash Cache is much more effec3ve than
flash 3ering architectures used by others – Caches current hot data, not yesterday’s – Caches data in granules 8x to 16x smaller than 3ering
• Greatly improves the effec3veness of flash
I/Os
2.66 Million 8K Read 1.96 Million 8K Write IOPS from SQL
Other Flash Features can be configured if needed E.g. Cache compression, Cache pinning, Flash Disks (for Temp)
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Flash Based Cell Disks Usage • Smart Flash Cache
– Uses all available space by default – Managed automa3cally for maximum efficiency
• Flash-‐based Grid Disks – Premium, persistent database storage – Requires deliberate planning for efficient usage – One poten3al use case is for temp tablespace
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Why Use Flash for Temp? • Even well tuned data warehouse environments oOen have
significant temp I/O – Par3cularly related to large hash joins
• Large quan3ty of Exadata flash provides a great mechanism to offload IOPS from hard disks and improve temp I/O performance
• Makes even more sense with flash cache compression and more flash on newer Exadatas
• By default Exadata does nothing to speed up temp I/O – No storage cell offloading for temp – Flash cache is not used for temp opera3ons
• Must use flash-‐based grid disks to use flash for temp opera3ons
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Risks Evaluate the trade-‐offs and determine if flash-‐based grid disks are right for your environment • Reduced flash cache size
– More impacPul for OLTP workloads than DW workloads
• Redundancy requirements for temp tablespace – External redundancy carries availability risk – even for temp – Normal redundancy requires using double the amount of flash
• May require addi3onal maintenance during patching • Does your database even have a lot of temp I/O?
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Reward Case Study: • Large organiza3on in Salt Lake City Area • Data Warehouse running on Exadata X2-‐2 ¼ rack (1TB flash) • Temp I/O significant bomleneck
• Temp read and write I/O 4th and 5th top wait events on DB • Significant performance improvement aOer moving temp
tablespace to 340GB flash disk • Dropped temp I/O out of the top 10 wait events • Temp I/O latency reduced 8X • Temp heavy SQL saw average of 3X performance improvement
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Temp Tablespace on Flash 1) Determine size of flash-‐based grid disk 2) Drop Flash Cache 3) Recreate Flash Cache at new reduced size 4) Create the flash grid disk 5) Create ASM Diskgroup on the flash grid disk 6) Create temporary tablespace on the new diskgroup 7) Alter users to use the new temp tablespace
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks
CELLCLI> list celldisk detail name: CD_00_cm01celadm01
diskType: HardDisk size: 528.734375G
... (12 total hard disks disks on a cell) name: FD_00_cm01celadm01 diskType: FlashDisk
size: 22.875G ... (15 total flash disks on a cell)
Find Current Celldisk detail using CellCLI:
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks
CELLCLI> list flashcache detail name: cm01celadm01_FLASHCACHE
size: 364.75G
Find Current Flashcache detail:
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks Note: If keeping databases online do all steps to create the flash-‐based grid disk one cell at a 3me, otherwise could do cells in parallel. 1) Calculate the new flash cache and flash grid disk sizes:
• E.g. 100GB flash grid disk on X2-‐2 ¼ rack • 364.75 (original flash cache) * 3 (cells) -‐ 100 (flash disk) =
994.25GB • 994.25GB available for flash cache • Divide by number of cells (e.g. 3) = 331.417GB flashcache per
cell
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks
oracle@cm01dbadm01 ~]$ dcli -c cm01celadm01 cellcli -e drop flashcache
cm01celadm01: Flash cache cm01celadm01_FLASHCACHE successfully dropped
2) Drop Flash Cache: • If keeping databases online do one cell at a 3me,
otherwise could do in parallel • If write-‐back flash is enabled will need to flush first
• cellcli > alter flashcache all flush
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks
[oracle@cm01dbadm01 ~]$ dcli -c cm01celadm01 cellcli -e create flashcache all size=331.4167g
cm01celadm01: Flash cache cm01celadm01_FLASHCACHE successfully created
3) Create Flashcache at new size: • Total flash minus the size of the new grid disk
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks
[oracle@cm01dbadm01 ~]$ dcli -c cm01celadm01 cellcli -e create griddisk all flashdisk prefix=flash
cm01celadm01: GridDisk flash_FD_00_cm01celadm01 successfully created
cm01celadm01: GridDisk flash_FD_02_cm01celadm01 successfully created
… cm01celadm01: GridDisk
flash_FD_15_cm01celadm01 successfully created
4) Create new Flash-‐based Grid Disks
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Crea3ng Flash-‐based Grid Disks Repeat Steps 2-‐4 for other storage cells • E.g.
• dcli -‐c cm01celadm02 cellcli -‐e drop flashcache • dcli -‐c cm01celadm02 cellcli -‐e create flashcache all
size=331.4167g • dcli -‐c cm01celadm02 cellcli -‐e create griddisk all flashdisk
prefix=flash • dcli -‐c cm01celadm03 cellcli -‐e drop flashcache • dcli -‐c cm01celadm03 cellcli -‐e create flashcache all
size=331.4167g • dcli -‐c cm01celadm03 cellcli -‐e create griddisk all flashdisk
prefix=flash
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Create Flash-‐based ASM Diskgroup
SQL> create diskgroup FLASH_TEMP normal redundancy disk 'o/*/flash*'
attribute 'compatible.rdbms'='11.2.0.2.0',
'compatible.asm'='11.2.0.4.0',
'cell.smart_scan_capable'='TRUE', 'au_size'='4M';
Diskgroup created.
1) Login via SQLPlus to ASM instance as sysasm 2) Create new diskgroup on the flash grid disk
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Create Flash-‐based ASM Diskgroup
SELECT GROUP_NUMBER AS GRP_NUM, NAME,
STATE,
TOTAL_MB,
FREE_MB,
USABLE_FILE_MB,
ROUND((CASE WHEN (TOTAL_MB != 0) THEN FREE_MB / TOTAL_MB ELSE 0 END), 2)*100 || '%' PERCENT_FREE
FROM V$ASM_DISKGROUP
ORDER BY 1;
3) Mount new diskgroup on all ASM instances • First, login to each ASM instance and verify state • Should already be mounted on instance created on
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Create Flash-‐based ASM Diskgroup SQL> SELECT GROUP_NUMBER AS GRP_NUM, NAME, STATE, TOTAL_MB, FREE_MB FROM V$ASM_DISKGROUP ORDER BY 1;
GRP_NUM NAME STATE TOTAL_MB FREE_MB
------- ------------ ------------ ----------- ----------- 0 FLASH_TEMP DISMOUNTED 0 0
1 DATAC1 MOUNTED 7815168 7448472
2 DBFS_DG MOUNTED 894720 893344
3 RECOC1 MOUNTED 11674368 11550648
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Create Flash-‐based ASM Diskgroup SQL> alter diskgroup FLASH_TEMP mount;
Diskgroup altered.
SQL> SELECT GROUP_NUMBER AS GRP_NUM, NAME, STATE, TOTAL_MB, FREE_MB FROM V$ASM_DISKGROUP ORDER BY 1;
GRP_NUM NAME STATE TOTAL_MB FREE_MB ------- ------------ ------------ ----------- -----------
1 DATAC1 MOUNTED 7815168 7448472
2 DBFS_DG MOUNTED 894720 893344
3 RECOC1 MOUNTED 11674368 11550648
4 FLASH_TEMP MOUNTED 102912 102168
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Create Flash Temp Tablespace SQL> CREATE TEMPORARY TABLESPACE FLASH_TEMP TEMPFILE
'+FLASH_TEMP' SIZE 20480M AUTOEXTEND ON NEXT 4096M MAXSIZE 20480M,
'+FLASH_TEMP' SIZE 20480M AUTOEXTEND ON NEXT 4096M MAXSIZE 20480M
TABLESPACE GROUP ''
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 64M
/
Tablespace created.
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Move Users to Temp Tablespace SQL> ALTER USER SH TEMPORARY TABLESPACE FLASH_TEMP;
User altered.
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Conclusion When to use Exadata Flash for Temp tablespace? • IO bomlenecked • Significant por3on of IO is from Temp opera3ons • Typically suited more for DW than OLTP workloads
– Because DW workloads typically don’t use flash cache as much and drive larger temp opera3ons
• Newer Exadata versions – have more flash
TEMP
• Managed Services • Cloud Services • Consul3ng Services • Licensing
Ques3ons?