Date post: | 25-Dec-2014 |
Category: |
Technology |
Upload: | grandisau |
View: | 9,763 times |
Download: | 2 times |
In-memory Database& MySQL Cluster
Grandis He ([email protected])
http://www.linkedin.com/in/grandisAny question? Any comment? Just let me know
2010-10-29 1Xiongwei He (Grandis) [email protected]
Personal Introduction
• Before immigration to Australia
– Lead Zero Downtime Upgrade Feature for Alcatel Lucent Subscriber Data Management (MySQL Application Year 2009 Award)
– Lead Super Distributed Home Location Register (SDHLR) developer team which used Oracle TimesTen for 7 years
2010-10-29 2Xiongwei He (Grandis) [email protected]
In-memory Database Development• Before Y2000
– Vendor DIY• NO SQL Support and limited Search Option
• Not easy for management
– Alternative choice: Berkley DB (Key-Value pair)
• After Y2000 – Independent Vendors
– SQL/ODBC/JDBC Support
– Easy for management
• Now – Major database vendors (Except Microsoft) have in memory options by purchasing or self-development
• Market Value for In-memory Database: SAP acquired Sybase –One major reason mentioned in SAP PR is: Sybase In Memory Database
2010-10-29 3Xiongwei He (Grandis) [email protected]
Different ways to be in-memory
• In-memory only database (or called as diskless)
– Data in Memory only
• In-memory cache to database
– Data will be synced to database which sync to disk
• In-memory database
– Data will be written to disk
2010-10-29 4Xiongwei He (Grandis) [email protected]
Note: Only for products with SQL Support
In-memory Only Database (Diskless)
• Typical usage
– Session management
– automatic generated data store such as GPS location data of Smart Phone/Base Station location store of mobile phones
2010-10-29Xiongwei He (Grandis) [email protected]
5
• Mainstream products
– Oracle TimesTen
– MySQL Cluster
– IBM SolidDB
– Sybase ASE IMDB
• Java Open Source DB
– HyperSQL (HSQLDB)
– Apache Derby
– H2
In-memory Cache to Database
• Main products
– IBM SolidDB Universal Cache• Support DB2, Informix, Oracle, Sybase and Microsoft
– Oracle In Memory Cache (Renamed from TimesTen Cache)• Only support Oracle
• Advantage: No change to existing applications and optimized some applications with real time speed
• Cost: Cache License + Database License
• Possible motivation for Oracle to buy TimesTen and IBM to buy SolidDB
2010-10-29Xiongwei He (Grandis) [email protected]
6
In Memory Database
• Abbreviation: IMDB
• Another name: Main Memory Database (MMDB)
• Now it is close to disk based database for operational convenience while holding data in the memory
• Real-time speed to access database (Always use 10 times faster for advertisement)
• Main products– Oracle TimesTen
– MySQL Cluster
– IBM SolidDB
2010-10-29Xiongwei He (Grandis) [email protected]
7
In-memory Database Features
• FULL Database are in memory, Query will not trigger Disk IO
• ACID – Non-duration for fast performance. (Some databases also provide durable option, some databases do NOT)
• Low Level API to access DB beside JDBC/ODBC• Low Latency Super Speed for Database Access
– Speed in microseconds or 2-5 milliseconds– Among Select/Update/Insert, select is fastest, then update,
then insert (latency might be 10 times for select)
• High Throughput • High Availability (HA) Support
2010-10-29 8Xiongwei He (Grandis) [email protected]
HA - Data Safety for 2 Node (Share nothing)
– 2-Safe Durable (For Disk Based Database): 1 transaction will wait all the transaction commit to disk of Node1 and Node2
– 2-Safe Visible (For In-Memory Database): 1 transaction will wait all the transaction commit to system (non-durable for in memory database) of Node1 and Node2
– 2-Safe Received: For transaction issued on Node1, Node1 commit transaction after it receive the message from Node2 that Node2 already received the replication log
– 1-Safe: Node1 transaction commit does not depend on log replication to Node2
2010-10-29Xiongwei He (Grandis) [email protected]
9
HA Term – Sync vs Safe
Sync/Async 2-Safe/1-Safe Other Term
Sync 2-Safe Durable2-Safe Visible
2 Phase Commit
Semi-Sync 2-Safe Received Return reception Replication
Async 1-Safe
2010-10-29Xiongwei He (Grandis) [email protected]
10
HA – Database node redundancy• Two node redundancy mode Share Disk (Used by Disk Based Database – Concern is Disk Array Quality)
• Active/Standby (by 3rd Clusterware)
• Active/Active (Supported by Database Vendors – RAC, ASE Cluster , DB2 pureScale)
Share Nothing (Used by in-memory database)
• Database Active/Standby (Less used)
• Database Active/Standby Read-only, also called as Write/Reader
• Database Active/Active
2-Way Replication Way with Conflict Resolution
3rd Server: extreme reliable NTP server for conflict resolution
2-Phase Commit
• three nodes/four nodes redundancy mode: Mix of above technologies
• Switch Over Behavior for Active/Standby: Standby become Active,
Active become standby [Automatically or Manually]
2010-10-29Xiongwei He (Grandis) [email protected]
12
MySQL Cluster Oracle TimesTen IBM SolidDB
Latest version 7.1 11g release 2 6.5
Share Memory Access
No (Distributed) Yes – Direct DriverConnection
Yes – Shared Memory Access
Latency 2-5ms (Distributed) Tens of microseconds -hundreds of microseconds
Tens of microseconds -hundreds of microseconds
Throughput (Database but NOT application)
Tens of thousands to hundreds of thousands
Tens of thousands to hundreds of thousands
Tens of thousands to hundreds of thousands
Durable Option No Yes Yes
HA Support –Replication
2-Save Visible for NDB Nodes1-Safe for MySQL Cluster
2-Safe Durable2-Safe Visible2-Safe Received1-Safe
2-Safe Durable2-Safe Visible2-Safe Received1-Safe
NodeRedundancy
Active/Active for NDB NodeActive/Active or Active/Standby Read-Onlyfor 2 MySQL Cluster
Active/ActiveActive/Standby Read-only
Active/Standby Read-only
Transaction Isolation
Read Commit SerializableRead Commit
Repeatable Read (PrimaryNode Only for HSB)Read Commit
2010-10-29Xiongwei He (Grandis) [email protected]
13
MySQL Cluster Oracle TimesTen IBM SolidDB
Disk Field Support Yes with index in memory
NoAlternative Solution: Oracle Cache + Oracle Database while index is NOT in memory
Not in field level. But can be whole table while index is NOT in memory
Scalability 256 Nodes at Max, 48 Data Nodes at Max
Limited to Machine Limited to Machine
Diskless option Yes Yes Yes
Change Notification (asynchronized)
NDB Event Notification XLA (Transaction Log API) Transaction Log Reader
Trigger Yes (Only on MySQL Nodes, can not called by NDB API)
No Yes
Store Procedure Yes (Only on MySQL Server Nodes, can not called by NDB API)
Yes Yes
Friendly Interface MySQL Server Interface, JDBC, ODBCNDB API (MySQL Cluster Only)
Oracle Friendly (OCI,PRO*C)JDBC and ODBCXA and JTA (DTP support)TTClass (TimesTen only, C++)
SA API (Low Level API)Light ClientODBC, JDBC (JTA)
MySQL Cluster
• 2003 Acquired Alzato – Ericsson venture
• Another name: NDB Cluster
• Use different version: MySQL Cluster version is different from MySQL Server version
• Cost: Cheap license in comparison to TimesTen
2010-10-29 14Xiongwei He (Grandis) [email protected]
MySQL Cluster Features
• Low Cost – Use commodity hardware without disk array• High reliable
– Shared nothing (better than Shared Disk Array and Mirror for maintenance) in single cluster
– Geo Redundancy Support by Cluster Level Replication
• High performance/frequency (especially with NDB API)• Distributed for application access• Low Latency: 2-5 ms• Disk Field Support: Address the issues for memory
limitation when application need support big field (Major advantage to other IMDBs)
2010-10-29 15Xiongwei He (Grandis) [email protected]
MySQL Cluster Architecture
Data 1 Data N-1
Data 2 Data N
NDB Native API
MySQL App MySQL App
ApplicationMySQLServer
MySQLServer
Management(MGM) Node
NDB Native API
Application
Data 3
Data 4
MySQLServer
MySQLServer
1
2
3
1 N-1
2 N
3
4
1. 2-Phase Commit between Data Nodes
2. Replication between MySQL Cluster
3. Standard MySQL Server Interface
2010-10-29 16Xiongwei He (Grandis) [email protected]
MySQL Cluster Nodes
• Management Node (MGM Node): Node for Data Management (ndb_mgmd) – Multiple MGM Nodes supported– Why there is MGM Node: Monitor the system and also log
is helpful for database startup after shutdown– MGM API: Can be used for develop 3rd monitor software
such as SNMP Agent to notify SNMP manager for Fault Management (Sending alarm for Node Abnormal Status)
• Data Node – Core of NDB Cluster• SQL Node
– MySQL Server Node– NDB API Node
2010-10-29 17Xiongwei He (Grandis) [email protected]
Node Groups, Replica and Partition• Data Node (NDB Node): The node running ndbd or ndbmtd
(multithread version) which stores a replica. – Each Data Node in Data Group can handle traffic
– No conflicts for 2 phase commit since different nodes handle different data
• Replica: Copy of a cluster partition. The number of replicas is equal to number of nodes per group
• Node Groups: A Node Group consists of 1-4 Data Nodes storing same set of data for reliability. One cluster can have multiple data groups – NDB Node Number = Node Group Number * Number of Replica
• Partition: Automatically by Key and Linear Key, or to be defined by user. It make data automatically distributed to different data groups
2010-10-29 18Xiongwei He (Grandis) [email protected]
MySQL Cluster Replication
• Replication latency is little longer than TimesTen/ SolidDB due to Distributed Architecture
• Support 2-way replication but personally suggest:
– Use 2-way replication when the update operations to cluster 1 and cluster 2 are using different keys (for example, odd to cluster 1, even to cluster 2)
– Suggest only use 1-way replication for most applications
• Conflict Resolution is NOT easy for complex scenarios
• Latency due to distributed architecture
2010-10-29 19Xiongwei He (Grandis) [email protected]
Commit, GCP, LCP
• Commit: commit to all the replicas (But in memory only until GCP happen)
• Global Checkpoint (GCP): A GCP occurs every few seconds, when transactions for all nodes are synchronized and the redo-log is flush to disk
• Local Checkpoint (LCP). This is a checkpoint that is specific to a single node. An LCP involves saving all of a node's data to disk, and so usually occurs every few minutes.
2010-10-29 20Xiongwei He (Grandis) [email protected]
Commit, GCP, LCP
• NDB GCP ~= Commit in Disk-based database for Data Safety
• NDB LCP ~= Checkpoint in Disk-based database
• LCP performance (full database flush) – NOT good as Checkpoint in Disk-based database
or TimesTen which flush dirty pages only
– Mitigation: Distributed architecture to make disk I/O reduced on single data node
2010-10-29 21Xiongwei He (Grandis) [email protected]
GCP and LCP for NDB Recovery
• NDB Recovery:– Load LCP
– Load GCP • Why need Global Synchronization for GCP: Make the whole
cluster data in consistence for recovery
– Lose committed transaction in memory for database crash• Mitigation for data safety: Use multiple replicas
“internal driving factors” for distributed architecturewith multiple replica support
2010-10-29 22Xiongwei He (Grandis) [email protected]
Why at least gigabit networking and more latency than TimesTen
• Assume that there is 2 replica case
– NDB1/NDB2 – Paired Data Node in Data Group
– Transaction Coordinator (TC): The NDB node which SQL Node connected
• Update need 10 messages for MySQL App, 8 messages for NDB API App, Read take 5 messages for MySQL App, 3 messages for NDB API App
• Update for MySQL App
1. MySQL App (update statement) MySQL Server
2. MySQL Server(update statement) TC
3. TC (prepare message) NDB1
4. NDB1 (prepare message ) NDB2
5. NDB2 (prepare result) TC
6. TC (commit message) NDB2
7. NDB2 commit and send acknowledge NDB1
8. NDB1 commit and send acknowledge TC
9. TC send result MySQL Server
10. MySQL Server send result MySQL App
2010-10-29Xiongwei He (Grandis) [email protected]
23
Programming Interface
• Distributed nature using multiple MySQL Nodes and NDB API Nodes
– Why: Transaction control by NDB Data Nodes
• The choice
– Standard MySQL Clients
– NDB API (Best way for high performance)
• Single Table Operation
• Can not access triggers but NDB Event
2010-10-29 24Xiongwei He (Grandis) [email protected]
Programming Interface
• Java Interface
– MySQL JDBC:Use Connnector/J 5.1.7+ for load balancing support
– ClusterJ for Java -- Java interface based on NDBAPI
– ClusterJPA – OpenJPA Implementation which take advantage of JDBC for complex query and ClusterJ for single table operation
• LDAP Interface Support (Based on NDB API)– Impressive Performance
– Data Store for OpenLDAP and OpenDS
2010-10-29 25Xiongwei He (Grandis) [email protected]
Your Options
• Using NDB API or NDB API originated interface (Max Performance with several times development cost)
• Using MySQL Interface (Best development efficiency)
• LDAP (Depend on your application type)
2010-10-29 26Xiongwei He (Grandis) [email protected]
System Architecture Input• Performance/Reliability Requirement
– System Volume/Throughput/Latency
– Disk Mirrored or NOT
– Redundancy Model
• Node Redundancy (Example: N+K (K=1 or 2) for SQL Node, 1 or 2 Data Nodes in different data group is down)
• TCP/IP Redundancy (Ethernet port + WAN/LAN Network Redundancy)
– CPU budget for busy hours (related to redundancy mode)
• Memory/Disk per subscriber (per order)
– Memory usage per subscriber (per order)
– Disk usage per subscriber (per order) or Disk/Memory Rate
• Disk I/O Performance/Behavior (MySQL Cluster – LCP)
• Replication
– WAN Budget for Geo Redundancy
– Replication Daemon Throughput
– Replication Latency2010-10-29 27
Xiongwei He (Grandis) [email protected]
Your Adjustment• Hardware Key Indicator
– CPU/Memory/Disk (Speed/Volume)– Switch/Router
• Hardware/WAN Adjustment– CPU/Memory/Disk– NDB Node Number– Network Configuration (LAN)– Router and WAN bandwidth request
• Software Adjustment– Data Model Design Adjustment– Database Tuning– Move “INSERT/DELETE” action to non-busy hours if possible– Service/configuration data local caching (MySQL Cluster – using NDB
Event) to database access for small tables– Others such as Optimized Software System Pattern/Design
2010-10-29Xiongwei He (Grandis) [email protected]
28
Hardware Environment
• Using VMWare for Test and even for functionality demo to customers
• Using Non-ATCA Blade (IBM/HP/Oracle/Dell) or ATCA for Performance Test– 1G/10Gb Ethernet @ backplane– NDB Node: Not necessary for multiple CPU but need
SAS Disk if insert/update/delete take significant percentage in the transactions
– MySQL Cluster Node for replication: Do not turn on Intel Hyper Thread or use Sun CMT CPU for fast replication
2010-10-29 29Xiongwei He (Grandis) [email protected]
Upgrade
• MySQL Cluster without Cluster-Level Replication
• MySQL Cluster with Cluster-Level Replication
2010-10-29 30Xiongwei He (Grandis) [email protected]
Single MySQL Cluster Upgrade
• 4 Data Groups (2 replicas), 2 MGM Node– Stop front end application
– Backup the cluster data
– Split the node into 2 clusters• Old cluster, cluster 1: MGM1, NDB1, NDB3, NDB5, NDB7
• New cluster, cluster 2: MGM2, NDB2, NDB4, NDB6, NDB8
– Upgrade NDB and schema of cluster 2
– Let front end application connect to cluster 2
– Change cluster 1 to be part of cluster 2
2010-10-29 31Xiongwei He (Grandis) [email protected]
MySQL Cluster Upgrade with replication support
• Database version: NDB 7.1
– NDB 7.1 Feature: attribute promotion/demotion support for replication
– NDB 7.0 Feature: Default value for new adding columns in tables
• Environment: Cluster 1 (Active) and Cluster 2 (Standby Read-only)
• Enterprise Upgrade Procedure
– Upgrade on cluster 2 (NO Service Interrupt)
• Upgrade NDB version and apply schema changes
• Cluster1Cluster2 replication verification
– Switch cluster 2 to be active
• Cluster2Cluster1 replication verification
• If new version can NOT work, switch back cluster 1
– Upgrade on cluster 1 (Upgrade NDB version and apply application schema changes)
• Carrier Upgrade Requirement – Be able to downgrade even 2 clusters are upgraded
2010-10-29 32Xiongwei He (Grandis) [email protected]
Limitation of MySQL Cluster
• NO Foreign Key Constraints
• Transaction Limitation
– NO Savepoints
– Read commit isolation level
• NO Durable Commits
– Mitigation: increase replica to 3 or 4 if you want extreme reliability
2010-10-29 33Xiongwei He (Grandis) [email protected]
Take care in memory database (Beyond MySQL Cluster)
• Possible slow startup in comparison to disk-based database (impact to MTTR for single cluster case)– Need load all the data to memory– For MySQL Cluster: More Data Groups, Less time for startup
• Feature not fully deployed as disk-based database and but improving
• Insure the query/update with key– Please do not perform “select count(*) from $table_name” in
live machines if it is NOT tested. Find solution from database dictionary
– If the application is designed for high performance/frequency, Non-key search in transaction (if there is a bug) might make database busy which will let application hung and trigger outage
2010-10-29 34Xiongwei He (Grandis) [email protected]