Slice: OpenJPA for Distributed Persistence

© 2009 IBM CorporationConference materials may not be reproduced in whole or in part without the prior written permission of IBM.

WebSphere Services Technical virtual Conferencesworld class skill building and technical enablement

Scale Your JPA Applications with Distributed Scale Your JPA Applications with Distributed Database PartitionsDatabase Partitions

Session Number: D05Session Number: D05

Dr. Pinaki PoddarDr. Pinaki [email protected]@us.ibm.com

Application Integration & Middleware

© 2009 IBM Corporation2 Pinaki Poddar, SWG/IBM

Overview

Brief tour of JPA

– Design-time Features

– Runtime Behavior

Role of JPA in JEE

– Scalability

Horizontal Distributed Data Partition as a scaling strategy

Slice: JPA for Distributed, Partitioned Databases

– Using Slice

– Under the hood

– Future work

Q & A



@Entity

public class Customer {

@Id

private long id;

private String name;

@OneToMany(mappedBy="customer")

private List<Order> orders;

}

@Entity

public class Order {

@Id

private long id;

@ManyToOne

private Customer customer;

}

JPA uses POJO for Domain Model

Annotate as @Entity

Define persistent identity

Annotate relational mapping

Use full power of Java

– No interface to implement

– No class to inherit from

– Use Collection, List, Set, Map

– Use generics

Convention over configuration

– Implied database naming

– Implied persistence property



Persistence Unit

Persistence Unit

• Set of persistent classes• Mapping metadata• Database & other configurations

*.class

META-INF/persistence.xml

orm.xml



How to obtain a Persistence Unit?

EntityManagerFactory emf =Persistence.createEntityManagerFactory(“test”);

Instantiate via bootstrap

InitialiContext ctx = new InitialContext();EntityManagerFactory emf = ctx.lookup(“myEMF”));

Look up in JNDI

EntityManagerFactoryconstruction is costly

WARNING

@PersistenceUnit(unitName=“test”)private EntityManagerFactory emf;

Inject as a resource



Persistence Context

Persistence Context

• Session/Transaction • Cache of managed instances • Persistent operations

•find()•persist()•merge()•remove()•refresh()•createQuery()•…



How to obtain a Persistence Context?

@PersistenceContextprivate EntityManager em;

Inject as a resource

EntityManager em = emf.createEntityManager()

Construct from Persistence Unit



Persistence Context manages instances in a group

Persistence Unit

Persistence Context

Account pc1 = em1.find(Account.class,1245);

Persistence Context

Account pc2 = em2.find(Account.class,1245);

em1 = emf.createEntityManager();em2 = emf.createEntityManager();

Account

ID NAME AMOUNT

2347

John $ 12000.57

1245

Mary $ 34568.89

SELECT ID,NAME FROM ACCOUNT t WHERE t.ID=1245



A Question about Identity

pc1 == pc2 ?

pc1.equals(pc2) ?

Important questions

• Reference-based identity• Value-based identity

Java supports two identities• Persistence-based identity

JPA adds another identity

•Persistence Identity defines uniqueness within a Persistence Context•An instance is managed by one and only one Persistence Context at a time



Life of Persistence Context

em = emf.createEntityManager();

begin(); commit(); flush(); clear();

close();

begin(); commit();

Extended Persistence Context

Transactional Persistence Context

Time



Query executes in a Persistence Context

Persistence Context is factory for Query

Query is expressed in JPQL (Java Persistence Query Language)

Selected instances are added to the persistence context

Selected instances are returned in a ListEntityManager em = …;

String jpql = “SELECT p FROM Person c WHERE p.name=:name”;

List result = em.createQuery()

.setParameter(“name”, “John”)

.getResultList();



Transaction Scaling in JPA

em = emf.createEntityManager()

begin() commit() flush() clear()

close()

begin() commit()

Extended Persistence Context

Transaction-scoped Persistence Context

L2 Data CacheDatabase Transaction

Time



Optimistic Versioning scales transaction

begin();commit();

pc1 = find(Item.class, 1234);

qty:5version: 56

Item:1234

qty:30version: 56

Item:1234

begin(); commit();pc2 = find(Item.class, 1234);

qty:5version: 56

Item:1234

qty:87version: 56

Item:1234

UPDATE ITEM SET QTY=30, VERSION=57 WHERE ID=1234 AND VERSION=56

ID 1234 QTY 5 VERSION 56 ID 1234 QTY 30 VERSION 57

pc2.setQty(87);

pc1.setQty(30);

1

3

42

5

6

7

9

8

UPDATE ITEM SET QTY=87, VERSION=57 WHERE ID=1234 AND VERSION=56

10

OptimisticException



Multi-level caches favor read-mostly sessions

em1 = emf.createEntityManager();

em1.begin() em1.commit() em2.begin()

em2.commit()

L2 Data Cache

Database Access & Transaction

Time

em1.query()

em2 = emf.createEntityManager();

em2.find() em2.find() em2.remove()



Scaling against Data Volume

Data is growing rapidly

– compounded annual growth rate of worldwide capacity of compliant records from 2003 to 2006

64%

– Unbounded nature of the Web

•From a web site, a company can generate several gigabytes of data each day



Distributed Horizonal Partition

Horizontal Partition

– put different rows into different tables

Distributed Horizontal Partition

– put different rows into different databases



divide et impera

Natural partition exists in many domains

– Geographical (Customers by State)

– Temporal (PurchaseOrders by Month)

– Personal (Blog Posts by User)

Partition is natural in some scenarios

– Hosted Platforms

– Software-As-Service



Overview

Brief tour of JPA



Role of JPA in JEE

– Scalability



– Using Slice

– Under the hood

– Future work

Q & A



What is Slice?

Slice is an OpenJPA module to transact with distributed, horizontally partitioned databases

Incubated as Apache Lab project in Jan 2008

Included as OpenJPA module since June 2008

Slice is bundled with OpenJPA within WAS v7

Slice is not the best thing since sliced bread



What is OpenJPA?

An implementation of JPA Specification

Default persistence provider for WebSphere EJB3 Feature Pack v 6.1 and WAS v 7.x

Apache Project since May 2007

– http://openjpa.apache.org

Operational codebase since 2002

Rich, extended, ahead-of-the-curve feature set

Powerful configurability



Architectural Tiers of JPA-based service

JPA-based User Application

OpenJPA

Standard JPA API

JDBC API



Architectural Tiers of Slice-based service

Slice-based User Application

OpenJPA

Standard JPA API

JDBC API

Slice

OpenJPAis a plugabbleplatform



Features of Slice

Slice-based User Application

OpenJPA

Standard JPA API

JDBC API

Slice

No change toApplication code orDomain Model

User-definedDistribution &Replication Policy

Flexible per-SliceConfiguration

Parallel QueryExecution

HeterogeneousDatabases

Master-basedSequence

Targeted Query



Overview

Brief tour of JPA



Role of JPA in JEE

– Scalability



– Using Slice

– Under the hood

– Future work

Q & A



Using Slice

No change in Application Code

No change to Domain Model

OK, almost



<?xml version="1.0" encoding="UTF-8"?><persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd"> <persistence-unit name="test“ transaction=“RESOURCE_LOCAL”> <provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider> <class>domain.EntityA</class> <class>domain.EntityB</class> <properties> <property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> <property name="openjpa.ConnectionURL" value="jdbc:mysql://localhost/test"/> <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> <property name="openjpa.Log" value="SQL=TRACE"/> </properties> </persistence-unit>

Persistence Unit Configuration

List of knownPersistent types

Vendor-specific configuration

Governed by XML Schema

JPA Provideris pluggable

META-INF/persistence.xmlIdentified byUnit Name



Per-Slice Configuration <properties>

<property name="openjpa.BrokerFactory" value=“slice"/>

<property name=“openjpa.slice.Names” value=“One,Two,Three”/> <property name=“openjpa.slice.Master” value=“One”/>

<property name="openjpa.slice.DistributionPolicy" value=“acme.org.MyDistroPolicy"/>

<property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> <property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql://localhost/slice1"/> <property name=“openjpa.slice.Two.ConnectionURL” value=“jdbc:mysql://localhost/slice2”/>

<property name=“openjpa.slice.Three.ConnectionDriverName” value=“com.ibm.db2.jcc.DB2Driver”/> <property name=“openjpa.slice.Three.ConnectionURL” value=“jdbc:db2://mac3:50000/slice3”/>

<property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> </properties> </persistence-unit>

META-INF/persistence.xml

Activate Slice

Declare slices

Configure each slice

Configure common behavior

Define Data Distribution Policy



Rules of Configuring slices

Each slice is identified by a logical name

All slice names can be specified by openjpa.slice.Names

Or determined implicitly– openjpa.slice.XYZ.abc declares a slice with logical name XYZ

Each slice can be configured independently

Each slice property defaults to common configuration

– If openjpa.slice.XYZ.abc is not specified then it defaults to value of openjpa.abc property

A master slice is either configured by openjpa.slice.Master property

Or automatically detected by convention/heuristic as the first slice

Unreachable slices are ignored at startup if openjpa.slice.Lenient property is set to true.



How to distribute data across slices?

01: EntityManager em = …;

02: em.getTransaction().begin();

03: Person person = new Person();

04: person.setName(“John”);

05: person.setAge(42);

06: Address addr = new Address();

07: addr.setCity(“New York”);

08: person.setAddress(addr);

09: em.persist(person);

10: em.getTransation().commit();

01: public class MyDistributionPolicy implements DistributionPolicy {

02: public String distribute(Object pc, List<String> slices, Object ctx) {

03: return ((Person)pc).getAge() > 40)

04: ? slices.get(0) : slices.get(1);

05: }

06: }

@Entitypublic class Person { private String name; private int age; @OneToOne (cascade=ALL) private Address address;}

@Entitypublic class Address { private String city;}

Use

r A

pp

licati

on

Domain Classes

Data Distribution Policy



Distribution Policypublic interface DistributionPolicy { /** * Gets the name of the slice where the given newly persistent * instance will be stored. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier of the slice. This name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String distribute(Object pc, List<String> slices, Object context); }

Slice will call this method while persisting or merging a root instance.The instance and its closure will be stored in the returned slice.



Collocation Constraint

All instances reachable from a root instance, at the time of persist(), are stored in the same slice

– Because Slice can not join across databases

Compliant Domain Models are referred as Constrained Tree Schema

– Customer has Orders has LineItems– http://www.devwebsphere.com/devwebsphere/2008/01/constrained-tre.html

CustomerCustomer OrderOrder LineItemLineItem0+ 1+

1 1



What if schema is not a Constrained Tree Schema?

CompanyCompany DepartmentDepartment EmployeeEmployee

AddressAddress

CountryCountry

• Partition into databases per Department• Tree Schema Constraint is violated• In which database should Company and Country reside?



Replicate Master Data across slices

Annotate Company and Country as @Replicated

By default, @Replicated entities are stored in all slices

– or implement ReplicationPolicy

@Entity

@org.apache.openjpa.persistence.Replicated

public class Company {..}



Replication Policypublic interface ReplicationPolicy { /** * Gets the name of the slices where the given newly persistent * instance will be replicated. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier(s) of the slice. Each name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String[] replicate(Object pc, List<String> slices, Object context); }

Slice will call this method while persisting any @Replicated instance.



Distributed Query

Each query is executed across all slices in parallel

Performance upper bound is the size of the largest partition not the size of the entire dataset.



Distributed Query

Results from individual slices are appended

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

MARY 24 2007

BILL 29 2001

ROB 22 2008

MARY 24 2007

BILL 29 2001

ROB 22 2008

slice1

slice3

slice2

List result = em.createQuery(“SELECT e FROM Employee e WHERE e.age < 30”) .getResultList();



Distributed Query (Sorting)

Results from individual slices are sorted across all slices for ORDER BY queries

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

MARY 24 2007

BILL 29 2001

ROB 22 2008

BILL 29 2001

MARY 24 2007

ROB 22 2007

slice1

slice3

slice2

List result = em.createQuery(“SELECT e FROM Employee e WHERE e.age < 30 ORDER BY e.name”).getResultList();



Distributed Top-N Query

Top-N Result from each slice is merged (with ordering, if any) for LIMIT BY queries

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

ROB 22 2008

BILL 29 2001

slice1

slice3

slice2

MARY 24 2007

JOHN 35 2001

HARI 31 2002

SHIVA 35 1999

ROB 22 2008

MARY 24 2007

List result = em.createQuery(“SELECT e FROM Employee e ORDER BY e.age”) .setMaxResult(2).getResultList();



Distributed Top-N Query

Top-N Results from individual slices are appended for LIMIT BY queries without an ORDER BY clause.

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

ROB 22 2008

BILL 29 2001

slice1

slice3

slice2

MARY 24 2007

JOHN 35 2001

HARI 31 2002

SHIVA 35 1999

ROB 22 2008

MARY 24 2007

List result = em.createQuery(“SELECT e FROM Employee e”) .setMaxResult(2).getResultList();



Targeted Query Query and find() can be targeted to a subset of slices by hints

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

slice1

slice3

slice2

SANDRA 43 1975

JOHN 35 2001

JOSE 41 1987

SHIVA 35 1999

List result = em.createQuery(“SELECT e FROM Employee e WHERE e.age > 34”)

.setHint(“openjpa.slice.Targets”, “slice1,slice3”)

.getResultList();

SANDRA 43 1975

JOHN 35 2001

JOSE 41 1987

SHIVA 35 1999



Aggregate Query Aggregate results are supported when aggregate

operation is commutative to partition

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

slice1

slice3

slice2

78

37

107

22278

37

107

Number sum = (Number)em.createQuery(“SELECT SUM(e.age) FROM Employee e

WHERE e.age > 30”).getSingleResult();



Aggregate Query Aggregate results are not supported when aggregate

operation is not commutative to partition

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

slice1

slice3

slice2

37.0

37.0

35.6

36.5

37.0

37.0

35.6

3WRONG!

Number sum = (Number)em.createQuery(“SELECT AVG(e.age) FROM Employee e

WHERE e.age > 30”).getSingleResult();



Distributed Aggregate Query Limitations

Commutativity

– ability to change the order of operations without changing the end result.

SUM() or MAX() is commutative to partition– SUM(D) = SUM(SUM(D1), SUM(D2), SUM(D3))

where Partition(D) = {D1,D2,D3}

But AVG() is not– AVG(D) != AVG(AVG(D1), AVG(D2), AVG(D3))



Query for Replicated Entities

Replicated instances are detected and queried in a single slice

Number sum = (Number)em.createQuery(“SELECT COUNT(c) FROM Coutry c”)

.getSingleResult();

CODE POPULATION

US 300M

GERMANY

82M

INDIA 1200M

CODE POPULATION

US 300M

GERMANY

82M

INDIA 1200M

CODE POPULATION

US 300M

GERMANY

82M

INDIA 1200M

slice1

slice3

slice2

3

3 @Entity

@Replicated

public class Country {..}



Updates

Slice remembers original slice of each instance.

– SlicePersistence.getSlice(Object pc) returns the logical slice name for the given argument.

If an instance is modified then the update occurs in the original slice.

Replicated instances are updated to many slices– SlicePersistence.isReplicated(Object pc)

Commit will not be invoked for a slice if no update exists for that slice



Database and Transaction

Slices can be in heterogeneous database platforms

– Each slice can use its own JDBC driver

A Master slice is identified for sequence generation

Commits are executed in parallel without any warranty

If all JDBC drivers are XA-compliant then a 2-phase commit provision is available

– Each slice transaction is not seen by the Application Server’s Transaction Manager.



Overview

Brief tour of JPA



Role of JPA in JEE

– Scalability



– Using Slice

– Under the hood

– Future work

Q & A



Core Architectural constructs of OpenJPA

EntityManagerFactory

BrokerFactory

EntityManager

Broker

StoreManager

JDBCStoreManager

JDBC API

OpenJPAConfiguration

creates

creates

delegates delegates

configured by



Distributed Template Design Pattern

public class DistributedTemplate<T> implements T, Iterable<T> { protected List<T> _delegates = new ArrayList<T>(); public boolean execute(String arg0) {

boolean ret = true;for (T t:this) ret = t.execute(arg0) & ret;return ret;

}

public Iterator<T> iterator() { return _delegates.iterator(); }}

• Distributed Template Design Pattern as main metaphor• on JDBC artifacts (Statement, ResultSet)• major OpenJPA artifacts such as StoreManager, Query.



Slice extends OpenJPA by Distributed Template

EntityManagerFactory

BrokerFactory

EntityManager

Broker

DistributedStoreManager

JDBCStoreManager

JDBC API

JDBCStoreManagerJDBCStore

Manager

DistributedConfiguration

delegates delegates

creates

creates

configures

applies Distributed Template Pattern

Not aware of partitioned Databases



Overview

Brief tour of JPA



Role of JPA in JEE

– Scalability



– Using Slice

– Under the hood

– Future work

Q & A



Future Work: Evolving data distribution

Gradual Redistribution

– Complete migration from one slice to another is currently supported

– Gradual migration of data from one slice to another

• Read from one slice, write to another



Future Work: Courage under Fire

Graceful degradation

– can ignore unreachable slices at bootstrap

– can cope with unreachable slices at runtime

– can not reconnect dynamically



Future Work: Heterogeneity

Heterogeneous Schema

– assumes each slice has identical schema

– relax this assumption

Join relation across slices

– this one is hard problem



Overview

Brief tour of JPA



Role of JPA in JEE

– Scalability



– Using Slice

– Under the hood

– Future work

Q & A



Thank You!

Date post:	19-May-2015
Category:	Technology
Upload:	pinaki-poddar
View:	1,337 times
Download:	1 times

Slice: OpenJPA for Distributed Persistence

Technology