1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Object Persistence These slides are...

1

Advanced Database Topics

Copyright © Ellis Cohen 2002-2005

Object Persistence

These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

For more information on how you may use them, please see http://www.openlineconsult.com/db

© Ellis Cohen 2002-2005 2

Lecture Topics

Handling Result CollectionsClient Access to Server-Side ObjectsClient-Side Object CachingModifying Persistent OO DataCache Management for OO DataDeleting and Inserting

Persistent ObjectsJava Data Objects (JDO)JDO Query Language (JDOQL)Capability-Based Access Control


HandlingResult Collections


Using Result Collections

OQL queries returns collections of references to objects

Client programs will want to iterate through the collection and access fields of the referenced object.

This suggests that we may want to cache the objects at the client.


Embedded OQL Programming

Imagine a PL with embedded OQL

var richEmps set<Employee> :=SELECT e FROM e IN emps WHERE e.sal > 3000;

for e in richEmps loop if e.job = "MANAGER" then pl( e.name || " " || e.sal ); end if;end loop;

Creates a temporary

collection to hold

references to rich

Employees


Result Management Alternatives

1) The OQL result collection richEmps is left on the Database Server as a temporary collection (deleted when the session ends)

2) richEmps is returned to the client (it contains OIDs for the rich employees which are stored at the server)

3) richEmps is returned to the client, and the rich employees are cached at the client as well.

Which is better and why?Consider what happens when the code iterates

through richEmpsHow does this compare to RDB Result Sets


Storing OQL Result Collections

emp1

emp3

emp5

emp4

emp2

empsemp1

emp2

varrichEmps

DB Server

Client-side

Object Cache

varrichEmps

varrichEmps

OIDreference

pointerreference

1

2

3


Result Collection Tradeoffs • If the result collection and every object in

it are copied to a client-side object cache– A significant amount of copying may need to be

done as a result of the query– If the program runs remotely, then it all needs

to be copied across the wire– Queries involving result collections must be

disallowed or require a query engine that can integrate data stored both in the DB server and in the client cache.

– Wasteful if NOT every object in the collection is accessed

• If the objects in the result collection are not cached at the client

– Every access to a field in an object in the result collection (e.job, e.name, e.sal) requires an interaction with the DB server

– If the program runs remotely, then every access goes across the wire


Query IntegrationImagine a PL with embedded OQL

(used either at the server or a remote client)


var richNyEmps set<Employee> :=SELECT e FROM e IN richEmpsWHERE e.dept.loc = "New York";

for e in richNyEmps loop if e.job = "MANAGER" then pl( e.name || " " || e.sal ); end if;end loop;

Implies client cache manager has a query

engine that can process queries that

access both persistent and

temporary collections

© Ellis Cohen 2002-2005 10

Java OQL Programming

OqlConnection conn = …;OqlQuery query = conn.createQuery();

String oqlstr :="SELECT e FROM e IN emps WHERE sal > 3000;

OqlCollection richEmps := query.getCollection( oqlstr );

oqlstr := "SELECT e FROM e IN &empset " + "WHERE e.dept.loc = 'New York'";

query.prepare( oqlstr );query.bind( 'empset', richEmps );OqlCollection richNyEmps := query.getCollection();

Iterator iter = richNyEmps.iterator();while (iter.hasNext()) { Employee e := (Employee)iter.next(); if (e.job = 'MANAGER') pl( e.name + " " + e.sal );}

© Ellis Cohen 2002-2005 11

ClientAccess to

Server-Side Objects

© Ellis Cohen 2002-2005 12

Remote References

If objects, including those referenced by query results, are never cached at the client

– local program variables must reference objects stored on the DB server

– for client-side programs, these are remote references, often referred to as proxies.

© Ellis Cohen 2002-2005 13

Distributed Object Proxies

Each client has proxies (holding OIDs) to objects kept

only on the server. Every access to an object must be passed back to the server

Disk

Memory

OO DB Server

OO Client 1

OO Client 2

© Ellis Cohen 2002-2005 14

Concurrency Control for Proxies

Lock-BasedObjects locked at the server

Cache-BasedThe server must maintain a cache for each client holding versions of data read/written by that client

© Ellis Cohen 2002-2005 15

Global vs Local OIDsGlobally unique OIDs

OIDs are globally unique: usually encodes ip address + locally unique id of database

Enables object mobility (at cost of object location service or forwarding)

Facilitates replication & remote references

Locally managed OIDsOIDs are only unique to the databaseOID might simply be the virtual memory

address (especially if objects are versioned and are immutable once committed)

Remote references must add ip addess + database id to the OID

© Ellis Cohen 2002-2005 16

Persistent vs Active OIDsPersistent OIDs

Names object whether•active (in memory) or•passive (only in secondary storage).

Can be used as persistent references

Active OIDsObject given OID

only when it is activated.Persistent references to object use a soft naming

scheme (e.g. monikers)A name is bound to an OID only when the object is

activatedUseful for distributed services

© Ellis Cohen 2002-2005 17

Remote Persistent ObjectsCORBA

Uses proxies and globally unique persistent OIDs

DCOMUses proxies and globally unique active OIDs.Monikers used for persistent naming

EJB (Entity Beans)Uses proxies and locally managed persistent

OIDs, based on primary keys

JDOUses persistent OIDsMay be either globally unique or locally managedDoesn't support proxies;

uses distributed local caching(would probably be useful to specify that some objects stay only on the server)

© Ellis Cohen 2002-2005 18

Client-SideObject Caching

© Ellis Cohen 2002-2005 19

Client-Side Object CacheA client-side object cache is

– a client-side cache(i.e. a client-side lazy replica of the

server DB state)– where the data items held in the cache

are objects (with their OIDs) instead of tuples (with their ROWIDs)

Unlike an RDB client-side cache– queries do NOT automatically load all

returned objects into the cache– objects are loaded into the cache by

navigation as well as queries– applications access and modify the

items in the cache directly!

© Ellis Cohen 2002-2005 20

Object Loading ApproachesExplicit

Client decides when an object is explicitly loaded (a single object, all objects in a collection, etc.)The system will raise an exception when following a reference to an unloaded object

Transparent(on Access) When the client needs the content of an unloaded object, it will automatically load the object(on Reference) In addition, when the client stores a reference to an object, it will make space for the object (hollow objects)

Pre-FetchingIn either case, when an object is loaded, the system may also load objects clustered with it (e.g. other objects on the same page as the object being loaded and/or other objects it references)

© Ellis Cohen 2002-2005 21

Transparent Loading on AccessrichEmps := SELECT …

anEmp := richEmps.pick()

anEmp

emps

emp1

emp3

emp5

emp4

emp2

theDept := anEmp.dept

anEmp

varrichEmps

richEmps

*emp2

dept4

theDept

© Ellis Cohen 2002-2005 22

Representing Object References

Database ObjectsDatabase objects are referenced by OIDsOIDs (directly or indirectly) locate the

object in the DB:[DB identity +] page + slot in page

Program ObjectsObjects in memory are generally

referenced by pointers, which generally hold the memory address of the object

References may indirect through another object

References to objects in memory can take much less space than OIDs!

© Ellis Cohen 2002-2005 23

The Reference Representation ProblemSuppose both objects A and B are persistently stored in the database, and A holds a reference to B, represented by an OID.

or be swizzled to a pointer (direct or indirectly) to B?

A BOID for B

If both A and B are loaded into a client's object cache, they both now have a memory address. Should A's reference to B remain an OID?

A BOID for B

A BPtr to B

If the reference is swizzled, and A is modified, then the ptr must be unswizzled back to an OID before writing A back to the DB.

© Ellis Cohen 2002-2005 24

Memory-Based Object Representation Approaches

Goal: efficient dereferencing & traversal

Indirect ApproachMemory reference points to intermediate object

which points to referenced objectExtra overhead; easy invalidation

Direct ApproachMemory reference points directly to referenced

object. Uses same model as ordinary programming

language objects!

Page-Translation ApproachDepends on Page Translation table (usually with

hardware support) to translate reference to address of object

© Ellis Cohen 2002-2005 25

Other Representation IssuesDereferencing

When we want to load an object given its OID, if it's– In memory: how do we find it?– Not in memory: What do we do once its loaded?

Tracking Dirty Objects Can the run-time system automatically keep track of objects which have been modified, so that the corresponding changes can be automatically or semi-automatically made to the database?

Managing InvalidationWe may want to selectively invalidate cached objects when they become stale or when transactions complete– If we invalidate a cached object, and another cache

object points to it, how do we prevent dangling pointers?

Each of the following approaches handles these issues differently

© Ellis Cohen 2002-2005 26

Oid Lookup Approach

30493

'Joe Java'

B709342A8…

anEmp… … …

A34F02397… … ptr

… …

128 bit oid info ptr

30493

'Joe Java'

B709342A8…

anEmp… … …

A34F02397… … ptr

… …

B709342A8… … ptr


dept

On Loading anEmp from server

When anEmp.dept is dereferenced, aDept is loaded from the server (if not in memory)

OODB & Memory

representation are identical

Object Table

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 27

Deferred Indirect Approach

… … …

A34F02397… … ptr

… …



anEmp.dept is swizzled to point to the object table row on first dereference, which points to aDept.

aDept is loaded if not already in memory

OODB & Memory

representation are initially

identical

Object Table

30493

'Joe Java'

Ptr to obj table row

anEmp… … …

A34F02397… … ptr

… …

B709342A8… … ptr


30493

'Joe Java'

B709342A8…

anEmp

dept

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 28

Immediate Indirect Approach

30493

'Joe Java'


anEmp… … …

A34F02397… … ptr

… …

B709342A8… … NULL


30493

'Joe Java'


anEmp… … …

A34F02397… … ptr

… …

B709342A8… … ptr


dept

On loading anEmp, anEmp.dept is immediately swizzled to point to an object table row. If aDept's

OID is not already in the table, a row is created for it.

On loading aDept

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 29

Deferred Direct Approach

30493

'Joe Java'

Ptr to object

anEmp… … …

A34F02397… … ptr

… …

B709342A8… … ptr


When anEmp.dept is dereferenced, the ptr is swizzled

to point directly at object

… … …

A34F02397… … ptr

… …



OODB & Memory

representation are initially

identical

30493

'Joe Java'

B709342A8…

anEmp

dept

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 30

Indirect/Direct Approach

… … …

A34F02397… … ptr

… …

B709342A8… … NULL


30493

'Joe Java'

Ptr to object

anEmp… … …

A34F02397… … ptr

… …

B709342A8… … ptr


On Loading anEmp

When anEmp.dept is dereferenced, the ptr is swizzled

to point directly at object

Object Table

30493

'Joe Java'


anEmp

dept

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 31

Hollow-Object Approach

… … …

A34F02397… … ptr

… …

B709342A8… … ptr


30493

'Joe Java'

Ptr to obj

anEmp… … …

A34F02397… … ptr

… …

B709342A8… … ptr


When anEmp.dept is dereferenced, aDept is pulled over from the server and used to fill in the hollow object

(hollow)

Object Table

(aDept)

On Loading anEmp

30493

'Joe Java'

Ptr to obj

anEmp

dept

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 32

Hollow Object Loading (JDO)Hollow objects are strictly more expensive

(take more space) than other approaches: Encouraged by languages that are type-safe (e.g. Java)

When a hollow object is created, its exact subclass must be known to create the correct type of hollow object

Hollow objects can't be created for arrays, since their size may change in the database before they're loaded:

Referenced arrays must be loaded immediately.

More generally, an object may want to indicate that an object it references should be loaded with it.

© Ellis Cohen 2002-2005 33

Cache IntegrationIn some cases, a server and client can share the same address space

– Embedded systems (the DB runs as a module of the client)

– Memory mapping (Pages of the BD server) can be explicitly shared with the client

In this situation, the client's object cache can be integrated with the DB server's page cache. That is, clients can directly address objects in the server page cache.

– If multiple clients can map the same DB server page, then locking must be used (generally at the page level, possibly on individual objects)

Why can't optimistic concurrency be used?

All of the previous approaches can be used with either separate or integrated caches.

Page Translation is designed for integrated caches.

© Ellis Cohen 2002-2005 34

Page-Translation Approach

anEmp… … …

A34F023 … 34

… …

DB page# info page#

dept

When anEmp.dept is dereferenced, entire page containing aDept must

be pulled over from the server

When loading anEmp, pull over entire

page from the OODB server

Page Translation Table

34

30493

'Joe Java'

B709342A8…

anEmpDB page# info page#34 97

Assumes DB page# can be used in a virtual address; More likely, needs to be

swizzled to a virtual page #

… … …

A34F023 … 34

… …

B709324 97

30493

'Joe Java'

B709342A8…

30

SALES

NY

aDept

© Ellis Cohen 2002-2005 35

Modifying Persistent OO Data

© Ellis Cohen 2002-2005 36

Perspectives on ModifyingPersistent Object Data

Standard DB PerspectiveHow is data stored in an OODB

modified?How do we make sure that

transactions satisfy the ACID properties?

OO Programming PerspectiveHow do we easily save and restore

the programming state of OO programs?

© Ellis Cohen 2002-2005 37

Historic Approaches to Persistenceof OO Program Data

CheckpointingTakes lots of spaceOnly useful for program that did checkpoint

Serialization (Pickling)Program controls what is savedSerialized state may be able to be read by other

programs

Persistent Object StoreAllows individual objects to be saved and

reloaded in random order by multiple programs

Object DatabasePersistent Object Store supporting querying and

transactional semantics

© Ellis Cohen 2002-2005 38

Query & Update Approachesfor Persistent Data

Result Set Approach"Real" client-side cache is invisible to clientsResult set returned by query is transferred into the

client's memory, where it is independent of any client-side cache

Persistent data (or data in the cache) must be modified via commands INSERT/DELETE/UPDATE

Most common approach used with RDBs

Visible Client-Side Cache ApproachQueries (and navigation) cache results in a client-

side cache, which is integrated with the user's address space.

Persistent data modifications result from writing back data modified in the client's address space

Approach used with OODBs and with Object-Relational Mapping

© Ellis Cohen 2002-2005 39

OODB Command-Based Update


…

UPDATE e IN emps SET e.sal := e.sal * 1.1 WHERE e.job = "MANAGER"

AND e IN richEmps

COMMIT;

If the Standard RDB Perspective/ Result Set Approach were used with OODBs,

here's how a set of employees might be updated

This is NOT the approach that is commonly used.

© Ellis Cohen 2002-2005 40

OOPL Cache-Based Approach


for e in richEmps loop if e.job = "MANAGER" then e.sal := e.sal * 1.1; e.markDirty(); end if;end loop;

COMMIT;On COMMIT, all objects marked as dirty are written back to the database

© Ellis Cohen 2002-2005 41

Explicit Persistence

In explicit persistence, the client explicitly notifies the runtime system

– obj.markDirty()• Indicates that a persistent object has been modified

(needed for cache management as well as controlling DB update on commit)

– obj.updateWhenDirty( true/ false )• Tells the system whether or not to persist any changes

to the DB on commit if it the object is modified (default is true)

– obj.update ()• Persists object changes to the DB if object has been

modified (used if updateWhenDirty is false)

If an object is already persistent, then after updating the locally cached version of the object, the updated contents of the object will generally

need to be persisted back to the server.

© Ellis Cohen 2002-2005 42

Transparent Persistence

In transparent persistence, the underlying system "automatically" detects when an object has been modified and automatically calls markDirty()

Still supports updateWhenDirty and update

If an object is already persistent, then after updating the locally cached version of the object, the updated contents of the object will generally

need to be persisted back to the server.

© Ellis Cohen 2002-2005 43

Program Compilation & Execution

Program (Java, C#, etc)

ByteCode

Executable

Hardware

Programcompilation

Bytecode compilation

Hardware

ByteCodeInterpreter

executesexecutes

executes (interprets)

© Ellis Cohen 2002-2005 44

Transparent Modification DetectionThe system must be enhanced to call

markDirty when a (persistent) field in a persistent object is modified.

•Execution Enhancement– Modify Firmware (for executables)– Modify Bytecode Interpreter (for bytecodes)

•Compiler Enhancement– Modify the program compiler or the bytecode

compiler to add the call to MarkDirty

•Code Enhancement- Use a separate tool that revises the

- Program code- Bytecode- Executable code

to add the call to MarkDirty

© Ellis Cohen 2002-2005 45

Mutators for Transparent Persistence

Suppose all persistent object are modified through mutator methods

Instead ofjoe.sal := 1400

Usejoe.setSal( 1400 )

Either clients can be required to write code this way, or the code can be changed through compiler or code enhancement.

Advantage: The call to MakeDirty can simply be added during enhancement to all set_xxx methods

Note: Accessor methods [e.g. joe.getSal()] can be used in a similar way as the place to call the code that loads an object if necessary.

© Ellis Cohen 2002-2005 46

Transactional PersistenceOn first access to a persistent object in a transaction

– Load data if not present (and if loading is transparent)

– Perhaps pull anyway if stale– Mark as read, which– Obtains SHARE lock at server if lock-based

On first update of a persistent object in a transaction

– Mark as dirty, which– Obtains EXCLUSIVE lock at server if lock-based

On CommitIf optimistic, do validation at server. [If validation

fails, db sends identity (or even contents) of all those objects which are stale]

Provide content of written (i.e. dirty) objects to the server

If lock-based, locks are released at the server

© Ellis Cohen 2002-2005 47

OODB LockingOODB Locking is very similar to RDB Locking

• The Table/Row locking hierarchy in RDBs corresponds to the Extent/Object hierarchy in OODBs (and both use index locking)

• Non-extent collections just contain a group of object references. They need to be locked just like other objects when read/written, but do not form a hierarchy with the objects they reference.

• OODBs can also use a Page/Object hierarchy (lock the page instead of all objects on the page). Extent/Page/Object hierarchies are less common, since a page often contains objects from many extents.

• OODBs may use a Cluster/Object hierarchy, where a cluster is a group of objects which are used together. A Cluster/Page/Object hierarchy may be used if they are stored together.

© Ellis Cohen 2002-2005 48

Cache Management for OO Data

© Ellis Cohen 2002-2005 49

Cache Data Lifetimes

Transaction Lifetime– The objects in a client's cache are

cleared at the end of each transaction – i.e. every transaction starts with an empty cache

Session Lifetime – Cached objects remains in the cache at

the end of a transaction. At the start of a transaction, the cache may contain objects used in the client's previous transactions.

• Does require a (relatively simple) local undo mechanism if a transaction aborts

© Ellis Cohen 2002-2005 50

Object Timestamps for Session Lifetime Caches

DB Server TimestampEvery object on the DB server has a timestamp: the time when the object was last updated at the server

Client Cache TimestampWhen an object is retrieved from the DB server, its server timestamp is retrieved as well, and stored in the client cache along with the object

© Ellis Cohen 2002-2005 51

Per-Object Metadata

For each object in the client's cache, the cache manager maintains

– the OID of the object– whether the object is dirty– whether to write the object on commit

if it is dirty– whether it was used during the current

transaction– [for a session lifetime cache] the

object's timestamp (may be in the cache and/or kept at the server)

© Ellis Cohen 2002-2005 52

Managing Object Retrieval

An employee anEmp is in the client's object cache. anEmp contains a reference to a department, (represented by the department's OID).

empno 30493

ename 'Joe Java'

dept B709342A8…

anEmp

The client's program executes

myLoc := anEmp.dept.loc

Under what circumstances should anEmp.dept be retrieved from the DB serverand be placed in the cache.

© Ellis Cohen 2002-2005 53

Object Retrieval RulesIf the OID in anEmp.dept doesn't identify an object already cached

Read in anEmp.dept from the DB server & place it in the cache

If the OID in anEmp.dept identifies an object already cached not yet used in this transaction

Replace the cached object with the latest version read from the DB server

– DB server can instead indicate that the version in the client cache is the latest version

– [If using optimistic concurrency] Replacement is not required. Chance that it is the latest version (Fail validation later if wrong)

Don’t retrieve if the object is cached and has already been used in the current transaction.

© Ellis Cohen 2002-2005 54

Partial Queries

Suppose the cache is empty, and the very first client operation in a transaction is

SELECT e.empno, e.salFROM e IN empsWHERE e.job = "CLERK"

How should the client cache managerprocess this query?

Consider what happens if the same request is made later in the transaction.

© Ellis Cohen 2002-2005 55

Partial Query Alternatives

1.Send the request to the server. Cache nothing.

Reasonable

2.Send the request to the server.Cache the results for reuse.

Possible, but would significantly complicate concurrency and cache management it would require having field-level granularity, not just tuple-level granularity.

© Ellis Cohen 2002-2005 56

Caching for ReuseRequest & cache entire objects from the server

var tempemps Set<Employee>:= SELECT e FROM e IN emps WHERE e.job = "CLERK"

Then execute the query

SELECT e.empno, e.sal FROM e IN tempemps

locally. That is, each employee in tempemps will be downloaded and cached at the client, and empno and sal will be retrieved from it

Makes most use of cache, cost of downloading objects worth it if they will be used in future queries; may not be if they won't. May be able to set cache manager hints to decide

© Ellis Cohen 2002-2005 57

Complex Query Problem

Suppose some employees are already loaded in the client's cache. Possibly, some of their salaries have been updated. Then (in the same transaction), the client executes

SELECT deptno, sum( SELECT p.sal FROM p IN partition)FROM e IN emps WHERE e.job != 'CLERK'GROUP BY deptno: e.deptno

How should the client's cache manager process this query?

© Ellis Cohen 2002-2005 58

Processing Complex QueriesExecuting the query at the server would ignore any non-clerk salaries already updated at the client.

The simplest approach is for the client cache manager to first process

var tempemps Set<Employee> :=SELECT e FROM e IN emps WHERE e.job != 'CLERK'

Then, process the following query locally (which pulls the remaining employees into the cache)

SELECT deptno, sum( SELECT p.sal FROM p IN partition)FROM e IN tempemps GROUP BY deptno: e.deptno

More sophisticated cache managers may be able to apply the query to the employees in the cache, send a request to the server to compute the query for the rest of the employees, and then integrate the results.

A sophisticated query manager can also send the entire request to the server if it keeps track of the fact that no employee (or non-clerk) salaries have been updated so far during the transaction!

© Ellis Cohen 2002-2005 59

Invalidation ApproachesComplete Invalidation

Whenever a transaction completes, invalidate all objects in the cache (except perhaps for ones explicitly marked)

Aggressive InvalidationWhenever objects are persisted to the DB server

by a client, the server notifies every other client that has cached any of those objects to invalidate them. Server maintains cache timestamps for all cached objects.

Failure InvalidationThe DB server notifies the client on validation

failure that the server has a more recent version of the object

The client can also invalidate objects that have not been used recently

© Ellis Cohen 2002-2005 60

Deleting and InsertingPersistent Objects

© Ellis Cohen 2002-2005 61

Deleting Persistent Objectsvar oldnlst Employee :=

last( SELECT e FROM e IN emps WHERE e.job = "ANALYST" ORDER BY e.age)

coolemps.remove( oldnlst )– Removes (the reference to) oldnlst from

coolemps

– Removes it immediately in the cache;at commit time, from the database

oldnlst.delete()– Removes (the reference to) the object from its

extent [immediately in cache, at commit from the database]

– Deletes the actual object [at commit, from the database; in cache, invalidates it]

© Ellis Cohen 2002-2005 62

About Deletion …

emps

emp1

emp2

e… coolemps

Can e.delete() delete e (in addition to removing it from its extent)

if there are other persistent references to it?

© Ellis Cohen 2002-2005 63

Models for Deletion

1. No; e itself will continue to exist until all references to it are deleted. Semantically problematic to have a persistent object which is not part of its extent.

2. No. e.delete() raises an exception if there are other references to it. Problematic if you really do want to get rid of the employee. Also, like #1, it requires reference counting or garbage collection, which prevents objects from being on remote hosts or removable media:

3. Yes, and it completely deletes p. But causes dangling references. (How are they handled?)

4. Kind of. Mark e as DELETED (acts like it is deleted). Really delete it when no references to it. Problem: unnecessarily takes up memory, and also prevents distributed/removable object repositories.

Can e.delete() delete e (in addition to removing it from its

extent) if there are other persistent references to it?

© Ellis Cohen 2002-2005 64

Dangling References & OIDs

Dangling references are only a problem when it is impossible to tell when the memory of a freed object is filled with a different object (possibly of a different type or on a different memory boundary).

In an OODB, references hold OIDs, which are unique. Suppose an OID contain a page #, a slot #, and generation #(which distinguishes different objects stored at that page/slot over time).

Dereferencing the OID finds the page, and uses the slot # to get the offset to the object. The object header contains the generation #, which is checked against the dereferenced OID. If they don't match, the OID must reference a deleted object.

If we do delete e, what do we do about the dangling reference to

e left in coolemps?

© Ellis Cohen 2002-2005 65

OIDs and ROWIDs

27 622 Auditing CHICAGO …

deptno dname loc …

empno ename dept

6291 SMITH … AAAGDxAABAAAH9EAAD27

emp

A local OID can be represented as a ROWID + a generation #, which also appears in the referenced

object's header

© Ellis Cohen 2002-2005 66

Persisting New ObjectsExplictly

– Specify each object or collection of objectsto be made persistent (not every object created by a program needs to persist)

Declaratively– By object class– By object property

• May be able to be changed dynamically

Reachability– Specify root objects– All newly created objects reachable (via

object references) from root objects– Stop traversing if you get to a newly

created object marked as transient (either explicitly or based on its class)

© Ellis Cohen 2002-2005 67

Extents & Inheritance

Models for Extents• Only superclasses have extents• New object is only added to its own class extent; not

extents of superclasses• New object added to its own class extent as well as

extents of all its ancestor superclasses

Extent-Related Functions (non-standard)• all(class) - returns a collection of all objects in that

class (or any of its subclasses)• only(class) - returns a collection of all objects in that

class which are not instances of any of its subclassesConsider how to implement these functions in each of

the models above

© Ellis Cohen 2002-2005 68

VersioningTemporal Versioning

On each commit that modifies an object, the OODB remembers the previous state (often using a delta), and the timestamp of the commit

Supports "flashback" queries; queries executed with the state at a specified time

Immutable VersioningCommitted objects are immutable (and easily cached and

replicated)Modifications made to an object within a transaction are made to a

new version of the object (with a new OID), which is persisted on commit (the old object is often retained as a delta of the new object, or vice versa)

If object X is modified => X', and object Y references X, then we may want to automatically create Y', a new version of Y, which now references X' instead of X.

Version Groups & SelectionVersions are immutable and have OIDs, but additionally, a separate

OID can represent a version group (a group of different versions of an objects)

Following a reference for an OID for a specific version gets that version

Following a reference for a version group selects a particular instance of the group, typically the latest, but there are mechanisms to select other versions based on their properties.

This approach supports version branches and merges

© Ellis Cohen 2002-2005 69

Java Data Objects

© Ellis Cohen 2002-2005 70

JDO 1.0

Persistent Object Layer + Methodology

Transparent Transactional Persistence

Can be used to map objects to RDBs as well as OODB's (and other Persistent Object Stores)

The material here is mostly based on JDO 1.0(The JDO 2.0 spec was published in Dec 2004)

© Ellis Cohen 2002-2005 71

JDO Persistence ApproachTransactional Transparent Persistence

– Based on bytecode enhancement w added accessor/mutator methods

– Extends persistent capable classes so they implement a PersistentCapable interface, which delegates to a StateManager

Java source itself does not indicate– Which classes are persistent capable– Which fields of a class should be saved– Which fields represent relationships – How to map data structures to persistent store

(which could be an OODB or an RDB)

These details are specified via a standard configuration file with vendor-specific extensions

© Ellis Cohen 2002-2005 72

1st & 2nd Class Objects

In C or C++Objects may be contained within other objectsSame distinction as between Class and REF ClassWhen the parent object is made persistent, the

contained object is persisted with it

In JavaThis distinction can't be made in the same wayAll objects are independent and accessed via

referencesJDO distinguishes between

• 1st class objects (stand on their own) and• 2nd class objects (act as if contained)

© Ellis Cohen 2002-2005 73

2nd Class Objects• 2nd class objects are automatically loaded &

stored with the 1st class objects that refer to them

– Problems arise if a 2nd class object is shared by multiple 1st class objects

• 2nd class objects do not have their own JDO identity (OID visible to JDO client)

• Transparent persistence (esp. automatic dirty detection) is not supported for 2nd class objects

– Only the 1st class object which refers to a 2nd class object should update it.

– It must then explicitly call makeDirty on itself!

• Arrays are always 2nd class objects

© Ellis Cohen 2002-2005 74

Object Characteristics

Persistent vs Transient– Persistent objects are automatically

persisted– Transient objects are not persisted and

do not have a persistent object identityTransactional vs Non-Transactional

– Transactional objects participate in transactions (ACID properties)

• No support for different isolation levels• No support for nested transactions

– Non-transactional objects don't.Persisted explicitly.

These characteristics can be changed dynamically

© Ellis Cohen 2002-2005 75

Life CycleMake an instance persistent

Employee emp = new Employee( ... );

pm.makePersistent( emp );– For transactional objects, object not made

persistent until commit is done

Persistence by reachability– All instances reachable from emp

become persistent as wellemp.dept = new Dept( … );

Delete an instance from the databasepm.deletePersistent( emp );– Depends upon underlying DB Delete Model– Does not delete by reachability– For transactional objects, object not deleted

until commit is done

pm is a persistence manager: it manages an object cache connected to a persistent store

© Ellis Cohen 2002-2005 76

Updating Persistent Objects

trans = pm.currentTransaction();

emp.name = "Joe B. Jones"; emp.dept.name = "Financing";

trans.commit();

Default caching model:All transactional objects are invalidated (I.e. made hollow!) when transaction completes

JDO extensions support other caching models:e.g. maintain for duration of the sessionRequires specification of caching strategy

The variable emp refers to an object used in a previous transaction

Is it still in the object cache or will it need to be reloaded?

© Ellis Cohen 2002-2005 77

Persisting New Objects

pm = pmf.getPersistenceManager();trans = pm.currentTransaction();

Employee emp = new Employee( … ); emp.name = "Joe Jones"; … emp.dept = new Department( … ); emp.dept.deptno = 24; … pm.makePersistent( emp );

trans.commit();rememberid = pm.getObjectId( emp );

-- could pass to some other machine

pmf is a Persistence Manager Factory. It is connected to a persistent store

© Ellis Cohen 2002-2005 78

Updating Remembered Objectspm = pmf.getPersistenceManager();trans = pm.currentTransaction();

Employee emp =(Employee)pm.getObjectById( rememberId );

emp.name = "Joe B. Jones"; emp.dept.name = "Financing";

trans.commit();

pm must be connected to the same persistent store where the object was persisted; Could be a "meta" store that knows how to find and connect to the current store holding the object

© Ellis Cohen 2002-2005 79

JDOQuery Language

© Ellis Cohen 2002-2005 80

JDOQL 1.0 vs OQL

JDO Query LanguageSimpler but less functional than OQLEasier to map to RDB SQL

(or more primitive persistent stores)Limited forms of Joins/SubqueriesNavigation onlyNo DISTINCT or GROUP BY (added in JDO

2.0)No projection: Can only return elements

of the collection queried, not their attributes (changed in JDO 2.0)

Can be used to query collections returned to client

© Ellis Cohen 2002-2005 81

JDO 1.0 QueriesQueries

– filter Collections– return Collections

Required elements– Class of results– Collection to filter

•may be an Extent:done on back end

•may be a Collection: done in client cache

– Filter (Java boolean expression)

© Ellis Cohen 2002-2005 82

Example Queries

select e from e in emps where e.sal > 10000

Query q = pm.newQuery(Employee, emps, "sal > 10000" );

Collection result = q.execute();

select e from e in emps where e.sal > e.mgr.sal

Query q = pm.newQuery(Employee, emps, "sal > mgr.sal" );

select e from e in empswhere "Yael" in e.kidnames

Query q = pm. newQuery(Employee, emps,"kidnames.contains(\"Yael\" )" );

© Ellis Cohen 2002-2005 83

Query Parameters

select e from e in emps where e.sal > minsal

Parameter declarations

Query q = pm.newQuery( Employee, emps )q.declareParameters ("float minsal");

Filter uses declarations as if they were in scope

q.setFilter ("sal > minsal");

Parameter binding at Query execution(primitive values passed as wrapper objects)

result = q.execute (new Float (10000))

© Ellis Cohen 2002-2005 84

Querying Navigated Collections

Navigate through collections referenced by an object

Find Boston Departments with at least one well-compensated Employee

select d from d in deptswhere d.loc = "Boston"and exists e in d.empls : e.sal > 10000

Declare variables used to iterate through those collections

Query q = pm.newQuery( Dept, depts );

q.declareVariables ("Employee e");

q.setFilter ( "loc = \"Boston\" && empls.contains(e) && e.sal > 10000" );

Collection wcdepts = q.execute();

© Ellis Cohen 2002-2005 85

Querying Local Collections

Find departments in wcdepts that have more than 20 employees

select d from d in wcdeptswhere count(d.empls) > 20

Query q = pm.newQuery( Dept, wcdepts,"empls.size() > 20" )

© Ellis Cohen 2002-2005 86

Using Collections As Parameters

Find employees whose spouses are in one of the departments in wcdepts

select e from e in empswhere e.spouse in wcdepts

Query q = pm.newQuery( Employee, emps );q.declareParameters ("Collection wcdepts");q.setFilter( "wcdepts.contains(spouse)" )Collection wcspouses =

q.execute( wcdepts );

(To get employees who are themselves in one of the departments in wcdepts, use self instead of spouse)

wcdepts is not persistent; so it must be passed as a parameter.

© Ellis Cohen 2002-2005 87

JDO 2.0 Results

JDO 2.0 allows queries to return values, not just objects

select distinct(e.job) from e in emps where e.sal > 10000

Query q = pm.newQuery(Employee, emps, "sal > 10000" );

q.setResult( "distinct job" ); Collection result = q.execute();

Query q = pm.newQuery( Employee, emps );Collection result = q.execute();Collection jobs := new HashSet();Iterator r = result.iterator();while (r.hasNext()) { jobs.add( ((Employee)r.next()).job ); }

JDO 2.0

JDO 1.0

© Ellis Cohen 2002-2005 88

Joins by Hand

select distinct e from e in emps, s in starbuckswhere e.zip = s.zip

Find employees who live in a zipcode where a Starbucks is located

zips := select distinct s.zip from s in starbucksQuery q = pm.newQuery( Starbuck, starbucks );q.setResult( "distinct zip" ); Collection zips = q.execute();

select e in emps where e.zip in zipsQuery q = pm.newQuery( Employee, emps );q.declareParameters ("Collection zips");q.setFilter( "zips.contains(zip)" )Collection staremps = q.execute( zips );

© Ellis Cohen 2002-2005 89

Joins by Handselect distinct e from e in emps, s in starbucks

where e.zip = s.zipFind employees who live in a zipcode where a Starbucks is located

zips := select distinct s.zip from s in starbucksCollection zips := new HashSet();Iterator s = starbucks.iterator();while (s.hasNext()) { int zip := ((Starbuck)s.next()).zip; zips.add(new Integer(zip)); }

select e in emps where e.zip in zipsQuery q = pm.newQuery( Employee, emps );q.declareParameters ("Collection zips");q.setFilter( "zips.contains(zip)" )Collection staremps = q.execute( zips );

© Ellis Cohen 2002-2005 90

Capability-Based Access Control

© Ellis Cohen 2002-2005 91

Object-BasedAccess Control Approaches

• Access control on objects (by granting privileges for, or associating security predicates or ACLs with objects or object classes). Possibly a collection of objects could be specified statically, or even dynamically by OQL expressions

• Security domains: allowing some objects to be accessed and/or updated only through operations (to which execute access can be selectively granted, or which dynamically determine when or how to execute)

• Capabilities: Access control included as part of references. The capability system described here is based (very loosely) on the classic CMU Hydra system.

© Ellis Cohen 2002-2005 92

Capability: Reference + PrivilegesrichEmps := SELECT …

anEmp := richEmps.pick()

anEmp

emp1

emp2

richEmps

RdC, WrC

RdD

RdD

RdD

RdD

RdD

richEmps

RdD – read the data in the referenced object

RdC – read the capabilities in the referenced object

WrC – write capabilities to the referenced object

Capability: Reference + privileges related to the

reference

RdC, WrC

© Ellis Cohen 2002-2005 93

Capability Copy & Restrict

The creator of an object gets a capability for that object with all privileges.

A user can create copies of a capability (possibly with restricted privileges), and make them available to other users

A user can restrict, but cannot arbitrarily increase a capability's privileges (though see privilege amplification)

This requires a protected mechanism (similar to Java's Reference classes) that doesn't allow forging capabilities or direct access to a capability's privileges.

© Ellis Cohen 2002-2005 94

Privileges

Capabilities contain 3 kinds of privileges:

• Class-Specific Privileges– Privileges specific to objects of a specific class– For example, capabilities for Operation objects

have an Execute privilege

• Generic Object Privileges– Privileges like RdD, RdC, and WrC that apply to

any object, regardless of its class

• Meta Privileges– Privileges which constrain the capability itself,

rather than the object referenced by the capability (invented for Hydra by E. Cohen, 1973)

© Ellis Cohen 2002-2005 95

Class-Specific PrivilegesEvery capability designates the class of the object it

references.

The capability can either designate the actual class of the object, or any of its superclasses.

The class-specific privileges of a capability correspond to the class actually designated by the capability.

Example: A Stack is a subclass of a List. A capability for a stack object can either be designated– as a Stack, in which case its Push privilege would

control whether or not it can be passed to the Stack.Push operation, and List operations could not be called using the capability.

– As a List, in which case the capability could only be used with the built-in List operations (depending upon its privileges), but not Stack operations, which expect Stack capabilities.

© Ellis Cohen 2002-2005 96

Generic Object Privileges

Read PrivilegesRdD – read the data in the referenced objectRdC – read the capabilities in the referenced object

Modification PrivilegesMod* – see next slideWrD – write data to the referenced objectWrC – write capabilities to the referenced objectClrC – clear or overwrite capabilities in the

referenced object (a capability for an object with only WrC cannot be used to overwrite capabilities in the object, but can be used append capabilities to the object)

Del – delete the referenced object

© Ellis Cohen 2002-2005 97

Transitive Privileges

Transitive Privileges (invented for Hydra by E. Cohen, 1973) refer not just to a specific capability or object, but to any one reached through it!

Mod* – is a transitive privilege– Mod* is required to modify the

referenced object in any way (it is needed in addition to any of the specific modification privileges such as Wrd or ClrC)

– Mod* is required to modify any object reached through the referenced object

© Ellis Cohen 2002-2005 98

The Modification ProblemMod* is used to solve the Modification Problem.

An abstract data type may be represented by a group of objects linked together (via capabilities)

If a user has access to a capability for an abstract data type object, without the Mod* privilege, that user cannot modify any part (including linked objects) of the representation of the abstract.

Note that ifIf richEmps is a capability variable (without the Mod*

privilege) referencing an array of capabilities, and

a program executes myEmp := richEmps[1], then

myEmps will not have the Mod* privilege either

A copy of a capability will not have the Mod* privilege, if the object holding the original capability was accessed through a capability without the Mod* privilege

© Ellis Cohen 2002-2005 99

Meta PrivilegesCapabilities may also have meta privileges, which constrain the capability itself, rather than the object it references

– MClr – allows the capability to be deleted, moved or overwritten from the field it is in

– MLoad* – a transitive privilege which allows the capability (or a capability obtained from any object reached from it) to be moved or copied.

– MStor* – a transitive privilege which allows the capability (or a capability obtained from any object reached from it) to be stored in a persistent object [Note: This implies that objects marked as temporary cannot later be remarked as persistent]. Absence of MStor* can be used to limit delegation.

– MAlias – used for aliased revocation (discussed later)

© Ellis Cohen 2002-2005 100

Security DomainsEvery database connection has an security

domain associated with it

The security domain is a dictionary collection containing capabilities for objects that can be accessed by the user/program connected to the database.

Because a domain is a dictionary, each capability is named, and these names are the top-level names which can be used in queries.

When a user connects to a database, the user's login domain is initially associated with the connection. It contains a number of capabilities for special objects.

See next slide

© Ellis Cohen 2002-2005 101

Granting PrivilegesNo need to explicitly "grant" privileges to a user.

Instead, a user can store a capability in an object where another user can read it.

For example, suppose each user (through their user login domain) has capabilities for

• Their schema. This holds capabilities for a user's objects.

• Their inbox. Another user can add a capability to a user's inbox to grant them a capability.

• The global inbox dictionary. Allows a user to get a capability for any other user's inbox.

• Their exports. A user stores a capability here when they want all users to be able to get it.

• The global exports dictionary. Allows a user to get a capability for any other user's exports.

What privileges should a user have for each of these capabilities, and what privileges should be included with capabilities in the global inbox and exports dictionary?

© Ellis Cohen 2002-2005 102

User Access Architecture

Mod*, RdC, MLoad*, MStor*

AllExports

Mod*, RdC, WrC, ClrC,

MLoad*, MStor*My

Exports


AllInboxes


MLoad*, MStor*

MyInbox

Mod*, RdC, MLoad*, MStor*Joe

…

…

Mod*, RdC, MLoad*, MStor*Sue

…

Mod*, WrC, MLoad*, MStor*Joe

…

…

Mod*, WrC, MLoad*, MStor*Sue

…

Joe's Exports

Joe's Login

Domain

Joe's Inbox

Global Inbox

Dictionary

Global Exports

Dictionary

Could you design a capability-based access control model for an RDB where the objects are tables, views, operations, etc.?


MLoad*, MStor*

MySchema

…

…

Joe's Schema

© Ellis Cohen 2002-2005 103

The Capability Revocation Problem

How can a capability be made available to another user, so that its use can later be revoked?

Hint: Consider using Meta Privileges?

© Ellis Cohen 2002-2005 104

Revocation with Meta PrivilegesStore the capability in a separate object (and make a capability for that object available to other users).Place the capability in the separate object without MLoad* privileges.

This allows the capability to be used (to reference data in the object it references), but prevents the capability from being moved or copied.The original holder of the capability can, at any time, clear (i.e. delete) the capability in the separate object, which means the other user can no longer use it.This is a bit restrictive, since it means that the other user cannot load the capability into a local variable, or store it (e.g. as the result of a query) in a transient object.

Place the capability in the separate object without MStor* privileges.

This prevents the other user from storing it in a persistent object. After clearing the capability, other connecting users may access copies of the capability in local variables or transient objects, but only as long as they remain connected.

© Ellis Cohen 2002-2005 105

Aliases & Immediate Revocation

A user who has a capability for an object with the MAlias privilege can interpose an alias between the capability and the object. Aliases are invisible to users through capabilities which do not have an MAlias privilege, and always when performing ordinary reads and writesA user who has an capability for an aliased object with the MAlias privilege can

– Block all access through the alias (i.e. revoke access)– Reallow access to the aliased object– (Perhaps) Change the alias to refer to a different

object– Interpose an additional (chained) alias between the

capability and the alias


Alias

an object

© Ellis Cohen 2002-2005 106

Aliases & Fine-Grained Access Control

More general alias models can be used to provide fine-grained access control.Fine grained aliases can have functions associated with them, which are executed when the alias is traversed, and might either (depending on the system)

– Return true or false, indicating whether access to the aliased object is allowed or not

– Return a capability for an object, which is used in place of the original capability.

© Ellis Cohen 2002-2005 107

Domain SwitchingWhen a definer-rights operation is called in an RDB, it

switches its security domain to the domain of the user who defined the operation.

In a capability-based system, a definer rights operation is created with an associated dictionary.

– The operation's dictionary contains capabilities for objects (some private) potentially needed by the operation when it executes

When a definer-rights operation is executed, a new security domain (a transient dictionary) is created and used while the operation executes (the previous domain is used again when the operation returns). It contains

– the capabilities from the operation's dictionary, plus

– capabilities explicitly passed as parameters (the operation specification indicates the classes and the required privileges of each parameter)

© Ellis Cohen 2002-2005 108

Mutually Suspicious Subsystems

The mutually suspicious subsystems problem describes the following situation

• A service has private data, and does not want clients of the service to be able to access the data

• A client has access to a great deal of sensitive information. This client wants to use a service, but only wants the service to be able to access a small subset of the information it can access.

Describe how the capability-based domain switching model solves the mutually suspicious subsystems problem.

Explain how this approach might be adapted for a relational database

© Ellis Cohen 2002-2005 109

The Confinement Problem

The confinement problem:A client wants to use a service, but wants to guarantee that the service cannot retain any of the client's data.

When a operation is executed through an Operation capability without the Mod* privilegeAll capabilities copied from the operation's dictionary to the newly created security domain are copied without the Mod* privilege.

Explain why this solves the confinement problem

© Ellis Cohen 2002-2005 110

Class Factory ObjectsEvery object has a class

Dictionary objects have class DictionaryOperation objects have class OperationThere are also objects that act as class factories.

These have class Class.

Creating an object of a specific class requires a capability for its Class object with the Create privilege.

Creating an operation requires a capability for the Operation Class object with the Create privilege.

(Although a capability system is likely to have built-in operations which allow creation of basic types of objects such as operations and dictionaries)

Creating a new Class object requires a capability for the Class Class object with the Create privilege.

© Ellis Cohen 2002-2005 111

Recasting CapabilitiesA user with a capability for a Class object with the Super privilege can cast a capability to its superclass.

The built-in operation Supercast( classcapa, capa, reqprivs, newprivs ) takes

– classcapa, a capability for the a class object (e.g. the Stack Class object) with Super privileges.

– capa, a capability designating that class (e.g. a Stack)– reqprivs, the class-specific privileges the capability

must have (e.g. Push)– newprivs, privileges of the returned capability

Supercast returns a capability for the same object designated with its superclass (e.g. List), with

– Meta-privileges and the Mod* privilege taken from the original capabililty, and

– the remaining privileges taken from newprivs.

Subcast works in a similar way.

© Ellis Cohen 2002-2005 112

Protected Class-Based EncapsulationA Class object also describes the class-specific privileges, and has a dictionary containing its operations (i.e. class methods)

This supports protected class-based encapsulation in combination with class-specific privileges, domain switching & recasting.

Example, consider a Stack class, implemented as a subclass of List

Stacks have privileges Push, Pop, Length, and Nth.Stacks also have operations Push, Pop, Length, and Nth (it is common, but not required, for the privileges to match the class methods).The Push operation expects as its first parameter, a capability for a stack with Push privileges.It uses Supercast to obtain a capability for the stack denoted as a List, with the privileges needed to implement the Push (RdD and WrC), and then appends its second parameter onto the stack/list.

These mechanisms eliminate the need for class-based subsystems to maintain extents

© Ellis Cohen 2002-2005 113

Templates for Recasting

Some capability-based systems support two-step recasting.

– The operation MkSupercaster( classcapa, reqprivs, newprivs ) returns a template, a pseudo-capability actually used to do the supercast

– Supercast( template, capa ) returns the supercasted capability based on the template

This allows each class method's dictionary to include only the template(s) it needs, limiting the damage it can do in case of error.

© Ellis Cohen 2002-2005 114

Principle of Least Authority

Capability-based systems support the Principle of Least Authority:

Each subject is authorized to perform all and only the actions necessary for its work.

In particular, operations (including class methods which use templates for supercasting) have access only to the privileges they need.

Date post:	02-Jan-2016
Category:	Documents
Upload:	asher-owens
View:	222 times
Download:	1 times

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Object Persistence These slides are...

Documents