Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | asher-owens |
View: | 222 times |
Download: | 1 times |
1
Advanced Database Topics
Copyright © Ellis Cohen 2002-2005
Object Persistence
These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
For more information on how you may use them, please see http://www.openlineconsult.com/db
© Ellis Cohen 2002-2005 2
Lecture Topics
Handling Result CollectionsClient Access to Server-Side ObjectsClient-Side Object CachingModifying Persistent OO DataCache Management for OO DataDeleting and Inserting
Persistent ObjectsJava Data Objects (JDO)JDO Query Language (JDOQL)Capability-Based Access Control
© Ellis Cohen 2002-2005 3
HandlingResult Collections
© Ellis Cohen 2002-2005 4
Using Result Collections
OQL queries returns collections of references to objects
Client programs will want to iterate through the collection and access fields of the referenced object.
This suggests that we may want to cache the objects at the client.
© Ellis Cohen 2002-2005 5
Embedded OQL Programming
Imagine a PL with embedded OQL
var richEmps set<Employee> :=SELECT e FROM e IN emps WHERE e.sal > 3000;
for e in richEmps loop if e.job = "MANAGER" then pl( e.name || " " || e.sal ); end if;end loop;
Creates a temporary
collection to hold
references to rich
Employees
© Ellis Cohen 2002-2005 6
Result Management Alternatives
1) The OQL result collection richEmps is left on the Database Server as a temporary collection (deleted when the session ends)
2) richEmps is returned to the client (it contains OIDs for the rich employees which are stored at the server)
3) richEmps is returned to the client, and the rich employees are cached at the client as well.
Which is better and why?Consider what happens when the code iterates
through richEmpsHow does this compare to RDB Result Sets
© Ellis Cohen 2002-2005 7
Storing OQL Result Collections
emp1
emp3
emp5
emp4
emp2
empsemp1
emp2
varrichEmps
DB Server
Client-side
Object Cache
varrichEmps
varrichEmps
OIDreference
pointerreference
1
2
3
© Ellis Cohen 2002-2005 8
Result Collection Tradeoffs • If the result collection and every object in
it are copied to a client-side object cache– A significant amount of copying may need to be
done as a result of the query– If the program runs remotely, then it all needs
to be copied across the wire– Queries involving result collections must be
disallowed or require a query engine that can integrate data stored both in the DB server and in the client cache.
– Wasteful if NOT every object in the collection is accessed
• If the objects in the result collection are not cached at the client
– Every access to a field in an object in the result collection (e.job, e.name, e.sal) requires an interaction with the DB server
– If the program runs remotely, then every access goes across the wire
© Ellis Cohen 2002-2005 9
Query IntegrationImagine a PL with embedded OQL
(used either at the server or a remote client)
var richEmps set<Employee> :=SELECT e FROM e IN emps WHERE e.sal > 3000;
var richNyEmps set<Employee> :=SELECT e FROM e IN richEmpsWHERE e.dept.loc = "New York";
for e in richNyEmps loop if e.job = "MANAGER" then pl( e.name || " " || e.sal ); end if;end loop;
Implies client cache manager has a query
engine that can process queries that
access both persistent and
temporary collections
© Ellis Cohen 2002-2005 10
Java OQL Programming
OqlConnection conn = …;OqlQuery query = conn.createQuery();
String oqlstr :="SELECT e FROM e IN emps WHERE sal > 3000;
OqlCollection richEmps := query.getCollection( oqlstr );
oqlstr := "SELECT e FROM e IN &empset " + "WHERE e.dept.loc = 'New York'";
query.prepare( oqlstr );query.bind( 'empset', richEmps );OqlCollection richNyEmps := query.getCollection();
Iterator iter = richNyEmps.iterator();while (iter.hasNext()) { Employee e := (Employee)iter.next(); if (e.job = 'MANAGER') pl( e.name + " " + e.sal );}
© Ellis Cohen 2002-2005 11
ClientAccess to
Server-Side Objects
© Ellis Cohen 2002-2005 12
Remote References
If objects, including those referenced by query results, are never cached at the client
– local program variables must reference objects stored on the DB server
– for client-side programs, these are remote references, often referred to as proxies.
© Ellis Cohen 2002-2005 13
Distributed Object Proxies
Each client has proxies (holding OIDs) to objects kept
only on the server. Every access to an object must be passed back to the server
Disk
Memory
OO DB Server
OO Client 1
OO Client 2
© Ellis Cohen 2002-2005 14
Concurrency Control for Proxies
Lock-BasedObjects locked at the server
Cache-BasedThe server must maintain a cache for each client holding versions of data read/written by that client
© Ellis Cohen 2002-2005 15
Global vs Local OIDsGlobally unique OIDs
OIDs are globally unique: usually encodes ip address + locally unique id of database
Enables object mobility (at cost of object location service or forwarding)
Facilitates replication & remote references
Locally managed OIDsOIDs are only unique to the databaseOID might simply be the virtual memory
address (especially if objects are versioned and are immutable once committed)
Remote references must add ip addess + database id to the OID
© Ellis Cohen 2002-2005 16
Persistent vs Active OIDsPersistent OIDs
Names object whether•active (in memory) or•passive (only in secondary storage).
Can be used as persistent references
Active OIDsObject given OID
only when it is activated.Persistent references to object use a soft naming
scheme (e.g. monikers)A name is bound to an OID only when the object is
activatedUseful for distributed services
© Ellis Cohen 2002-2005 17
Remote Persistent ObjectsCORBA
Uses proxies and globally unique persistent OIDs
DCOMUses proxies and globally unique active OIDs.Monikers used for persistent naming
EJB (Entity Beans)Uses proxies and locally managed persistent
OIDs, based on primary keys
JDOUses persistent OIDsMay be either globally unique or locally managedDoesn't support proxies;
uses distributed local caching(would probably be useful to specify that some objects stay only on the server)
© Ellis Cohen 2002-2005 18
Client-SideObject Caching
© Ellis Cohen 2002-2005 19
Client-Side Object CacheA client-side object cache is
– a client-side cache(i.e. a client-side lazy replica of the
server DB state)– where the data items held in the cache
are objects (with their OIDs) instead of tuples (with their ROWIDs)
Unlike an RDB client-side cache– queries do NOT automatically load all
returned objects into the cache– objects are loaded into the cache by
navigation as well as queries– applications access and modify the
items in the cache directly!
© Ellis Cohen 2002-2005 20
Object Loading ApproachesExplicit
Client decides when an object is explicitly loaded (a single object, all objects in a collection, etc.)The system will raise an exception when following a reference to an unloaded object
Transparent(on Access) When the client needs the content of an unloaded object, it will automatically load the object(on Reference) In addition, when the client stores a reference to an object, it will make space for the object (hollow objects)
Pre-FetchingIn either case, when an object is loaded, the system may also load objects clustered with it (e.g. other objects on the same page as the object being loaded and/or other objects it references)
© Ellis Cohen 2002-2005 21
Transparent Loading on AccessrichEmps := SELECT …
anEmp := richEmps.pick()
anEmp
emps
emp1
emp3
emp5
emp4
emp2
theDept := anEmp.dept
anEmp
varrichEmps
richEmps
*emp2
dept4
theDept
© Ellis Cohen 2002-2005 22
Representing Object References
Database ObjectsDatabase objects are referenced by OIDsOIDs (directly or indirectly) locate the
object in the DB:[DB identity +] page + slot in page
Program ObjectsObjects in memory are generally
referenced by pointers, which generally hold the memory address of the object
References may indirect through another object
References to objects in memory can take much less space than OIDs!
© Ellis Cohen 2002-2005 23
The Reference Representation ProblemSuppose both objects A and B are persistently stored in the database, and A holds a reference to B, represented by an OID.
or be swizzled to a pointer (direct or indirectly) to B?
A BOID for B
If both A and B are loaded into a client's object cache, they both now have a memory address. Should A's reference to B remain an OID?
A BOID for B
A BPtr to B
If the reference is swizzled, and A is modified, then the ptr must be unswizzled back to an OID before writing A back to the DB.
© Ellis Cohen 2002-2005 24
Memory-Based Object Representation Approaches
Goal: efficient dereferencing & traversal
Indirect ApproachMemory reference points to intermediate object
which points to referenced objectExtra overhead; easy invalidation
Direct ApproachMemory reference points directly to referenced
object. Uses same model as ordinary programming
language objects!
Page-Translation ApproachDepends on Page Translation table (usually with
hardware support) to translate reference to address of object
© Ellis Cohen 2002-2005 25
Other Representation IssuesDereferencing
When we want to load an object given its OID, if it's– In memory: how do we find it?– Not in memory: What do we do once its loaded?
Tracking Dirty Objects Can the run-time system automatically keep track of objects which have been modified, so that the corresponding changes can be automatically or semi-automatically made to the database?
Managing InvalidationWe may want to selectively invalidate cached objects when they become stale or when transactions complete– If we invalidate a cached object, and another cache
object points to it, how do we prevent dangling pointers?
Each of the following approaches handles these issues differently
© Ellis Cohen 2002-2005 26
Oid Lookup Approach
30493
'Joe Java'
B709342A8…
anEmp… … …
A34F02397… … ptr
… …
128 bit oid info ptr
30493
'Joe Java'
B709342A8…
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
dept
On Loading anEmp from server
When anEmp.dept is dereferenced, aDept is loaded from the server (if not in memory)
OODB & Memory
representation are identical
Object Table
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 27
Deferred Indirect Approach
… … …
A34F02397… … ptr
… …
128 bit oid info ptr
On Loading anEmp from server
anEmp.dept is swizzled to point to the object table row on first dereference, which points to aDept.
aDept is loaded if not already in memory
OODB & Memory
representation are initially
identical
Object Table
30493
'Joe Java'
Ptr to obj table row
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
30493
'Joe Java'
B709342A8…
anEmp
dept
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 28
Immediate Indirect Approach
30493
'Joe Java'
Ptr to obj table row
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … NULL
128 bit oid info ptr
30493
'Joe Java'
Ptr to obj table row
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
dept
On loading anEmp, anEmp.dept is immediately swizzled to point to an object table row. If aDept's
OID is not already in the table, a row is created for it.
On loading aDept
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 29
Deferred Direct Approach
30493
'Joe Java'
Ptr to object
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
When anEmp.dept is dereferenced, the ptr is swizzled
to point directly at object
… … …
A34F02397… … ptr
… …
128 bit oid info ptr
On Loading anEmp from server
OODB & Memory
representation are initially
identical
30493
'Joe Java'
B709342A8…
anEmp
dept
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 30
Indirect/Direct Approach
… … …
A34F02397… … ptr
… …
B709342A8… … NULL
128 bit oid info ptr
30493
'Joe Java'
Ptr to object
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
On Loading anEmp
When anEmp.dept is dereferenced, the ptr is swizzled
to point directly at object
Object Table
30493
'Joe Java'
Ptr to obj table row
anEmp
dept
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 31
Hollow-Object Approach
… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
30493
'Joe Java'
Ptr to obj
anEmp… … …
A34F02397… … ptr
… …
B709342A8… … ptr
128 bit oid info ptr
When anEmp.dept is dereferenced, aDept is pulled over from the server and used to fill in the hollow object
(hollow)
Object Table
(aDept)
On Loading anEmp
30493
'Joe Java'
Ptr to obj
anEmp
dept
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 32
Hollow Object Loading (JDO)Hollow objects are strictly more expensive
(take more space) than other approaches: Encouraged by languages that are type-safe (e.g. Java)
When a hollow object is created, its exact subclass must be known to create the correct type of hollow object
Hollow objects can't be created for arrays, since their size may change in the database before they're loaded:
Referenced arrays must be loaded immediately.
More generally, an object may want to indicate that an object it references should be loaded with it.
© Ellis Cohen 2002-2005 33
Cache IntegrationIn some cases, a server and client can share the same address space
– Embedded systems (the DB runs as a module of the client)
– Memory mapping (Pages of the BD server) can be explicitly shared with the client
In this situation, the client's object cache can be integrated with the DB server's page cache. That is, clients can directly address objects in the server page cache.
– If multiple clients can map the same DB server page, then locking must be used (generally at the page level, possibly on individual objects)
Why can't optimistic concurrency be used?
All of the previous approaches can be used with either separate or integrated caches.
Page Translation is designed for integrated caches.
© Ellis Cohen 2002-2005 34
Page-Translation Approach
anEmp… … …
A34F023 … 34
… …
DB page# info page#
dept
When anEmp.dept is dereferenced, entire page containing aDept must
be pulled over from the server
When loading anEmp, pull over entire
page from the OODB server
Page Translation Table
34
30493
'Joe Java'
B709342A8…
anEmpDB page# info page#34 97
Assumes DB page# can be used in a virtual address; More likely, needs to be
swizzled to a virtual page #
… … …
A34F023 … 34
… …
B709324 97
30493
'Joe Java'
B709342A8…
30
SALES
NY
aDept
© Ellis Cohen 2002-2005 35
Modifying Persistent OO Data
© Ellis Cohen 2002-2005 36
Perspectives on ModifyingPersistent Object Data
Standard DB PerspectiveHow is data stored in an OODB
modified?How do we make sure that
transactions satisfy the ACID properties?
OO Programming PerspectiveHow do we easily save and restore
the programming state of OO programs?
© Ellis Cohen 2002-2005 37
Historic Approaches to Persistenceof OO Program Data
CheckpointingTakes lots of spaceOnly useful for program that did checkpoint
Serialization (Pickling)Program controls what is savedSerialized state may be able to be read by other
programs
Persistent Object StoreAllows individual objects to be saved and
reloaded in random order by multiple programs
Object DatabasePersistent Object Store supporting querying and
transactional semantics
© Ellis Cohen 2002-2005 38
Query & Update Approachesfor Persistent Data
Result Set Approach"Real" client-side cache is invisible to clientsResult set returned by query is transferred into the
client's memory, where it is independent of any client-side cache
Persistent data (or data in the cache) must be modified via commands INSERT/DELETE/UPDATE
Most common approach used with RDBs
Visible Client-Side Cache ApproachQueries (and navigation) cache results in a client-
side cache, which is integrated with the user's address space.
Persistent data modifications result from writing back data modified in the client's address space
Approach used with OODBs and with Object-Relational Mapping
© Ellis Cohen 2002-2005 39
OODB Command-Based Update
var richEmps set<Employee> :=SELECT e FROM e IN emps WHERE e.sal > 3000;
…
UPDATE e IN emps SET e.sal := e.sal * 1.1 WHERE e.job = "MANAGER"
AND e IN richEmps
COMMIT;
If the Standard RDB Perspective/ Result Set Approach were used with OODBs,
here's how a set of employees might be updated
This is NOT the approach that is commonly used.
© Ellis Cohen 2002-2005 40
OOPL Cache-Based Approach
var richEmps set<Employee> :=SELECT e FROM e IN emps WHERE e.sal > 3000;
for e in richEmps loop if e.job = "MANAGER" then e.sal := e.sal * 1.1; e.markDirty(); end if;end loop;
COMMIT;On COMMIT, all objects marked as dirty are written back to the database
© Ellis Cohen 2002-2005 41
Explicit Persistence
In explicit persistence, the client explicitly notifies the runtime system
– obj.markDirty()• Indicates that a persistent object has been modified
(needed for cache management as well as controlling DB update on commit)
– obj.updateWhenDirty( true/ false )• Tells the system whether or not to persist any changes
to the DB on commit if it the object is modified (default is true)
– obj.update ()• Persists object changes to the DB if object has been
modified (used if updateWhenDirty is false)
If an object is already persistent, then after updating the locally cached version of the object, the updated contents of the object will generally
need to be persisted back to the server.
© Ellis Cohen 2002-2005 42
Transparent Persistence
In transparent persistence, the underlying system "automatically" detects when an object has been modified and automatically calls markDirty()
Still supports updateWhenDirty and update
If an object is already persistent, then after updating the locally cached version of the object, the updated contents of the object will generally
need to be persisted back to the server.
© Ellis Cohen 2002-2005 43
Program Compilation & Execution
Program (Java, C#, etc)
ByteCode
Executable
Hardware
Programcompilation
Bytecode compilation
Hardware
ByteCodeInterpreter
executesexecutes
executes (interprets)
© Ellis Cohen 2002-2005 44
Transparent Modification DetectionThe system must be enhanced to call
markDirty when a (persistent) field in a persistent object is modified.
•Execution Enhancement– Modify Firmware (for executables)– Modify Bytecode Interpreter (for bytecodes)
•Compiler Enhancement– Modify the program compiler or the bytecode
compiler to add the call to MarkDirty
•Code Enhancement- Use a separate tool that revises the
- Program code- Bytecode- Executable code
to add the call to MarkDirty
© Ellis Cohen 2002-2005 45
Mutators for Transparent Persistence
Suppose all persistent object are modified through mutator methods
Instead ofjoe.sal := 1400
Usejoe.setSal( 1400 )
Either clients can be required to write code this way, or the code can be changed through compiler or code enhancement.
Advantage: The call to MakeDirty can simply be added during enhancement to all set_xxx methods
Note: Accessor methods [e.g. joe.getSal()] can be used in a similar way as the place to call the code that loads an object if necessary.
© Ellis Cohen 2002-2005 46
Transactional PersistenceOn first access to a persistent object in a transaction
– Load data if not present (and if loading is transparent)
– Perhaps pull anyway if stale– Mark as read, which– Obtains SHARE lock at server if lock-based
On first update of a persistent object in a transaction
– Mark as dirty, which– Obtains EXCLUSIVE lock at server if lock-based
On CommitIf optimistic, do validation at server. [If validation
fails, db sends identity (or even contents) of all those objects which are stale]
Provide content of written (i.e. dirty) objects to the server
If lock-based, locks are released at the server
© Ellis Cohen 2002-2005 47
OODB LockingOODB Locking is very similar to RDB Locking
• The Table/Row locking hierarchy in RDBs corresponds to the Extent/Object hierarchy in OODBs (and both use index locking)
• Non-extent collections just contain a group of object references. They need to be locked just like other objects when read/written, but do not form a hierarchy with the objects they reference.
• OODBs can also use a Page/Object hierarchy (lock the page instead of all objects on the page). Extent/Page/Object hierarchies are less common, since a page often contains objects from many extents.
• OODBs may use a Cluster/Object hierarchy, where a cluster is a group of objects which are used together. A Cluster/Page/Object hierarchy may be used if they are stored together.
© Ellis Cohen 2002-2005 48
Cache Management for OO Data
© Ellis Cohen 2002-2005 49
Cache Data Lifetimes
Transaction Lifetime– The objects in a client's cache are
cleared at the end of each transaction – i.e. every transaction starts with an empty cache
Session Lifetime – Cached objects remains in the cache at
the end of a transaction. At the start of a transaction, the cache may contain objects used in the client's previous transactions.
• Does require a (relatively simple) local undo mechanism if a transaction aborts
© Ellis Cohen 2002-2005 50
Object Timestamps for Session Lifetime Caches
DB Server TimestampEvery object on the DB server has a timestamp: the time when the object was last updated at the server
Client Cache TimestampWhen an object is retrieved from the DB server, its server timestamp is retrieved as well, and stored in the client cache along with the object
© Ellis Cohen 2002-2005 51
Per-Object Metadata
For each object in the client's cache, the cache manager maintains
– the OID of the object– whether the object is dirty– whether to write the object on commit
if it is dirty– whether it was used during the current
transaction– [for a session lifetime cache] the
object's timestamp (may be in the cache and/or kept at the server)
© Ellis Cohen 2002-2005 52
Managing Object Retrieval
An employee anEmp is in the client's object cache. anEmp contains a reference to a department, (represented by the department's OID).
empno 30493
ename 'Joe Java'
dept B709342A8…
anEmp
The client's program executes
myLoc := anEmp.dept.loc
Under what circumstances should anEmp.dept be retrieved from the DB serverand be placed in the cache.
© Ellis Cohen 2002-2005 53
Object Retrieval RulesIf the OID in anEmp.dept doesn't identify an object already cached
Read in anEmp.dept from the DB server & place it in the cache
If the OID in anEmp.dept identifies an object already cached not yet used in this transaction
Replace the cached object with the latest version read from the DB server
– DB server can instead indicate that the version in the client cache is the latest version
– [If using optimistic concurrency] Replacement is not required. Chance that it is the latest version (Fail validation later if wrong)
Don’t retrieve if the object is cached and has already been used in the current transaction.
© Ellis Cohen 2002-2005 54
Partial Queries
Suppose the cache is empty, and the very first client operation in a transaction is
SELECT e.empno, e.salFROM e IN empsWHERE e.job = "CLERK"
How should the client cache managerprocess this query?
Consider what happens if the same request is made later in the transaction.
© Ellis Cohen 2002-2005 55
Partial Query Alternatives
1.Send the request to the server. Cache nothing.
Reasonable
2.Send the request to the server.Cache the results for reuse.
Possible, but would significantly complicate concurrency and cache management it would require having field-level granularity, not just tuple-level granularity.
© Ellis Cohen 2002-2005 56
Caching for ReuseRequest & cache entire objects from the server
var tempemps Set<Employee>:= SELECT e FROM e IN emps WHERE e.job = "CLERK"
Then execute the query
SELECT e.empno, e.sal FROM e IN tempemps
locally. That is, each employee in tempemps will be downloaded and cached at the client, and empno and sal will be retrieved from it
Makes most use of cache, cost of downloading objects worth it if they will be used in future queries; may not be if they won't. May be able to set cache manager hints to decide
© Ellis Cohen 2002-2005 57
Complex Query Problem
Suppose some employees are already loaded in the client's cache. Possibly, some of their salaries have been updated. Then (in the same transaction), the client executes
SELECT deptno, sum( SELECT p.sal FROM p IN partition)FROM e IN emps WHERE e.job != 'CLERK'GROUP BY deptno: e.deptno
How should the client's cache manager process this query?
© Ellis Cohen 2002-2005 58
Processing Complex QueriesExecuting the query at the server would ignore any non-clerk salaries already updated at the client.
The simplest approach is for the client cache manager to first process
var tempemps Set<Employee> :=SELECT e FROM e IN emps WHERE e.job != 'CLERK'
Then, process the following query locally (which pulls the remaining employees into the cache)
SELECT deptno, sum( SELECT p.sal FROM p IN partition)FROM e IN tempemps GROUP BY deptno: e.deptno
More sophisticated cache managers may be able to apply the query to the employees in the cache, send a request to the server to compute the query for the rest of the employees, and then integrate the results.
A sophisticated query manager can also send the entire request to the server if it keeps track of the fact that no employee (or non-clerk) salaries have been updated so far during the transaction!
© Ellis Cohen 2002-2005 59
Invalidation ApproachesComplete Invalidation
Whenever a transaction completes, invalidate all objects in the cache (except perhaps for ones explicitly marked)
Aggressive InvalidationWhenever objects are persisted to the DB server
by a client, the server notifies every other client that has cached any of those objects to invalidate them. Server maintains cache timestamps for all cached objects.
Failure InvalidationThe DB server notifies the client on validation
failure that the server has a more recent version of the object
The client can also invalidate objects that have not been used recently
© Ellis Cohen 2002-2005 60
Deleting and InsertingPersistent Objects
© Ellis Cohen 2002-2005 61
Deleting Persistent Objectsvar oldnlst Employee :=
last( SELECT e FROM e IN emps WHERE e.job = "ANALYST" ORDER BY e.age)
coolemps.remove( oldnlst )– Removes (the reference to) oldnlst from
coolemps
– Removes it immediately in the cache;at commit time, from the database
oldnlst.delete()– Removes (the reference to) the object from its
extent [immediately in cache, at commit from the database]
– Deletes the actual object [at commit, from the database; in cache, invalidates it]
© Ellis Cohen 2002-2005 62
About Deletion …
emps
emp1
emp2
e… coolemps
Can e.delete() delete e (in addition to removing it from its extent)
if there are other persistent references to it?
© Ellis Cohen 2002-2005 63
Models for Deletion
1. No; e itself will continue to exist until all references to it are deleted. Semantically problematic to have a persistent object which is not part of its extent.
2. No. e.delete() raises an exception if there are other references to it. Problematic if you really do want to get rid of the employee. Also, like #1, it requires reference counting or garbage collection, which prevents objects from being on remote hosts or removable media:
3. Yes, and it completely deletes p. But causes dangling references. (How are they handled?)
4. Kind of. Mark e as DELETED (acts like it is deleted). Really delete it when no references to it. Problem: unnecessarily takes up memory, and also prevents distributed/removable object repositories.
Can e.delete() delete e (in addition to removing it from its
extent) if there are other persistent references to it?
© Ellis Cohen 2002-2005 64
Dangling References & OIDs
Dangling references are only a problem when it is impossible to tell when the memory of a freed object is filled with a different object (possibly of a different type or on a different memory boundary).
In an OODB, references hold OIDs, which are unique. Suppose an OID contain a page #, a slot #, and generation #(which distinguishes different objects stored at that page/slot over time).
Dereferencing the OID finds the page, and uses the slot # to get the offset to the object. The object header contains the generation #, which is checked against the dereferenced OID. If they don't match, the OID must reference a deleted object.
If we do delete e, what do we do about the dangling reference to
e left in coolemps?
© Ellis Cohen 2002-2005 65
OIDs and ROWIDs
27 622 Auditing CHICAGO …
deptno dname loc …
empno ename dept
6291 SMITH … AAAGDxAABAAAH9EAAD27
emp
A local OID can be represented as a ROWID + a generation #, which also appears in the referenced
object's header
© Ellis Cohen 2002-2005 66
Persisting New ObjectsExplictly
– Specify each object or collection of objectsto be made persistent (not every object created by a program needs to persist)
Declaratively– By object class– By object property
• May be able to be changed dynamically
Reachability– Specify root objects– All newly created objects reachable (via
object references) from root objects– Stop traversing if you get to a newly
created object marked as transient (either explicitly or based on its class)
© Ellis Cohen 2002-2005 67
Extents & Inheritance
Models for Extents• Only superclasses have extents• New object is only added to its own class extent; not
extents of superclasses• New object added to its own class extent as well as
extents of all its ancestor superclasses
Extent-Related Functions (non-standard)• all(class) - returns a collection of all objects in that
class (or any of its subclasses)• only(class) - returns a collection of all objects in that
class which are not instances of any of its subclassesConsider how to implement these functions in each of
the models above
© Ellis Cohen 2002-2005 68
VersioningTemporal Versioning
On each commit that modifies an object, the OODB remembers the previous state (often using a delta), and the timestamp of the commit
Supports "flashback" queries; queries executed with the state at a specified time
Immutable VersioningCommitted objects are immutable (and easily cached and
replicated)Modifications made to an object within a transaction are made to a
new version of the object (with a new OID), which is persisted on commit (the old object is often retained as a delta of the new object, or vice versa)
If object X is modified => X', and object Y references X, then we may want to automatically create Y', a new version of Y, which now references X' instead of X.
Version Groups & SelectionVersions are immutable and have OIDs, but additionally, a separate
OID can represent a version group (a group of different versions of an objects)
Following a reference for an OID for a specific version gets that version
Following a reference for a version group selects a particular instance of the group, typically the latest, but there are mechanisms to select other versions based on their properties.
This approach supports version branches and merges
© Ellis Cohen 2002-2005 69
Java Data Objects
© Ellis Cohen 2002-2005 70
JDO 1.0
Persistent Object Layer + Methodology
Transparent Transactional Persistence
Can be used to map objects to RDBs as well as OODB's (and other Persistent Object Stores)
The material here is mostly based on JDO 1.0(The JDO 2.0 spec was published in Dec 2004)
© Ellis Cohen 2002-2005 71
JDO Persistence ApproachTransactional Transparent Persistence
– Based on bytecode enhancement w added accessor/mutator methods
– Extends persistent capable classes so they implement a PersistentCapable interface, which delegates to a StateManager
Java source itself does not indicate– Which classes are persistent capable– Which fields of a class should be saved– Which fields represent relationships – How to map data structures to persistent store
(which could be an OODB or an RDB)
These details are specified via a standard configuration file with vendor-specific extensions
© Ellis Cohen 2002-2005 72
1st & 2nd Class Objects
In C or C++Objects may be contained within other objectsSame distinction as between Class and REF ClassWhen the parent object is made persistent, the
contained object is persisted with it
In JavaThis distinction can't be made in the same wayAll objects are independent and accessed via
referencesJDO distinguishes between
• 1st class objects (stand on their own) and• 2nd class objects (act as if contained)
© Ellis Cohen 2002-2005 73
2nd Class Objects• 2nd class objects are automatically loaded &
stored with the 1st class objects that refer to them
– Problems arise if a 2nd class object is shared by multiple 1st class objects
• 2nd class objects do not have their own JDO identity (OID visible to JDO client)
• Transparent persistence (esp. automatic dirty detection) is not supported for 2nd class objects
– Only the 1st class object which refers to a 2nd class object should update it.
– It must then explicitly call makeDirty on itself!
• Arrays are always 2nd class objects
© Ellis Cohen 2002-2005 74
Object Characteristics
Persistent vs Transient– Persistent objects are automatically
persisted– Transient objects are not persisted and
do not have a persistent object identityTransactional vs Non-Transactional
– Transactional objects participate in transactions (ACID properties)
• No support for different isolation levels• No support for nested transactions
– Non-transactional objects don't.Persisted explicitly.
These characteristics can be changed dynamically
© Ellis Cohen 2002-2005 75
Life CycleMake an instance persistent
Employee emp = new Employee( ... );
pm.makePersistent( emp );– For transactional objects, object not made
persistent until commit is done
Persistence by reachability– All instances reachable from emp
become persistent as wellemp.dept = new Dept( … );
Delete an instance from the databasepm.deletePersistent( emp );– Depends upon underlying DB Delete Model– Does not delete by reachability– For transactional objects, object not deleted
until commit is done
pm is a persistence manager: it manages an object cache connected to a persistent store
© Ellis Cohen 2002-2005 76
Updating Persistent Objects
trans = pm.currentTransaction();
emp.name = "Joe B. Jones"; emp.dept.name = "Financing";
trans.commit();
Default caching model:All transactional objects are invalidated (I.e. made hollow!) when transaction completes
JDO extensions support other caching models:e.g. maintain for duration of the sessionRequires specification of caching strategy
The variable emp refers to an object used in a previous transaction
Is it still in the object cache or will it need to be reloaded?
© Ellis Cohen 2002-2005 77
Persisting New Objects
pm = pmf.getPersistenceManager();trans = pm.currentTransaction();
Employee emp = new Employee( … ); emp.name = "Joe Jones"; … emp.dept = new Department( … ); emp.dept.deptno = 24; … pm.makePersistent( emp );
trans.commit();rememberid = pm.getObjectId( emp );
-- could pass to some other machine
pmf is a Persistence Manager Factory. It is connected to a persistent store
© Ellis Cohen 2002-2005 78
Updating Remembered Objectspm = pmf.getPersistenceManager();trans = pm.currentTransaction();
Employee emp =(Employee)pm.getObjectById( rememberId );
emp.name = "Joe B. Jones"; emp.dept.name = "Financing";
trans.commit();
pm must be connected to the same persistent store where the object was persisted; Could be a "meta" store that knows how to find and connect to the current store holding the object
© Ellis Cohen 2002-2005 79
JDOQuery Language
© Ellis Cohen 2002-2005 80
JDOQL 1.0 vs OQL
JDO Query LanguageSimpler but less functional than OQLEasier to map to RDB SQL
(or more primitive persistent stores)Limited forms of Joins/SubqueriesNavigation onlyNo DISTINCT or GROUP BY (added in JDO
2.0)No projection: Can only return elements
of the collection queried, not their attributes (changed in JDO 2.0)
Can be used to query collections returned to client
© Ellis Cohen 2002-2005 81
JDO 1.0 QueriesQueries
– filter Collections– return Collections
Required elements– Class of results– Collection to filter
•may be an Extent:done on back end
•may be a Collection: done in client cache
– Filter (Java boolean expression)
© Ellis Cohen 2002-2005 82
Example Queries
select e from e in emps where e.sal > 10000
Query q = pm.newQuery(Employee, emps, "sal > 10000" );
Collection result = q.execute();
select e from e in emps where e.sal > e.mgr.sal
Query q = pm.newQuery(Employee, emps, "sal > mgr.sal" );
select e from e in empswhere "Yael" in e.kidnames
Query q = pm. newQuery(Employee, emps,"kidnames.contains(\"Yael\" )" );
© Ellis Cohen 2002-2005 83
Query Parameters
select e from e in emps where e.sal > minsal
Parameter declarations
Query q = pm.newQuery( Employee, emps )q.declareParameters ("float minsal");
Filter uses declarations as if they were in scope
q.setFilter ("sal > minsal");
Parameter binding at Query execution(primitive values passed as wrapper objects)
result = q.execute (new Float (10000))
© Ellis Cohen 2002-2005 84
Querying Navigated Collections
Navigate through collections referenced by an object
Find Boston Departments with at least one well-compensated Employee
select d from d in deptswhere d.loc = "Boston"and exists e in d.empls : e.sal > 10000
Declare variables used to iterate through those collections
Query q = pm.newQuery( Dept, depts );
q.declareVariables ("Employee e");
q.setFilter ( "loc = \"Boston\" && empls.contains(e) && e.sal > 10000" );
Collection wcdepts = q.execute();
© Ellis Cohen 2002-2005 85
Querying Local Collections
Find departments in wcdepts that have more than 20 employees
select d from d in wcdeptswhere count(d.empls) > 20
Query q = pm.newQuery( Dept, wcdepts,"empls.size() > 20" )
© Ellis Cohen 2002-2005 86
Using Collections As Parameters
Find employees whose spouses are in one of the departments in wcdepts
select e from e in empswhere e.spouse in wcdepts
Query q = pm.newQuery( Employee, emps );q.declareParameters ("Collection wcdepts");q.setFilter( "wcdepts.contains(spouse)" )Collection wcspouses =
q.execute( wcdepts );
(To get employees who are themselves in one of the departments in wcdepts, use self instead of spouse)
wcdepts is not persistent; so it must be passed as a parameter.
© Ellis Cohen 2002-2005 87
JDO 2.0 Results
JDO 2.0 allows queries to return values, not just objects
select distinct(e.job) from e in emps where e.sal > 10000
Query q = pm.newQuery(Employee, emps, "sal > 10000" );
q.setResult( "distinct job" ); Collection result = q.execute();
Query q = pm.newQuery( Employee, emps );Collection result = q.execute();Collection jobs := new HashSet();Iterator r = result.iterator();while (r.hasNext()) { jobs.add( ((Employee)r.next()).job ); }
JDO 2.0
JDO 1.0
© Ellis Cohen 2002-2005 88
Joins by Hand
select distinct e from e in emps, s in starbuckswhere e.zip = s.zip
Find employees who live in a zipcode where a Starbucks is located
zips := select distinct s.zip from s in starbucksQuery q = pm.newQuery( Starbuck, starbucks );q.setResult( "distinct zip" ); Collection zips = q.execute();
select e in emps where e.zip in zipsQuery q = pm.newQuery( Employee, emps );q.declareParameters ("Collection zips");q.setFilter( "zips.contains(zip)" )Collection staremps = q.execute( zips );
© Ellis Cohen 2002-2005 89
Joins by Handselect distinct e from e in emps, s in starbucks
where e.zip = s.zipFind employees who live in a zipcode where a Starbucks is located
zips := select distinct s.zip from s in starbucksCollection zips := new HashSet();Iterator s = starbucks.iterator();while (s.hasNext()) { int zip := ((Starbuck)s.next()).zip; zips.add(new Integer(zip)); }
select e in emps where e.zip in zipsQuery q = pm.newQuery( Employee, emps );q.declareParameters ("Collection zips");q.setFilter( "zips.contains(zip)" )Collection staremps = q.execute( zips );
© Ellis Cohen 2002-2005 90
Capability-Based Access Control
© Ellis Cohen 2002-2005 91
Object-BasedAccess Control Approaches
• Access control on objects (by granting privileges for, or associating security predicates or ACLs with objects or object classes). Possibly a collection of objects could be specified statically, or even dynamically by OQL expressions
• Security domains: allowing some objects to be accessed and/or updated only through operations (to which execute access can be selectively granted, or which dynamically determine when or how to execute)
• Capabilities: Access control included as part of references. The capability system described here is based (very loosely) on the classic CMU Hydra system.
© Ellis Cohen 2002-2005 92
Capability: Reference + PrivilegesrichEmps := SELECT …
anEmp := richEmps.pick()
anEmp
emp1
emp2
richEmps
RdC, WrC
RdD
RdD
RdD
RdD
RdD
richEmps
RdD – read the data in the referenced object
RdC – read the capabilities in the referenced object
WrC – write capabilities to the referenced object
Capability: Reference + privileges related to the
reference
RdC, WrC
© Ellis Cohen 2002-2005 93
Capability Copy & Restrict
The creator of an object gets a capability for that object with all privileges.
A user can create copies of a capability (possibly with restricted privileges), and make them available to other users
A user can restrict, but cannot arbitrarily increase a capability's privileges (though see privilege amplification)
This requires a protected mechanism (similar to Java's Reference classes) that doesn't allow forging capabilities or direct access to a capability's privileges.
© Ellis Cohen 2002-2005 94
Privileges
Capabilities contain 3 kinds of privileges:
• Class-Specific Privileges– Privileges specific to objects of a specific class– For example, capabilities for Operation objects
have an Execute privilege
• Generic Object Privileges– Privileges like RdD, RdC, and WrC that apply to
any object, regardless of its class
• Meta Privileges– Privileges which constrain the capability itself,
rather than the object referenced by the capability (invented for Hydra by E. Cohen, 1973)
© Ellis Cohen 2002-2005 95
Class-Specific PrivilegesEvery capability designates the class of the object it
references.
The capability can either designate the actual class of the object, or any of its superclasses.
The class-specific privileges of a capability correspond to the class actually designated by the capability.
Example: A Stack is a subclass of a List. A capability for a stack object can either be designated– as a Stack, in which case its Push privilege would
control whether or not it can be passed to the Stack.Push operation, and List operations could not be called using the capability.
– As a List, in which case the capability could only be used with the built-in List operations (depending upon its privileges), but not Stack operations, which expect Stack capabilities.
© Ellis Cohen 2002-2005 96
Generic Object Privileges
Read PrivilegesRdD – read the data in the referenced objectRdC – read the capabilities in the referenced object
Modification PrivilegesMod* – see next slideWrD – write data to the referenced objectWrC – write capabilities to the referenced objectClrC – clear or overwrite capabilities in the
referenced object (a capability for an object with only WrC cannot be used to overwrite capabilities in the object, but can be used append capabilities to the object)
Del – delete the referenced object
© Ellis Cohen 2002-2005 97
Transitive Privileges
Transitive Privileges (invented for Hydra by E. Cohen, 1973) refer not just to a specific capability or object, but to any one reached through it!
Mod* – is a transitive privilege– Mod* is required to modify the
referenced object in any way (it is needed in addition to any of the specific modification privileges such as Wrd or ClrC)
– Mod* is required to modify any object reached through the referenced object
© Ellis Cohen 2002-2005 98
The Modification ProblemMod* is used to solve the Modification Problem.
An abstract data type may be represented by a group of objects linked together (via capabilities)
If a user has access to a capability for an abstract data type object, without the Mod* privilege, that user cannot modify any part (including linked objects) of the representation of the abstract.
Note that ifIf richEmps is a capability variable (without the Mod*
privilege) referencing an array of capabilities, and
a program executes myEmp := richEmps[1], then
myEmps will not have the Mod* privilege either
A copy of a capability will not have the Mod* privilege, if the object holding the original capability was accessed through a capability without the Mod* privilege
© Ellis Cohen 2002-2005 99
Meta PrivilegesCapabilities may also have meta privileges, which constrain the capability itself, rather than the object it references
– MClr – allows the capability to be deleted, moved or overwritten from the field it is in
– MLoad* – a transitive privilege which allows the capability (or a capability obtained from any object reached from it) to be moved or copied.
– MStor* – a transitive privilege which allows the capability (or a capability obtained from any object reached from it) to be stored in a persistent object [Note: This implies that objects marked as temporary cannot later be remarked as persistent]. Absence of MStor* can be used to limit delegation.
– MAlias – used for aliased revocation (discussed later)
© Ellis Cohen 2002-2005 100
Security DomainsEvery database connection has an security
domain associated with it
The security domain is a dictionary collection containing capabilities for objects that can be accessed by the user/program connected to the database.
Because a domain is a dictionary, each capability is named, and these names are the top-level names which can be used in queries.
When a user connects to a database, the user's login domain is initially associated with the connection. It contains a number of capabilities for special objects.
See next slide
© Ellis Cohen 2002-2005 101
Granting PrivilegesNo need to explicitly "grant" privileges to a user.
Instead, a user can store a capability in an object where another user can read it.
For example, suppose each user (through their user login domain) has capabilities for
• Their schema. This holds capabilities for a user's objects.
• Their inbox. Another user can add a capability to a user's inbox to grant them a capability.
• The global inbox dictionary. Allows a user to get a capability for any other user's inbox.
• Their exports. A user stores a capability here when they want all users to be able to get it.
• The global exports dictionary. Allows a user to get a capability for any other user's exports.
What privileges should a user have for each of these capabilities, and what privileges should be included with capabilities in the global inbox and exports dictionary?
© Ellis Cohen 2002-2005 102
User Access Architecture
Mod*, RdC, MLoad*, MStor*
AllExports
Mod*, RdC, WrC, ClrC,
MLoad*, MStor*My
Exports
Mod*, RdC, MLoad*, MStor*
AllInboxes
Mod*, RdC, WrC, ClrC,
MLoad*, MStor*
MyInbox
Mod*, RdC, MLoad*, MStor*Joe
…
…
Mod*, RdC, MLoad*, MStor*Sue
…
Mod*, WrC, MLoad*, MStor*Joe
…
…
Mod*, WrC, MLoad*, MStor*Sue
…
Joe's Exports
Joe's Login
Domain
Joe's Inbox
Global Inbox
Dictionary
Global Exports
Dictionary
Could you design a capability-based access control model for an RDB where the objects are tables, views, operations, etc.?
Mod*, RdC, WrC, ClrC,
MLoad*, MStor*
MySchema
…
…
Joe's Schema
© Ellis Cohen 2002-2005 103
The Capability Revocation Problem
How can a capability be made available to another user, so that its use can later be revoked?
Hint: Consider using Meta Privileges?
© Ellis Cohen 2002-2005 104
Revocation with Meta PrivilegesStore the capability in a separate object (and make a capability for that object available to other users).Place the capability in the separate object without MLoad* privileges.
This allows the capability to be used (to reference data in the object it references), but prevents the capability from being moved or copied.The original holder of the capability can, at any time, clear (i.e. delete) the capability in the separate object, which means the other user can no longer use it.This is a bit restrictive, since it means that the other user cannot load the capability into a local variable, or store it (e.g. as the result of a query) in a transient object.
Place the capability in the separate object without MStor* privileges.
This prevents the other user from storing it in a persistent object. After clearing the capability, other connecting users may access copies of the capability in local variables or transient objects, but only as long as they remain connected.
© Ellis Cohen 2002-2005 105
Aliases & Immediate Revocation
A user who has a capability for an object with the MAlias privilege can interpose an alias between the capability and the object. Aliases are invisible to users through capabilities which do not have an MAlias privilege, and always when performing ordinary reads and writesA user who has an capability for an aliased object with the MAlias privilege can
– Block all access through the alias (i.e. revoke access)– Reallow access to the aliased object– (Perhaps) Change the alias to refer to a different
object– Interpose an additional (chained) alias between the
capability and the alias
Mod*, RdC, MLoad*, MStor*
Alias
an object
© Ellis Cohen 2002-2005 106
Aliases & Fine-Grained Access Control
More general alias models can be used to provide fine-grained access control.Fine grained aliases can have functions associated with them, which are executed when the alias is traversed, and might either (depending on the system)
– Return true or false, indicating whether access to the aliased object is allowed or not
– Return a capability for an object, which is used in place of the original capability.
© Ellis Cohen 2002-2005 107
Domain SwitchingWhen a definer-rights operation is called in an RDB, it
switches its security domain to the domain of the user who defined the operation.
In a capability-based system, a definer rights operation is created with an associated dictionary.
– The operation's dictionary contains capabilities for objects (some private) potentially needed by the operation when it executes
When a definer-rights operation is executed, a new security domain (a transient dictionary) is created and used while the operation executes (the previous domain is used again when the operation returns). It contains
– the capabilities from the operation's dictionary, plus
– capabilities explicitly passed as parameters (the operation specification indicates the classes and the required privileges of each parameter)
© Ellis Cohen 2002-2005 108
Mutually Suspicious Subsystems
The mutually suspicious subsystems problem describes the following situation
• A service has private data, and does not want clients of the service to be able to access the data
• A client has access to a great deal of sensitive information. This client wants to use a service, but only wants the service to be able to access a small subset of the information it can access.
Describe how the capability-based domain switching model solves the mutually suspicious subsystems problem.
Explain how this approach might be adapted for a relational database
© Ellis Cohen 2002-2005 109
The Confinement Problem
The confinement problem:A client wants to use a service, but wants to guarantee that the service cannot retain any of the client's data.
When a operation is executed through an Operation capability without the Mod* privilegeAll capabilities copied from the operation's dictionary to the newly created security domain are copied without the Mod* privilege.
Explain why this solves the confinement problem
© Ellis Cohen 2002-2005 110
Class Factory ObjectsEvery object has a class
Dictionary objects have class DictionaryOperation objects have class OperationThere are also objects that act as class factories.
These have class Class.
Creating an object of a specific class requires a capability for its Class object with the Create privilege.
Creating an operation requires a capability for the Operation Class object with the Create privilege.
(Although a capability system is likely to have built-in operations which allow creation of basic types of objects such as operations and dictionaries)
Creating a new Class object requires a capability for the Class Class object with the Create privilege.
© Ellis Cohen 2002-2005 111
Recasting CapabilitiesA user with a capability for a Class object with the Super privilege can cast a capability to its superclass.
The built-in operation Supercast( classcapa, capa, reqprivs, newprivs ) takes
– classcapa, a capability for the a class object (e.g. the Stack Class object) with Super privileges.
– capa, a capability designating that class (e.g. a Stack)– reqprivs, the class-specific privileges the capability
must have (e.g. Push)– newprivs, privileges of the returned capability
Supercast returns a capability for the same object designated with its superclass (e.g. List), with
– Meta-privileges and the Mod* privilege taken from the original capabililty, and
– the remaining privileges taken from newprivs.
Subcast works in a similar way.
© Ellis Cohen 2002-2005 112
Protected Class-Based EncapsulationA Class object also describes the class-specific privileges, and has a dictionary containing its operations (i.e. class methods)
This supports protected class-based encapsulation in combination with class-specific privileges, domain switching & recasting.
Example, consider a Stack class, implemented as a subclass of List
Stacks have privileges Push, Pop, Length, and Nth.Stacks also have operations Push, Pop, Length, and Nth (it is common, but not required, for the privileges to match the class methods).The Push operation expects as its first parameter, a capability for a stack with Push privileges.It uses Supercast to obtain a capability for the stack denoted as a List, with the privileges needed to implement the Push (RdD and WrC), and then appends its second parameter onto the stack/list.
These mechanisms eliminate the need for class-based subsystems to maintain extents
© Ellis Cohen 2002-2005 113
Templates for Recasting
Some capability-based systems support two-step recasting.
– The operation MkSupercaster( classcapa, reqprivs, newprivs ) returns a template, a pseudo-capability actually used to do the supercast
– Supercast( template, capa ) returns the supercasted capability based on the template
This allows each class method's dictionary to include only the template(s) it needs, limiting the damage it can do in case of error.
© Ellis Cohen 2002-2005 114
Principle of Least Authority
Capability-based systems support the Principle of Least Authority:
Each subject is authorized to perform all and only the actions necessary for its work.
In particular, operations (including class methods which use templates for supercasting) have access only to the privileges they need.