October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 1
Promoting Semantic Interoperabilityof Metadata for Directories of the Future
Art Vandenberg, Georgia State University
Vijay K. Vaishnavi, Georgia State University
Chris Shaw, Georgia Institute of Technology
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 2
Abstract
A challenge in LDAP schema design and interoperability is better understanding of schema inter-relationships across organizations. Georgia State has received NSF funding to research an approach based on the proposition that monitoring, clustering, and visualization of cross-organizational metadata can help identify patterns of practice and lead to dynamic evolution of standards. A semantic facilitator tool is demonstrated that uses Self-Organizing Maps for clustering and viewing metadata, and implements an instance of the Stereoscopic Field Analyzer (SFA) to visualize directory objects’ in 3-dimensional, interactive space.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 3
New Approach to Metadata
• Domain – directory metadata standards
• Team & Funding
• Research & experimentation
• Semantic Facilitator TM SM Prototype– Schema repository
– Select schema & universal input vector, cluster, view
– Repeat with tailored input vector (reference set)
• LSA/LSI with localDomainPerson
• SFA (Stereoscopic Field Analyzer)
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 4
Problem Domain
• Inter-organizational directory metadata– Standard objectClasses beneficial– Working group approach (often lengthy) to defining standards– No sooner adopted than “adapted and changed”– No sooner finished than new requirement
• How to enhance/improve this time-consuming practice?• Relevant NMI Integration Testbed Components
– eduPerson, eduOrg, commObject (ITU H.350), (courseID…)– LDAP Recipe– Metadirectory Practices for Enterprise Directories in Higher Ed– LDAP Analyzer
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 5
Proposed Approach
• Hypothesis:monitoring, clustering, and appropriate visualization of cross-organizational metadata can help identify patterns of practice and lead to automatic evolution of standards
• Research literature, prototype, experimental validation• Key insight: self-organizing of complex systems
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 6
Team & Funding
• Directory Services Team– http://www.gsu.edu/~wwwacs/DSR/index.htm– CIS faculty / IT middleware / 2 PhD, 5 Masters, 2 undergrad– College of Computing faculty, Georgia Tech / (2 recent Masters)
• Initial discussions Fall 2000, formal meetings June 2001…• Sun Microsystems, Academic Equipment Grant, Fall 2001• Internet2 Middleware – working groups et al.• NMI Integration Testbed Program participant• NSF-ITR Award 0312636, Sep 2003-Aug 2006
– Promoting Semantic Interoperability of Metadata for Directories of the Future
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 7
Research & Experimentation
• Research on metadata approaches, clustering approaches• Kohonen Self-Organizing Maps (SOM), neural-networks• Latent Semantic Analysis/Latent Semantic Indexing
(LSA/LSI)• Genetic Algorithm SOM implementation (using Condor-NT)
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 8
Research & Experimentation
• Hypotheses:– SOM parameters from other domains not best for LDAP metadata – Can find SOM parameters giving results comparable to experts– SOM parameters so good that new data from domain clusters well
• Experiment design– LDAP experts cluster iPlanet objectClasses– Run SOM algorithm with varied parameter values– Compare SOM results to experts
• Conclusion: can cluster LDAP metadata as well as experts• Genetic Algorithm can find SOM parameter solution
– evaluate on order of 10,000 SOM values
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 10
Semantic Facilitator TM SM
• Initial Prototype WITS02 Conference, December 2002• Current version
– Runs on IBM Websphere (Apache/Tomcat), java– Oracle database repository for schemas– User selects schema, sets input vector (reference set)– User selects SOM parameter values
• map dimensions, neighborhood size, iterations– ObjectClasses are mapped
• Prototype Demonstration– select schema(s), cluster, map– select schema(s), define reference set, cluster, map
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 11
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.Next slides are screen captures of a “walk through”demonstrating how prototype is used by user to:
• Select LDAP from repository;• Accept default feature & cluster objectclasses;• Submit;• Accept default SOM parameter values;• Choose rectangular display;• Display;• Show text;• Uncover nearby person objects.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 21
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.Next slides continue “walk through” demonstratinghow prototype is used by user:
• By clearing feature objectclasses,• using only inetOrgPerson, eduPerson, gsuPerson as reference• and submitting with default SOM parameter values,• person objects are drawn out from whole schema set.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 27
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.Next slides continue “walk through” demonstratinghow prototype is used by user:
• Continuing to refine reference set by• adding person, organizationalPerson,• further improving discovery of person objects.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 31
Summary of preceding
• It is possible to cluster objectClasses from a directory schema in a way comparable to experts (based on experimental validation of computer vs. expert results).
• By specifying a “reference set” of objectClasses, it is possible to draw out particular objectClasses (in this case person related objects) from all the other objectClasses.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 32
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo.Next slides show “walk through” where user:
• Selects UAB schema;• Directly specifies a “reference set” of person objects;• Displays result;• Finds clustering of additional uabPerson objects.
SF / What if we used “person” reference set?(person, organizationalPerson, inetOrgPerson, residentialPerson, newPilotPerson, eduperson)
SF / “unstacking the objects” finds “uab-” objects: uabPerson, uabAlum, uabEmployee, uabStudent as well as pabPerson, uabEntity...
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 37
Summary of preceding
• Using a “reference set” of common person objectClasses (person, organizationalPerson, inetOrgPerson, residentialPerson, newPilotPerson, eduPerson), it is possible to draw out new, unknown person objectClasses (uabPerson, uabAlum, uabEmployee, uabStudent as well as pabPerson, uabEntity...).
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 38
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo. Next slides shows “walk through” where user:
• Selects IBM vendor delivered schema.• Default options reveal no obvious person objects.• User picks ePerson as start of reference set.• By iteratively adding newly revealed person objectclasses,• User finds successive person related objectclasses.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 50
Summary of preceding
• Rather than starting with a known reference set, one can build up a reference set incrementally, starting with a single objectClass of likely relevance and adding newly discovered objectClasses to refine the results.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 51
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo. Next slides show multiple schema clustering
• First:• Cluster CMU and UMich schemas• show clustering of cmuPerson, umichPerson, eduPerson.
• Then:• Cluster Novell, OpenLDAP, IBM, and iPlanet schemas• show clustering of related person objectClasses:
3 eduPerson, gsu/ufl/um/admin/liPerson.
SF / unstack & Show Node Text – objects exploded out from middle right of screen (3 eduPerson, gsu/ufl/um/admin/liPerson)
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 57
Summary of preceding
• Multiple schemas, even from different vendor LDAPs, can be clustered.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 58
Following slides...
• Simulate1 the time steps in Self Organizing Map solution• University of Michigan OpenLDAP schema objects• Time steps of 1000 iterations for SOM parameters:
– X_dimension = 7 and Y_dimension = 8– Neighborhood_size = 2– Iterations = 10,000
• Illustrates clustering state progression (with person objects tagged)– Our experiment indicated that 10,000 iterations was best– This sequence simulates iterations up to 20,000– Shows “good fit” for 10,000 based on clustering of person objects
• 1NB: this state function not yet implemented by prototype
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 80
Summary of preceding
• Providing a “state” function, that displays intermediate states of clustering, may be helpful in determining SOM parameter values selection. User may have better sense of “good” clustering result by visually following convergence rate.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 81
LSA/LSI analysis
• “Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text.” ref: Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284. (http://lsa.colorado.edu/)
• Latent Semantic Analysis/Indexing is another technique for analyzing information content.
• Typically used for document searching where one wants to rank order relevance of documents based on their inclusion of a set of terms
• “Latency” in the sense that, while not having all terms being queried, a document may still be ranked high because other terms usually do occur in conjunction with the missing term(s).
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 82
LSA/LSI analysislocalDomainPerson
• localDomainPerson – analyzing the variations• 21 schemas used in LSA/LSI test set
– 13 localDomainPerson– 2 eduPerson (structural, auxiliary)– liPerson, iGNPerson (Secureway)– Top, person, organizationalPerson, inetOrgPerson
• Challenges on vendor/institution schema:– Explicit statement of inherited attributes vs. implicit– Multiple inclusion of attributes in one objectClass!– No, or Non-standard, OIDs (cf. eduPerson-oid, uwPerson-oid)– Variations on objectClasses specification format
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 83
LSA/LSI Analysis
• Latent Semantic Analysis/Indexing– Jorge Civera Saiz, Georgia Tech– Taruna Hariani, Georgia State
• Basic idea– Document X Term matrix created (cf. objectClass X attribute)– singular value decomposition (SVD)
• X = T * S * D’• t x d = t x k * k x k * k x d• k corresponds to “noise factor” - goal is to optimize
– Construct query on SVD
• In other words:– Find relevant documents containing terms
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 84
Following slides…
• Results of SVD of objectClass by attributes matrixof 21 person schemas
• The query was based on structural eduPerson
• Results of K=1 to K=21 are graphed
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 105
LSA/LSI – finding “k”
• K can reduce dimensionality... noise reduction• What’s best “k”?
– Usually look to mid-range– Too high, includes noise– Too low, trivial
• Query vector composed of terms (attributes)– Returns ranking of documents (objectClasses)– Ranking based on containment of terms (attributes)– Document may contain many other terms…– Issue of latency & similarity
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 106
• DRAFT results• Values of k=10• eduPerson (structural) query vector• Attribute similarity is an issue (oids, names…)
objectClass rankabs val dif
rankmatching attributes
total attributes
ucdperson -0.354474 0.349 0 9gsuperson -0.090412 0.085 0 8organizationalperson -0.065578 0.060 25 28edupersonaux -0.010739 0.005 7 10isuperson -0.007394 0.002 52 70ustperson -0.006752 0.001 52 64eduperson -0.005851 0.000 61guperson -0.004702 0.001 52 79inetorgperson 0.002533 0.008 47 53ugaperson 0.005138 0.011 47 54uwperson 0.005221 0.011 52 61uabperson 0.007272 0.013 58 71utsieduperson 0.009338 0.015 58 61utmeduperson 0.009338 0.015 58 61tneduperson 0.015221 0.021 60 73person 0.156449 0.162 8 11top 0.191709 0.198 2 4liperson 0.397216 0.403 26 38ignperson 0.441857 0.448 6 17umichperson 0.481966 0.488 47 91ubperson 0.637099 0.643 34 63
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 107
Summary of preceding
• LSA/LSI may provide another mode of analyzing relationship of objectClasses based on their attributes
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 108
SFAStereoscopic Field Analyzer
• SFA: visualize high-dimensional spaces– Chris Shaw, College of Computing, Georgia Tech– SFA Windows 2000 version– Analyzing complex data in greater than 3D space– Using color, size, glyphs, vectors for additional dimensions
• General approach:– Tokenize schema data (use SOM prep, or LSA results) for set file– Set file “length” is number of vectors – objectclasses– Set file “Dimension” is vector length – attributes– Convert to binary– In SFA space x,y,z axes, color, glyph, etc. correspond to attributes– Plotted objects are the objectClasses
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 110
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo. Next slides show “walk through” of SFA operation
• Initial interface• Open a data file (schema)• Select glyph type• Scale glyph size• Inspect mappings (attributes matched to dimensions)• Rotate, move 3D display volume
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 120
Summary of preceding
• SFA provides a 3D volume in which objectClasses can be mapped
• Additional dimensions provided by color, glyphs, x-size…
• Manipulation of attribute mappings to various dimensions can highlight objectClasses containing attributes
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 121
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo. Next slides demonstrate multidimensionality of SFA
• Given a set of 3 attributes (cn,fullname, emailaddress)mapped to x, y, z dimensions,• Using additional “dimensions” (color, opacity, xsize)can provide additional (re-enforcing) information
91attr 86obj EDIR with sim / cn, fullname, emailAddress, + color emailAddress + opacity cn + xsize fullname
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 126
Summary of preceding
• Using “extra” dimensions (color, opacity, x-size…) can help visualize information and relationship of objects
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 127
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo. Next slides show more complex visualization
• 3 initial attribute dimensions (cn, fullname, emailAddress) set;• Adding 4th dimension (groupMembership) refines object set.• Opening a second schema file• Provides further opportunity to refine & compare objects.
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembershipopen 2nd data set (497 attr, 86obj EDIR) …
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembershipopen 2nd data set… select different glyph type
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembershipopen 2nd data set, select different glyph type… display together
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembershipopen 2nd data set… edit mappings 2nd data set
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembershipopen 2nd data set… select sn, fullname, displayName, givenName, groupid
91attr 86obj EDIR with sim / cn, fullname, emailAddress + groupMembershipopen 2nd data set… compare… (iterate)
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 136
Summary of preceding
• Additional dimensions can be represented by mapping attributes beyond the x, y, z axes...
• Such as using color as 4th dimension for data set 1.
• Opening of additional data set 2 with 5 dimensions (using color and opacity).
• Comparing data between data sets may provide insight.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 137
BREAK PAGE [Live demo of prototype tool]
NOTE: Internet2 Presentation was live demo. Next slides show various additional functions of SFA.
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 142
Overall Summary
• Challenges of cross-organizational LDAP schema
• New approach to metadata:– monitoring, clustering, and visualization
– identify patterns of practice
– dynamic evolution of standards
• Semantic Facilitator TM SM tool– Schema repository
– Self-Organizing Map technology
• Latent Semantic Analysis/Latent Semantic Indexing
• Stereoscopic Field Analyzer (SFA) 3D visualization
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 143
Concepts & Challenges
• Validating clustering (without recourse to “humans”…)• Interface design and usability• Reference sets (automated; library of; cf. my_refs…)• Monitoring• SOM - additional interfaces and parameters• Genetic Algorithm: extend J. Liang Thesis work• DirNet a la WordNet® (an online lexical reference system)• “DNA” (Directory Node Analysis) signatures• Generalize as knowledge engine for virtual organizations
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 144
Near Future Work
• Deploy prototype as component based architecture
• Extend schema repository
• Build, validate reference sets
• LSA/LSI and SFA as “drill down” analysis components
SF Tables,ERD
Browser (users)
HttpJSPServlet
Semantic Facilitator
DB Shibboleth Client
AuthN/Z
Web Services
October 16, 2003Art Vandenberg
Internet2 Fall Member Meeting 145
Q&A
Contact: Art Vandenberg
Vijay [email protected]
Chris [email protected]
Directory Services Team
http://www.gsu.edu/~wwwacs/DSR/index.htm