Date post: | 02-Jul-2015 |
Category: |
Technology |
Upload: | tommaso-teofili |
View: | 759 times |
Download: | 1 times |
Flexible search in Apache Jackrabbit Oak
Tommaso Teofili
Apache Jackrabbit Oak
• Scalable content repository • JCR 2.0 • Designed for concurrent access (MVCC) • Pluggable components (storage, indexes) • Powering AEM 6.0
18/11/14 2
Oak Architecture
• Oak-JCR • Oak-Core – MVCC (node states and immutable trees) – Core components (Security, Query engine, …) – Plugins
• Oak-MK – Pluggable storage
18/11/14 3
Oak – the Query Engine
• Query languages – XPATH – SQL-2
• Selects the index(es) supposed to perform better – Search is demanded to the underlying indexes – No index? The repository is traversed
• ACLs applied afterwards
18/11/14 4
Indexing – the IndexEditor API
• NodeState before = builder.getNodeState(); • builder.child(”a").setProperty(”foo", ”bar"); • NodeState after = builder.getNodeState(); • NodeState indexed = editorHook.processCommit(before, after, …); // who said MVCC?
18/11/14 5
Searching – the QueryIndex API
• Filter filter = … ; // "select * from [nt:folder]" • filter.restrictPath("/somenode",
Filter.PathRestriction.DIRECT_CHILDREN); • Cursor cursor = queryIndex.query(filter,
nodeState); // search against a state • IndexRow row = cursor.next(); // results
18/11/14 6
Searching – Filters
• Full text expressions • Property restrictions • Path restrictions – Exact – Parent – Child – Descendant
• Node type restrictions
18/11/14 7
Configuring indexes
• Indexes are declared by adding “query index configuration” nodes in the repository – Type – Asynchronous – Reindex – Index specific properties
18/11/14 8
In repository indexes
• Data structures designed as content – Property index – Ordered property index – Node type index – Reference index
18/11/14 9
Lucene index
• Full text and (sorted) property restrictions • Stored in repository • Tika for indexing binaries • Configurable indexing rules (boost), codec,
analyzers
19/11/14 10
Lucene index
• Interesting facts – DocValues for sorted property restrictions – Uncompressed stored fields – Property exists queries • TermRange vs Wildcard vs Term vs MatchAll
+FieldExistsFilter
19/11/14 11
Solr index
• Full text, property, path restrictions • Embedded or remote Solr(Cloud) • Configurable – Mapping restriction / fields – Page size – Commit policy
• Most is configured on the Solr side
18/11/14 12
Problems
• Hard to express complex queries • Cannot leverage underlying indexes
advanced capabilities
18/11/14 13
Native language support
• Leverage underlying index capabilities – Multiple query languages/parsers
• More accurate full text queries (and results) – … where native(’lucene', 'name:(hello world)
“hello world”^3') • Advanced index capabilities (e.g. MLT) – … where native('solr', 'mlt?q=path:/content/
sample1&mlt.fl=jcr:title') 19/11/14 14
Adding more indexes
• Create an IndexEditor – Turn diff into an “indexable”
• Create a QueryIndex – Turn a Filter into an index-specific query
• “Declare” the index
18/11/14 15
Looking forward
• Results aggregation features (e.g. facets) • More configuration options (Lucene, Solr) • Smarter index selection • Cover indexes
18/11/14 16
Thanks