Date post: | 04-Dec-2014 |
Category: |
Technology |
Upload: | lucenerevolution |
View: | 879 times |
Download: | 7 times |
5/14/12 h(p://dar.bibalex.org 1
Accessing Your Library Book Collec5ons Using Solr
By: Engy Morsy Software project manager, Bibliotheca Alexandrina
BA & Solr 5/14/12 h(p://dar.bibalex.org 2
h(p://bibalex.org
5/14/12 h(p://dar.bibalex.org 3
h(p://wamcp.bibalex.org
5/14/12 h(p://dar.bibalex.org 4
h(p://ssc.bibalex.org
5/14/12 h(p://dar.bibalex.org 5
h(p://dar.bibalex.org
5/14/12 h(p://dar.bibalex.org 6
Introductory Video
5/14/12 h(p://dar.bibalex.org 7
Agenda
• Brief introducFon to DAR architecture • Indexing books’ collecFon • Searching across Metadata and Content • FaceFng • Searching Book Content • Solr with personalizaFon • Future • Q&A 5/14/12 h(p://dar.bibalex.org 8
About 1.5 Million books
5/14/12 h(p://dar.bibalex.org 9
5/14/12 h(p://dar.bibalex.org 10
Digital Assets Repository
Digital Assets Repository
5/14/12 h(p://dar.bibalex.org 11
Book site
• Approximately 260,000 books • Nearly 220,000 books published online • About 1.5 TB of content • Average book size 6 MB • Daily indexing rate is about 150 books.
5/14/12 h(p://dar.bibalex.org 12
What do we want…?
• Allow simple and advanced search across metadata and content in 5 languages
5/14/12 h(p://dar.bibalex.org 13
Simple Search
5/14/12 h(p://dar.bibalex.org 14
What do we want…?
• Allow simple and advanced search across metadata and content in 5 languages
• FaceFng
5/14/12 h(p://dar.bibalex.org 15
What do we want…?
• Allow simple and advanced search across metadata and content in 5 languages
• FaceFng • AnnotaFons
5/14/12 h(p://dar.bibalex.org 20
Text Underlining
Text Highligh5ng
Adding S5cky Notes
What do we want…?
• Allow simple and advanced search across metadata and content in 5 languages
• FaceFng • AnnotaFons • PersonalizaFon
5/14/12 h(p://dar.bibalex.org 25
Arranging Books in Bookshelves
SubmiIng Comments
Ra5ng
Embedding
Sharing the book link in other social networks
What lies beneath!!
5/14/12 h(p://dar.bibalex.org 31
Book site indices
5/14/12 h(p://dar.bibalex.org 32
AR Index
EN Index
FR Index
IT Index
SP Index
Query
Indexing Book CollecFon
• Index per language • A Document in the content index correspond to a page in a book
• Maintain a field to disFnguish between metadata record and content record (e.g. SolrType)
• Use staFc fields for all content index (e.g. PageID..etc)
5/14/12 h(p://dar.bibalex.org 33
What is the problem with this solu5on?
5/14/12 h(p://dar.bibalex.org 34
Problem for content search
Example : Advanced Search search for Title: Mobile Technology And Content : “cloud compuFng”
5/14/12 h(p://dar.bibalex.org 35
SolrType Content
SolrType Meta
Proposed soluFon
5/14/12 h(p://dar.bibalex.org 36
Title: Mobile Technology
Content : “cloud compuFng”
.. index
.. index
Get intersecFon
Result IDs
Facet result
Final result
Parent Book IDs
.. index
The problem is…
• Can’t get the faceFng result directly from the content index
• Need to query the metadata index in order to get the final facet result
processing Fme!!!
5/14/12 h(p://dar.bibalex.org 37
SoluFon…!
• Metadata denormalizaFon – Denormalize metadata into content index
5/14/12 h(p://dar.bibalex.org 38
SolrType Content
SolrType Meta
Proposed soluFon
5/14/12 h(p://dar.bibalex.org 39
Title: Mobile Technology
Content : “cloud compuFng”
.. index
.. index
Get intersecFon
Result IDs
Facet result
Final result
Problem for content search
• Metadata denormalizaFon…..
5/14/12 h(p://dar.bibalex.org 40
Worst choice! • Re-‐indexing for changes in
metadata • Data processing is required.
New Solu5on
5/14/12 h(p://dar.bibalex.org 41
Indexing Metadata
• Index per language • Separate content and metadata index • Text field holds the whole book content in the metadata index – The maxFieldLength has been set to maximum.
• e.g: 2147483647
5/14/12 h(p://dar.bibalex.org 42
Back to the example
Example : Advanced Search search for Title: Mobile Technology And Content : “cloud compuFng”
5/14/12 h(p://dar.bibalex.org 43
SoluFon
5/14/12 h(p://dar.bibalex.org 44
Title: Mobile Technology
Content : “cloud compuFng”
Meta index
Facet result
soluFon
5/14/12 h(p://dar.bibalex.org 45
Title: Mobile Technology
Content : “cloud compuFng”
Meta index
Content index
Get intersecFon
Meta index
Facet result
Separate indexes Vs. All in one
• Separate indexes
+ Indexing Fme + Index size -‐ Processing results (facets..) -‐ Scoring
5/14/12 h(p://dar.bibalex.org 46
Separate indexes Vs. All in one
• Separate indexes
+ Indexing Fme + Index size -‐ Processing results (facets..) -‐ Scoring
• One index – Index size – Indexing Fme + Scoring + Processing Fme
5/14/12 h(p://dar.bibalex.org 47
Book content index
5/14/12 h(p://dar.bibalex.org 48
AR Index
EN Index
FR Index
IT Index
SP Index
5/14/12 h(p://dar.bibalex.org 49
Searching
• Simple and advanced search – Cache the resulted IDs only
• HighlighFng search result – Get the full search result and highlight per page result
5/14/12 h(p://dar.bibalex.org 50
Book Content Search
• Search using – Search query – Book ID – List of pages’ IDs
• Highlights • AnnotaFons – Saved currently in DB
5/14/12 h(p://dar.bibalex.org 51
FaceFng
• Fixed facet fields – Category, sub-‐category, language..etc. – Stored, indexed, exact fields
• Process facets from different indices
5/14/12 h(p://dar.bibalex.org 52
PersonalizaFon
• Using separate index of personalizaFon – Different Solr fields for different languages. – Search across all fields.
• Saving in both Solr and DB • Indexing tags, raFng and comments using type field
5/14/12 h(p://dar.bibalex.org 53
Future
• Book mobile applicaFon using Solr • Using Hadoop • Indexing other digital media (Maps, audio, video)
5/14/12 h(p://dar.bibalex.org 54
Contact
engy.morsy @bibalex.org Library website: h(p://bibalex.org
Digital Asset Repository: h(p://dar.bibalex.org
5/14/12 h(p://dar.bibalex.org 55
5/14/12 h(p://dar.bibalex.org 56
Thank you…
5/14/12 h(p://dar.bibalex.org 57