Date post: | 15-May-2015 |
Category: |
Technology |
Upload: | gustavo-fernandes |
View: | 2,525 times |
Download: | 0 times |
Making Your Domain Objects Searchable with Hibernate
SearchGustavo Fernandes
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Agenda
2
Mo#va#ons and Goals
Indexing
Retrieval
Scalability
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
3IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
4
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
5
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
6
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Select * from Author;+----+--------------+| id | name |+----+--------------+| 1 | Stephen King | +----+--------------+
Select * from Book;+----+----------+| id | title |+----+----------+| 1 | Blaze |+----+----------+
Select * from Book_Author;+---------+------------+| Book_id | authors_id |+---------+------------+| 1 | 1 |+---------+------------+
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate extension which uses Lucene internally
Bring full text search capabiliIes to Hibernate
Object-‐Document mapping
Take care of the plumbing
Keep database and index in sync
ConvenIon over configuraIon
Flexible
7
Meet Hibernate Search
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Meet Hibernate Search
Current version: 3.2.0-‐Final (May/2010)
LGPL License
Lucene version supported: 2.9.2
Solr version supported: 1.4
8IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Meet Hibernate Search
Dependencies:
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search</artifactId> <version>3.2.0.Final</version> </dependency>
9IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing
Mapping Objects <-‐> Documents
Support for types
Analyzers/Boost
Transparent/Manual Indexing
10IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Entitypublic class Author {
@Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
11IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed@Entitypublic class Author {
@Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
12IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed@Entitypublic class Author {
@Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
13IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
14IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String name;
@OneToMany private Set<Book> books; }
15IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping Fields@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field private String name;
@OneToMany private Set<Book> books; }
16IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping Fields@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(name = name_field, store = Store.YES, index = Index.TOKENIZED) private String name;
@OneToMany private Set<Book> books; }
17IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping Fields@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Fields( { @Field(index = Index.TOKENIZED), @Field(name= “nameForSort”, index = Index.UN_TOKENIZED) } ) private String name;
@OneToMany private Set<Book> books; }
18IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping RelaIonships@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(index = Index.TOKENIZED) private String name;
@OneToMany @IndexEmbedded private Set<Book> books;
}
19IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Types
20
@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(index = Index.TOKENIZED) private String name;
@OneToMany @IndexEmbedded private Set<Book> books;
@Field(bridge = @FieldBridge(impl = AddressBridge.class)) private Adress address;
}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Boost
21
@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(index = Index.TOKENIZED) @Boost(1.5f) private String name;
@OneToMany @IndexEmbedded private Set<Book> books;
@Field(bridge = @FieldBridge(impl = AddressBridge.class)) @Boost(0.75f) private Adress address;
}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
22
@Entity @Indexedpublic class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
23
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),)public class Author { @Id @GeneratedValue @DocumentId private Integer id; private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
24
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class) })public class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
25
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class) })public class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
26
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class) })public class Author { @Id @GeneratedValue @DocumentId private Integer id; @Analyzer(definition = “combinedAnalyzers”) private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Index -‐ Fluent APISearchMapping mapping = new SearchMapping();
mapping .analyzerDef("customAnalyzer", StandardTokenizerFactory.class) .filter(LowerCaseFilterFactory.class) .filter(SnowballPorterFilterFactory.class) .param("language", "English") .entity(Author.class) .indexed() .property("id",ElementType.FIELD).documentId() .property("adress", ElementType.FIELD) .field().bridge(AdressBrigde.class).store(Store.YES) .property("books", ElementType.FIELD).indexEmbedded() .property("name", ElementType.METHOD).field().store(Store.YES) .entity(Book.class) .indexed() .property("id", ElementType.METHOD).documentId() .property("title", ElementType.METHOD) .field().analyzer("customAnalyzer");
27IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Backend
28
Source: Hibernate Search in AcIon
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Backend
hibernate.work.execu#on async
hibernate.work.thread_pool_size 1029
Source: Hibernate Search in AcIon
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ JMS backend
hibernate.worker.backend jms
hibernate.worker.jms.connec#on_factory /Connec#onFactory
hibernate.worker.jms.queue queue/hsearch
30
Source: Hibernate Search in AcIon
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Manual Indexing
Use case Non-‐exclusive database
Manual Indexing types: Single enIty
Mass indexer
31IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Manual Indexing -‐ Single EnItyFullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
Object author = fullTextSession.load( Author.class, 1 );
fullTextSession.index(author);
tx.commit();
32IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Mass IndexingfullTextSession.createIndexer().startAndWait();fullTextSession.createIndexer().start();
33IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ Lucene Queries + Hibernate API// Wraps Hibernate Session Object
org.hibernate.seach.FullTextSession fullTextSession = org.hibernate.search.Search.getFullTextSession(session);
// Lucene queryVersion v = Version.LUCENE_29;
org.apache.lucene.queryParser.QueryParser queryParser = new org.apache.lucene.queryParser.QueryParser(v, "name", new StandardAnalyzer (v));
org.apache.lucene.search.Query query = queryParser.parse("+King");
// Hibernate search queryorg.hibernate.Query textQuery = fullTextSession.createFullTextQuery(query, Author.class);
Author loadedAuthor = (Author)textQuery.list();
34IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ Hibernate Search
1. Executes Lucene Query and get the results
2. Retrieves document ids from the index
3. Load objects from database
4. Return domain objects
35IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ Results ManipulaIon Pagina#on
Type restric#on
Projec#on
Result mapping
36IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ IndexReader shared strategy: shared IndexReader (default) hibernate.search.reader.strategy = shared
not-‐shared strategy: open IndexReader for every query hibernate.search.reader.strategy = not-shared
Extensible by using ReaderProvider Interfacehibernate.search.reader.strategy = com.mycompany.CoolReaderProvider
37IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability
Sharding
Clustering
38IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Sharding
•Default: one index per en#ty type
•Shard: two or more indexes per en#ty type
•Use cases • Performance
• Maintenance
39
IndexApplicationQueryIndex
A - Z
Shard A
Shard B
Shard C
Application
A - H
I - N
O - Z
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Sharding
Indexes separated physically
Virtual Index
40
Shard A
Shard B
Shard C
VirtualIndex
ApplicationQueryIndex
A - H
I - N
O - Z
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Sharding
Configura#onhibernate.search.com.sourcesense.Author.sharding_strategy.nbr_of_shard 2
41IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Shard Strategy
Default algorithm: ID Hash
42
12345
f(x) = x % N
1 2
3
4
5
Shard 1
Shard 2
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Custom Sharding Strategy
Implement IndexShardingStrategy
hibernate.search.com.sourcesense.Author.sharding_strategy BookTitleStrategy
43IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Synchronous Clustering
Every node can read and write to the index
Pessimist locking prevents corrup#on
Single index shared among every node
Choose your flavour: NFS, Database, distributed caches
44IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Clustering
Read-‐Write Synchronous cluster
45
Index
Node 1
IndexWriter
Node 2
IndexWriter Node 3
IndexWriter
Node 4
IndexWriter
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Asynchronous Clustering
46IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Asynchronous Cluster
Advantages Only master writes
No indexing in slaves -‐> no waiIng for locks
Downside Data is not visible immediately by the slaves
47IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
To learn more...
48
hibernate.org/subprojects/search.html
anonsvn.jboss.org/repos/hibernate/search/
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Thank you
49
twicer: @gustavonalle
Sunday, 23 May 2010