+ All Categories
Home > Documents > Word Up! Using Lucene for full-text search of your data set.

Word Up! Using Lucene for full-text search of your data set.

Date post: 24-Dec-2015
Category:
Upload: leona-sullivan
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Word Up! Using Lucene for full-text search of your data set
Transcript
Page 1: Word Up! Using Lucene for full-text search of your data set.

Word Up!Using Lucene for full-text search of your data set

Page 2: Word Up! Using Lucene for full-text search of your data set.

Full-text searchReview of full-text search options

Focus on Lucene

Integrating Lucene with JPA/Hibernate

Page 3: Word Up! Using Lucene for full-text search of your data set.

Full-text search options‘LIKE’ queries

SQL extensions

Kludge with web search engine

Kludge with web search appliance

Embeddable search library

Page 4: Word Up! Using Lucene for full-text search of your data set.

‘LIKE’ queries

Page 5: Word Up! Using Lucene for full-text search of your data set.

‘LIKE’ queriesSimple, straightforward

Fast, easy to implement

Large result set

Limited fuzziness (wildcard or regex)

Page 6: Word Up! Using Lucene for full-text search of your data set.

Full-text search extensionsNo standard syntax (Sybase, MSSQL, DB2, etc. all different)

Administrative overhead for text search indices

Other limitations

Page 7: Word Up! Using Lucene for full-text search of your data set.

Kludge with search engineExternal indexing/search software

ht://Dig

mnoGoSearch

Sphinx

Xapian

Not necessarily pure Java

Can be database-intensive

Lag in updating search index

Page 8: Word Up! Using Lucene for full-text search of your data set.

Kludge with search appliance“Black-box” solutions

Thunderstone

Google Search Appliance

Your data set mixes with public content

Doesn’t always work as advertised

Can’t fine-tune search

Page 9: Word Up! Using Lucene for full-text search of your data set.

Embeddable search library

Page 10: Word Up! Using Lucene for full-text search of your data set.

Search libraryExample: Apache Lucene

Deploys as part of your application

100% Java

Fuzzy full-text search (Levenshtein algorithm)

Searches against text, numeric, boolean fields with multiple options

Can be integrated with JPA/Hibernate via Hibernate Search, Compass

Page 11: Word Up! Using Lucene for full-text search of your data set.

About LuceneSearch index stored on file system (also JDBC and BDB options)

Can store/retrieve data to/from search index (Lucene Projections)

Can index HTML, XML, Office docs, PDFs, Exchange mail with external tools

Supports extended and multi-byte character sets by default

Page 12: Word Up! Using Lucene for full-text search of your data set.

More about LuceneIndexes records as Lucene Document object

Lucene Document doesn’t have to be a literal document – can be any arbitrary object

Document can have any number of name-value pairs

Synchronizing your data with search index is someone else’s problem …

Page 13: Word Up! Using Lucene for full-text search of your data set.

Integrating with JPA / HibernateMost common method: Hibernate Search

Supports only Hibernate provider

Automatically updates search index when object persisted to database

Entity classes mapped to separate indexes

Entity fields mapped to Lucene index fields using Java annotations

Page 14: Word Up! Using Lucene for full-text search of your data set.

Integrating with JPA/Hibernate …Alternate method: Compass Project

Supports Hibernate, OpenJPA, others

No release since 2009 – effectively unsupported

Page 15: Word Up! Using Lucene for full-text search of your data set.

Annotated class example …@Indexed

@Entity

@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id

@Column(name="MKR_MARKERID")

@Field(store=Store.YES)

private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)

@Field(store=Store.YES)

@NumericField

private Double mkrLat;

@Column(name="MKR_LONG", nullable = true)

@Field(store=Store.YES)

@NumericField

private Double mkrLong;

@Indexed – tells Hibernate that this entity class should be

indexed

Page 16: Word Up! Using Lucene for full-text search of your data set.

Annotated class example …@Indexed

@Entity

@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id

@Column(name="MKR_MARKERID")

@Field(store=Store.YES)

private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)

@Field(store=Store.YES)

@NumericField

private Double mkrLat;

@Column(name="MKR_LONG", nullable = true)

@Field(store=Store.YES)

@NumericField

private Double mkrLong;

@Field – tells Hibernate to create a matching name-value pair in the search index for this

entity class

Store.YES – stores the value for retrieval directly from the index, without touching the

database

Page 17: Word Up! Using Lucene for full-text search of your data set.

Annotated class example …@Indexed

@Entity

@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id

@Column(name="MKR_MARKERID")

@Field(store=Store.YES)

private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)

@Field(store=Store.YES)

@NumericField

private Double mkrLat;

@Column(name="MKR_LONG", nullable = true)

@Field(store=Store.YES)

@NumericField

private Double mkrLong;

@NumericField – index as a numeric value, enables greater

than / less than / range searches

Page 18: Word Up! Using Lucene for full-text search of your data set.

Let’s take a Luke at the index …

Page 19: Word Up! Using Lucene for full-text search of your data set.

Practical search exercise

Page 20: Word Up! Using Lucene for full-text search of your data set.

Questions!

Page 21: Word Up! Using Lucene for full-text search of your data set.

Recommended