DB2 Net Search DB2 Net Search ExtenderExtender
Presenter:
Sudeshna Banerji(CIS 595: Bioinformatics)
Sudeshna Banerji (CIS 595: Bioinformatics)
Topics to discuss:– Information retrieval– Text-indexing– DB2 Text Extenders– DB2 Net Search Extender– References– Questions
Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background…
Information Retrieval(IR):• Extraction of “relevant” information from huge
volumes of data scattered across different databases.• Examples: Textual search, image search, video search
etc.• Efficiency(time and speed) of IR is based on different
INDEXING technologies.• Indexing increases performance of system.• An example of indexing technology: Text-indexing
used for textual-search.
Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background…
Text-Indexing :• Process of deciding what will be used to represent a
given document.
• A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it.
• The search is then handled as a query to look up the index.
Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background… Text-Indexing (continued):
• Involves the following:
– Parsing the documents to recognize the structure.
E.g title, date, other fields.
– Scan for word tokens: numbers, special characters, hyphenation, capitalization etc.
– Stopword removal: based on short list of common words like “the”, “and”, “or”.
Sudeshna Banerji (CIS 595: Bioinformatics)
Indexing only Indexing only Significant TermsSignificant Terms
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Extenders
– Product of IBM family that provide support to data beyond traditional character and numeric data types.
– Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc.
– Trial and beta versions available for testing.– Link for extenders:http://www-3.ibm.com/software/data/db2/extenders/index.html
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Text Extenders– To meet the increasing demands of content management,
IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB).
• DB2 Net Search Extender
• DB2 Text Information Extender
• DB2 Text Extender – When to use what?
• Link for comparisons of the above:http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender Replaces DB2 Text Information Extender Version 7.2 Some important features:
– Indexing speed of about 1GB per hour .– Different text formats: ASCII Plain text, HTML,XML,
GPP– Base support for 37 languages including English, Spanish,
French, Japanese and Chinese .– Sub-second search response times. – No decrease in search performance with up to 1000
concurrent queries per second.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender Some text-search capabilities:
– Search can be performed using SQL (fourth generation language…almost like English query).
– Searches can include:• Boolean operations.• Proximity search for words in the same sentence or
paragraph: for HTML,XML and GPP.• “Fuzzy” searches for words having a similar spelling as
the search term: Andrew & Andru• Thesaurus related search.• Restrict searching to sections within documents.• User can limit the search results with a “hit count”, and
can also specify how the results are to be sorted.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender System requirements
– DB2 Version 8.1– Java Runtime Environment (JRE) Version 1.3.1
Windows Installation– Administrative rights required.– Call db2text start to start the DB2 Net
Search Extender Instance Services.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender Simple example with the SQL queries
– Following steps are required to do a basic textual-search in DB2 Net Search Extender:
1. Creating a database2. Enabling a database for text search3. Creating a table4. Creating a full-text index5. Loading sample data6. Synchronizing the text index7. Searching with the text index
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
1. Creating a database:db2 "create database sample"
2. Enabling a database for text search:• To start Net Search Extender Service
db2text "START“
• To prepare the database for use with DB2 Net Search Extender:db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample"
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
3. Creating a table:db2 "CREATE TABLE books (isbn VARCHAR(18) not
null PRIMARY KEY, author VARCHAR(30), story
LONG VARCHAR, year INTEGER)"
4. Creating a full-text index:db2text "CREATE INDEX db2ext.myTextIndex FOR
TEXT ON books (story) CONNECT TO sample"
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender5. Loading sample data:
db2 "INSERT INTO books VALUES (‘0-13-086755-
1’,’John’,’ A man was running down the street.’,2001)“
db2 "INSERT INTO books VALUES (‘0-13-086755-2’ ,
‘Mike’, ’The cat hunts some mice.’, 2000)“
6. Synchronizing the text index:
db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT
CONNECT TO sample“
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender7. Searching with the text index:
• Using CONTAINS scalar search function:db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000"
The following result table is returned:AUTHOR STORYMike The cat hunts some mice.
NOTE:– To create a text-index, the text columns must be one of
the following data types:CHAR, VARCHAR, LONG VARCHAR, CLOB.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender Thesaurus Support:
– A thesaurus is structured like a network of nodes linked together by relations:
• Associative relations: RELATED_TO
• Synonym relations: SYNONYM_OF
• Hierarchical relations: LOWER_THAN, HIGHER_THAN
– Creating and compiling a thesaurus:
1. Create a thesaurus definition file (explained below).
2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender Create a thesaurus definition file.
– Define its content in a definition file using a text editor.
Example of some definition groups::WORDS
football
.RELATED_TO goal
.SYNONYM_OF soccer
:WORDS
chapel
.LOWER_THAN skyscraper
.HIGHER_THAN house
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender An example of a structure of a Thesaurus:
Game
Ball Game
Tennis
Soccer
HIGHER_THAN
HIGHER_THANHIGHER_THAN
Football
HIGHER_THAN
SYNONYM_OF
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender References:
- http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/
document.d2w/report?fn=desu9m03.htm#ToC
- Information Retrieval Site containing good lecture slides:
http://ciir.cs.umass.edu/cmpsci646/
- Net Search Extender Administration and User’s Guide , Version 8.1 (can be downloaded with the software)
Sudeshna Banerji (CIS 595: Bioinformatics)
ANY QUESTIONS????