+ All Categories
Home > Documents > Johnson Graduate School of Management Library Project

Johnson Graduate School of Management Library Project

Date post: 02-Jan-2016
Category:
Upload: kirby-cotton
View: 25 times
Download: 1 times
Share this document with a friend
Description:
Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Since the last presentation…. - PowerPoint PPT Presentation
Popular Tags:
42
JGSM Library Project - CS 501 1 Johnson Graduate School of Management Library Project Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu
Transcript

JGSM Library Project - CS 501 1

Johnson Graduate School of Management Library Project

Clients:Ken BoltonLynn BrownAngela K. HorneDon SchnederDoris SmithJGSM Library Reference Team

Project Team:

Jonathan Gong

Benson Lee

Man Fai Matthew Lee

Greg Leedberg

Liz Xu

JGSM Library Project - CS 501 2

Since the last presentation…

Tasks accomplished: Decided on using PHPDig as the backend Implemented many functional requirements Adjusted PHPDig code to improve ranking

based on client requirements Discussed with the client additional

functionality to be added to the system

JGSM Library Project - CS 501 3

Presentation Outline

New Requirements / Why PHPDig? Implemented Functionality

Abstract Display Advanced Search Administrative Features Ranking Adjustments

Task List for Final Milestone (Things to Do) Demo of Current System

JGSM Library Project - CS 501 4

New Requirements

Boosting Display Statistics Page Batch Adding Search Results Display Add/Remove Categories

JGSM Library Project - CS 501 5

Why PHPDig? Non-technical

Client prefers using PHP/MySQL since both technologies are on their web server

JGSM Library site has less than 300 HTML pages

A requirement: database Client involved in decision of continuing with

PHPDig Focus on maintainability and usability

JGSM Library Project - CS 501 6

Why PHPDig? Technical

PHPDig code is relatively short PHPDig = Open Source = Free to modify Florida State University, Dept. of Biology

www.bio.fsu.edu/phpdig

The Kiwi Search Engine http://www.linknz.co.nz/ 123,000+ web sites indexed

Ranking is similar to Lucene since they both use the same ranking algorithm (tf-idf)

PHPDig version 1.8.7 www.phpdig.net

JGSM Library Project - CS 501 7

Implemented Functionality: Abstract Display

Purpose Users can get a description written by a

librarian/administrator Implementation

Modified PHPDig code to look for an abstract Added a table to the database: auxiliary

spider_id : int full_url : string abstract : string category : string

JGSM Library Project - CS 501 8

Example of Abstract Display

JGSM Library Project - CS 501 9

Example of Abstract Display (Cont’d)

JGSM Library Project - CS 501 10

Our Current Working Interface

We now have a functional interface which can actually perform searches, and display results.

The interface has evolved from the prototype previously presented, based on feedback from our clients.

JGSM Library Project - CS 501 11

Evolved Interface

Started with the prototype presented for progress report 1 as target design.

One we started working with PhpDig’s template system, made some slight changes to the original target interface due to the reality of what PhpDig can handle.

JGSM Library Project - CS 501 12

Evolved Interface

JGSM Library Project - CS 501 13

Evolved Interface

After presenting this design to our clients and discussing possible alternatives, we jointly came up with the current working design:

JGSM Library Project - CS 501 14

Our Current Working Interface: Advanced Search

JGSM Library Project - CS 501 15

Our Current Working Interface: Search Results

JGSM Library Project - CS 501 16

How We Implemented the Interface

PhpDig uses a template system Allow us to write HTML code for the search

page, and use special PhpDig tags to generate form controls, results, etc., within that page

JGSM Library Project - CS 501 17

How We Implemented the Interface

Some problems came up during this process: Problem: Some of the static HTML generated

automatically by PhpDig tags to produce the search form does not match our desired style.

Solution: We do not depend on PhpDig to generate all of the form HTML, some is hand-coded by us to match our style

JGSM Library Project - CS 501 18

How We Implemented The Interface

Some problems arose during this process: Problem: Some of the dynamic HTML generated

by PhpDig tags also does not match our style. Solution: We cannot hand-code this HTML

(category drop-down, etc.), so we modified the PhpDig source code which is called in response to these tags so that the generated HTML matches our desired style.

JGSM Library Project - CS 501 19

Where To Go From Here

Based on future discussions with our client, we will continue to refine the interface towards an ideal goal.

More source-level changes to PhpDig to get the details right Example: Context currently cuts off words in the

middle

JGSM Library Project - CS 501 20

Administrative Features

Implemented: Add a page

Options: abstract & category Remove a page from database Update a page in database

Options: update abstract & category Content is re-indexed

JGSM Library Project - CS 501 21

Administrative Features

To be Implemented: Manual ranking abilities

Give a page more weight overall Give a page more weight for certain words

Feedback Kerberos authentication

JGSM Library Project - CS 501 22

Administrative Features

To be Implemented: (continued) Display statistics

Statistics useful to the administrators, such as most frequent searches, searches with no results, etc

Batch adding of pages Category Administration

JGSM Library Project - CS 501 23

Ranking

Improved from before, mostly complete Formula similar to Lucene default now:

Our formula:

)in (t.fieldlengthNorm * )in .fieldgetBoost(t*)(idf*)in (tf*),coord(),(score ddtdtdqdqQt

)in getBoost(*)(idf*)in (tf*),coord(),(score dttdtdqdqQt

JGSM Library Project - CS 501 24

coord function

coord():

q is the # of query terms matched in document

Q is # terms in query

only relevant in search for “any of the terms”

Q

q

JGSM Library Project - CS 501 25

Current Progress

Completed: Ranking implementation complete

Left to do: Admin Panel to modify boosted pages/words Uses boost, but need to finalize how to

modify boosting parameter

JGSM Library Project - CS 501 26

Boosting Methods

Two possibilities:

1. Admin modifies score of page relative to current score.

2. Specify position a page should appear given a one-term query.

JGSM Library Project - CS 501 27

Pros and Cons

Method 1: Modify relative to current score

+ More careful manipulation of score possible

+ Faster to code, more time to test

- More difficult to use Method 2: modify rank

+ Easier to use

- Adjustments only possible on one-word queries

JGSM Library Project - CS 501 28

Task List for Final Milestone

Feedback Confirmations and errors will be adjusted to

display the message on the administrative page to improve usability.

JGSM Library Project - CS 501 29

Display stats page

Links for the relevant log pages will be added to the main administration page.

JGSM Library Project - CS 501 30

Batch adding

To facilitate the indexing process, we will add batch adding feature to the main administration page.

JGSM Library Project - CS 501 31

Adjust search results display

The page description will have no cut off words and that the client is satisfied with the search results interface.

JGSM Library Project - CS 501 32

Limit by category

Search by category will be implemented.

JGSM Library Project - CS 501 33

Administrative function to add and remove categories

Adding and removing categories will be implemented and linked to the administrative page.

JGSM Library Project - CS 501 34

Administrative function to weight ranking

Manual ranking adjustments will be added so that the client would be fully satisfied with the search results.

JGSM Library Project - CS 501 35

Authentication

Access to the administration page will use Cornell University’s Web Authentication (CUWebAuth) for authentication.

JGSM Library Project - CS 501 36

Unit Testing and Integration Testing

Every unit that is implemented will be fully unit tested on our own computers, and also integrated into the rest of the code for integration testing.

JGSM Library Project - CS 501 37

Installation and Refinement

The installation of the final system will take place early before the next milestone in order to avoid any delay.

This time period is reserved for any last minute minor changes to the system to ensure the client’s satisfaction.

JGSM Library Project - CS 501 38

Documentation and Training Slides

Our final milestone includes a detailed documentation of the project, training slides and an informal training session to help administrators to learn the control of the system.

JGSM Library Project - CS 501 39

Deployment

After careful testing and feedback, the search system will go live.

JGSM Library Project - CS 501 40

Timeline

JGSM Library Project - CS 501 41

Demo…

JGSM Library Project - CS 501 42

The End.

Questions? Comments?


Recommended