Discovering discovery tools

Discovering discovery toolsEvaluating vendors and implementing

Web 2.0 environments

Dean James, Michael Garrett and Leah KrevitHouston Academy of Medicine, Texas Medical Center Library,

Houston, Texas, USA

AbstractPurpose – Many libraries are now designing and implementing their own tools to meet users’ needsfor search and data discovery. The aim of this study is to share the experiences of the HAM-TMCLibrary, one of the largest US medical libraries, in creating and implementing such a tool.

Design/methodology/approach – A narrative of the process demonstrates the genesis of theproject and highlights the importance of collaboration with entities outside the usual library sphere.

Findings – Results show that libraries have choices to make in designing their own futures and inoffering innovative services to their users. Taking a proactive approach can yield exciting results.

Originality/value – Many libraries accept federated search and other technologies directly fromtheir library management system vendors as the simplest way to proceed with implementing newtechnologies. The HAM-TMC Library recognized that its particular information environment requiredlearning the “problem space” thoroughly before investigating available options. As a result, the newtool the Library is providing is much more likely to meet specific user information needs.

Keywords Medical libraries, Library users, User interfaces, Software tools, Search engines,United States of America

Paper type Case study

IntroductionIn Fall 2006, staff from the Houston Academy of Medicine-Texas Medical CenterLibrary (HAM-TMCL) and the Life Sciences Data Archive (LSDA) of NASA-JohnsonSpace Center began discussions about possible projects for collaboration[1]. This casestudy consists of two parts. The first part examines the process by which the teamidentified the scope of the project and the evaluative tool it developed, leading to theidentification of appropriate vendors for the needed data-discovery solution. Thesecond part examines the implementation of one of these tools by the HAM-TMCLibrary to address its own needs for data discovery.

Part IAfter initial discussions began in fall 2006, the group determined on the project offinding and evaluating searching and indexing tools to solve certain criticaldata-discovery issues for the Life Sciences Data Archive (LSDA). The LSDA atNASA-Johnson Space Center provides information and data from space flightexperiments funded by NASA. The archive includes investigations from the MercuryProject (1961) through more current missions, such as the International Space Stationand the Shuttle. These investigations involve human, plant and animal studies. TheLSDA is a part of the Human Health and Performance Program of the Exploration

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/0737-8831.htm

LHT27,2

268

Received 11 December 2008Revised 14 December 2008Accepted 14 February 2009

Library Hi TechVol. 27 No. 2, 2009pp. 268-276q Emerald Group Publishing Limited0737-8831DOI 10.1108/07378830910968218

Systems Missions Directorate which is dedicated to “safe, sustained, affordableexploration of the Moon, Mars, and beyond . . . ”[2].

The quantity of the LSDA’s data and the disparate forms in which they exist meanthat effective data discovery is critical. In recent years, search has become a vital issuefor any web-oriented service or organization, largely thanks to Google and the powerand popularity of its single search box (White, 2007, p. xiii). Given the diversity ofstructured and unstructured data possessed by any organization, effective searchfunctionality is a key component in an organization’s success, whether it needs to makeits data findable internally, externally, or both (White, 2007, p. xiii-xiv). The LSDA hasrecognized the critical importance of these facts, and this recognition led to the formingof the partnership with the HAM-TMC Library.

The data contained by the LSDA exist in many forms, everything from physicalspecimens and autopsy reports to video, audio, and datasets, some of them structured butmuch unstructured. The data are also contained in numerous locations, some of which arenetwork-accessible, some of which are not. Thus, information architecture becomes anissue as well, in order to enable effective search (White, 2007, p. 10). There is metadataavailable for many of these items, but not for all of them, particularly for video. There is asearch available via the LSDA web site[3] that allows access to the data that are currentlyavailable, but there is more data that could be accessed if the proper tools were in place.Much of the data have been published in book form, such as Space Medicine in ProjectMercury[4] and Biomedical Results from Skylab[5]. As accessible as these data are,however, there is need for deeper, more sophisticated access.

From fall 2006 through Summer 2008, the investigators met regularly to reviewpresentations by a number of vendors who specialized in enterprise search, indexing,and clustering engines. This investigatory process allowed the participants to gain abetter understanding of the scope of the issues involved in creating a multi-functionaldiscovery tool. Viewing demonstrations and talking with vendors (either in person orvia webinars and teleconferences) were critical components in the process. Suchinteraction offered the team the opportunity to interact with information scientists andtechnologists and glean a better-rounded picture of the search/discovery universe. Thisin turn helped the team better identify the criteria it would use in creating an evaluativetool, the “Vendor Matrix”. Among the criteria included in this matrix were ontologysupport, security, interoperability with existing systems and tools, search modalities,and automatic metadata generation[6]. Using this tool the team eventually selectedthree vendors to evaluate for the final phase of the selection process (Figure 1).

Part IIBoth the LSDA and the HAM-TMC Library have critical data discovery needs, andduring the investigatory phase the team members representing the Library achieved abetter understanding of those needs and the possible solutions presented by thevarious vendors and their tools. The Library selected one of the vendors, Vivı́simo, as apartner for the in-house development of a Library-specific discovery tool. The secondpart of this case study focuses on the Library’s activities in identifying, selecting, andimplementing the tool.

The Houston Academy of Medicine-Texas Medical Center (HAM-TMC) Library,founded in 1949, serves the educational, research, and clinical programs of the TexasMedical Center (TMC)[7]. The TMC is home to over 40 member institutions, includingtwo medical schools[8], three schools of nursing[9], 13 hospitals and more[10]. The

Discoveringdiscovery tools

269

Figure 1.Vendor Matrix

LHT27,2

270

Library also serves as a resource for the greater Houston area and five states through itsdesignation as the Regional Medical Library for the National Network of Libraries ofMedicine South Central Region (NN/LM SCR)[11]. The Library’s users range fromfirst-year medical students and undergraduate nursing students through graduate andpostgraduate students, interns, residents, faculty, researchers, and clinicians. TheLibrary provides extensive collections of research materials, in both print and electronicformats, historical and current, to its users, and users gain access to these materials viathe Library’s web site at: http://resource.library.tmc.edu (Figure 2). In addition to itsOPAC, the library offers various proprietary databases through which users have accessto electronic books and full-text electronic journals. Searching for these materials in thepast meant that users had to search various silos to consult different databases.

This environment, particularly in the age of Google, is no longer sufficient to meetlibrary users’ needs. Forcing users to search multiple silos on a library’s web site is notin tune with a Web 2.0 world. Metasearching, or federated searching, appears to be theanswer to this problem, but the solution has inherent problems of its own. No matterhow powerful the metasearch engine might be, it is nevertheless constricted by the wayin which each database structures its queries. Relevancy is often sacrificed for quantityof retrieval, and different metasearch engines present results in different ways andwith varying response times (Baule, 2007; Breeding, 2007; Tennant, 2001). These issueshave led a number of libraries to develop their own solutions (Breeding, 2007).

The ability of these tools to retrieve results from multiple databases is impressive,but how the results are presented to the user is another critical issue. Google’s

Figure 2.Library web site prior to

October 15, 2008


271

seemingly simple single-search-box approach has accustomed its users to enter searchterms without having to give any thought to choosing databases or subject areas ormaking choices among result sets to narrow or broaden the focus of a search. Forexample, in usability testing at one research university, findings indicated that “peopleclearly prefer a simple search because it uses a default set of databases and avoidscomplex query statements, such as Boolean operators” (Boyd et al., 2006; Cervone,2005; Singer Gordon and West, 2008). The task thus confronting libraries these days ishow to provide a one-box-search alternative to Google that can offer the relevancyranking that seems fairly standard throughout the web. Users are now moreaccustomed to additional features as well, such as faceted navigation, ratings andreviews from other users, and visual clues to content, such as book jackets and web sitesnapshots (Breeding, 2007).

The HAM-TMC Library, like most other medical and academic libraries, has had toconfront the problems of effective search and data discovery across multiple databasesand information repositories. Through the collaboration with the LSDANASA-Johnson Space Center, the librarians on the team gained broader and deeperknowledge of not only the number and diversity of tools available through vendorsoutside the usual range of library vendors, but also the complex mosaic of issuessurrounding federated searching. In common with the LSDA, the HAM-TMC Libraryhas data available in various formats and across multiple platforms. One of the keyissues in choosing a data-discovery tool was the ability of that tool to interactsuccessfully with proprietary databases, local information repositories, and otherlocally-created and maintained databases, including the Library’s online catalog. Thelibrarians on the team also understood that, after more than two dozen vendordemonstrations and webinars, they must consider factors such as cost of the software,its implementation and ongoing maintenance and personnel costs, its scalability, andits performance under heavy use.

The HAM-TMC Library chose Vivı́simo’s search platform, Velocity 6.0, to build adiscovery tool for implementation. Velocity 6.0 consists of three components: a searchengine, a federated search tool, and a clustering engine. The local administrator of thesoftware has considerable flexibility in building connectors for the federated searchtool and in building and maintaining the resulting clusters[12]. The search engineorganizes results via the clustering engine into topical categories selected from wordsand phrases contained in the results or the documents themselves. Users can chooseclusters that are most relevant to their needs, and if none of the clusters that appearamong the first results are what the user wants or expects, Velocity allows for aremixing of the clusters. The administrator can modify relevancy rankings throughuse of various parameters, including proximity, synonyms, source (i.e. rankingdatabases in order of relevance), link analysis, and more. Finally, Velocity is extremelyflexible in its ability to index many file formats and to search diverse sources such asinternal documents, intranets, the web, and syndicated news feeds as well as manyproprietary databases for which it has established connectors[13].

Because of the problems inherent in performing federated searches across multipleproprietary databases (as mentioned above), the Library did not want to mislead usersby calling the new tool a “search” tool. Rather, the team decided that the emphasisshould definitely be on data discovery – the idea being that this new “discovery” toolwould help users find sources of information they might not have “discovered” beforein their separate searches through various silos. Using this discovery tool does notprovide the user an exhaustive search of hundreds of databases; instead, the tool

LHT27,2

272

currently focuses on ten sources: the Library Catalog, PubMed, the Library’sE-Journals A-Z list, SCOPUS, ScienceDirect, CINAHL Plus with Full Text, MD Consult,PsycInfo, Digital Commons (theses and dissertations submitted by members of theTexas Medical Center community), and Go Local Texas Gulf Coast (MedlinePlus GoLocal for the South Central Region). These ten are among the most used resources theLibrary offers. Other connectors can be built, but the team decided to focus at first onthese ten[14]. Thus, users may perform what is more properly described as across-search, rather than a metasearch, and do it quickly and easily (Rochkind, 2007).

Part of the implementation process included discussing with vendors the use oftheir products as part of the Library’s data discovery tool. Access to electronicresources via the new tool requires authentication, just as access via each individualdatabase does. Another part of the implementation process consisted of a redesign ofthe Library’s existing web site. The new web site features less overall verbiage and anemphasis on access points. The new discovery tool features prominently on theredesigned site (Figure 3).

In addition to its enhanced searching capabilities, the new tool offers some unusualfeatures to aid the user in easily comprehending the results display. The currentdefault is ten results per page. Each result offers title, author and source information(including a logo). Some results include an abstract generated on the fly. The user hastwo choices to view the full citation. Clicking on “new window” will take the user via anew window into the particular database that is the source of the citation. Choosing

Figure 3.New library web site


273

“preview” will open a small window in the current page. This window is scrollable, andusers can navigate through this preview into the database itself or to the full text of thearticle or book (when available) (Figures 4 and 5).

The Library licensed the software in September 2007. The actual implementationprocess began in November 2007, after the librarian selected as administrator (MichaelGarrett, Technology Coordinator) attended intensive training sessions at Vivı́simo’scorporate headquarters in Pittsburgh, Pennsylvania. The Collections ManagementDepartment of the Library oversaw the development and implementation of theproject. Leah Krevit, Associate Director, put together a team that met on a weekly basisto advise the administrator on development and implementation issues. The “VelocityTeam” consisted of professionals from various departments within the Library, andthe team also served as advisors in the development of the redesigned web site. Theprocess from purchase of the software to beta-testing of the redesigned web sitefeaturing the new data discovery tool lasted from November 2007 through October2008. In mid-September 2008 the Library’s users were invited to view the redesignedsite via a link on the existing page and encouraged to try the new tool. Final roll-out ofthe redesigned site and the discovery tool took place in November 2008. The VelocityTeam will continue to explore issues related to the implementation of this new tool.One of those issues is the creation of specially-designed information portals that theLibrary plans to offer over the coming year.

Figure 4.Data discovery tool inaction

LHT27,2

274

Notes

1. The investigators in this project include Leah Krevit, Associate Director, CollectionsManagement (HAM-TMCL), Michael Garrett, Technology Coordinator (HAM-TMCL), DeanJames (Associate Director, Collections (HAM-TMC), Kathy Johnson-Throop, MedicalInformatics and Health Care Systems Branch Chief (NASA-JSC), Mary Fitts, MedicalInformatics and Health Care Systems Deputy Branch Chief (NASA-JSC), Meena Husein,Manager, Information Systems Projects, Medical Informatics and Health Care SystemsBranch (NASA-JSC), and Jack W. Smith, Dean (School of Health Information Sciences,University of Texas Health Science Center-Houston).

2. http://lsda.jsc.nasa.gov/

3. http://lsda.jsc.nasa.gov/scripts/mission/mis_search_start_adv.cfm

4. Access to this publication is available online at http://lsda.jsc.nasa.gov/books/mercury/cover.htm

5. Access to this title is available online at http://lsda.jsc.nasa.gov/books/skylab/skylabcover.htm

6. Given the nature of the LSDA data, as mentioned above, identifying these criteria wascrucial.

7. http://resource.library.tmc.edu/about/

8. Baylor College of Medicine and The University of Texas Health Science Center at Houston.

9. The University of Texas School of Nursing at Houston, Texas Woman’s University, andPrairie View A&M College of Nursing.

Figure 5.Discovery tool preview

window


275

10. For a complete list, see http://resource.library.tmc.edu/about/supinst.cfm

11. http://nnlm.gov/scr/

12. Michael Garrett, Technology Coordinator, is the administrator at the library for this project.An interdepartmental team at the Library, the Velocity Team, served as advisors for theproject.

13. See http://vivisimo.com/products/flexible

14. See http://resource.library.tmc.edu. Click on the link at “Scan the Library’s top resources” forfurther information. When searching proprietary databases, time spent on executing thequery is a factor. Retrieval of results is not instantaneous, and various factors can affectretrieval. Thus, the more connectors enabled, the more possibilities for slowing down theprocess.

References

Baule, S. (2007), “Data, data everywhere, and how do you sort through it?”, Library MediaConnection, Vol. 25 No. 6, pp. 54-6.

Boyd, J., Hampton, M., Morrison, P., Pugh, P. and Cervone, F. (2006), “The one-box challenge:providing a federated search that benefits the research process”, Serials Review, Vol. 32No. 4, pp. 247-54.

Breeding, M. (2007), “The birth of a new generation of library interfaces”, Computers in Libraries,Vol. 27 No. 9, pp. 34-7.

Cervone, F. (2005), “What we’ve learned from doing usability testing on open URL resolvers andfederated search engines”, Computers in Libraries, Vol. 25 No. 9, pp. 10-14.

Rochkind, J. (2007), “(Meta)search like Google”, Library Journal, Vol. 132 No. 3, pp. 28-30.

Singer, G.R. and West, J. (2008), “Making search better for patrons”, Computers in Libraries,Vol. 28 No. 5, pp. 4-55.

Tennant, R. (2001), “Cross-database search: one-stop shopping”, Library Journal, Vol. 126 No. 17,pp. 29-30.

White, M. (2007), Making Search Work: Implementing Web, Intranet and Enterprise Search,Information Today, Medford, NJ.

Further reading

Breeding, M. (2008), “Managing resources comprehensively”, Computers in Libraries, Vol. 28No. 8, pp. 28-30.

Drake, M.A. (2008), “Federated search one simple query or simply wishful thinking?”, Searcher,Vol. 16 No. 7, pp. 22-62.

Fryer, D. (2004), “Federated search engines”, Online, Vol. 28 No. 2, pp. 16-19.

Gordon, R.S. and West, J. (2008), “Making search better for patrons”, Computers in Libraries,Vol. 28 No. 5, pp. 54-5.

Helfer, D.S. and Wakimoto, J.C. (2005), “Metasearching: the good, the bad, and the ugly of makingit work in your library”, Searcher, Vol. 13 No. 2, pp. 40-1.

Corresponding authorDean James can be contacted at: [email protected]

LHT27,2

276

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

Date post:	14-Dec-2016
Category:	Documents
Upload:	leah
View:	213 times
Download:	0 times

Discovering discovery tools

Documents