+ All Categories
Home > Documents > DOCUMENT RESUME ED 396 752 IR 055 923

DOCUMENT RESUME ED 396 752 IR 055 923

Date post: 24-Dec-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
78
DOCUMENT RESUME ED 396 752 IR 055 923 TITLE Research Agenda for Networked Cultural Heritage. INSTITUTION Getty Art History Information Program, Santa Monica, CA. REPORT NO ISBN-0-89236-414-9 PUB DATE 96 NOTE 78p. PUB TYPE Reports General (140) EDRS PRICE MF01/PC04 Plus Postage. DESCRIPTORS Access to Information; Archives; *Computers; Fund Raising; *Humanities; Information Retrieval; Internet; Knowledge Representation; Learning Processes; *Liberal Arts; Multimedia Materials; *Research; *State of the Art Reviews; Teaching Methods; *Technological Advancement IDENTIFIERS Authenticity; Digitizing; Distributed Computing; Electronic Media; J Paul Getty Museum CA; Software Tools ABSTRACT The rapid growth of multimedia computing and the Internet, and the entrance of the commercial sector into information and the education sef;tor previously dominated by acadmic interests, have raised the stakes for arts and humanities computing. In addition, ongoing reductions in funding for arts, humanities and educational research have made it imperative that dollars be well spent. The Getty Art History Information Program (AHIP) commissioned eight individuals to write papers on research issues considered critical to future progress in arts and humanities computing, and conducted two electronic discussions, open to the Internet community, to stimulate reaction to their views. In addition to tl..e full text of the papers, this report provides a summary of the papers and discussions as a basis for identifyi% issues that any research agenda in arts and humanities computing should address. The papers are: (1) "Tools for Creating and Exploiting Content" (Robert Kolker and Ben Shneiderman); (2) "Knowledge Representation" (Susan Hockey); (3) "Resource Search and Discovery" (Gary Marchionini); (4) "Conversion of Traditional Source Materials into Digital Form" (Anne R. Kenney); (5) "Image and Multimedia Retrieval" (Donna M. Romer); (6) "Learning and Teaching" (Janet H. Murray); (7) "Archiving and Authenticity" (David Beaman); (8) "New Social and Economic Mechanismr To Encourage Access" (John Garrett). A topical index to the papers and a glossary a.e located at the end of the report. (Author/SWC) *********************************************************************** Keproductions supplied by EDRS are the best that can be made from the original document. ***w*******************************************************************
Transcript

DOCUMENT RESUME

ED 396 752 IR 055 923

TITLE Research Agenda for Networked Cultural Heritage.INSTITUTION Getty Art History Information Program, Santa Monica,

CA.

REPORT NO ISBN-0-89236-414-9PUB DATE 96NOTE 78p.

PUB TYPE Reports General (140)

EDRS PRICE MF01/PC04 Plus Postage.DESCRIPTORS Access to Information; Archives; *Computers; Fund

Raising; *Humanities; Information Retrieval;Internet; Knowledge Representation; LearningProcesses; *Liberal Arts; Multimedia Materials;*Research; *State of the Art Reviews; TeachingMethods; *Technological Advancement

IDENTIFIERS Authenticity; Digitizing; Distributed Computing;Electronic Media; J Paul Getty Museum CA; SoftwareTools

ABSTRACTThe rapid growth of multimedia computing and the

Internet, and the entrance of the commercial sector into informationand the education sef;tor previously dominated by acadmic interests,have raised the stakes for arts and humanities computing. Inaddition, ongoing reductions in funding for arts, humanities andeducational research have made it imperative that dollars be wellspent. The Getty Art History Information Program (AHIP) commissionedeight individuals to write papers on research issues consideredcritical to future progress in arts and humanities computing, andconducted two electronic discussions, open to the Internet community,to stimulate reaction to their views. In addition to tl..e full text ofthe papers, this report provides a summary of the papers anddiscussions as a basis for identifyi% issues that any researchagenda in arts and humanities computing should address. The papersare: (1) "Tools for Creating and Exploiting Content" (Robert Kolkerand Ben Shneiderman); (2) "Knowledge Representation" (Susan Hockey);(3) "Resource Search and Discovery" (Gary Marchionini); (4)"Conversion of Traditional Source Materials into Digital Form" (AnneR. Kenney); (5) "Image and Multimedia Retrieval" (Donna M. Romer);(6) "Learning and Teaching" (Janet H. Murray); (7) "Archiving andAuthenticity" (David Beaman); (8) "New Social and EconomicMechanismr To Encourage Access" (John Garrett). A topical index tothe papers and a glossary a.e located at the end of the report.(Author/SWC)

***********************************************************************

Keproductions supplied by EDRS are the best that can be madefrom the original document.

***w*******************************************************************

: "

eq41t-1/41,

en

0"11

Is

A 0 II

A:6

dl

-)rINLaw wer

U $ DEPARTMENT OF EDUCATIONOdic* o' Educat onto Rsolucn and awn:whom

EDUCATIONAL RESOURCES INFORMATION

CNTaEs beRfEeRn CO This documenEt ire)producedreceived from the person or orgr-ingionoriginating it

0 Minor changes have bee's-ICI-ado toimprove reproduchor

Points of v,w or opinions stated in thisdocum,m do not necessarily representoffo at OERI position or policy

"PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BY

Nancy L. Bryan

TO THE EDUCA 'IONAL RESOURCES ,

INFORMATION fIEN1FR (ERIC)

NTrialle"%i%Vt

Cover image:

Construction in Spice, 1928,

by Naum Gab,

from the Philadelphia Museum of Art:

A. E. Gallatin Collection

Research Agendafor NetworkedCultural Heritage

TH E GETTYART H ISTORY

!N FORMATIONPROGRAM

,i

THE GETTY ART HISTORY INFORMATION PROGRAM

The mission of the Getty Art History

Information Program (AHIP) is to enhance

access to art and humanities information for

research and education through the use of

computer technology. In collaboration with

institutions and organizations internationally',

AHIP acidresses the research needs, standards,

and practices that can bring the full benefits of

digitized information to the cultural heritage

community.

Copyright © 1996 The J. Paul Getty TrustAll rights reserved

Design: James Robie Design Associates

Printed in the United States

ISBN 0-89236-414-9

The Getty Art History Information Program

401 Wilshire Boulevard, Suite 1100

Santa Monica, California 90401-1455

Telephone: (310) 395-1025

Telefax: (310) 451-5570

Electronic mail: [email protected]

Internet (World Wide Web): http://www.ahip.getty.edu/ahip/home.html

CONTENTS

7 OVERVIEW AND DISCUSSION POINTSDavid Bearman, Archives & Museum Informatics

23 SYNOPSIS OF RESEARCH OPPORTUNITIES AND

FUNDING NEEDS

27 TOOLS FOR CREATING AND EXPLOITING CONTENTRobert Kolker and Ben Shneiderman, University of Maryland

31 KNOWLEDGE REPRESENTATION

Susan Hockey, Rutgers and Princeton Universities

35 RESOURCE SEARCH AND DISCOVERY

Gtny Marchionini, University of Maryland

41 CONVERSION OF TRADITIONALSOURCE MATERIALS INTO DIGITAL FORM

Anne R. Kenney, Cornell University

49 IMAGE AND MULTIMEDIA RETRIEVAI.

Donna M. Romer, Eastman Kodak

57 LEARNING AND TEACHING

Janet H. Murray, Massachusetts Institute of Technology

63 ARCHIVING AND AUTHENTICITYDavid Bearman, Archives & Museum Informatics

69 NEW SOCIAL AND ECONOMICMECHANISMS TO ENCOURAGE ACCESS

John Garrett, Corporation for National Research Initiatives

75 TOPICAL INDEX

80 GLOSSARY

OVERVIEW AND

DISCUSSION POINTS

David Bearman, Archives & Museum Informatics

PREFACEIn 1994 the Clinton Administration was developing policies for the National InformationInfrastructure (NII) and seeking to make a business case for investing public money in it. Intereststhroughout the country, including those in the arts and humanities, were approached to help theAdministration articulate the importance of supporting the information revolution for economicdevelopment, scientific and scholarly progress, and improvements in the quality of life. The GettyArt History Information Program (Al-HP), with the American Council of Learned Societies and theCoalition for Networked Information, worked with scholars throughout the cot 1.try to write a whitepaper entitled "Humanities and Arts on the Information Highways: A. Profile,' ..ne early drafts ofwhich were influential in shaping the Administration's Information Infrastructure Task ForceCommittee on Applications and Technology report The Information Infrastructure: Reaching Sociery'sGoals, especially the critical chapters on "Arts, Humanities and Culture on the NII." The final ver-sion of the white paper, issued in September 1994, was a major part of the public comment on theAdministration's plan and the fullest articulation of the state of humanities computing at that time.

Staff who prepared these papers became keenly aware of how little was known about the range ofhumanities projects exploiting information technologies and how sorely needed was a research agen-da for computing technology focused on the humanities. In future policy discussions, spokesmen forthe arts and humanities would need to draw more quickly on facts about the current state of imple-mentation, point to successes, aryl explain the specialized research needs posed by their fields. Tomcet these perceived needs, AHIP undertook several projects under the rubric of the NetworkedAccess Project in late 1994 and ')95.

One of these, the Research Agenda Project, was designed to articulate a research agenda for arts andhumanities computing and achieve consensus among resExirchers in technology and the humanitiesabout the critical research needs in this fiekl. Several dozen leaders in the field were asked to identifythe important domains in arts and humanities computing research and nominate individuals best sit-uated to summarize the state of rsearch in each. From the norninations, staff selected eight criticalareas identified by large numbers of informants and commissioned eight brief papers. In order toallow as many people as possible to have input in shaping the final report, these papers were openedfor discussion on the I nternet in a private list for a monti in early summer of 1995 and for discus-sion on an open, loosdy modera:ed list in the fall of 1995.

This report, therefore, takes into account ideas from the commissioned papers and the open- andclosed-list discussions ts well as eviews specifically solicitd from other individuals identified duringthe process. It does not attempt :o replace the original papers or discussion, but only to synthesizetheir most salient aspects and to idenrify areas for action. The report recognizes that, while resultantresearch would have a predomirnntly academic focus, such research would have an impact on thebroades: ronge of practitioners arid audiences in the arts and humanities. Its purpose is to offer pub-lic polity makers and private fot ndarions the information they :teed to direct support for arts andhumanities computinL into area; most critical for the disciplines.

After publication and dissemina ion of this report to participants in the discussions, AH1P hopes towork with public and private foundations in an effort to increase and coordinate funding in thesefields. Future reports on the "Su te of Networked Cultural Heritage" may be needed to move theagenda forward in future years.

7

RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

EXECUTIVE SUMMARYThe rapid growth of multimedia computingand the Internet, and the entrance of the com-mercial sector into information and the educa-tion sector previously dominated by academicinterests, have raised the stakes for arts andhumanities computing. In addidon, ongoingreductions in funding for arts, humanities, andeducational research (especially from the federalgovernment) have made it imperative that dol-lars be well spent. In the spring of 1995, theGetty Art History Information Program(AHIP) asked several nozen experts to help itidentify the areas of research that they consid-ered critical to future progress in arts andhumanities computing and to nominate special-ists who could knowledleably reflect on thesedomains. Eight individuals were commissionedto write papers on these research issues, and rwoelectronic discussions, open to the Internetcommunity, were conducted to stimulate reac-tion to their views. This report uses the com-missioned papers and discussions as a basis foridentifying issues that any research agenda inarts and humanities computing should address.

The papers and discussions exposed four majorinfrastructural issues and three significant intel-lectual problems:

The arts and humanities lack a venue, such as

an Annual Review of Arts and HumanitiesComputing, a conference, or an electronic list,

through which progress on the research agenda

can be reported and assessed. Support for such

research forums is essential.

The arts and humanities have not given rise to

a field ofreflective study analogous to the his-

tory, philosopky, and sociology ofscience, with

a consequent lack ofagreement among its prac-

titioners on the fundamental characteristics of

the fields and the conditions for successful sys-

tems development and evolution. The study ofthe arts and humanities as fields of human

endeavor is necessary to identift the critical

success criteria for software and systems.

In the vast array of standards-setting and

de facto staarrdiz.ation processes under way in

the computing industry, the arts and humani-ties need suppo led spokespersons to articulate

their constituents' requip menu. Without such

spokespersons, they will have no voice in the

development ofsoftware. corn wurucation and

display teelmologies, and standards governing

the range from applications to systems.

The arts and humanities need to expose their

practitioners, whether academic scholars,

museum professionals, or librarians, to the dif-ference that computer-assisted scholarship and

teaching could make. Promoting institutionaland social changes that are essential to create a

hospitable environment for computer-support-ed arts and humanities is thus a tactical

requirement.

The intellectual issues needing research are con-siderably more complex:

Representation The crucial advantages ofdigital libraries lie the flexibility of knowl-edge representations to support different intel-

lectual perspectives and fivictionality.

However, if they are to create a unified andcomprehensive library of useful knowledge, the

arts and humanities rust make significantprogress in the next aecade in shared methods

of representation.

Retrieval If comprehensive libraries ofuseful knowledge are createcl their use willdepend on improved means ofaccess.

Discovering appropriate resources in the net-

worked environment and retrieving relevantinformation in a usable format will be criti-cal. Although the last g::reration of research in

these areas has been far _from conclusive, it is

clear that distributed networks place new

demands on discovery and retrieval

Resource persistence Even if resources of

great utiliry can be created and found, schol-arship will depend on assurance that scholars

can cite them at a fixed address, that they will

look and behave consistently, and that they

will persist over time.

I. THE PAPERSWhen dozens of experts were consulted, in thespring of 1995, about their views of the mostimportant research problems to be resolved forprogress to be mrde in arts and humanitiescomputing, eight topics arose repeatedly as themost significant issues for both the medium-and the long term. Commissioned authors werethen asked to identify the nature of the ques-tions raised in each domain, the state of the art,current research of importance, and what futureresearch, if funded, would ofkr the greatestbenefit to the arts and hamanities. Seven of theresearch problem sets can be viewed as occir-ring in chronological order from the beginningof a scholarly or creative process through to the

6

arch1val life of its products. The eighth paperaddresses societal mechanisms that affect thissequence. Arranged in this order, the eightbackground papers address:

I . "Tools for Creating and ExploitingContent," by Robert Kolker and BenShneiderman, University of Maryland

2. "Knowledge Representation," by SusanHockey, Center for Electronic Texts inthe Humanities, Rutgers and PrincetonUniversities

3. "Resource Search and Discovery," by GaryMarchionini, University of Maryland

4. "Conversion of Traditional SourceMaterials into Digital Form," by AnneKenney, Cornell University

5. "Image and Multimedia Retrieval," byDonna Romer, Eastman Kodak

6. "Learning and Teaching," by Janet Murray,Massachusetts Institute of Technology

7. "Archiving and Authenticity," by DavidBearman, Archives & MuseumInformatics

8. "New Social and Ecenomic Mechanismsto Encourage Access," by John Garrett,Corporation for National ResearchInitiatives

This report summarizes some of the pointsmade by both the authors of these backgroundpapers and the commentators who participatedin the electronic discussions. It builds on an ear-lier paper in which this author posed questionsabout the state of activity in important researchdomains in order to stimulate .,*ialogue as part ofthe open listserv discussion Jf these issues on theInternet during October/ November 1995. Theonline discussions in which this author partici-pated were intentionally open-ended to stimu-late debate. The intention of this paper is tobring the discussions ro closure, to focus onresolvable issues, and to propose a middle- andiong-term agenda foi further research. The read-er will observe that this discussion does notattempt to fully address each point raised by thecontributed papers or by the online discussions;the fault for any resulting imbalance lies entirelywith this author.

OVERVIEW AND DISCUSSION POINTS 9

This report addresses the research papers in thefirst section, reflecting the judgment of theexperts consulted, that these represent the mostimportant research domains. In the secondthrough fourth sections, a series of cross-cuttingresearch questions raised by the commissionedpapers and discussions is addressed separately.My intention is not to suggest that the focus ofresearch in arts and humanities computingshould be anything other than the topicsassigned to the principal authors, but rather toexplore the issues they addressed from differentintellectual perspectives. I hope this tactic broad-ens, deepens, and in some cases recontextualizesthe points made in the commissioned papers.

A. Tools for Creating and Exploiting ContentRobert Kolker and Ben Shneiderman describethree strands of current research: the Internet,commercially available software, and toolsdeveloped for specific research projects or pur-poses. While Sha Xin Wei of StanfordUniversity correctly suggests that it is moreappropriate to see the Internet as infrastructurethan as a tool in itself, network-based applica-tions are playing a crucial role in shaping dis-course. We know little about how the arts andhumanities are being influenced by these tools,or what other network tools might be desirable.Michael Joyce of Vassar College hints at theprofundity of such influence by the tools formultimedia authoring and creation of hyper-linked knowledge bases. An unexpected subtextof the Kolker and Shneiderman paper is howmuch their examples of "successful" electronicsupport activities invol,red, and probablydepended on, successful human mediation, sug-gesting a need to train people to use tools ratherthan basic research into computing capabilities.By implication, continued success would entailfunding more demonstration projects in special-ized disciplinary applications and ensuring thatpart of the research plans involve informingother practitioners.

Discussants endorsed the call for research intocomputer interfaces and interfiice standards, butit was clear from the discussion that there wasdisagreement on whether such research was cru-cial in order to make computers easier foreveryone (including humanists) to use, orwhether the humanities presented specialrequirements for interface design. Kolker andShneiderman stress the need for future researchby teams of humanists, specialists in humancomputer interaction, and computer scientiststo develop interface standards, software tools,

RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

and content for specialized arts and humanitiesusers. Most of all they call for support to gettools into the hands of students and faculty.Since this is an infrastructure problem forwhich upgrading campus-based services is thebasic solution, a sound investment wouldappear to be challenge funding with successmeasured according to how much the arts andhumanities faculty used the installed equipmentin teaching and research.

B. Knowledge Representation

The arts and humanities are self-consciousabout how they express themselves; indeed, onemight reasonably say that the arts and humani-ties are about the ways in which we express our-selves. Given this fact, it should not surprise usthat knowledge representation was discussed byvirtually every contributor to the conferenceand in all the commissioned papers.

The first question, and the fundamental oneraised by Susan Hockey in her paper, is what torepresent. Not "Which sources should we cap-ture first," but rather, "What about any sourcedo we need to have explicitly represented?" Todetermine this, research is needed into what wemean by fidelity of representation in order todetermine whether fidelity itself is an irnpossi-ible, or even un.iesirable, target. Commentatorsnoted that we need representations that areexplicit about their limitations, assumptions,and biases; if so, what kinds of annotations arerequired, and how can they be normalize& Thepresence of such self-conscious notation wasidentified as defining the quality of a represen-tation, beyond its mere fidelity to the original.Since most of the reseatch to date has been ontext, how can we emphasize all the ethermodalities that convey artistic and humanisticknowledge? Research into the Features of intel-lectual sources that most fully contribute tointerpretation, understanding, and connectionswould be most useful if those participatingeither agreed to develop prctotype applicationsor included in their research design steps tobring applications to demonstration.

But even if intellectual perspe:tives and needsoF scholarship can define what is to be repre-sented, we still need to pursue research on howto represent knowledge effectively, and further,how to ensure its future operability. The discus-sants seemed comfortable with StandardGeneralized Markup Language (SG M I.), hot itis clear tlut exiensions, such as 1-iyTime,VRML (Virtual Reality Markup Linguage), and

other representation languages will also need tobe employed. Moreover, arts and humanitiespractitioners will need to better understand whythey should nor use HTML (Hypertext MarkupLanguage) without guidelines that ensure its con-formity with the SGML standard. Standards forrepresenting the content of stid images, sounds,motion images, and three-dimensional graphicalspaces are still needed. In general, these stan-dards will be beneficial to the arts and humani-ties if collective agreement is reached on thecontent of the resource annotations (or "meta-data") required for humanistic scholarship.Convening groups to reach consensus on thedescriptive elements that best support humanis-tic research will be productive for many years.

The most vexing issue remains: Why representknowledge? There is no question that we mustby definition represent it for it to be digitallyavailable, or that representations of knowledgeare designed to serve specific purposes (or, ifnot designed for such purposes, are unknowing-ly valid only for limited purposes), but for whatpurposes do we want to make knowledge repre-sentations? In his comments in the discussion,Michael Buckland of the University of(.2.lifornia at Berkeley emphasized the ways inwh'ch representations become derived objectsin their own right and how semiotics researchcan be usefully brought to bear on both ques-tions of knowiedge representation and questionsof what knowledge representations mean inthemselves, as material cultural objects.Elsewhere in the discussions the question aroseof whether we could, or should, engender aresearch tradition that asks what meanings digi-tal genres have and for whom and what purpos-es they exist. We could take the position oftechnologkal imperative: that the sources ofour civilization's self-knowledge will be "re-presented" digitally and that we must therefbretake steps to make the best representations. Orwe could try oci answer, for different kinds ofsource genres and media, why cei Lain represen-tations will be better. A research agenda thatseeks to answer these questions will, if it pro-duces convincing answers, push thc process ofdigital representation ahead quickly and neednot he too costly.

C. Resource Seartlt and Discovery

It i.s axiomatic that if more and more resourcesare going to be available electronically and areto be of value to the arts and humanities, wewill need to better understand the process bywhich researchers locate information of interest

1 U

to them. In what ways is discovery similar to,and how different from, retrieval? We need fur-ther research to understand what differentiatesand what contributes to the effectiveness ofwhat Gary Marchionini describes as two verydifferent processes. How can the next gener-ation of discovery tools better exploit browsingand take advantage of prior knowledge throughguided discovery using authored links and sup-port feedback? How can networked data bestandardized so that its "handles" will allowmeaningful discovery at a consistent level ofdetail? What structures and strategies for uniqueand persistent identification of networkedobjects will be required, and how can the sys-tems on which electronic objects are created,stored, and accessed ensure such identification?Discovery research is potentially the mostimportant new frontier for inforr .tion science,and important work can be done at a relativelylow cost, because the resources being discoveredare publicly available.

Retrieval research, on the other hand, has along, if checkered, history. What furtherresearch on retrieval is needed, and how canpast research that addressed central databases bemade relevant to the problems of access to dis-tributed resources with different functionalities?How much additional progress can be made inretrieving full text by means of automatic inter-mediation such as enhanced fuzzy-logic stringsearching, ranking of results, and using domain-based knowledge with user profiles? How canretrieval be improved by pre-processing withsystems tools to index resources automatically,merge thesauri effectively, and analyze resourcesto support access to them by people with differ-

'

ent levels of knowledge or different languages?How can mediated or software-assistedexchanges improve retrieval by enabling us touse knowledge of feedback to increase precisionin searches and recall with and beyond brows-ing? It is not yet clear how much research inartificial intelligence and full-text enhancementis specific to the humanities or how much suchresearch will contribute in the mid-term future,but the long-term promise is great.

Finally, if we arc truly to be a multimedia digi-tal culture, what research do we need to enableoptical pattern matching, searching for contentin oral files, finding relevant chunks of multi-media, locating experiences rather than data,and matching similarities across modalities?Here the humanities arc in serious need ofapproaches and tools that will provide for

OVERVIEW AND DISCUSSION POINTS 11

approximate retrieval: failure to develop suchtools means capping the potential of sound andimage bases and requiring labor-intensive, sin-gle-perspective indexing of the digital sourcelibraries. Inve,tments in automated supports formultimedia indexing and retrieval are crucial,although this research may prove expensive.

D. Conversion of Traditional SourceMaterials into Digital Form

Most knowledge in the arts and humanities isrecorded in non-digital formats (most often aspr'nted, typed, or handwritten sources). If, asAnne Kenney contends, we need functionallyrobust surrogates, and we can decide what kindsof functionality humanists require in their digitalrepresentations (as urged under KnowledgeRepresentation), then what methods can wedevelop (or better yet, what standard methodscan we deploy) to acquire that functionality? Theresearch agenda for such capture is as long as thekinds of existing formats in which our knowl-edge is stored and the kinds of surrogates we

.!ed. As methods are suggested and implement-ed, how can we evaluate them? What methodsneed to be dewloped to make conversion cost-effective, and what benefits will lead society tosupport creation of surrogates that are richerthan the originals in their yield of knowledgerepresentation? Only large-scale, technicallysophisticated, academically based, multidiscipli-nary research will push this agenda forward;commercial efforts or individuals are unlikely tocontribute much to improving high-quality pro-duction processes for digital surrogates.

While it is not, properly speaking, an issue ofconversion but of delivery, unless researchaddresses and resolves questions of how to man-age very large collections of digital materials andprovide useful access to them, the prospects forlarge-scale conversion are dim. Research into newcompression techniques will be critical in theprocess. Economics plays a major role, as a busi-ness case ultimately must be made for the con-version of content. Moreover, research that leadsto evaluation of post-conversion resources willsupport future conversions and improve methodsand technologies of capture and delivery.Support for study of the economics of conver-sion, and for demonstrating scaleable technolo-gies and organization, will be crucial to the largervision of an electronically based, internationallyaccessible, arts and humanities corpus.

Above all, Anne Kenney calls for quality bench-marks (i.e., technical measures that can be

1 i

12 RESEARCH AGENDA FOR NFTWORK D CULTURAL HERI FACE

applied to digital files), which are crucial if weare to exploit the ongoing development of com-mercial tools, becatc:,; only benchmarks will tellus whether our requirements are being, or everhave been, met by off-the-shelf methods.Ultimately, these technologies will be acceptedor rejected by the arts and humanities on thebasis of display capabilities. But humanists willprobably not contribute much to this arena ofresearch, except in the design of user interfacesas discussed earlier by Kolker and Shneiderman.

E. Image and Multimedia RetrievalMarchionini was not alone in addressing imageretrieval; almost everyone bemoaned the state ofthe art in digital multimedia. Donna Romermade clear in her paper that retrieval results arealways based on data representations, but non-textual documents currently defy auto-indexing,and we know little about whether, how, andunder what circumstances text-based approach-es enable image-based access. Indeed we knowlittle about "likeness" of images, which is thefundamental criterion for retrieval. Constructingan empirical basis for how best to represent setsof images, in addition to or in place of individ-ual images, will also be necessary, since item-level control is often missing in these largeimage collections. Much of this research willneed to begin at the beginning, with documen-tation of both the resource sets and the usercommunities. Romer points out that we firstneed to make sizeable, representative, well-known image sets and establish the character-istics of a variety of "points of view." Suchresearch can be expected to be expensive, time-consuming, and slow to produce results.Nevertheless, accessible multimedia resourcesare fundamental to the success of a more broad-ly based arts and humanities.

If we are to create large collections of images forbroad-based access, long-term digital imagemanagement will require a great deal moretechnical documentation of the images asobjects with a history of capture techniques.Jennifer Trant of AHIP's Imaging Initiativenotes that research on image documentationand image quality arc issues of crucial concernto the Getty and that these are multidisciplinaryendeavors, with implications (and thereforestakeholders) ')eyond the arts and humanities.The Getty alone cannot sponsor the requiredresearch on image quality characteristics andmethods of documenting the technical charac-teristics of digital images (aside from imagecontents or subjects). Research in this arca

needs to be accompanied by standardizationefforts, education, and implementation strate-gies, and by proselytizing to other fields. Again,however, such ambitious goals are crucial tomaking image data usefully available for schol-arship and appreciation.

F. Learning and TeachingUsually it is our lack of understanding of theprocess of acquiring knowledge, rather thantechnology, that impedes teaching and learning.But, according to Janet Murray, in some afeassimple technological improvements could helpin the short term. Because this is one researcharena in which progress depends critically onknowing what is known, a major focus ofresearch support should be to inform educatorsof the state of the research, the state of thetools, and the state of the resources available tothem in digital form. Current knowledge inthese areas is still quite inadequate, so signif-icant funding is needed to learn more aboutteaching and learning, test techniques using

rcbources, and develop strategies for eval-uating teaching and learning as it rakes placeusing digital technologies. These projects arerelativel, large scale, human intensive, cross-dis-ciplinary, often longitudinal, and will requireconsiderable support over a number of years.

Murray emphasized the need for research indefining curriculum in the light of what thenew technologies offer that could not be donepreviously, and the need for collaborative soft-ware development efforts to establish compati-ble materials and authoring environmentscustomized for the needs of humanists.

G. Archiving and AuthenticityImplicit in David Bearman's assessment of thestate of research in archiving is the dramaticshift that has taken place in the past five years asa result of the proliferation of local and widearea networks throughout organizations. Thesehave led to the electronic creation and transmis-sion of virtually all organizational records. Whilethis development affects organizational account-ability primarily, the longer-term implicationsfor thc arts and humanities are that the record ofour culture, as we are creating and recording ittoday, is increasingly digital. Because softwareand hardware change so rapidly, all efforts topreserve the original bits on the media on whichthey were initially stored are doomed. Instead,csearel, must focus on preserving context and

meaning, resident in highet-level represent t-tions and functionality, while the practical

business of managing archival records acrosstime involves copying them onto currently sup-ported devices under th- control of newer soft-ware Research into the functional requirementsfor capturing and maintaining the intellectualcharacter of records as evidence is quite faralong, but ongoing support will be needed tostandardize approaches, implement solutions,and train arts and humanities professionals (andthe organizations in which they work) toarchive records of contemporary works in waysthat will be usable in the future.

H. New Social and Economic Mechanisms toEncourage Access

Perhaps the most difficult task is to lift our-selves out of our situation and envision differ-ent futures. John Garrett was asked not only todo that, but also to identify the research neededto invent those futures and report on the stateof knowledge about hypothetical and futuristicsocial constructs as well as the cultural, intellec-tual, political, and economic tools needed toconstruct alternative futures. Neither Garrett,nor the discussion of organizational options andfutures, preiuced a blueprint for social andeconomic change, but research support directedtoward experiments, prototypes, and "re-inven-tions" is probably 7he only way that the acade-mic community will move from its currentmoorings into new waters. While such experi-ments need not be costly in their infancy, theyshould be designed to be real players in the realworld. Foundations will need to develop tacticsthat enable them to fund or loan substantialquantities of capital to ensure that start-up ven-tures representing new ways of organizing thearts and humanities are structured as experi-ments, not as permanent, resource-creatingprojects. When such start-up ventures arefunded, it is also important to hold at least partof the funding for research into the before andafter, and into measurements of individualinteractions that support fine-tuning or explor-ing alternative arrangements.

OVERVIEW AND DISCUSSION POINTS 13

RESEARCH AGENDA ISSUE:

A theme running consistently through thecommissioned papers is that the state of artsand humanities computing is difficult togauge because it lacks an identity or focus. Ifthe arts and humanities bad a venue such asan Annual Review of Arts and HumanitiesComputing, or if existing mechanisms forreporting on humanities computing issuescould be made more responsive to the specificneeds of humanities disciplines rather thanto technological opportunities, the researchagenda could be advanced substantially. Amajor focus of any concerted research agendashould be to create such a structure.

II. THE NATURE OF THE ARTSAND HUMANITIES

Since the authors were asked to address researchissues in humanities and arts computing, it isnot surprising that many opened their discus-sions or prefaced treatment of specific topics byreference to the character of the humanities.Their papers and the online commentaries madeclear that further research into how humanistswork would help define the functional require-ments for supporting their activity. It would beuseful not only to define the past, but also todevelop baselines that would help us to under-stand how scholarship is being transformed bycomputing and digital communications tech-nologies. Serious thought should be given as tohow to foster systemic study of the humanitiesand how to make thc results of that researchboth known and useful to those developing sys-tems to support the arts and humanities.

In the absence of a body of research on thesocial and intellectual systems of the arts andhumanities, authors of the papers and discus-sants in the electronic conference cited impres-sionistic and undocumented attributes andderived from them criteria for evaluating thesuccess of computing as a means of supportingthese disciplines. Among the characteristics ofthe humanities the authors identified as impor-tant to shaping the research needs of its disci-plines were their presumed diversity, complexityof knowledge representation, variability inexpression, historicity, textualiry, cumulativeness,and genre dependence. Often thc authors con-trasted these, explicitly or implicitly, with pre-sumed characteristics of the sciences. But in theonline discussion, their assumptions about thesciences and social sciences were frequently chal-lenged; although these assertions were also made

14 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

without reference to a body of research literaturethat would have supported the debate, such aliterature does exist in the history, philosophy,and sociology of science. As Leonard Will ofInformation Management Consultants in theUK put it, many were "struck by the ab.xnce ofdata on what humanities scholars actually do,"despite the self-evident necessity of such researchin furtheling the agenda of humanities comput-ing. f ie went on to suggest that research on thisaspect of the problem would also begin tore..iolve the difficult questions of what benefitswould be obtained from different kinds of inter-ventions and implementations and by progressin different sub-areas of research.

A. Disciplinary Diversity

One way of thinking about the implications ofdiversity for formulating the research agenda isto see it as a reflection of material conditionsand an impediment to concerted action. In-deed, the way in which it was raised as an issueby Kolker and Shneiderman, who spoke of the"states of the art" within and between disci-plines, the disparity of equipment and access(mostly less than ideal) in different institutions,and the absence of humanities researchersamong those engaged full-time in humanities-oriented computing research make it appearthat diversity is a social and institutional charac-teristic of the arts and humanities.

However, there may be more fundamentalsources of diversity. In online comments, NoraSabelli of the National Science Foundation andSha Xin Wei of Stanford University noted thatthe differences between disciplines might rundeeper, reflecting the nature of argument(descriptive, logico-deductive, dialectic) in dif-ferent fields. They suggested that the human-ities might contribute to other fields such asmedicine and vice versa, based on diversityamong these fundamentai dimensions. Sha XinWei noted that mathematics was part of theclassical humanities curriculum, and that "itconsists of intuitions about, and elaborationupon, structures more akin to literature and artthan to the empirical sciences."

Such commonalities in intellectual processesshould be the link to software functionality,leading to software support for broadly definedstyles of reasoning and argumentation, ratherthan discipline-specific methods.

14

B. Complexity of RepresentationRunning through much of the discussion wasthe contention that the requirements for knowl-edge representation in the humanities are excep-tionally complex. To some extent this opinionreflects the views of specialists for whom off-the-shelf software is inadequate; it may therefore notbe specifically about a fundamental characteristicof the humanities, but instead reflect the relativepoverty and limited technical investment inhumanities computing, which often requireshumanists to use tools not specifically createdfor them. Examples cited, such as multilinguali-ty and methods for treatment of missing data(which is the norm in much humanisticresearch), are issues in day-to day computing butmay not be requirements for a "research" agen-da. For example, Janet Murray of MIT, in hercomments on the paper by Kolker andShneiderman, identified two cases in which herwork required development of specialized toolsto retrieve text from foreign-language video sub-titles and support multiple links from anyanchor point in an application of a video server.Unfortunately, it is far from clear that theseissues of unique software design requirementscan be addressed collectively; humanists andtheir funders may simply have to acknowledgethat more funding needs to be directed towardappropriate software for specific tasks at hand.

Susan Hockey referenced a more fundamentalaspect of complexity of representation in thehumanities, noting the prevalence of a multi-plicity of intellectual perspectives which thehumanist wants to keep in the picture at alltimes, since much of the humanities is aboutstyles of discourse and diversity of conceptualframeworks. The requirement to see a textualsource simultaneously through a variety ofinterpretive lenses and to bring them togetherat various points differs fundamentally from therequirement to see a material object through avariety of optical lenses or wavelengths of light;what humanists mean here, and how comput-ing tools might assist them, deserves furtherresearch. The same observation is clearly true ofimages, although research in this area is muchless developed.

C. Viriability of ExpressionAn interesting and important observation madeby Gary Marchionini in the context of searchand discovery was that the humanities actuallyencourag ,. differences in ways of expressingideas for the sake of interesting prose. Not onlydoes this fact defeat many efforts to standardize

terminology or provide algorithmic methods ofanalysis; it peses interesting challenges to intelli-gent full-text information retrieval. As the con-cept of "variations on a theme by. . ." makesclear, the concept of derivation as new creationis fundamental to the arts and humanities.Variations become increasingly less derivativeand can elaborate ideas far from the originaltheme, which creates fundamental challengesfor developing tools that explore degrees of dif-ferences, especially when the original expressionis in images and sounds. One of the complicat-ing factors here is that new humanistic worksincorporate and elaborate on originals; in thedigital environment especially, the act of cre-ativity itself can be blurred.

D. Historical OrientationArtists and humanists are not alone in dealingwith time as a variable in their research, as dis-cussant Warren Sanderson of ConcordiaUniversity observed, but they are more likelythan others to want access to older sources andto need to understand them as they were origi-nally understood. The implications of this ori-entation for arts and humanities computingresearch include the following:

Techniques for acquiring digital represen-tations of traditional source materials willcontinue to be important in the mediumterm became huge quantities of originalmaterials need to be retrospectively digitizedto achieve a ritical mass.

Serious research is needed into the fungibili-ty of original sources and into their reusabil-ity before great efforts are expended incapturing the material. If certain types ofsources are in fact highly fungible, substan-tial effort could be saved in digitization. Ifmany sources are not reusable, or reusabilityof sources depends on highly specific tech-nical and intellectual characteristics, wastedinvestments can be avoided.

Humanists need to develop and employ col-lective methods for defining representationconventions used in treating source materi-als, and to incorporate into sources such lay-ered knowledge as commentaries,pathfinders, and attribution tools that bothrepresent a point of view and reflect theunderstanding of others, from different his-torical periods, concerning the same objects.

OVERVIEW AND DISCUSSION POINTS 15

Humanists will be dependent on researchthat preserves digital signals over long peri-ods of time (as reflected in points made byPeter Graham of Rutgers University) a.idthe meaning of digital representatic.., overtime (as stressed by David Bearman ofArchives & Museum Informatics).

E. Textual BiasOne of the subtexts of all the discussions wasthat much of humanities scholarship, outsidethe arts, has been strongly oriented toward text.Because the authors of the research agendapapers were specifically asked to think aboutnon-textual information, they found manyopportunities for additional research presentedby image, sound, and multimedia. ContriZ,utorsto the debate clearly expected that "technology"would solve the problems associated with imagestandards and with integration of multimedia.In spite of disagreement about whether digitalcameras had already achieved resolutions ade-quate for capturing primary materials, as reflect-ed in exchanges between Kevin Kiernan of theBritish Library and Anne Kenney of CornellUniversity, participants expressed no doubt thatthese pesky technical issues were going to beresolved shortly and without input from thehumanities. Therefore, most of the discussionof research implications was focused on theconcept of quality as it applies to any represen-tation made for any purpose.

Contributors clearly felt quite comfortable withcommunity-defined standards for knowledgerepresentation, such as the choice of SGMLmarkup and the Document Type Definitions ofthe Text Encoding Initiative (TED for text, butthe call for further research began in earnestwith markup of image or sound data. As usual,the question of how best to represent the knowl-edge embedded in such multimedia objectsturned on the purpose of representations, thenature of the intended audience, and the mean-ing of a precision of reference and preservationof context (to use Janet Murray's criteria ofquality) when applied to diff:rent modalitiesand different humanities ..usciplines. It was evi-dent that these questions have not been satis-factorily answered and that substantial researchwill be required to begin to identify features forintegrated multimedia markup and to assess thebenefits to artists and humanists of such value-added efforts.

lb

t6 RESEARCH AGENDA FOR NETWORKED CULTURAL HI RITAGE

F. Cumulative CharacterThe arts and humanities are being developed,taught, and thought about on an ongoing basis.For many participants, the design of futureteaching and learning was a critical topic ofresearch for humanists. Bob Rosenberg ofRutgers University and Bob Arellano of BrownUniversity urged further examination of theimpact of digital delivery systems on learningand organizing teaching resources. JerroldMaddox of Penn State University expressedconcerns that his teaching had by necessitybecome more exam-based since his studentswere often 1,000 miles away, and proposeddetailed study of the good and bad conse-quences of distance education. Janet Murrayprovided examples of how new intellectual par-adigms may resonate with the new technologies,as in the teaching of writing. There was no sim-ilar discussion of the teaching of art, althoughoblique reference was made to teaching dramausing digital sources of previous performancesof the same plays.

What seems most interesting about the discus-sions of learning and teaching is the role ofcumulative knowledge and the representation ofcumulative knowledge. Current computingtools ri -.Aide the best environment we have yetmad, ior exploring such overlays as are createdby commentary built up over time. Researchint.o the benefits of using such methods forlearning will go a long way toward validating,or discrediting, their use in teaching.

RESEARCH AGENDA ISSUE:

The arts and humanities have not given riseto a field of self-study analogous to the his-tory, philosophy, and sociology of science,long since designated a schokrly disciplinein many universities. As a consequence, alack of agreement on the fundamental char-acteristics of the fields constituting the artsand humanities precludes the conditions forsuccessfid systems development and evolu-

tion. A research agenda that does notaddress how the arts and humanities canbecome the object of systemic study will havelittle long-term impact on the state of tools,methodologies, and analytic _frameworks forsupport of these fields.

16

III. CHALLENGES ACROSS RESEARCHDOMAINS

Several proposed research challenges, while notattributed to the nature of arts and humanitiesper se, nonetheless applied across disciplineswithin arts and humanities. These researchproblems appear to be relevant to any body oforganized knowledge elaborated upon by acommunity of practitioners.

A. Disciplines as Symbolic Systems

Disciplines, including those in the arts andhumanities, are formal systems, with languages,reptnentation conventions, and ways of think-ing. Moreover, different disciplines evolve dif-ferent ways of thinking about resources. If weare to develop adequate means for computingto serve "the arts and humanities," understand-ing the differences between these formal sys-tems is crucial to model our representations ofsources correctly. And if we are to decode theirrepresentation conventions accurately at afuture time, documenting the representationrules we subsequently use will be essential.

Little research has been conducted into the gen-res of expression used by humanities disciplinesand the constantly evolving assumptions under-lying them. The claims that humanities disci-plines share the need to represent the processesand contexts of creation, and that precision ofreference and preservation of context play a spe-cial role across disciplines, have as yet little sub-stantiation within the research literature.

The design of the rules for SGML encodingadopted by the TEI, for example, anticipate theongoing analysis and markup of digitally cap-tured sources. The resulting many-layered repre-sentation, carrying perspectives of a number ofdisciplines and the attributions of many ana-lysts, will make genre analysis a major researchissue for humanists. Defining the factors criticalto understanding sources specific to differentdisciplines should inform future guidelines fortext representation.

B. Multimedia Representation

lb carry modalities of information other thantext will require methods for linking one pieceof information to another, including objects ofdifferent modalities, in ways that reflect theoriginal (pre-digital) intention. Different kindsof objects have different functionality withrespect to their links: for example, spokenobjects need to be heard, three-dimensionalobjects need to be moved through and around,

a

and objects that magnify parts of otl,er objectsneed to be "opened when clicked.' At a muchmore fundamental level, in order to representmultimedia data the way that end users perceiveit, humanisrs need to co iduct research into themeanings of the various modalities of informa-tion and hcw meaning is affected when they arecombined. A variety of types of informationcannot yet be used effectively because we lackways of representing it digitally that will beavailable for use by others. To illustrate thispoint, Susan Hockey identified the problem ofrepresenting derived knowledge, while AnneKenney pointed to pattern matching, objectrecognition, or raster-to-vector conversion.Thus, subjects for humanities research thatwould contribute to the evolution of new formsof digital communication include discourse onthe construction of "intelligent files" that reflectmodes of speaking, have "hot" links, executescripts, and contain other dynamic andauthored elements.

While'practical difficulties in managing theevolving new genres such as the corpora andrich webs being created in some disciplines andspecialties are not unique to the humanities,humanists have a special role to play in docu-menting and researching the implications ofthese new approaches for scholarship and teach-ing. Several disciplines in the arts and humani-ties will soon attain the stage at which largeenough bodies of digital content exist to consti-tute the "critical mass" long thought essentialfor any serious research into the impact of mul-timedia. Any research agenda needs to jointhese fields of scholarship in virtual multi-disciplinary laboratories.

C. The Need for Standards

Standards, or the lack of them, were a majorconcern of most of the authors and are, ofcourse, essential to effective communications.But what was meant by standards, and whetherhumanities-based research would contributespecially to such standards, was not always clear.Kolker and Shneiderman invoked the need forinterface standards and methods of accessingcontent; their focus on these was supported bycommentators who felt that the humanities hadspecial needs for Graphical User Interface(GUI) standards beyond those being met today.They were strongly seconded by Nancy Ide(President, Association of Computers and theHumanities), who viewed the success of elec-tronic means of research and teaching as

inevitable but saw the development and pro-

OVERVIEW AND DISCUSSION POINTS . 17

mulgation of appropriate user interface stan-dards as a sine qua non of that success. In partic-ular, reference was made to tools that wouldsupport annotation and attribution, comparisonand presentation, and synthesis. WarrenSanderson of Cor.cordia University envisionedhe framework as living between sustained nar-

rative and a database, allowing for drafting, dis-semination, amplification and modification, andcommentary. "It approaches," he said, "the char-acter of a continuing seminar or colloquium."Sha Xin Wei cautioned, however, that standard-ized environment elements, such as the WorldWide Web protocol, are not really tools but sim-ply infrastructure and that toolsets will be con-structed around scholarly tasks and disciplines.

Susan Hockey explored the role of meta-data asindependent representations of the logical andphysical source, which led to the importance ofSGML for preventing obsolescence in text rep-resentation. She noted humanists' need for mul-tiple parallel hierarchies in SGML (whichremains a research problem) and the limitationsof HTML in this respect. It is not ..wident thatnew standards are required for representing sig-nificant intellectual features of texts or multi-media, specific to the humanities; agreement onwhat meta-data ought to be employed for thesepurposes calls for further research.

Janet Murray foresaw that teaching from textswill be severely hampered unless we can developstandards for text management software, butthese are only the tip of a larger iceberg: appli-cation interoperability standards of value toeducation. Ron Overman of the NationalScience Foundation added that ethnographicdatabases, geographical databases, economic his-tory databases, and databased video all rep-resent environments needing commonauthoring and retrieval tools and standardmethods to enhance intra- and interdisciplinaryresearch. Because thcre is little reason to believethat interoperability standards are more neces-sary in the humanities than in other areas ofendzavor, however, a research focus specific tothe arts and humanities seems unnecessary.

In some areas, the arts and humanities could bespecial beneficiaries. Current standards for digi-tization of images are confined to technicalstandards necessary to record pixels, rather than

intellectual standards for recording thc contentand ideas the images represent. While technicalstandards help ensure quality of capture, AnneKenney makes it clear that the humanities will

17

18 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

always need to ask "quality for what purpose";content-level standards, based on intended use,will require further research into those uses. Ofcourse, if the images are in color, humanists willbe concerned that the surrogate has the samecolor as the original, a nearly impossible goalwithout standards for color management anddisplay, which are snit in their infancy.

System and architecture standards were not for-gotten. John Garrett noted the crucial need forreliable, standard infrastructures. Several suchstandards would be of special importance to thehumanities, including location-independentnaming of objects and registration methods fordigital objects that will protect intellectualproperty and ensure credit. David Bearmancalled for immediate investment in standardsfor meta-data encapsulation of records to pro-tect their qualites as evidence, to fulfill anessential aspect of trustworthy and reliable testi-mony critically important to all scholarship.

RESEARCH AGENtit ISSUES:

If the arts and humanities are to be success-fri in influencing the development of soft--vare, display and telecommunication;technologies, and standards ranging fromapplications to systems, they will requiresupported spokespersons capable of takingtheir position in the vast array of standards-setting and de facto standardization pro-cesses under way in the computing industry.Substantial CM ts are entailed to retain thetechnical expertise to play dfectively in thestandards arena. Further inrIstments willbe required to maintain rer.dar contactwith arts and humanities scholars and cred-ibly represent their interests. A researchagenda that overlooks the need to supportsuch infrastructure will have little impacton the fundamental characteristics of com-puting and communications technologies.

IV. INSTITUTIONAL CHALLENGESThe task of writing about new societal mecha-nisms was assigned to John Garrett. A broadlybased response urged further study of emerginginstitutions and imagined institutional arrange-ments, with experimentation the frequently rec-ommended means of exploring newinstitutional structures. Virtually every partici-pant highlighted the need to understand andbetter manage the social d;mensions, organiza-tional challenges, and economic constructs thatthe advent of digital networked communica-

lb

tions had brought to the humanities. Thecalled-for research ranged from providing sup-port and tools for humanistic scholars anddeveloping more cost-effective methods of datacapture, conversion, delivery, and distributionto more fundamental issues of promotion andtenure, peer review, access to resources, andsupport of "trailblazers." While the need forresearch in these areas is not confined to thehumanities, humanists keenly feel the absenceof a framework for entering the digital age.

A. Distribution of Scholarly Knowledge

Communication is, above all else, essential tothe arts and humanities. The system of dissemi-nation that supports them, in its broadest defin-ition, encompasses all means of publishing andperforming. The significant changes that thissystem is undergoing raise many questionsabout its direction and method of getting there.

Scholarship requires repositories of knowledgeand communities of debate. Building libraries isthe first task, and it is evident that we do notknow technically how to go about the captureof digital information, where to get the funds,or where to begin. Katherine Jones-Garmil ofHarvard University identified the serious needto move beyond the "greatest hits," or works ofcanonical importance in a given discipline, tothe primary sources of real value to scholarship.She and others called both for evolution of theelectronic journal and for tesearch into the ben-efits and drawbacks of electronic-only dissemi-nation of current knowledge.

Accessing the resources, if and when they aredigitized, is no easier. Toni Petersen of the Art& Architecture Thesaurus noted that "incred-ible funding resources are going to have to beapplied ro improve" discovery and retrieval.Research by the Coalition for NetworkedInformation over the past year has suggested thesame. Even when electronic representationshave been found, getting them to those whoneed them is no trivial matter. Kolker andShneiderman joined Janet Mut ray in calling forresearch on how best to deliver electronicresources to students. The Museum EducationalSite Licensing Project in the United States,which has drawn attention to this problem, isamong the experimental fields in which rcsearchon these questions can be pursued.

Once data is delivered, interpreting what hasbeen sent and providing tools for understandingit presents 110 small task. Anne Kenney and

Janet Murray pointed to the large compendiaand discipline-based projects that are creating anew resource, rather than simply a library of oldsources, and to the social implications of creat-ing "course length" hypermedia. How, they andothers asked, will the role of the scholar, asteacher, as author, as reader, or as curriculum

developer change?

Challenging part of the framework suggested byJohn Garrett, Paul Peters of the Coalition forNetworked Information pointed to numerousstudies, and to the need for many more, exam-ining how traditional roles in the productionand dissemination of scholarship are breakingdown and what is replacin; them. The systemsbeing studied are essentially those of the tradi-tional scholarly publishing chain, but otherecologies need analysis, too: the authors of fic-tion, poetry, music, dance, theater, film, andsoftware are part of different disseminationchains that are no less affected by change, per-haps even more so.

B. EducationDespite the great promise of electronically net-worked resources, higher education has yet tocapitalize on them as supports for its research,teaching, or service roles. The concerns of ele-mentary and secondary education were newlyinvisible in the online discussion, but sure:y theywill have as great an impact as the unir...rsitieson the electronically resourced film, of learn-ing. In any case, a research agend.a that does not

look equally seriously at the implications of artsand humanities computing for K-12 education,and for lifelong learning, as it does at highereducation will fail in the most importantrespect: it will lack relevance to the social con-text in which the case for arts and humanitiescoMputing must ultimately be m,de.

But this aspect of the research agenda ls formi-dable. To begin with, we know very little aboutthe use and impact of digital surrogates inlearning. It may be too early to study the effectsof new media, and we may still know too littleabout learning itself. But it is not too early toformulate questions and to begin.to gatherbaseline data from which to assess the inroadsmade by new methods of teaching and learningbased on electronic resources and software-

assisted methods-. Small-scale, controlled stud-

ies, with substantial qualitative aspects, couldfirst serve as the basis for larger, quamitativestudies that make comparative assessments.

OVI RVII W AND DISCUSsK)N POIN I S 19

C. LawChanges in society lead to changes in law. Inthe case of electronic resources in arts andhumanities, these changes are still too inchoateto provide adequate support for potential devel-opments such as the copyright of electronicresources in education and the reliance on elec-tronic evidence for historical study. JanetMurray', and Jennifer Tram of AHIP's ImagingInitiative, expressed the contemporary uncer-tainty regarding intelle .cual property law.Specifically, these uncertainties are seen as hav-

ing current negative impact on media studies,but the longer-term impacts will be on all usesof historical resources that need to be convertedto electronic form. David Bearman pointed tolegal uncertainties about what it means to pre-serve electronic evidence and how failure on thepart of governments and individuals to createauthoritative electronic records will impedefuture historical research.

Research, combined with advocacy, can ad-vance arts and humanities interests within legalframeworks. Research that defines specificharms and identifies equally specific remedies isessential to future electronic scholarship. Thepace of legislation is generally faster than that ofresearch. Thus the challenge is to fund anticipa-tory research by policy research groups already

in place.

D. EconomicsDuring the conkrence, there was only indirectdiscussion of the importance of economicresearch to the agenda of humanities comput-ing. Yet humanists often feel that the agenda forsoftware research, for example, is being set bycommercial firms with needs and priorities dif-ferent from theirs, and that the nature of themedium and its use is being determined byinfo-tainmem rather than by educational inter-ests. Although considerable research has beenconducted on the economics of the current,paper-based information delivery models inlibraries, the discussion neither referenced thiswork nor called for more. Nevertheless, only abetter understanding of the economics of thesystems that support arts and humanities willchange both those systems and the flows ofresources through them, to achieve desired new

ends. Any serious research agenda for arts midhumanities computing will support research onthe economics of capture, storage, retrieval,delivery, and use of electronic resources, as wellas examine the costs of failure to develop an

1 9

20 RESEARCH AGENDA FOR NEEWORKED CUEI URAL HERITAGE

appropriate mechanism for arts and humanitiesto exploit computing capabilities.

E. Communication ThehnologyThe attention given by the authors to issues ofcommunication, collaboration, and dissemina-tion highlighted the transition over the lastdecade from freestanding to networked com-puting. Virtually all the authors, while celebrat-ing the virtues of the Internet, bemoaned itsprimitive organization of resources and accessmethods. Research into both automatic andhuman-assisted finding tools for makingresources known was seen indisputably as yield-ing thez.reatest benefit. Its value would increasein proportion to the continued growth ofresources and might exceed the benefits of sim-ply adding new materials.

While authors and discussants pointed to exem-plary Internet sites, they acknowledged thesevere limitations of common knowledge-repfesentation toolsets such as those based onHTML. The advantages of mixed media in thedigital network nevertheless raise a host ofresearch problems. ranging from such basictechnical issues as linking objects of differentmodalities and determining appropriate levelsof compression research to more fundamentaldemands for greater understanding of userneeds arid perceptions. The sense that digitalmultimedia is the beginning of a new means ofhuman communication has yet to give birth toa research framework in which the meaning ofthis revolution, and the means for promoting it,can be understood.

The concern for the instability of the currentnetwork was accompanied by a certain despairover how the arts and humanities could influ-ence it to become more the kind of long-term,supportive communications environment theyneed. Specifically, dramatic improvements indisplay technologies and interoperability stan-dards need to be developA and sustained toovercome the current impermanence of the vir-tual networked library. Of critical importance isresearch to identify methods to prevent destruc-tion of the last or archival copy of a work aswell as means to ensure that archiving solutionsiii a networked environment will prove bothscaleable and susceptible to implementation.

Finally, the participants saw a need for newtools. In the face of their inability to digest thethousands of new tools being thrust out intothe market annually, there was nevertheless a

0

sense that some classes of tools were not fullyunderstood, would not be made by [he com-mercial sector, or would not be effectively usedby arts and humanities scholars without sub-stantial new support. In addition to bettermethods of search and discovery, the leadingrequirement was for stronger mechanisms tosupport editorial or critical review and the ana-lytic and annotation facilities they required. Thewidespread call for tools that could evaluate,automatically summarize, and ii tegrate differ-ent sources raised the implicit cuestion of howthe humanist's role will change when softwareperforms these traditional intellectual tasks ibrthe scholar.

The absence of baseline dam about what com-munications and computing facilities the artsand humanities are using, and for wliat pur-poses, makes it difficult to identify where bestto invest in rcsearch. The first research issue,therefore, will be to establish sucli baselines.

F. People

In the midst of large-scale social change,understanding what ic happening to peopleand their interactions with technology is criti-cal to making it work better. This requires nota one-rime study, but rather an ongoing effortof many different disciplines over the foresee-able fwure. What kinds of questions will haveto be asked, again and again, to navigatethrough this transition? What skills are needed,what meanings are to be imparted, what meth-ods are to be employed?

Kolker and Shneiderman called for ongoingresearch into the shifting computer-literacyneeds of faculty and students. One could rea-sonably extend this call to the general publicand to younger students as well. Probably ofequal importance to the humanities, as AnneKenney and Donna Romer pointed out in theirdiscussion of image representation issues, isunderstanding the meanings that new informa-tional genres will have for their "readers" (eventhe concept of "reader" will have to give way toa viewer/participant/contributor), how represen-tations will function as surrogates, and howthey will serve purposes beyond surrogacy. Wewill need u. continue to explore the culturaland discursive implications of nonlinearity andmultiple intellectual perspectives on a singletext, issues raised by Susan Hockey. What willthe impact of availability be on the perceivedusability of images by the end user, as discussedby Anne Kenney?

Skills and meaning will merge in determiningwhat tools future reseaichers will need and howthey will use them. Ongoing research into t'ledemand for structured-vocabulary searching. full-text searching, and searching through knowledgebases using intelligent agents will help change

methods of representing knowledge in digitalcollections. Ongoing research into image analysis

and description, indexing, and annotation, andthe use of machine intelligence to locate imagesthrough pattern matching and object recogni-tion, as called for by Donna Romer, can haveequivalent implications. The ultimate need is fora research basis to determine not only the effectof future intelligent objects on scholarship butwhat kinds of intelligence they, and the systemsthat support them, ought to have in order tocontribute to scholarship.

If research could bring about Gary Marchionini'svision of search and discovery tools integrated

with cxearion, use, and communication tools,how vould that vision change his identifiedneed for electronic analogs of existing genres offinding tools? If research establishes that the artsand humanities address an imprecise audiencewith many varied intellectual perspectives, asnumerous commentators suggested, whatrequirements will this place on sofrware to pro-.vide multiple approaches, layered representa-tions, and well-tested interface methods? If, asDonna Romer asks, we can find ways to mean-ingfully identify content attributes withinimages for automatic identification bv comput-ers, we will still need to understand visualthinking processes (which, in turn, will evolverapidly). How much more so the representationof motion and music, in which the state of theart today is so primitive?

We can readily agree with Janet Murray' thathypermedia authoring and reference environ-ments arc urgently needed, yet have no idea ofthe impact of these tools on the humanitiesand the arts. Thc leitmotif here, as JohnGarrett reminds us, is that there is a stronginterplay among technology, scholarship, andsociety' and that we have yet to begin the job ofstudyit-g these variables to tune the system.What far-reaching consequences would collab-oration tools with mechanisms fin assigningresponsibility and credit have? How will low-ered entry barriers for scholarlv publishingaffect the humanities?

OVFRVIFAX! AND D1ticUtiN1ON POINIS 21

Finally% Bearman reminds us that the entireconcept of evidence has its roots in the cultureand that the digital object and digital comnm-nications will transform both our concepts ofevidence and the literary warrant for records.How records are used, an area that has longbeen under-studied, will continue to cry out for-attention; in a time of changing methods andproblems, the answers will be needed more thanever. Katherine Jones-Garmil of HaivardUniversity adds that the electronic journal andelectronic dissemination of research upsets exist-ing paradigms of authenticity and authority.

G. Sources

It is, of course, equally' important to understandwhat is happening to the genres of symbolicexpression themselves. Virtually every authorstressed the significance of research into elec-

tronic genres and our understanding them asmeans of expression. Kolker and Shneidermanraised the question indirectly in reviewingexemplary Internet sites: what makes a "homepage" valuable, effective, or even interesting?Susan Hockey asked more explicitly for researchinto way's of creating a new genre that shebelieves is essential for scholarship in thehumanities: one in which representations ofstructure and content are independent, multipleperspectives and versions can be interrelated,and nonlinearity can be supported. AnneKenney' asks us to understand not only whatdifferent genres are, but also what are theirfunctional requirements for digital representa-tions to enable us to devise automatic capturesettings and make decisions about conversionpriorities with automated selection and control.

Michael Joyce of Vassar College contributednumerous examples of collaborative work inMOO (Multi-user Dungeon, Object Oriented),pace and of collaborative approaches growing

out of the "Computers and Composition"movement that have spawned software, jour-nals, conferences, and even new disciplinaryassociations. In his view the radically newmeans of expression interact with the complexi-ty' of the "feminist, post-modernist and otherradical" content of the expression they haveengendered. Donna Romer calls on us to con-duct research into the fiwmal propertii, of gen-res ill diffirent modalities and to explore how tocreate and exploit n entirely new genre, the"visual thesaurus." And, of course, we have thegenre of nonlinear writing, for which we needboth better tools and a basis for understanding.

21

22 RESEARCH AGENDA FOR NEI-WORKED CULTURAL HERITAGE

When John Garrett calls for research onresource identification systems he in partreflects the need to identify what a uniqueresource actually is in an age in which the "orig-inal" and the "copy" are indistinguishable andexpression involves evolutionary versions, bor-rowing, and references to external entities.Bearman's model of records as transactions willrequire research on how best to capture meta-data defining the record, creating new genres ofcommunicated transactions and new require-ments for robust, functional representations.

RESEARCH AGENDA ISSUES:

Identifting institutional and social changesessential for creating a hospitable environ-ment for computer-supported arts andhumanities is critical, since neither thehuman nor capital resources for changingeverything are available. Research thatbegins to identift critical success factors andlocate current barriers will help realize thepotential of arts and humanities computing.

Cited e-mail contributions to the discussions(other than those in the commissionedpapers). In each case, the names and institu-tional affiliation of discussion contributors arecited in the text.

Bon ARELLANO (BROWN UNIVERSITY), "RE:

LEARNING AND TEACHING," OCIOBIM 10, 1995

DAVID BARMAN (ARCHIVES & MUSEUM

INEORNIM1CS), "RE: ARCHIVING," JUI v 16, 1995

MK:HAEL BUCKLAND (UNIVERSE! y OF CALIFORNIA,

BERKELEY), "KNOWLIDGE REPRESENTAI ION,"

NOVEMBER 28, 1995

PETER GRAHAM (RV EGERS UNIVERSITY LIBRARIES),

"ARCHIVING," JUIY 10, 1995

. "RE: AlwANcED AtultivINt;CHNOLOGIES," JULY 24, 1995

NANcy 1oE (ASSOCIAI ION FOR COMPrEERS AND

1'111 HUMANITIES), "COMMEN 1 s ON T001 I OR

CRI AnNo; AND ExPIOIHNG CONLEN I ,"

NOVEMBER 15, 1995

K,vri ii luxi. joNEs-GARxut (HARvAlu) UNIVERSI1a[NO SUM! CI IINI j, NOVI SIBI R 15, 1995

MILHAH. JoYci% (VAssAlt Col I H.11.), "C051511 N 1

ON Ii ARNING AND TEA( IIING PAN R,"

Novi xtBER 14, 1995

ANNE Kt NNEY (CORNI I 1 UNIV1 RSI 1 Ri : Toot s,RI 11(1st N IA rioN, WM Joir 19, 1995

KI.V1N Ku RNAN (UN1VI RNI iv IN Ki NIIVEY),

"CONVERSION," JUNI 25, 1995

. "Too1 S, REPRESENTATION, IMAGE,"

JUNE 27, 1995

JERROID MADDOX 1,PENNSYLVANIA STATE

UNIVERSEIN), "LEARNING AND "FEACHING,"

OCTOBER 6, 1995

JANET MURRAY (MIT), "IMAGINING IDEMENVIRONMENTS," JUNE 26, 1995

. Toms, REI,RYsENroioN, IMAGE,"JUNE 28, :995

. ADVANCED ARCHIVING TECHNOLOGIES,"

Jut.v 2% 1995

. "RiNcnoNALTTIEs FOR HUMANITIE.S

SCHOLARS," Juty 21, 1995

RON OVERMAN (NSF), "RE: "FOOLS FOR

CRENEING AND ENPI OITING CON 1 ENT PAPER,"

NOVEMBER 6, 1995

PAL1 PETERS (CNI), "JOHN GRRETE'S 'NEwSOCIAL AND ECONOMIC MECHANISMS' PAPER,"

OC,TOIWR 11, 1995

Tom PEEERSEN (Gym AH1P, AAT). "RE:RESOURCE SEARCH AND Disc:min' PAPER,"

0(711)11ER 26, 1995

Boil ROSENBERG (RUM;ERS UNIVERS! IN),

"RE: LEARNING AND TEACHING," OCTOBER 9, 1995

NORA SABEIAA (NSF), "Rt.: HOCKEY PAPER,"

OCIOBER 26, 1995

JERRY SAETzTA (MIT), "RE: ADVANCED ARCHIVINGTECHNOLOGIES," JULY 12, 1995

WAPREN SANDERSON (CoNcomA UNIVERsiTv),

"RESOURCE SEARCH AND DISCOVERY,"

OCTOBER 11, 1995

JENNIFER TRANI. (GETEY AH1P), "RI.: Toot s,

REPRI.SENIAHON, IMAGE," Jtu 19, 1995

LEONARD WILL (CoNsutTANT), [NO st.NEcr I INF],NOVEMBER 28, 1(:95

SI LA X1N WEI (STANFORD UNivmsrry),

"Toot s TOR CREATING AND ENPI OH ING CONTENT,"

OcroBER 24, 1995

2 4

INTELLECTUAL ISSUES

SYNOPSIS OF RESEARCH

OPPORTUNITIES

AND FUNDING NEEDS

Shared methods of representation serving different perspectives and functionalities in order to create a

unified and comprehensive library of useful knowledge

KNOWLEDGE REPRESENTATION

10ff. Effective representation of knowledge

10 Different degrees of "fidelity" in knowledge representation

10, 17 Meta-data elements required for humanistic research

10 Applicatioi, of semiotics research to knowledge representations as material cultural objects

10; 13 Functionalities that humanists require of digital representations

i 2 Preserving context and meaning in higher-level representations of knowledge

17, 19 The meanings of new information genres, their function as surrogates, and how meaningis affected when genres are combined

CONVERSION, TREATMENT, AND D(CUMENTATION OF SOURCES

Features of intellectual sources that help interpret, understand, and connect them

14 Simultaneous multiple interpretations of resources

15 Tools to distinguish degre2s of difference f..etween original sources and their variousderivation

151. Consensus on methods of defining and documenting representation conventions forsource materials

MULTIMEDIA

10, 18 Standards for representing content of non-textual media

15 Identifying features for marking up integrated multimedia

Discovery and retrieval in a distributed network environment

11, 17 Methods to manage, and provide access to, large collections of digital materials

11

11

How retrieval resembles, and differs from, discovery

The relevance to the networked environment of prior research on centralized databases

Tools11 Discovery tools that better exploit browsing capabilities and prior knowledge

17 Retrieval tools that support annotation and attribution, comparison and presentation,and synthesis

17 Common authoring and rctrieval tools to enhance intra- and interdisciplinary research

20 Automatic and human-assisted discovery tools

2 3

23

24 RESEARCH AGENDA FOR NETWORKED CULTURAL H FATAGE

TEXT AND EDITING

ii Better tools and techniques for full-text retrieval, including pre-processing, linguisticanalysis, and artificial intelligence

20 Mechanisms and analytic/annotation flicilities to support editorial or critical review

MULTIMEDIA

II Retrieval models applicable to multimedia, including optical pattern matching,approximate retrieval, and automated indexing

12 The effectiveness of text-based retrieval in image-based resources

12 Criteria for representing image sets, rather than individual images

16 Methods for linking multimedia information objects, to reflect their pre-digital intention

21 Structured-vocabulary searching, image analysis and description, indexing, annotation,and machine intelligence for retrieval

SThNDARDS

11 Data standards for uniquely identifying networked objects, to ensure meaningfuldiscovery at a consistent level of detail

9 Interface standards

Persistence of computerized resources over time to ensure future stability of knowledge

ECONOMIC FACI ORS

11 Cost-effective methods for digital conversion of resources

11 Demonstrations of scaleable technologic,: and organization for digital conversion

15 Retrospective digitization of large quantities of original materials, to achieve a"critical mass"

19f. The economics of capture, storage, retrieval, delivery, and use of electronic resources;costs of failure to exploit computing capabilities

20 Methods to ensure that networked archiving solutions are scaleable and implementable

QUALITY

Criteria for evaluating converted resources, to foster further and improved capture anddelivery

11 Quality benchmarks for conversion

12 Image quality characteristics and methods of documenting technical characteristics ofdigital images

18 Standards for meta-data encapsulation of records, to protect their qualities as evidence

MF rums

I I Compression techniques

1.3 Standardized approaches, implementation, and training in methods of archivingdigital records

19 Preservation of digital signals and their meaning

1 1 Standards for color management and displ iv

18 Location-independent naming of objects; registration methods for digital objects thatprotect intellectual property and ensure credit

2 4

syNoysis cm RI SI AR( H OPPOR I UNI III S AND I UNDINC, NI I IS 25

18 Techniques for creation of digital libraries

20 Display technologies and interoperability standards, to overcome current impermanenceof virtual libraries

20 Methods to prevent destruction of archival copies

INFRASTRUCTURAL ISSUES

Publication, conference, or electronic discussion list through which to report and gauge progress on theresearch agenda

8, 13 Publication such as Annual Review of Arts and Humanities Computing to report andassess papers on the research agenda

9 Informing practitioners of research results in specialized disciplinary applications

12 How best to inform educators of the state of research, tools, and digital resources

Consensus among humanists on fundamental characteristics of their fields, and on criteria for developingsoftware and systems

UNDERSTANDING

9 Knowledge of how network tools are influencing the arts and humanities

10 Investigation of the meanings of digital genres, their audiences, and their purposes

12 Documentation of resource sets and their user corn: lunities, to establ.sh the characteris-tics of varied "points of view"

13 I. Understanding humanists' working p uctices, as a basis for defining tbuctional require-ments that support their activity

16 Genres of expression used by humanities disciplines and the evolving assump6ons thatunder!ie them

Baseline data about use of communication and computing facilities and for what purposes

IMPAcr ON TRADITIONAL DISCIPLINES

14 Software for specific humanistic disciplines

16 Understanding of sources specific to different disciplines

17 tanking disciplinary "critical masses" into virtual multidisciplinary laboratories

9, 17 Computer interfaces and interface standards

12 Collaborative software development to create customized authoring environments forhumanists

18 Content-level standards based on intended use

Support for advocacy on behalf of the humanities in technical and standards development

17 Documentation of implications of new technologies for teaching and scholarship

1 7 Humanists needs fnr standards and techniques beyond those already available

17 Need fr humanists to define interoperability standards for their disciplines

2b

26 RESEARCH AGENDA FOR NETWRKED CULTURAL HERITAGE

Promoting computer-astisted scholarship and teaching in the arts and humanities

EDUCATIONAL I M PM 'T

12 Defining curriculum in light of new technological capabilities

16 Impact of digital delivery systems on learning and organization of resources, for design offutuie teaching and learning

16 The consequences of distance education

16 Examination of benefits of using computing tools in teaching, to validate or discredittheir use

17 Development of stqndards for application interoperability that are of value to education

18 Effective delivery of electronic resources to students

19 K-12 education and lifelong learning

TRAI N I NG AND USE

91. Placing computer and network tools in the hands of students and faculty. and trainingi; them in their use

10 (15 Upgrading campus-based services

Support for classes of tools that humanists understand poorly or use ineffectively, or thatwould not be produced commercially

20 Shifting computer-literacy needs of faculty and students

SCHOLARLY COMM UNICATION

1 7 Evolution of new forms of digital communication through discourse and "intelligent files'

1 8 Evolution of electronic journals; benefits and drawbacks of electronic-only disseminationof knowledge

ADVOCACY AND PLANNING

9 Demonstration projects in discipline-specific applications

12 Education, implementation strategies, and proselytizing for use of digital images

19 Advancement of arts and humanities interests within legal frameworks; anticipatolyresearch by policy research groups

ADDITIONAL AREAS FOR RESEARCH

INS TITUTIONAL AND SOCIAL I M I'ACT

1 3 Experiments, prototypes, and "rc inventions" leading to social and economic change inthe humanistic academy

18 Emerging institutions, imagined institutional arrangements, and new institutional structures

18 Management of social dimensions, organizational challenges, and economic constructsresulting from networked communications

1 9 Breakdown and replacement of traditional roles in production and dissemination ofscholarsh ip

Human-computer interaction

TOOLS FOR CREATING AND

EXPLOITING CONTENT

Robert Kolker and Ben SImeiderman

University of Magland

STATE OF THE ART.11) be true to the spirit of the humanities, we need to talk about states of the art. The humanities are

a large umbrella under which many disciplines carry on many varieties of work, almost all of which

may be subdivided into smaller components, down to the unique research done by a particular indi-vidual. Because humanities research is only occasionally carried on by teams or under the rubric of a

collective project, computer-based tools and access to content are currently (with a few exceptions)

distributed across many sites and many individual projects.

In most institutions of higher learning, the humanities include performance and artistic production

(theater, music, literature, filmmaking; painting, sculpture, photography); critical and theoretical

work (art history, literary theory and criticism, film and media theory and criticism, rhetoric, philos-

ophy, linguistics); research ',history, literature, art history; music history; film and communications),and language learning. Frequently these areas intersect.

Given this diversity and the fact that much of the work of the humanities has been traditionallyintuitive rather than deductiveand based profoundly on the bookacceptance of technology is

slow but increasing at a steady rate. On the most fundamental level of equipment, enormous dispari-

ties exist. Most researchers and professors in the humanities still use low-powered DOS-based orMac computers to do word processing and e-mail. Networking is not universal, though many have

some kind of Internet hookup. Some are content with this level of access, but may be unaware ofmore sophisticated possibilities and opportunities to improve their work lives. With increased Jain-ing and knowledge of such possibilities, they should be able to raise their interest levels and improvetheir access, which will mean their work will have a greater impact on their intellectual comraunities,their students, and their publics.

Others in the humanities are actively exploring how technology can advance their research andteaching. A few devote most of their research to creating computer-based tools for their cEsciplines.Some team projects are developing common access techniques. For individual research projects, how-

ever, even the best work is oftenperhaps usuallydone without careful attention to human inter-

action factors.

CURRENT RESEARCH AND ITS PROMISEThe majority of current humanities research can be divided into three categories:

The Internet, which can be subdivided into electronic discussion groups and Web sites.Existing software, such as graphics, presentation, database and database front-ends, and multi-media authoring packages used to develop discipline-specific applications.Original software developed for specific or general research projects.

Network access is among the most important tools for the humanities and perhaps the first manyfaculty use when they step beyond word processing. The wide variety of discussion groups, whichpermit free circulation of ideas, are especially useful in helping colleagues share information. Forexample, the NEN-supported H-Neta network of over 57 humanities listservs supervised by theUniversity of IllinoisChicago and Michigan Statc Universityprovides moderated forums in suchareas as diverse as women's history, American studies, ethnic immigration, film history, rural andagricultural studies, and comparative literature and computing. Other humanities electronic discus-sion groups have waxed and waned over the years, probably because they weie too general. But mostH-Net groups seem to thrive because of focus and careful supervision.

27

27

28 RESEARCli AGENDA FOR NETWORKFD CUITURAI. HERMGE

But these and other network-accessed discus-sion groups suffer from the lack of a unifiednetwork interface and an accessible source ofinformation about their very existence and theprocedures necessary for signing up. A singleuniversity may have many different ways tomake a network terminal connection, from asimple telnet client to a more sophisticated orcustomized user interfitce developed for a par-ticular department or college. 1Ypically, some-one finds out about one discussion group byalready being signed up on another. Whiledirectories (of listservs, institutions, archives,bibliographies, people, etc.) exist, they are notcommonly known. Finding them requires anexisting level of knowledge about how to searchthe Internet. Such haphazardness of:access andknowledge is a primitive constraint, keepingioformation from people who could benefitfrom it.

The World Wide Web provides what might becalled a general, external common interface forthose who can access it. The menu functionsof Mosaic or Netscape viewers are the same foranyone using the softwnre. The importantconsideration, therefore, is the design of a spe-cific site, wlyn information it present, andhow it k organized.

The University of Virginia's Institute forAdvanced Techno4y in the Humanities,directed by John ',Merritt Unsworth, maintainsone of thc most :advanced sites in humanitiesresearch. The imerface is simply and clearlyorganized; the;content is rich and growing.IATH provid0 an outlet for the work ofUniversity of Virginia scholars, such as thenineteenth-century scholar and textual theoristJerome McGann, who :s constructing anarchive of tent, manuscript, and images by thepoet and artist Dante Gabriel Rossetti. The his-torian Edward I.. Ayers maintains a site inprogress or. the War, Me Vallty of theShadow. The exrerimental video and computerartist David Blair is constructing an elaborateMOO site for his WaxWeb project. INI'H alsooffers computing resources to a mster of fellowsfrom other universities. In collaboration withNorth Carolina State University, IATH editsand publishes Oxford University Press'sPostmodern Culture, one of the kw scholarly,refereed, online journals in the humanities.lATH, the most clearly focused site for exploit-ing hunt.,.nities content, manages, through afairly simple and consistent use of VIM., topresent a diverse set of issues in text editing,

historical research, and film and cultural stud-ies. It makes use of plain text and muitimediatools and depends on a technologically awarecohort of scholars in the field to access and con-tribute to it.

Electronic Text Centers, because of licensing andcopyright restrictions, provide services that areoften restricted to one uni-ersity community.They have limited Internet and Web access thatprovide reference and lookup services (card cata-logs, and texts of the OFD, Shakespeare, andother literary works that can be searched). A fewpresent graphical images of manuscripts. Muchliterature appears on the Internetnovels, poet-ry, and dramabut few texts are of dependableauthenticity. It will be crucial for Electronic TextCenters, perhaps in conjunction with publishers,to create a body of authorized, searchable textswith access mechanisms universally available.Centers such as the Electronic Text Center ofUniversity of Virginia's Alderman Library andThe Center for Electronic Texts in theHumanities (a joint project of Princeton andRutgers universities, also associated with theText Encoding Initiative) are helping to solvethe matter of editorially dependable computer-accessible texts by undertaking major initiativesin digitizing manuscripts and creating authorita-tive texts using SGMI..

There are other, scattered. networked pioiects.A recent Web site, estabikhed by the Universityof Chicago and Notre Dame, exhibits manu-scripts by Dante. Among the art exhibits nowproliferating, thc best design remains that of theWeb Louvre project by Nicolas Pioch. TheGetty Art History Information Program'sMuseum Educational Site Licensing Project, amulti-university initiative to explore networkedaccess to museum images, should help organizestrategies and metho;is for uctworkd access toimates of att objects.

The use of commercial software packages and,iii some instances, the creation of original soft-ware, to produce information access rrograms,.-nultimedia projects, and teaching modules inthe humanities are being developed and used bymany scholars. However, they are often notwidely known beyond the developer of the par-ticular discipline. Well-established programs,such as the Max MIDI composition program,use C language in an object-oriented environ-ment to produce composition modules. ThePerseus Project from Harvard uses CD-ROMfor multimedia researdi in ancient Grt-k

26

history and literature. The Academic SoftwareDevelopment Group at Stanford University hasdeveloped Media Weaver, a distributed author-ing system for hypermedia and media streamsboth off- and online, which is currently beingused for projects as diverse as Chaucer's poetry

and the history of Silicon Valley.

Peter Donaldson, Director of MIT'sShakespeare Multimedia Project, has developed

a Mac-based interface that matches theShakespearean text with moving images (fromlaserdiscs and Quicktime files) drawn from vari-ous filmed versions of the plays. The projectallows students to compare different readings bydifferent filmmakers and actors in ways thatexplain not only the text, but the varieties ofcinematic interpretation as well. It offers inter-esting interactive potential by allowing studentsto grab and arrange still images onto their ownnotepad windows.

Cinema would seem a natural subject for com-puter-accessed study. A number of scholars are

exploring ways of digitizing moving images andinteractively combining them with text. Someare using a computer interface with laserdisc tocreate onalvses of a single film (such as TheRebecca Project, by Lauren Rabinovitz andGreg Easly at Iowa. which analyzes theHitchcock film from a number of critical andhistorical perspectivi.$). Others, such as RobertKolker, one of the authors of this paper, andStephen Mamber at UCLA, are experimentingwith critical essays using moving images pub-lished on the Web and multimedia explorationsof the basic cinematic vocabulary, using digi-tized clips and authored in Asymetrix Toolbook.

Language learning and linguistics are fields of

major exploration; a number of interactive proj-ects are based on both existing and new software.Asian languages have received special attention inmultimedia teaching programs. Ohio StateUniversity and the University of Maryland'sUniversity College are creating a multimedia ver-sion of a standard Japanese textbook in a project

funded by the Annenberg-CPB Project.

Linguistics scholars are developing computer-assisted principle-based parsers (which can give

structural descriptions of sentences in morethan one language). A database of children'sspontaneous speech known as CHILDES has/Nen developed at Carnegie-Mellon Universitywith NSF funding.

TOOLS FOR CRFAIING AND EAOLOITING MNTENT 29

FUTURE RESEARCH NEEDSThis very brief and selective survey indicates theplurality of tools and content developed incomputer-based and computer-assisted humani-ties research. What it does not reveal are theintense efforts now being made, and yet to heundertaken, to bring this work to students.Many projects arc made for student interaction,but interface designs are as diverse as the proj-ects themselvesrequiring skills specific to theprojectand student access to computer facili-ties is far from universal. A very few collegesand universities provide every incoming studentwith a computer. Others have developed com-puter lab facilities in which students can dotheir work. Relatively few have interactive com-puter teaching theaters where faculty and stu-dents can learn in an environment that allowsclose association between human and machine.

The need for access to hardware is coupled withan even greater need for access to training.Major curricular issues are at stake if computer-aided research and pedagogy are to expand.Introductory courses in computation need to bedeveloped for all students outside the usualcomputer science curriculum. Humanities fac-ulty members need to be trained in graphicalenvironments so that they can enjoy access toexisting humanities content and begin to takepart in multimedia authoring.

Work is needed on ways to bring the necessarytraining to humanities schol-rs that will I )inform them of how comp. ,ters can aid theirresearch; 2) make them c nnfortable with com-puter-based tools; and 3' identify and thenencourage those who wi h to do advanced workin creating tools for teaching and research. Thiseffort must be carried on concurrently withresearch into the kinds of interface design thatwould be best for most humanities users.

A major barrier to these users is the lack ofinterface standards and the need for specialized

skills to create and access content. As we saidearlier, the work of the humanities is a diverseundertaking with multiple points of view andmultiple content. I, is an area of many special-izations, whose practi..ioners may not have t:skills or the time to devote to autho-ing andprogramming. Barriers to access of a variety ofcomputer applications need to be lowered.Standards for multimedia authoring and usableinterfaces that can be easily modified to accom-modate a variety of content would be extremelyuseful. Sound, image, and video capture must be

29

30 RESEARCH AGENDA FOR NEM/OM:ED CUL:I-URAL HERITAGE

simplified and standardized, as should the pro-grams for integrating them. Not all universitieshave software development units available to fac-ulty. In the absence of those, all interested facul-ty should be able to access simple, universal

tools (for example, HTML and the World WideWeb). Stand-alone applications need to incorpo-rate a similar, even simpler, set of standards.

Such work would ideally combine the talents ofcomputer scientists, humancomputer inter-action researchers, and humanities scholars

developing content and tools. Once these taskswere accomplished, computer technology wouldfacilitate the needs of the humanities and yetremain in harmony with the diverse, explorato-ry nature of work in the humanities.

30

KNOWLEDGE REPRESENTATION

Susan Hockey Center for Electronic Texts in the Humanities

Rutgers and Princeton Universities

STATE OF THE ARTThe arts and humanities focus on the study of cultural objects. Knowledge in the arts and humanitiescan consist of-cultural objects themselves, information about those cultural objects, interpretive com-mentary on those objects, and links or relationships between them. The nature of the objects understudy is so broad that the knowledge associated with them can be almost anything, and it can be usedand reused for almost any purpose. For example, a text can be part of a large collection studied for lit-erary', linguistic, and historical purposes. The same tex: can also be analyzed in very fine detail, per-haps even for the punctuation within it and for the physical characteristics of the original source.

The representation of knowledge in electro iic format can itself take many forms, and it has takenmore than forty years of work with electror ic resources to begin to understand the potential and theperils of some of these formats. S ..erial consisting of text, numeric data, images, sound, andvideo now exists in electronic form. At a fundamental level, all of these are represented in electronicform as bits, but it is the higher levels of representation (the forms into which the information isorganized and the access points to those forms) chat define how useful that electronic informationmight be.

Early projects worked mostly with text, and the efforts of these projects show some of the possiblepitfalls in choosing how to represent information. These projects attempted to transcribe electronictext by maintaining as accurate a reproduction of the source as possible. Typographic features such asitalic type and footnotes were copied faithfully, making an explicit representation in a different medi-um (electronic form) of a property oldie original medium (print). Typographic features aid the read-ing process the human performs, but they are ambiguous and so are less suitable to aid anyprocessing done by a machine. It took many years to begin to understand some of the differencesbetween representing knowledge that is intended only to be read, and representing knowledge thatcan be processed electronically in different ways.

Much of our knowledge about objects or information is implicit in some way or another. We knowthat the text along the cop of a page is a running heading because that is where a heading is normallyplaced. We can deduce the context or scene depicted in a painting because we know, for example,that the figures shown appear in a particular biblical story in that context. When wc see a film clipwe can recognize the place where the action is happening or detect a foreign accent in one of thespeakers. When we browse a dictionary we know that the item in boldface type at the beginning ofan entry is the headword. But when we start to manipulate any of these items electronically this lackof specificity becomes apparent and contextual or other information is needed.

The question then arises of what knowledge should be stored to provide this explicit information. Invery many information systems, the representation of knowledge is tied up in some way with partic-ular data structures. Early systems stored databases as "flat files" or single tables with one set of rowsand columns, which inevitably meant some restructuring of data before it was entered into the com-puter to avoid repetitions and to deal with anomalies. Historians and archaeologists commonly com-plained that this led to simplification of the material. Relational databases provide a moresophisticated data model, but can also suffer from some problems. Not all humanities data fits neatlyinto sets of tables without some loss of information. Furthermore, the relationships between theitems of data need to be defined when the database is initially set up, yet many collections ofhumanities material are put into electronic form in order to do research that will help to establishthe relationships between the items in the collection.

31

31

32 RESEARCH AGENDA FOR NEI-WORKED culTuRAL HERITAGE

In many current systems the representation ofknowledge depends on specific software pro-grams. When items or objects are indexed andaccess to them is only via special-purposc soft-ware that can read those indexes, some of theknowledge becomes dependent on the softwareand is derived through functions of the soft-ware. In some cases it is not even possible toextract the information in the format in whichit was entered. Moreover, knowledge that hasbeen crcated for a specific program or type ofcomputer is less likely to last for a long time.Even if it can be converted easily from one pro-gram to another, something may be lost duringthe conversion, or a different theoretical orien-tation may be imposed on the material.

Meta-data, or knowledge about the knowledge,is another way of making implicit informationmore explicit. Some communities recognizedthe importance of meta-data early on: for exam-ple, bibliographic and cataloging data is stillfundamentally a means of using electronicmeans to describe material that is mostly not inelectronic form at present. In the 1970s thesocial science data archiving community createda system for describing its datasets, and thesecodebooks are almost universally accepted as anessential part of a dataset. Initially created inprim form, some are now being converted intoelectronic form. Meta-data for electronic textualmaterial is in a much more rudimentary form atpresent, and very few electronic texts have whatwould now be considered adequate informationassociated with them. Our understanding of themeta-data requirements for images, sound, andvideo Ilgs even further behind.

CURRENT RESEARCHResearch during the last ten years has concen .

trated on establishing ways of storing knowl-edge in electronic form so that it does notbecome obsolete, so that it can be reused fordifferent purposes, and so that it is separatefrom any software that will process it. TheStandard Generalized Markup Language(SGML) provides a way of representing infor-mation that is independent of any particularhardware or software. For text it consists ofplain ASCII files that can be transmitted acrossany platfbrm and network. SGML is object-ori-ented. It does riot say anything about what willhappen to those objects when they areprocessed electronically; it merely says whatthey are. Thus different processing programscan operate on the same aiML data. An addedbenefit of using SGML is the ability to defer

making many decisions which might otherwisehave to be made at the start of a project, andwhich are often regretted later.

SGML can be used to describe anything.Although principally text-oriented, it does nothave to work only with text. It can be used forche textual information that must accompanyimages, sound. and video in order for them tobe useful. SGML is not itself an encodingscheme; it is a meta-language within whichencoding schemes (SGML tag sets) can bedefined. The Text Encoding Initiative (TEO, amajor international project in humanities com-puting and the language industries, has createdan SGML tag set suitable for many differentapplications. Using a modular document struc-ture, the TEl can be used to represent manydifferent text types and many different theoreti-cal orientations. It has tags for the structuralcomponents of many text types, and alsoincludes tags for analytic and interpretive infor-mation as well. It also has a set of tags whichprovide an electronic text file header thatincludes meta-data of various kinds. Anotherhumanities-related SGML application s the

Finding Aids project at Berkeley.

The acceptance of SGML is now widespread forcommercial as well as academic applications. Itsfocus on content is appealing, especially when itis not possible to define all the likely functionsthat can be performed on an electronic text atthe start of a project. For text it also enables themeta-data to be encoded using the same syntaxas the text itself, which is attractive for process-ing purposes. SGML software is now becomingmuch more widely available, and the recentannouncement by Novell of an SGML Editionof WordPerfect 6.1 should help to put SGMLin mainstream computing. However, SGMLbasically assumes a single hierarchical structure

for a document. Most humanities material hasmultiple parallel hierarchies, or can even bcviewed as webs of information. Efforts to repre-sent these in the current version of SGML arcclumsy, since almost all SGML softwareassumes a single tree structure for processing.

The Hypertext Markup Language (HTML) usedby the World Wide Web has perhaps done morethan anything to raise awareness of structured

text. Even if it does not survive, it will leave alarge legacy of text marked up in an SGML-likeway. The World Wide Web has also enabled

many more people to be aware of network-wideresources in different forms and of the possibility

32

of linking or pointing to information stored else-where on the network. However, the current ver-sion of HTML does have limitations in the kindsof material that it can represent, and its encodingtags are mostly presentational. Its meta-datacapabilities are also weak.

Alternative approaches to representing textfocus more on the appearance of a document.This means that the document is easy to read,but the method is less suitable for long-termstorage of material that could be used for manydifferent applications.

A multiplicity of so-called "standards" exist atpresent for storing images, sound, and video.Conversion from one to another is usually pos-sible, perhaps with some loss of information.Some work has been done in the area of meta-data associated with these formats, bnt in gener-al this consists of moving information from onesystem to another in such a way that it can beprocessed (as opposed to merely being viewedor heard). Size is still a constraint for these typesof data, and much effort is of necessity beingconcentrated on compression techniques forstorage and transmission rather than on repre-sentation of the information itself.

A number of other representational issues areimportant for arts and humanities material.Non-standard characters appear regularly. Thereare many different ways of dealing with these,most of which are incompatible with each otheror are functions of specific software programs.In some cases the writing system and the lan-guage are treated as the same thing, althoughonly rarely do they have a one-to-one relation-ship. SGML offers some general-purpose solu-tions, but these do not appear to be very wellimplemented at present, and barely at all on thcWorld Wide Web. Dates can be in different cal-endar systems or can be vague forms like"Hellenistic," but they need to he representedin ways that enable them to be put into chrono-logkal order. Similar problems arise withweights and measures, where the units can varyfrom one culture to another. Names and theirrelationship to individuals who bear them canalso be important. The same name, referring tothe same person, can he spelled in differentways. There may also be several individuals withthe same name in a collection of material, giv-ing rise in some cases to doubt about whether itis the same per.ion or not.

KNOW. FIX& REPRBENTAFION 33

The need to represent missing or incompleteinformation in some way is now reasonably wellaccepted. In some cases it may be important todistinguish between information that does notexist in any way and information that can existbut is not known for this particular instance.The level of certainty about information in artsand humanities data can also be critical, and itis useful to give an indication of this. Similarly,it can be helpful to record who is responsiblefor decisions about uncertain information orother encoding, and their role in making thosedecisions.

CRITICAL AREAS FOR FURTHERRESEARCHMuch electronic information in use today hasbeen created with ,he aim of making a surro-gate of something that already exists in anotherformat. Many of the functions performed onthat information are the same as those thatmight be performed on the original: reading,viewing, etc. The electronic environment facili-tates other types of processing and analysis.Some of these, like statistical analysis of socialscience data or text retrieval, arc fairly wellunderstood. Others have been barely thought ofas yet, but future scholars will probably want tosubject electronic information being createdtoday to new and different forms of processing.Gaining a better understanding of the fullpotential of the electronic medium ought tohelp us create better and more useful represen-tations of material in electronic form.

Electronic information is mutable and dynamic:changes can be made to it at any time. Tracingthose changes becomes important for futureusers, but we do not yet have a universally rec-ognized way of recording these. For text we nolonger need to write in a single linear stream,stored on rectangular objects like those onwhich we have written for centuries. We arealready seeing hypertext fiction in which thenovel has no obvious beginning, middle, orend. This is still in an experimental stage, butwe can envisage hypertext writing of scholarlypapers in which differing arguments or inter-pretations are presented in parallel as hypertextlinks rather than as a single stream of text.Raising awareness of the potential of the elec-tronic medium may thus also help us to createbetter representations of information.

Current methods of recording meta-data seen,to be concentrating on the properties of theorigina, from which the electronic surrogate is

34 RESEARCH AGENDA FOR NEIVORKED CUITURAI HERITAGE

made (for example, indexing terms, traditionalcatalog records, and the like). Yet the propertiesof the electronic version can also be important.Thc TEl header is one of the very few attemptsto provide meta-data that records the process ofcreating the surrogate. It includes informationabout the transcription of the text, whetherspelling has been normalized, and the treatmentof potentially ambiguous characters like theperiod. A similar model might be needed forother types of data. Current methods of record-ing meta-data also seem to be intended mainlyfor humans to use, but it is likely that in thefuture they will be read and acted on just asmuch by computer programs; further research isneeded to establish exactly how this might workand what kinds of interoperabiliry are possiblebetween meta-data systems.

With the World Wide Web, we have a glimpseof the potential of a global network of linkedresources, where linking mechanisms are likelyto become more and more important. In someways they are fundamental to much work in thehumanities, which is about making connectionsbetween items of information and associatingsome intellectual rationale with those connec-tions. At a more practical level, we need ways oflinking transcriptions of text to the digitalimage of the source at a fine level of granularity,and or linking areas of an image to descriptiveinformation about those areas. In most currentsystt:ms links are software dependent and canonly be created and accessed via that software.1-1y-rime, the SGML application for hyper-media time-based information, provides onemethod of software-independent linking. TheTEl Guidelines incorporate a set of linkingmechanisms modeled on those in HyTime.Both of these have been little used so farbecause of a lack of suitable sofrware. Moreresearch needs to be done to determine howeffective and how usable they are.

Making a link between two or more itemsimplies that a relationship exists between them.The reason for the link is important, and whatis needed is a method of representing that rea-son as well as a way of'saying who created thelink It may be that conflicting reasons exist, inwhich case al1 need to be represented withoutone being privileged all the time. Pointers canbe multi-headed, in which case all pointersleading from a single item ought to be docu-mented. Links need to made from a singlepoint or span of information to another singlepoint or span of information.

34

Representing what can be referred to here as"derived knowledge" is also likely to becomemore important. Derived knowledge is theresult of some processing of electronic infor-mation (for example, some form of linguisticanalysis, or image processing). It may be that, inthe current state of our software, such a proces-sing program is not entirely accurate (for exam-ple, a word-class tagger giving about 96 percentaccuracy), bur the processing may take a longtime and yield results worth keeping. Waysmust be found to associate this with the originalmaterial, which also enables the derived knowl-edge to be updated and amended both auto-matically and manually.

For the more immediate future, ways of repre-senting some kinds of source material must bedeveloped further to bring them up to the levelthat already exists for other types. Currentmethods of encoding newspapers, papyri,inscriptions, text on vases and other artifacts,early printed books, spoken texts, and historicaldictionaries are acknowledged to be primitive atpresent. Linguistic information will becomeincreasingly important as we look toward betterretrieval methodologies, and the multilingualaspects of this are very relevant for the arts andhumanities. Another area of direct concern atpresent is what to do about the large amountsof "legacy" data that is already in electronicform, but represented in a way that is nowacknowledged to be deficient. Research is need-ed to perform more "intelligent" conversionsthat can begin to handle at least some of theincomplete representation of the original in theelectronic source.

Cost factors also need to be examined in moredetail. Given the high cost of creating electronicresources, it seems important to represent theinformation in such a way that it can be usedfor many different purposes. The scheme oughtalso to be incremental, thus enabling newknowledge to be added to already existing infOr-mation. In the arts and humanities, the qualityof the information is also extremely important.People are often unwilling to use material thatis perceived to be infcrior in quality to the Orig-inal. Electronic texts that have obvious typo-graphical errors have been heavily criticized, ashave low-resolution images in which the detailcannot be easily seen. Research is needed todetermine what is the minimum level of qualityacceptable to most users, what are the circum-stances where a very high level is essential, andwhat arc the relative costs associated with this.

RESOURCE SEARCH AND DISCOVERY

Gary Marchionini, University of Matyland'

Electronic technology has begun to change what information is available [6] and how that informa-tion is located and used. The first of such changes are related to remote access: Instead of traveling tothe sources of information, scholars use technology to bring information to them. One importantconsequence of remote access is the broadening of access to students and other novices who wouldnot or could not bear the time and financial costs to travel to libraries, museums, and research insti-tutes, and who might not know what to look for once they arrived. Second, electronic technologybrings new genres of information that provide new challenges for search and discovery (e.g., multi-media, interactive ephemera, etc.). Electronic technology exacerbates the traditional problemshumanities scholars have found in documenting and locating non-textual materials. Third, change isdue to electronic tools and the strategies that electronic representations made possible. The emphasishere is on tools and strategies for resource search and discovery, although we will continue to seecloser integration with tools and strategies for creating, using, and communicating information. Suchdevelopments imply that creators who choose to become more closely involved with consumers musttake more responsibility for documenting their work and making it accessible.

In archives, libraries, and museums, search and discovery are facilitated by finding aids, catalogs, an Iguides that organize the information space for information seekers. Similar devices are appearing foelectronic resources as well. An ongoing research challenge is to discover appropriate representation :for information and new search and discovery tools and strategies that leverage the strengths of com-puters and telecommunications networks.

Search implies an effort to locate a known object; the information seeker has in mind specific char-acteristics or properties of the object, which are used to speci& and guide search activity. Discoveryimplies an effort to explore some promising space for underspecified or unknown objects; the infor-mation seeker has in mind general characteristics or properties that outline an information space inwhich perceptual and cognitive powers are leveraged to examine candidate objects (elsewhere [10] Ihave distinguished search and discovery as analytical and browsing information seeking strategies,respectively). In general, discovery emphasizes the location of the promising space, such as a collec-tion or resource (e.g., [2]). Electronic technology provides new tools for each of these classes ofstrategies and also blurs the traditional boundaries between them.

STATE OF THE ARTScholarly search and discovery depend on mappings between conceptual space and physical locations.Classification systems organize information objects, thesauri map these organizations onto wordlabels, and catalogs provide pointers from labels to physical objects. Traditionally, there have beenclear dernarcations between n-ary information objects such as indexes and catalogs, and primary infor-mation objects such as books and physical artifacts. The Internet includes n-ary and primary informa-tion objects, and today's interfaces make little distinction between these representations, effectivelyblurring these boundaries. Thus, electronic technology influences information seeking by changingboth the traditional tools that support search and the strategies used for information seeking. Anyattempts to develop cataloging schemes for Internet resources must not only take into account thesedifferences but also address the difficulty of documenting dynamic and ephemeral information objectssuch as ftp and Web sites. It is certainly too soon and probably wrong to aim at developing collectiondevelopment policies and a master catalog for the Internet as a whole. Nonetheless, specific digitallibraries and resource collections have begun to take advantage of incormation retrieval and infinma-tion-seeking research to make information more easily and reac ily avalable.

35

35

36 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

Search

Information retrieval research has yielded sever-al approaches to the problem of matchingqueries to documents and object surrogates.These approaches have traditionally beenapplied to specific collections of documents(one set of resources) rather than across manydifferent collections. The most basic advantageof text in electronic form is the ability to dostring searchto locate all occurrences of astring of characters in 1, text or corpus.Although many algorithms support stringsearch, inverted file indexes are used in mostlarge-scale systems to support free-text search-ing. Building on string search techniques, schol-ars are able to develop concordances (e.g., theDead Sea Scrolls) and explore word usage fre-quencies across authors or works (e.g.,Thesaurus Linguae Graecae with Pandora).Although many of these efforts are currentlyrestricted to stand-alone, proprietary collec-tions, some are available through the Internet.There has been little progress in indexing non-textual materials, although scene changes andcolor patterns have been used to augment videoand graphical databases. Most non-textualobjects are located through textual descriptionsor linear scanning.

Another major development in search is theability to rank documents according to one ofmany statistical or probabilistic algorithms thatuse word or phrase frequency data to matchqueries with documents and rank results accord-ingly. Although such activities are computation-ally intensive, today's computers are able tomanage representations of documents as n-dimensional vectors and compute similaritymeasures for documents and queries in n-dimensional space. These approaches havegained commercial appeal (e.g., Dialog's Targetand Lexis/Nexis Freestyle); many Internetresources are now using statistical or probabilis-tic search engines on their servers (e.g., severalWAIS-based services are available; the Library ofCongress Thomas system uses the Inquiry searchengine). In most cases these approaches providekeyword access (based on all words in the corpusexcept some small set of common words) ratherthan subject access (based on a carefully con-structed controlled vocabulary used by indexersto describe the content of the object). Althoughranked retrieval offers good advantages to novicesearchers and a viable alternative to Boolean-based search for experienced searchers, we are along way from providing all and only relevant

information to information seekers who poseword-based queries."

A third set of approaches to searching leveragesthe logic of discourse or substantial knowledgebases to contextualize queries or to possiblymodify them. For example. the Perseus system[4) includes a morphological analyzer that goesbeyond string search to provide variant formsfor Greek words. Some linguists aim to developgeneric grammars that represent the domain ofpossible logical statements and parsing routinesthat map natural language queries and docu-ments onto the grammar. Other researchershave developed schemes for taking advantage ofmeta-knowledge provided by authors or pub-lishing specialists. For example, the TextEncoding Initiative (see Hockey paper in thiscollection) promotes the use of SGML codingin scholarly texts so that information seekerscan use these codes for locating and analyzingtexts. Another line of research aims to developthesauri (e.g., the Art & Architecture Thesaurus)that provide controlled entry points for infor-mation seekers as they formulate queries or thatare applied automatically to modify or expandqueries during the retrieval process. Proficientsearchers can certainly use a thesaurus to goodadvantage, but automatic query expansionbased on a thesaurus has not generally yieldedimproved search results (e.g., [8) and (121).

A fourth class of research aims to develop filter-ing systems that automatically route potentiallyrelevant information to scholars. Searchdepends on specification of the sought objectand filtering depends on specification of theuser. Libraries have traditionally selectively dis-seminated information to scholars, devotinghuman effort to scan information servicesaccording to institutional and individual inter-est profiles. Online services allow users to defineinterest profiles (usually word based), then alertthem when information objects arrive that fitthe profile (e.g., document delivery servicessuch as UnCover). Different implementationsmay use any combination of the search algo-rithms above. On the Internet, several networknews filters adapt as users provide positive andnegative feedback, and there are programs ofresearch to develop active agents that roam thenetwork to locate profile-appropriate informa-tion and sometimes cooperate with other soft-ware agents."'

3b

Finally, some research has attempted to auto-mate traditional reference and quectic.n answer-ing services. Early efforts used expert-systemtechnology to automate selected reference ser-vices; today's efforts aim to go beyond the sim-ple frequently asked question (FAQ) services todevelop multiple tiers of online reference sup-port (e.g., II)).

Discovery

Browsing has many attractions for scholars:exploration, contextualization, and serendipitysupport the discovery of new connectionsbetween known ideas as well as pertinent newinformational resources. In manual environ-ments, browsing has been done in specific col-lections (e.g., a section of shelves). Electronictechnology in general and the Internet in par-ticular has greatly expanded the universe ofbrowsable material by bringing it to the infor-mation seeker at the desktop. Because theInternet connects a multitude of collections (onall topics, in various media, and using differentorganizational schemes), discovery has becomecomplicated by the need to first limit browsingto a set of resources. Developing tools andstrategies for identifying resources to browsethis wealth is thus a primary research challenge.

One form of guided discovery-is exemplified byhypertext systems. Most hypertexts use explicitlinks denoted by link anchors (buttons, high-lighted text) to suggest routes for users to fol-low. In stand-alone hypertext systems (i.e.,specific collections), users can navigate effective-ly by following explicit links. Many scholarsconsider such links to be editorial acts; thusaggregations of existing materials woven togeth-er with hypertext links represent added-valuederivative works at least and original scholarlyinterpretations at best. The immense popularityof the World Wide Web is based on the easewith which users can follow hypertext linkswith public-domain and easy-to-use client soft-ware often called browsers (e.g., Mosaic,Netscape). Hypermedia systems such as Perseusand Piero press the links further by offeringimplicit or computed liriks that are made avail-able as the results of queries entered by the user.Electronic texts that use SGML or othermarkup codes can also offer on-the-fly link con-structions that allow information seekers to fol-low paths defined by their articulated needsrather than predefined links provided byauthors or editors. Other approaches includedependencies based on system state (e.g., Petrinets) and scripts that compute links based on

RESOURCE SEARCH AND DISCOVERY 37

user behavior. Even after users have limitedtheir discovery to a set of pertinent resources,personal discipline is required to remain withinthat set (e.g., today's browsers do not dynami-cally limit links to the sites contained in a pre-liminary selection of resources).

Discovery depends on both locating candidateobjects and recognizing relationship(s) betweenthem and the problem under investigation. Theinterplay between the perceptual aspects ofbrowsing and the cognitive aspects of reflectionand evaluation is best supported by systems thatpresent accurate and well-documented represen-tations (i.e., authors or their agents are explicitabout their perspective) for objects and allowusers rapid and precise control. Direct manipu-lation interfaces (see Kolker and Shneidermanpaper in this collection) best illustrate suchinterfaces in computing environments.Developments such as the use of thumbnailimages as well as text-based descriptions providenew types of surrogates for information objectsand support rapid scanning and browsing.Multiple levels of representation for texts areemerging in nerworked environments as usersmove from the entire Internet to a subset (pos-sibly ranked) of resource titles to outlines ortables of contents for specific objects to extractsfrom the objects, to the full representation ofthe object, and eventually to related objects.

Integration of Search and DiscoveryBecause electronic environments are blurringdemarcations between search and discover)strategies, several developments suggest researchdirections. First, one way to improve the resultsof a search is to use relevance feedback. Given aset of objects retrieved for a query, users may beable to identify which are appropriate to theneed and which arc not. These judgments arefed back to the system and the original query iseither modified or a new query is formulatedthat combines the original query with the addi-tional information gained through feedback.Relevance feedback illustrates the linkagebetween search and discoverya search queryserves to identify an intellectual neighborhoodfor the information to examine (often by brows-ing), and the results of the examination are usedto refine the neighborhood. This process mirrorswhat information seekers do in manual environ-ments, but the computational tools multiply thenumber of iterations possible per unit time. Justas rapidly displayed, coordinated still imagesbecome moving pictures beyond thresholds of10 to 15 frames per second, this quantitative

37

38 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

increase may lead to qualitative shifts in searchand discovery. One possible avenue of develop-ment in this regard is hierarchical (cascading)dynamic query systems.

Another development that improves search anddiscovery on the Internet is the use of indexingprograms called spiders or robc.-s that systemati-cally link to Web sites, note whether the site haspreviously been visited, and record basic meta-data about each site (sites may also contributeindexing information voluntarily). These pro-grams have made the Web somewhat searchablewithout constraining the browsing features ofservers or clients." It is important to note thatthese services do not really represent a catalog ofthe Internet but rather a listing of home pagewords. Additionally, to avoid tying up networkresources, spiders do not traverse all links in asite (thus a more substantive image of what thesite contains and to what it links is not avail:-able). Another systern provides full-textretrieval but also allows searches on SGML tagsand supports multilingual searchit ,;. Anotherapproach is illustrated by the Harv. st project,'which separates the indexing gatherers from theindexes themselves (brokers). This allows multi-ple and customized indexes to be tailored forspecific communities.

The most important illustrations of integrationare the developments in interactive interfacesthat closely couple search, evaluation, and re-formulation. Dynamic queries, fisheye views,semantic maps, and other visualization mecha-nisms illustrate such integration. The quality ofelectronic display continues to improve as fonts,backgrounds, color, and resolution continue tooffer more accurate representations for paperdocuments and other information objects (sceKolker and Shneiderman). One project thattightly couples textual information and graphi-cal information is the Piero project [9], whererelational database entities are linked to a three-dimensional visual database, allowing users tosearch and discover textile or visually.

(Iallenges in the HumanitiesAlthough the research and development trendsdiscussed above are applied in all domains, thehumanities offers special challenges for searchand discovery. First, the humanities celebrateindividuality; information resources take manyforms, and scholars often resist the impositionof standards. These effects are most apparent inword-based searching, which is complicated bythe opposing concerns of creators who endeavor

to find unique and figurative language (whetherthe language of expression is textual, aural, orvisual) and searchers who endeavor to map theirneeds onto language. Asking authors to usestandard language is ludicrous, so it remains foreditors, librarians, curators, and other informa-tion specialists to create customized indexes andguides to the literature. Furthermore, individu-ality leads to the creation of many fairly smallcorpora specific to individual scholars ratherthan few huge collections created by large com-munities of scientists (e.g., the databases of theHuman Genorne Project, Earth ObservationSystem databases). Thus, in the humanities, it isespecially critical to create and maintain special-ized ard multiple indexes.

Second, information resources in the humani-ties are less sensitive to time than resources inthe sciences; although some searching in thehumanities may be limited by period, the tem-poral range is typically wide. Thus, finding aidsand interfaces may not be able to easily leveragetime constraints. Also, these indexes and guidesthemselves must evolve as word usage evolvesover time.

Third, humanities resources are often muki-lingual. Individual works may use expressionsfrom multiple languages, and resources relatedto a topic or artist may be available in multiplelanguages. Since English is a de facto standardfor science and technology, most of the discov-ery tools are specific to English (although statis-tical retrieval techniques such as latent semanticindexing and n-gram analyses (e.g., [3]) haveproved generalizable across multiple languages).Machine translation research that uses an inter-lingual language (e.g., [5]) may al ) prove use-ful for indexing multilingual corpuses.

Fourth, data acquisition and digitalization areexpensive and time-consuming. Simply adoptinga controlled vocabulary such as the AAT is a sig-nificant change for cataloging new acquisitions,but the retrospective conversion of local cata-loging records is intellectually challenging (andcontroversial) as well as expensive. Also, digitiz-ing text is challenge enough, but much of thecontent of the humanities is graphical, aural,and three-dimensional. Capturing and storingimages or sound at high resolutions is bothtime-consuming kind open to criticism vis-a-visinterpretiveness. Furthermore, the comressionscheme used will determine or limit what surro-gates can be made available for browsing.

36

RESEARCH NEEDS AND DIRECTIONSThe special challenges the humanities offer forsearch and discovery research and the continuedevolution of the Internet suggest several themesfor future research and development.

Multiple ApproachesBecause humanities scholars typically do notlook for answers to well-defined questions butrather elaborate threads of discourse, traditionaldatabase techniques will not suffice. Humanitiesscholars and communities need to create andshare thematic indexes specific to their owninterests tnd expertise. The metaphor of self-organizing systemsmany minds creating entrypoints for search and discoveryseems moreappropriate both for a worldwide network ofinformation and for the spirit of the humanitiesthan the top-down metaphor of one great mind(or committee) that provides an organizationalframework for some master index. Because it isin their personal interest to create such thematicindexes, humanities scholars will do so withoutfunding (funding will speed up this process).There are, however, two crucial needs forresearch support in this regard.

First, we must learn how Aggregate thematicindexes and forge among them links that areactivated according to the ontological perspec-tive of the information seeker (this may bethought of as a kind of intellectual interoper-ability). Thus, information seekers can specify aschool of thought and be given sets of links cus-tomized to that perspective. Another user with adiffe-ent perspective would find a different setof links for the same corpus. Research in the-saurus merging ([7]), scheme merging ([11]),and ontology definition ([131) may eventuallybe helpful here.

Second, scholars should be encouraged to createpathfindets: guides to themes or topics that givenot only give pointers to information resourcesbut also critical commentary and interpretationsabout those resources. Since it is likely that wcwill see the continued development of indepen-dent, non-standard collections of informationeach a uniquely organized expressioncelebrating human innovation and creativityi z makes sense that these collections themselvesshould become subject to study, critique, andinterpretation. Thus, the purposeful aggregationand added-value commentary that definepathfinders in the humanities represent a formof scholarship that deserves directed researchattention. Commentaries have long been part of

RESOURCE SEARCH AND DISCOVERY 39

scholarly practice in the humanities, but elec-tronic environments provide new possibilitiesfor creating critical threads through the elec-tronic morass that themselves may includeinteractive aspeas; e.g., using a pathfinder asecond time will be different since it will takeadvantage of knowlee4 sf about what you havealready examined. How this knowledge is usedrequires creative and scholarly decisions on thepart of the creator of the pathfinder.

Because Internet resources will be available to abroad range of users, from children to seasonedscholars, there must be simple as well as power-ful tools for search and discovery. Althoughthese are not mutually exclusive requisites, thereis a need for developments of progressivelypowerful tools as well as tools tuned to specifictypes of users (see Murray paper in this collec-tion). A related need is for systems that providemultilingual interfaces as well as search and dis-covery tools that handle multilingual corpuses.Both of these needs have positive implicationsfor the humanities, since they will lead to ncwclasses and groups of users.

Other NeedsClearly, more materials in the humanities needto be transferred to electronic form (see Kenneypaper in this collection). Especially for text-based fields, techniques for automatically cate-gorizing and summarizing text fragments willbe necessary if information seekers are to maxi-mize their time and memory resources whenexamining and scanning candidate texts. Itseems prudent to look for ways to combine sta-tistical approaches with knowledge-basedapproaches. For image-based fields, techniquesto extract and match patterns must be com-bined with whatever word-based information isavailable (see Romer paper in this collection).Regardless of the medium (text, audio, images),interface mechanisms that allow rapid scanning(e.g., zooming and panning; fast-forward, mul-tiple display panels, etc.) are essential to an inte-grated search and discovery environment.

Finally, scholars must consider their audiencesboth during and after the creation of theirwork. First, during creation, the work can betailored to make it easier for the audience tofind it. On the crass side, this is advertisingbefore art; on the scholarly side, this is tailoringexpression to be best understood by one's pub-lic. Second, after creation, the scholar can pointthe work at audiences. This is what publisherscurrently do, but a networked world allows

39

40 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

creators to broadcast or narrowcast as theyplease. This closer link ly:tween creators andconsumers depends on ihe development of toolsthat support creation, communication, andmaintenance of digital work. (We can imaginenext iterations of hypertext authoring systemssuch as Storyspace that automatically generateHTML and browser scripts that monitor usagestatistics for automatic (or random) mutationsor author version control.) Surely, tools willemerge that allow creators to produce viralworks that change depending on use (or alter-natively, appear in different forms in differentenvironments). Persistence and stability enablestatic indexing and locational aids to work intoday's libraries. We need research to determinehow to document, find, and use new genres ofinteractive and evolving intellectual products.

REFERENCES

1. Ackerman, M. "Answer Garden: A Tool forGrowing Organizational Memory." Ph..D.dissertation, Massachusetts Institute ofTechnology, 1994.

2. Coalition for Networked Information (inpreparation). "Networked InformationDiscovery and Retrieval" (written by Lynch,Summerhill, and Preston). [May be accessedvia ftp.cni.org].

3. Cohen, J.D. "Highlights: Language- andDomain-independent Automatic IndexingTerms for Abstracting." Journal of theAmerican Society for Info. rmation Science 46,no. 3 (1995): 162-174.

4. Crane, G. Perseus 1.0: Interactive Sourcesand Studies on Ancient Greece. New Haven,CT: Yale University Press, 1992.

5. Dorr, B.J. "The Use of Lexical Semantics inInterlingual Machine Translation." MachineTranslation 7, no. 3 (1992-93): 135-193.

6. Getty Art History Information Program,American Council of Learned Societies, andCoalition for Networked Information.Humanities and Arts on the InformationHighways: A Profile. Santa Monica, CA:Getty Art History Information Program,1994.

7. Hafedh, M. and Rada, R. "MergingThesauri: Principles and Evaluation." IEEETransactions on Pattern Analysis and MachineIntelligence 10, no. 2 (1988): 204-220.

8. Jones, S.; Gatford, M.; Rugg, G.; Hancock-Beaulieu, M.; Robertson, S.; Seeker, J.; andWalker, S. "Interactive ThesaurusNavigation; Intelligence Rules OK?" Journalof the American Society for InformationScience 46, no. 1 (1995): 52-59.

9. Lavin, M. "Researching Visual Images withComputer Graphics." Computers and theHistory of Art 2, no. 2 (1992): 1-5.

10. Marchionini, G. Information Seeking inElectronic Environments. Ncw York:Cambridge University Press, 1995.

11. Nica, A., and Rundenstciner, E. "UniformStructured Document Handling Using aConstraint-based Object Approach."Proceedings of Advances in Drgital Libwries1995, pp. 42-59. New York: Springer-Verlag, 1995. [preliminary version]

12. Voorhees, E. "On Expanding Query Vectorswith Lexically Related Words." Proceedingsof tbe Second Text Retrieval Conference(TREC-2), pp. 223-231. Washington, DC:National Institute of Standards andTechnology, 1994.

13. Wiederhold, G.; We6ner, P.; and Ceri, S."Toward Megaprogratuming."Communications of the ACM 35, no. 11(1992): 89-99.

NOTES

The author wishcs to thank Davia Bearman,ciregory Crane, Elli Mylonas, and MichaelNeuman for comments on a previous version ofthis essay.

See the Center for Intelligent InformationRetrieval Web site. For information on Inquiry seehup://ciir.cs.umass.edu/.

See the Oard web site for a set of pointers to fil-tering research;

http://www.enee.umd.edu/medlabIfilter/.

k For example, Lycos Ihttp://h.cos.cs.cmu.edu/I andYahoo Ihttp;//www.yahoo.com/I services allowsimple word searching on several million Websites; Yahoo provides a simple classification systemfor limiting searching.

OpedrextIhup://www.opentext.com:8080/omw.htmll

http://harvest.cs.colorado.edu/

4 0

CONVERSION OF TRADITIONAL SOURCE

MATERIALS INTO DIGITAL FORNI

Anne R. Kenney. Cornell University

STATE OF THE ARTThis paper will focus on the conversion into electronic form of traditional source materials, includ-ing books, journals, manuscTipts, graphic materials, and photographs, which serve as the primarydocumentation for research in the arts and humanities. Although it acknowledges other means forelectronic conversion, this paper will emphasize the use of imaging technology to produce digitalsurrogates for paper- and film-based sources.

Digital images are "electronic photographs" that can accurately render not only the information con-tained in original documents, but also their layout and presentation, including typeface, annotations,and illustrations. High fidelity to the source material can be obtained in digital images, which can bedisplayed on-screen or used to produce paper and film copies, or transmitted over networks toresearchers around the world. The main drawback to digital images today is that they are "dumb"files, not data that can be manipulated (for example, searci,:rd and indexed).

Efforts to convert materials originally created in print form to machine-readable formats have beenongoing for nearly half a century, but the major thrust for arts and humanities research began in the1970s when important sources in linguistics, classics, religion, and history were converted to elec-tronic texts. The Thesaurus Linguae Graecae (TLG), begun in 1972, was the first significantAmerican conversion effort, and since then a growing number of institutions have initiated majorprojects to create compurer-processible electronic texts. The Center for Electronic Texts in theHumanities (CETH), established by Rutgers and Princeton in 1991, maintains an inventory of exist-ing electronic texts (available through RLIN, the Research Libraries Information Network) and pro-vides summer seminars on setting up and managing electronic text centers.

Such efforts have not sought to replace source documents but to create electronic transcriptions oftexts for quantitative and qualitative analysis. The creation of electronic texts has expanded andmatured with the development of standardized approaches and common protocols such as the TextEncoding Initiative (TEO, a collaborative effort to define means for encoding machine-readable textthat would make electronic exchange feasible; and the widespread adoption of ISO 8879, StandardGeneralized Markup Language (SGML), a standard set of instructions for composing structuredmachine-readable documents that encodes the function rather than the appearance of elements with-in a document. Notable current efforts in the use of such encoding may be seen on the World WideWeb, which supports Hypertext Markup Language (HTML) documents, and in the CaliforniaHeritage Digital Image Access project to develop navigation tools to move from online catalogentries to SGMI.-encoded finding aids and ultimately to a database of digital images documentingCalifornia history.

Beginning in the mid-1980s, efforts to use imaging technology to create digital surrogates began,first at the National Library of Medicine, then the National Archives and Records Administration.Although these pioneering efforts provided significant information on the use of digital imaging,they did not result in sustained efforts for several reasons, primarily because they were difficult tojustify economically. By the beginning of this decade, however, several developments converged topromote the use of digital imaging, including the following:

dramatic improvements in personal computer technology, including rapidly declining costs cou-pled with greatly increased powcr and storage capacity;

conseqwntly, exponential growth in the use of personal computers;

42 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

spread of high-speed, high-bandwidrh net-works accessible to millions worldwide;

emergence of client/server architecture andnetwork-organizing architectures such as theWorld Wide Web; and

high-quality, high-production scanningsystems.

At approximately the same time a majornational initiative to preserve the intellectualcontent of brittle books through microfilming,spearheaded by the Commksion onPreservation and Access and the NationalEndowment for the Humanities, opened thedoor for the acceptance of surrogates or replace-ments for original sources on a grand scale thatin turn stimulated the use of digital imagingtechnology in library applications.

By the mid-1990s, digital imaging was makinginroads into the domain hitherto reserved fortextual conversion projects. The technologicalinfrastructure had matured enough to supportthe creation, storage, transmission, and displayof digital images. Although digital image filesare much larger than equivalent text files, theybecame cheaper to produce (approximately$.25/image), whereas a fully corrected encoded-text equivalent could cost ten times thatamount. Further, many of the documents con-sulted by researchers, particularly in the arts andhumanities, are graphic (photographs, illustra-tions, prints, drawings, maps) and currentlycannot easily be rendered as encoded files. Theprocess of converting text-based material toalphanumeric files through optical characterrecognition (OCR) plograms begins with thecreation of digital images and etc two stepsimaging and text programmingcould beuncoupled and conducted at separate times.Proponents of imaging argue that the latter stepcould await user needs and capabilities forsophisticated processing of text or the matura-tion of OCR programs to render more accuraterepresentations of information, particularly forsources in non-Roman languages, handwrittendocuments, or those that are unevenly printedor produced with older type fonts. Today, imag-ing is the most cost-effective means for retro-spectively converting arts and humanities sourcematerials to digital form, and represents ineffect the lowest common denominator for net-work distribution.

4'4

Nonetheless, user expectations at the terminalare that the full text of important sources fortheir discipline should be available online,quickly accessible, and fully manipulatable.Researchers who accept and use printed booksand journalsor even microfilmoften ques-tion the value of a digital image surrogate:"What good is this image if I can't search itwith keywords?" This question must be satisfac-torily addressed in the next few years if digitalimaging technology is to be used effectively in amassive conversion of text-based sources and inthe ..levelopment of distributed digital libraries.Currently the most promising use of digitalimage technology may lie in the rendering ofgraphic and photographic materials.

CURRENT RESEARCH AND TRENDSTwo major trends have characterized the mostsignificant arts and humanities projects involv-ing the use of digital image technology over thepast five to seven years: the move toward thecreation of sizeable databases and their initial,non-networked use; and investigations intoissues associated with image capture. Amongthe former, the most noteworthy example is thedigitizing of eleven million pages from theArchivo General de Indias in Seville, Spain, thatdocument the Spanish colonization of theAmericas. Begun in 1988 as part of the com-memoration of the 500th anniversary ofChristopher Columbus' discovery of America,this project has completely revolutionizedarchival practice in the Archivo and researcheruse of primary documents. While the scanning(100 dpi, with 16 levels of gray retained) doesnot capture all the information contained in thesource documents, the objective to provide on-screen use has been successfully met. Almost alluse of converted materials in research occurs viacomputer. This project's most significantaccomplishment has been the creation ofmachine-readable finding aids and catalogs pro-viding access to digitally rendered documentsdown to the item level. Initial plans are beingdeveloped to extend network access to thearchives to other Spanish repositories. It isuncertain at this time whether or when interna-tional access over the Internet will be madeavailable. Consideration is being given to dis-tributing the most significant portion of thecollection via CD-ROM.

Other major conversion projects include thoseconducted at the Library of Congress(American Memory), the National AgricultureLibrary (which has embraced a goal of replacing

CONVERSION OF TRADITIONAL SOURCE MATERIALS INTO DIGITAL FORM 43

the traditional collection with a digital one), theNaval Research Lab (which is converting itslarge collection of unclassified documents), theNational Library of111,clicine (where access to60,000 images of photographs, art work, andprinted texts is provided), and Cornell and Yaleuniversities. Within the past year, multi-;nstitu-tional digital library initiatives in the arts andhumanities have been launched or announced,including those at the Library of Congress (todigitize five million pages of American historysources by the year 2000); the Making ofAmerica Project (Cornell, Michigan, and otherresearch institutions) to convert and make net-work-accessible 10,000 volumes (and ultimately100,000 volumes) on American history; theUNESCO-sponsored Memory of the WorldProject; and the recently announced initiative tocreate a national digital library on Americanheritage and culture by a federation of thirteenresearch institutions. The federation will formu-late selection guidelines, adopt common stan-dards and best practices for conversion,guarantee universal accessibility across theInternet, facilitate archivability and enduringaccess, and evaluate use and the effects onlibraries and ocher institutions.

Although a number of digital imaging projectsare beginning to evaluate the use of digitizedmaterial (including those sponsored by theNSF/ARPA, the Mellon Foundation, and theGetty Art History Information Program/MUSEEducational Media), more rhetoric than sub-stantive information has emerged on the impacton scholarly research of creating digital collec-tions and making them accessible over net-works. Preliminary information should beforthcoming in the next two years, but compre-hensive data may well await the creation of crit-ical masses of digitized collections that cansustain basic research and the means not onlyfor navigating collections but also using themeffectively in an online environment.

The second major research trend is definingimage capture guidelines and quality assessmentproc,:sses in the absence of any official standardsgoverning image quality in digital conversion todigitally rendered documents. Under the direc-tion of Michael Ester, the Getty Art HistoryInformation Program pioneered work in exam-ining the relationship between image qualityand viewer perception, principally with graphicmaterials. Cornell and, more recently, theLibrary of Congress in conjunction with PictureElements have established quality benchmarks

for the conversion of textual sources that arebased on the attributes of the source documentsthemselves and the effects on image quality ofresolution, gray scale, and compression. Thetwo institutions have agreed to collaborate on ajoint investigation to extend this work to abroad range of source materials. The ResearchLibraries Group, in cooperation with the ImagePermanence Institute, explored technical issuesassociated with the digital conversion of photo-graphic materials; the latter will build on thiseffort through a two-year project to conductboth objective and subjective image qualityevaluations, develop quality benchmarks, andsuggest technical standards for photographicconversion. In two complementary projects,Cornell and Yale universities will examine thecosts, processes, and quality implications forcreating both digital images and microfilm.Columbia University recently completed asmall-scale project on the quality implicationsof converting oversize color maps.

The principal investigators of these and otherprojects have argued for digitizing in a mannerto ensure full capture of significant informationpresent in the source documents. Some advo-cate the creation of an "archival" digital masterfor preservation purposes to replace rapidlydeteriorating original source documents. Othersconsider the cost benefit of selecting, preparing,and digitizing material once at a high enoughlevel of quality to avoid the expense of recon-verting at a later date when technologicaladvances require or can effectively utilize a rich-er digital file. Others suggest that derivativescan be created from the master to meet currentuser needs, and that the quality of these deriva-tives is directly affected by the quality of theinitial capture. Various digital outputs have dif-ferent quality requirements: high resolutionmay be required for printed facsimiles but notfor on-screen display and use.

NEAR TERMIt is anticipated that within two years, qualitybenchmarks for image capture for the range ofpaper- and film-based research materialsincluding text, line art, halftone, and continu-ous tone imageswill be well defined for avariety of outputs (paper, film, and then on-screen display). For the most part, these will bedesigned to be system independent, will involvethe creation of sophisticated technical test tar-gets, and will be based increasingly on theattributes and functionality characteristic of thesource documents themselves. These efforts

44 RESEARCH AGENDA FOR NETWORKED CUITURM HERITAGE

began with determining what was technologi-cally possible; current and near-term efforts aredirected at determining what is minimallyrequired to satisfy informational capture needs.At present, the trend is to set image qualityrequirements at a level sufficient to capture sig-nificant informational content for a broad rangeof source documents at the expense of file sizeso as to avoid the labor and expense of perform-ing item-by-item review.

Although technical, these benchmarks will alsotake into consideration the subjective evaluationof curators and the needs of researchers. TheImage Permanence Institute plans to incorpo-rate psychometric scaling tests in its evaluationof digitally converted photographs and photo-graphic intermediates. Quality assessments willextend beyond capture requirements to the pre-sentation and timeliness of delivery options.

From an industry perspective, research intoimage capture has slowed; the current emphasisis in bringing to market scanning systems thatwill offer a range of moderate- to high-qualitycapture options, but more importantly, fasterthroughput, greater flexibility in accommodat-ing a variety of source documents, and bettercalibration across scanners and peripherals (e.g.,printers and display devices). The industry willmove to high-production gray-scale/color scan-ning systems that can meet the performancerecords of bitonal (black-and-white) scanners.

The most promising scanning devices to appearin the next several years will be planetary anddigital cameras, such as those now coming onthe market from Minolta and Zeutschel, thatcan handle bound volumes, three-dimensionalobjects, fragile material, and oversize documentsin a non-damaging fashion and without resort-ing to the creation of photo-intermediates.Unlike flatbed scanners, digital cameras willenable operators to exercise greater control overresolution, lighting, and color balance. It maybe several years before digital cameras competeeffectively with photography, however.Increased quality and performance can also beanticipated from film scanners, such as theSunrise scanner that allows for high resolutionand gray-scale captum.

Technically sophisticated software for imagequality assessment and calibn.tion, such asImageXpertim, which incorporates fourteen dif-ferent tests (e.g., modulation transfer function(MTF), signal-to-noise ratio, gray resolution,

dimensional accuracy, color registration andconsistency) will provide operator-independentobjective tests of system performance. Untilrecently, such tests were beyond the capabilitiesof all outside industry or research labs. Colormanagement systems are also now available tocalibrate color data across imaging systems andindividual components (scanners, monitors,printers). The Munsell Lab at the RochesterInstitute of Technology has conducted extensiveresearch on managing color data through thewhole digitization process. Several projectsfocusing on art reproductions, VASARI andMethodology for Art Reproduction in Color(MARC), are exploring alternatives for achiev-ing true color fidelity.

The next generation of software programs togovern image quality should incorporate smartsystems for automatic, on-the-fly applications ofappropriate capture processes (resolution, grayscale, filters, etc.) based on an assessment ofdocument attributes and explicit institu-tional/user profile requirements. Early proto-types for this may be seen in the XeroxXDOD "autosegmentation feature" thatdetects the presence of halftones, appliesdescreening and halftone filters to those por-tions while treating text and line art with sepa-rate image enhancement algorithms designedto optimize contrast and detail. Instead of cre-ating separate windows for halftone and textu-al content, a future approach may be to createlayered images, with bitonal capture preservedin one layer, tonal reproduction in another,color saturation in a third.

In the longer term, programs will contain fea-tures for automatic image quality verification,designed to check not system performance, butthe digital files themselves. These will automati-cally match quality guidelines to desired out-puts: paper, film, and (in the case of on-screendisplay) the monitor's capability.

User requirements for derivative "access"images, including speed of display, browsingversus detailed examination, and color/tonalfidelity, will also become programmable. Anearly example of such considerations is seen in"progressive transmission," in which a completebut low-resolution image is transferred quickly;detail is added gradually until full image cap-ture is displayed or the reader halts the trans-mission. Kodak and Live Pictures, Inc. recentlysigned an agreement to develop capabilities fortransmitting, viewing, and manipulating

4'

CONVERSION OF TRADITIONAL SOURCE MATERIALS INTO DIGITAL FORM 45

high-quality images with less computing andnetworking capabilities than are currentlyrequired. The Live Picture technology storesimages as a sequence of discrete subimages,making it possible to access only those portionsneeded for transmission or editing at relativelyhigh speeds, even over regular telephone lines.

Some of the more promising industry researchfocuses on conversion with functionality', bring-ing intelligence to digital files. Most attentionto date has been given to improving the accura-cy and performance of OCR technology thatcan accommodate a broadening range of lan-guages and text-based representations. AdobeSystems' new Acrobat Capture software incor-porates OCR technology with bitmap imagingto create text-searchable files while retainingtypefaces, graphics, and the original page lay-out. The combined text and image informationpresent on an illustrated page, for example, arecompressed with the most appropriate compres-sion process and combined into a PortableDocument Fox mat (PDF) which is smaller thana compressed digital image. The accuracy rateof conversion can be set so that pages or por-tions of a page that challenge the software'scapabilities can be retained as bitmappedimages. According to a recent ras release, themilitary is considering using Acrobat Captureto convert twenty million pages of text.

Perhaps more significantly for future navigationof large image databases of mixed content,attention is being paid to pattern matching andobject recognition for non-textual informationpresent in digital images: symbols, spatialdimensions, orientation, and facial features, forexample. Excaliber is extending its OCR pro-gramming to accommodate face recognition,and Photodex is experimenting with databasesearching via an iconic interface. Providingcomputer-processible, eye-readable digitalimages for graphic materials represents the nextlogical step along this continuum. Initial workhas begun to convert raster images to vectorimages. a popular process used in GeographicalInformation Systems (GIS) and in engineeringand architectural applications. In vectorization,an image is generated by a set of mathematicalequations that describe points and locationswithin the image. They can be computationallyaltered to provide image functionality andmanipulation. The long-range potential is toreplace raster images (captured dot by dot) withvector images, which will result in greater func-tionality for searching, sorting, and manipula-

tion, and greatly reduce storage requirements.Issues associated with quality, however, must becarefully evaluated in this conversion process.

Research, too, is focusing on more efficientcompression processes that preserve fidelity andminimize the introduction of artifacts or noise.Work in the development of fractal and waveletcompression techniques is still under way. In anapplication for Citibank, Kodak is applyinghighly efficient, syntactical image compressionto store photo-identification as barcodes on theback of credit cards. It is envisioned that thesewill be read at retail stores, where the physicalidentification of the customer can be verified.This system of compression, based on buildinga taxonomy of like attributes (e.g., a library offacial features) may ultimately have broad appli-cations for a wide range of source materials.

FUTURE RESEARCH NEEDSResearch needs in the digital conversion of tra-ditional materials fall into three categories: eco-nomic, technical, and evaluative. Generallystated, the technology must become cheaper,better, and faster. Economically viable scanningprocesses and services are critically needed.Higher scanner thronr,hput must be coupledwith high-quality image capture capabilities andautomated means to ensure consistency of per-formance and quality control. Research institu-tions must work with vendors to jointly developcost-effective imaging service capabilities ofhigh quality and standardized means for creat-ing/capturing the requisite meta-data for order-ing and navigating the digital imagesthemselves. The means for capture and indexingshould be non-proprietary in nature and shouldlend themselves to network distribution andfuture digital applications, such as OCR, struc-tural linking, and visualization techniques.Definitions for creating an audit trail on con-version decisions must be incorporated intoheader information for each image.

Processes for selection, conversion, intellectualcontrol, and retrieval must be automated orsemi-automated if digital imaging is to becomean attractive economic alternative. In the nearterm attention should be directed to matchingcirculation records with selection decisions.deriving intellectual control from the digitalfiles themselves, evaluating the utility of fastbrowsing over textual description, and creatinga juried, interactive meta-database that couldaccommodate uscr input and differentiated lev-els of access.

4 5

46 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

Business cases demonstrating the economicadvantages of digital imaging applications toresearch libraries and cultural institutions mustbe developed and verified. The case for sharedresponsibility and enhanced access to distrib-uted sources over the institutional ownership ofhard-copy sources must be made firmly andconvincingly. The Andrew W. MellonFoundation is funding a number of digital ini-tiatives designed to provide economic compar-isons between traditional library costs and thoseassociated with digital library development.

User needs, perceptions, behaviors, and adapta-tion to online sources must be studied in detail.Preliminary studies suggest that researcheracceptance of image databases will depend ontheir convenience, speed of access, and degreeof user control. It should be presumed that thedevelopment of sizeable image databases richenough to support in-depth research are neces-sarybut not sufficientto facilitate scholarlyacceptance of the change from hard copy toonline resources. Means for navigating, retriev-ing, annotating, synthesizing, and presentinginformation at the desktop must also bedevised. These capabilities must be developed inan iterative, user-centered fashion becauseresearchers' needs will change with time andtheir increasing level of sophistication withusing digital technology. Greater human con-trol, requiring less human intervention, will benecessary.

Although navigation, retrieval, and utility issueswill be central to this research, dramaticimprovements in electronic display must beachieved. Research and development of moni-tors and other projection devices that make itpossNe to display documents in their originalsize with full legibility is essential. Ergonomicissues associated with scholarly research habits(e.g., eyestrain, body positioning) deservegreater exploration. Control and flexibility interms of display, access time, and functionalitymust rest with the end user. In addition toimproved display, research will be needed to tieimage presentation more closely to visual per-ception rather than technologically consistentapproaches.

REFERENCES

I. Conway, Paul, and Weaver, Shari. The SetupPhase of Project Open Book: A Report to theCommission on Pre,,rvation and Access on theStatus of an Effort to Convert Microfilm toDigital Imagery Washington, DC:Committee on Preset .,ation and Access,June 1994.

2. Elkington, Nancy, ed. Digital ImagingTechnology for Preservation. An RLGSymposium held March 17 and 18, 1994,Cornell University, Ithaca, New York.

3. Ester, Michael. "Draft White Paper onDigital Imaging in the Arts and theHumanities." Presented at the Getty ArtHistory Information Program Initiative onElectronic Imaging and InformationStandards, March 3-4, 1994.

4. . "Digital Images in the Context ofVisual Collections and Scholarship." VisualResources X (1994): 11-24.

5. . "Image Quality and ViewerPerception." LEONARDO. DigitalImageDigital Cinema Supplemental Issue(1990): 51-63.

6. Kenney, Anne R. "Digital-to-MicrofilmConversion: An Interim PreservationSolution." Library Resources and ThchnicalServices 37, no. 4 (October 1993): 380-402,and 38, no. 1 (January 1994): 87-95.

7. Kenney, Anne R., and Chapman, Stephen.Digital Resolution Requirement for ReplacingText-Based Material: Methods forBenchmarking Image Quality. Tutorial.Washington, DC: Commission onPreservation and Access, 1995.

8. Kodak Home Page(http://www.kodak.com).

9. Michelson, Avra, and Rothenberg, Jeff."Scholarly Communication andInformation Technology: Exploring theImpact of Changes in the Research Processon Archives." American Archivist 55 (Spring1992): 236-315.

46

CONVERSION OF TRADITIONAL SOURCE MATERIALS INTO DIGITAL FORM 47

10. Picture Elements, Inc. Guidelines forElectronic Preservation of Visual Materials,Part I. Submitted to the Library ofCongress, 1995.

11. Reilly, James. "Technical Choices in DigitalImaging." Presentation at the Society ofAmerican Archivists Annual Conference,Indianapolis, Indiana, September 1994.

12. . "Digital Imaging for PhotographicCollections: Foundations for TechnicalStandards." NEH grant application.

13. Robinson, Peter. The Digitization of PrimaryToctual Sources. Cambridge: OxfordUniversity, Office for HumanitiesCommunication Publications, no. 4, 1993.

14. Stam, Deirdre C. "Pondering PixeledPictures: Research Directions in the DigitalImaging of Art Objects." Visual Resources X(1994): 25-39.

15. Willis, Don. A Hybrid Systems Approach toPreservation of Printed Materials.Washington, DC: Commission onPreservation and Access, 1992.

NOTES

In addition to the reference list, the author wishesto acknowledge discussions with imaging scientistsand research directors, particularly James Reilly ofthe Image Pei manence Institute, Lou Sharpe ofPicture Elements, and Don Williams of EastmanKodak. References cited have been augmentedwith industry promotional literature, World WideWeb home pages providing hardware and softwareproduct announcements, and forecasts in maga-zines such as Imaging, Advanced Imaging, andGovernment Imaging.

4 7

IMAGE AND MULTIMEDIA RETRIEVAL

Donna M. Romer, Eastman Kodak

INTRODUCTIONDesigning an effective multimedia retrieval system is a complex challenge, primarily because existingguidelines for text-based systems do not entirely apply to the new technology. Fresh analytical chal-lenges confront the multimedia cataloger, for instance, who to optimize retrieval must conceptuallyand perceptually deconstruct materials across several cognitive dimensions. But existing catalogingtools have yet to catch up with the fact that multimedia description tasks need greater expressivepower. This paper discusses these issues as they relate to arts and humanities collections. Sometimesimage databases will be used to illustrate a topic, but the central issues are shared broadly by all mul-timedia applications.

DIGITAL LIBRARIESThe discussion of image databases in the literature over the last several years bears a striking similari-ty to the literature describing the development of library automation systems. Beginning with basicinventory management concerns, library systems eventually grew more sophisticated in work flowintegration, control functions, and enhanced public access.

Today, most image databases are like library automation systems of the early 1980s (i.e., proprietary,and weak on retrieval for all but the most adept). Through the 1980s library systems eventually grewinto Integrated Online Library Systems (IOLS), with isolated components united into more fluidstructures of communication. Further, productive research into retrieval technologies brought gener-al-purpose access methods to a diverse set of system users beyond the caretakers of a collection.Image databases have not yet smoothly integrated work flows, nor has research resulted in an inte-grated, widely usable institutional system.

Many years of work, however, preceded IOLS development, especially in classification, cataloging,and public access methods. If one looks back far enough, the bibliographic record as we know ittoday can even be traced to the cataloging record attributed to Kallimachos in his tenure at theLibrary of Alexandria [l]. For the items that an image database will need to classify, catalog, andretrieve, there is no corresponding historical tradition that can be drawn from which is a limitingfactor in the development of multimedia applications.

Essentially, this long tradition of organizing ideas, however imperfect, provided the library automa-don community with a necessary framework on which system developers could build struc;;;res. Thesame generalized methods have not yet materialized for cataloging images and multimedia objects.Many accounts in both the scholarly literature and the trade press describe an organization's rush toacquire multimedia database software, only to face the most pressing problem of all: how to describethe materials in question to achieve effective retrieval from the system just purchased. As these data-bases scale upward in size, collection managers soon begin to realize that applying existing descriptivemethods may be more likely to bury their assets than to provide the wide retrieval they hoped for.

BACKGROUND CONCERNS AND ISSUESRetrieval technologies are fundamentally judged by how their search tools perform. For databaseusers, the most memorable part of their interaction with the system is with the algorithms thatanswer their questions. In reality, the key to success is heavily dependent on the quality of the datapreparation environment that supports database design, documentation, and cataloging activities. Ifone looks carefully at why various multimedia projects fail to yield the expected benefits, one oftenfinds that the data preparation step was not adequately formulated. Cataloging itself rests upon yetanother layer, data representation, which refers to the choice of abstraction needed to manipulate

49

50 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

data on a computer platform. A brief outline ofthese three issues follows in order to establish acontext for later discussion.

DATA REPRESENTATIONSText-based descriptions, the most commonform of data representation used by databasetechnologies today, have proved to be very ade-quate representations for text-based materials.What could be better than using words todescribe other words and applying linguisticmethods computationally to linguistic struc-tures? But how adequate are text-based methodsfor non-textual materials? We have been pro-ceeding into the multimedia 2ge assuming thatpeople "read" and understand images in thesame way that they "read" and understand doc-uments. Multimedia's appeal to several sensesand perceptual modes actually challenges theuse of words to describe non-textual modalities.Early adopters of multimedia have been con-firming this obvious fact as they commonlyreport an inability to perform comprehensivesearches on their newly implemented multime-dia systems.

Part of the problem is that existing methods donot go far enough to describe the aspects thatdifferentiate a particular medium from another.For example, in photographic images, the place-ment of the camera relative to the central areaof interest contributes important visual differen-tiation for "visual" retrieval purposes. Yet evenwithin systems that incorporate camera angleand distance of the subject from the camera,many irregularities are found across this kind ofdescription. Both the lack of standard namingconventions and uneven visual training amongcatalogers contribute to the problem.

Several initiatives in the research community aretoday experimenting with non-textual represen-tations for multimedia content, deriving a newalphabet that multimedia systems will use torepresent the content of a digital file. (Forexample, a simple non-textual representationfor color is a hue/saturation histogram for red-green-blue expressed as a string of integers.)From an arts and humanities perspective, a fun-damental question remains unanswered. Are thenon-textual, or "content-based," technologiesarriving at representations that have enoughexpressive power for the materials that arts andhumanities collections hold? Since content-based work promises a form of automaticindexing and new avenues for search interfaces,how will traditional cataloging and search

methods be affected? Most significantly, whathappens if several competing non-textual meth-ods arrive in the marketplace? How will ourcarefully crafted interchange standards supportthe inevitable variety of content-based represen-tations that will emerge?

CATALOGING METHODSCataloging is essentially a process of creatingintelligent contextual judgments, with the goalof assembling descriptive access points that cannot only group items by their similarities, butalso distinguish differences within a collection.Cataloging professionals predominantly usetext-based structures as decision support toolsto construct descriptions for a database. A well-defined protocol and known economy are inplace to support this process today. Preservingthis investment is an important considerationwhen evaluating new technologies.

Multimedia content poses brand-new challengesto this effort, given the additional perceptualmodalities introduced, which are not evenlyrepresented in the text-based tool set. Imagearchive managers know all ..00 well what it is tofind an image, then hear the further inquiry:"Do you have any others like this?" While the-matic content may be readily accessible usingcataloged access points, retrieval by purely visu-al attributes is completely dependent on thepersonal "memoria technica" formed by thearchivist's experience with his or her collection.

The two most pressing issues for catalogingpractices today are:

Can existing text-based structures be supple-mented to support multimedia cataloging,based on a sound understanding of humancognitive processing of each unique medium?

Can content-based technologies evolve towork cooperatively with text-based methods?

SEARCH MODELSDatabase designers create search models to for-mally describe the primary retrieval tasks a data-base must support. For example, the user of aninventory database would expect retrieval bypart number to be a natural search criterion.Similaily, the user of a music database mayexpect retrieval by musical phrase to be a crite-rion for success. Consistent and psychologicallyinformed search models for multimedia retrievalare neither readily available nor obvious. Thesearch models found in both carly products and

4 9

the research literature appear to be driven bywhat technology is able to do, rather than howpeople make perceptual sense of differentmodalities. Traditionally, database ter hnologyhas assumed that one stores "answers" com-pletely and entirely in the database. But withmultimedia retrieval, a greater portion of theanswer" to a search is located in the recogni-

tion power of the person initiating the question.The adage "I will know it when I see it"expresses this phenomenon succinctly.

IMAGE DATABASES ANDTEXT-BASED CATALOGINGMost image databases today rely on text-baseddescriptions for cataloging and search purposes.Whether the choice of a word is derived fromfree-form thought, or from a structured vocabu-lary such as the Art & Architecture Thesaurus(AAT), the "representation" is seatched as a unitof text. The fundamental paradigm employedby most systems is matching the impressionsand words of the person cataloging an imagewith the woids and affective intention of theperson searching for an image.

For arts and humanities collections, severalintelligently composed cataloging tools havebeen developed to enhance consistent descrip-tion and access. ICONCIASS, the AAT, andthe Library of Congress Thesaurus for GraphicMaterials (LCTGM) are a few of the formaltools currently available. However, are they ade-quate for building solid descriptive catalogingfor image databases? In a forwarded PhotoCDdiscussion note [2], the California HistoricalSociety noted that combining several formalvocabulary tools to describe their images hasmuch improved access. While the time and costto complete a data record is increased signifi-cantly by this approach, text-based catalogingcan be improved by a more formal coordinationamong existing tools.

As daunting as the problems of the text-basedapproach is the different thinking modalityassociated with visual materials. No longer arethe variable combinations of image elements,thematic content, and iconographic denotationsthe only issues of concern. Other more finelyshaded interpretations are also required that aredifficult to name, such as the formal composi-tional rendering techniques the artist uses.

For the most part, text-based descriptions incurrent image databases try to stay close to therealm of the tangible and the nameable. While

IMAGE AND MUITIMEDIA RETRIEVAL 51

this method may work well for very small col-lections of images, significant problems occur asthe image database begins to scale upward insize. It becomes much more difficult to find"the difference that makes the difference" toensure successful searching.

A contradictory task faces both users and collec-tion managers. How can the power of visualrepresentation be unlocked using descriptiveinstruments that are not completely suited tovisual differentiation? A single word may namean object, such as a clock, bur only the limitlessvariations of compositional characteristics andgenre denotations provide the differentiatingfactor. The old cliché can truly be reversed: Aword (in an image database, at least) can beworth much more than a thousand images!

IMAGE DATABASES ANDCONTENT-BASED CATALOGINGOver the last several years a number ofresearchers in computer science and electricalengineering schools have been working on thesolution to the text-based dilemma, focusing oncreating descriptions from a digital image fileitself, a technique commonly called content-based description. The content-based workmost notable for arts and humanities empha-sizes the recognition and description of color,texture, shape, spatial location, regions of inter-est, facial characteristics, and (specifically formotion materials) key frames and scene-changedetection.

One goal of content-based work is to providealgorithms that can automatically recognize theimportant features contained in an image with-out the need for human intervention in theprocess. Since cataloging is the most expensivestep in multimedia database implementation,the promise of content-based methods has astrong appeal for reducing costs (and, onewould hope, increasing indexing consistency).

The current state of content-based technology,while very impressive, has yet to provide thegeneralized methods needed for wide accep-tance in the arts and humanities community.Notable work has been produced by the MITMedia Lab in the content-based work specifical-ly related to face, shape, and texture recogni-tion, collected under the application calledPhctoBook [3]. Existing commercial applica-tions, such as IBM's Query By Image Content((PIC), provide consistent representations. See[4] for a recent article in the popular press. The

50

52 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

QBIC technology operates on color, texture,shape, and feature locality.

The content-based representations produced bythese projects all have the unique stamp of theresearch that produced them. If one were able tolook at the algorithms that produced the con-tent-based descriptions, they may share somecommon thought, but most likely there will besignificant differences based on local innova-tions. While the desire may be for content-basedwork to settle into a consensus form to enablebroad usability, the truth is that this is a highlycreative and fluid time period for the content-based research community,. A stable set of meth-ods on which to build standards are not likely toemerge in the short term. An arts organizationmay choose a single and unique content-basedscheme for its local collection database. But itwill be difficult if not impossible to share thatsame representation with other organizationsusing different content-based schemes.

This fact should not deter the arts and humani-ties community from applying the power ofcontent-based technology; on the contrary, thisis an ideal time for application needs to be moreclearly understood and communicated to thecontent-based community in order to ensurethat the proper forms of representation are beingconsidered and tested. Content-based technolo-gy holds great promise for multimedia retrievaland over time will create representations thatprovide unique dimensions for retrieval.

It is important to note that content-based tech-nologies strive to create mathematical represen-tations of phenomena derived by a set of rules,although a complete rule set for human visualinterpretation has not yet been formulated. (Ahighly readable discussion of this issue is inter-woven in a recent NSF/UC Irvine-report [51 fordigital video systems.) For example, one mayobserve a content-based database search forimages on the dimension of texture, but amongthe results on the screen are usually some imagesthat make no visual sense at all. To the content-based system it looked right, but to the humanvisual system there is a mismatch. (Imagine thechallenges that connoisseurship studies wouldprovide to content-based research!)

The reality for this technology is that complete-ly automatic content-based recognition is on avery distant horizon. It is much more likely thatthe cooperative efforts between text-based andcontent-based methods will yield the most

interesting and useful results for representingimage and motion content for a very long timeto come.

BUILDING USER-BASED SEARCHMODELS FOR RETRIEVALAn area that has received very modest attentionin the rush to develop image databases is imagedatabase user studies. Other papers in this col-lection will discuss this issue more thoroughly,but I will touch on two issues specifically relat-ed to the cataloging and retrieval process. Thefirst issue is related to understanding the kindsof questions that users pose to existing systemsto satisfy an existing work process in which theyare engaged. The second issue is related to thevisual review process that assists users through-out the selection process, since a search is notreally over until something has been selected.

SEARCH QUESTIONSBefore images can be cataloged, whether bytext- or content-based methods, it is necessaryto establish some guidelines for what is impor-tant to describe. At-the heart of all good data-base systems is an understanding of the needs ofthe people who will use the database.

As an example, the Computer Interchange ofMuseum Information (CIMI) initiative, ProjectCHIO (Cultural Heritage Information Online),found that this line of inquiry was fundamentalto establishing an information sharing model.An IMAGELIB posting entitled "Looking forMr. Rococo" [6) provided a rich source of dis-cussion about understanding the pattern ofmuseum patrons' search questions in their ownwords (not filtered through an intermediary).Their inquiry revealed several "points of view"that required more access points than currentcataloging practices originally envisioned.

Inquiry "to understand the ecology of ques-tions" is a valuable way to begin laying thefoundation for constructing multi-purpose datarecords that support different kinds of systemusers. The broadest possible view is to create acataloging data record whose contents may berearranged to suit the requirements of multiple"points of view."

A working example of this issue is the researchperformed for the Kodak Picture Exchangeapplication for commercial photography [7].Image search questions (in the words of theoriginator) were collected from both imageowners (photo agencies) and image users

51

(graphic artists, art directors). Five commonsearch patterns that emerged from this inquirywere invaluable in establishing a "layered"framework for describing commercial photo-graphic images. In addition, these search pat-terns made it possible to construct data recordsthat provide access to two different "points ofview": that of edito, ial and advertising users.

While existing search questions cannot possiblymodel all the search variations a system mayreceive, this line of analysis provides the data-base designer with an excellent starting point.Some examples of the search and review pat-terns that were cbserved are:

PATTERN

Image Elements

CompositionalQualities

Subjective Responses

Spatial Relationships

Intellectual Property

To SEARCH FOR

Contexts, objects,

actions, places

Artistic techniques,genre, medium

Mood, emotion,subjective evaluations

Proximity, placementof objects to one

another

Usa e restrictions and/pricing

IMAGE AND MULTIMEDIA RETRIEVAL 53

alent; in others, there are implications for thecataloging record itself. As an example, two ofthe visual strategies found are discussed, togeth-er with their cataloging implications:

I. Visual thinking is stimulated by images.People often start co look for images by usingimages. They may perform a random or direct-ed search through books, catalogs, files, etc.

Implication: Image databases need to provide astructure, like a visual table of contents, thatusers can access without specifying words. Userinteraction becomes much easier if a purely visu-al activity is provided as an initial welcome to asystem or during the inevitable "dry spell" fre-quently experienced during a search session. Notall inmges in a database would necessarily be can-didTs for this browse function. Visually appro-pria:.e cataloging methods are needed to tag anitnyige as just such a browsing "candidate."

21. Images already selected provide the basis to con-/ tinue a search.

f/ Once suitable images have been found that are

iclose to the desired visual match, people will/ often use selected images to submit a request such/ as "Get me more like che ones I just found."

The importance of a "points of view" inqt/trycannot be stressed enough. Thc understlf.idinggained in this work makes it possible te' makeconscious choices about le,,ds of catalogingbased on user populations. Further, pne can cre-ate an economic model to support catalogingactivities and evaluate: cataloging wols against aperformance framework.

A VISUAL THINKING MODELStudies in art history and visual/mass commu-nications concern the interpretation of visualmaterials and their analytical deconstruction,but few have specifically tracked the thoughtprocess that supports the image search itself.Searching for images may require differentthought processes than searching for text-basedmaterials such as documents or books; if so,then multimedia cataloging will have to reflectthis fact.

One study by Romer [7] enumerates severalvisual thinking processes observed with profes-sional photo editors. In some cases the searchand review process needs only a software equiv-

Implication: An understanding of image similar-ity features is needed if the "get more like this"scenario is to have good results. Arriving at arobust set of visual similarities for arts andhumanities applications is a major challenge, butin the long term will contribute richly to thesearch environment. To incorporate visual simi-larity into a cataloging data record will require adeep ,Anderstanding of each medium and thecognitive process used for interpretation.

RECOMMENDATIONS FORRESEARCH OPPORTUNIMS

Points of View StudiesAcross arts and humanities collections a widevariety of potential users need to be studied.Among the user types chosen, a quantitativemethodology should be established for deriving"points of view" frameworks to guide the cata-loging process.

As mentioned earlier, the two most importantaspects to encapsulate in these studies are thediscovery of patterns in user search questionsand the perceptual review methods that arcemployed while refining a search. Both studieswill provide the evidence needed to design

52

54 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

practical multimedia databases, as well as drivesoftware-related development for user interfaces.

There are few studies in either of these areas,but most notable is the work of P.G.B. Enserfor the Hu lton Deutsch picture collection [8].The CIMI discussion around Project CHIOappears to be the most current, active forum inwhich several "points of view" studies arealready under way. This project also presents anopportunity to assess valuable tools such as theCategories for the Description of Works of Art withmore user-centered understanding derived from"points of view" studies.

Text-based Resources RImiewed for Structure

Existing text-based resources that support cata-loging practices need to be reviewed in terms ofhow well they satisfy the requirements of multi-media retrieval. Preliminary work is needed todevelop a list of multimedia retrieval require-ments; based on this work, possible projectsmight be:

An evaluation of existing resources such asthe AAT, LCTGM, etc. to determine howwell they perform against a multimediasearch model derived from "points of view"studies. Support for this approach is partial-ly found in the work of Soergel [9] relatedto user studies validating the contents offormal cataloging and access tools.

An evaluation to support restructuring hier-archical resources into semantic networks,i.e., structures that represent knowledge inan interconnected manner. Note that theuse of a network structure eliminates manyof the limitations surrounding hierarchicaland faceted thesauri. With a semantic net-work it is possible to assign several relation-ships between terms with differing weightsto provide a clear notion of the semanticstrength between terms.

A particularly lucid theoretical discussion byJanice Woo [10] contrasts the issues of tradi-tional static organintions of concepts todynamic relationships based on participatoryactions (i.e., hierarchical vs. network structurcs).Chakravarthy 1111 presents an excellent andthorough discussion of a prototype imageretrieval system supported by semantic networktechnology (WordNet).

An area of descriptive depth that is importantto image retrieval (especially images with his-

toric value) is the precise definition of imageelements and their proximal relationship to oneanother. (Image elements are the tangible peo-ple, objects, actions, places, etc. depicted in animage.) Current cataloging practices do notfocus on the mundane level of naming individ-ual objects or actions depicted in an image,focusing instead on the descriptions of thematiccontent and iconographical attributions. For thebroadest possible access, though, there is a needto name individual image elements and theirrelationships to one another in a standard syn-tax to support precise searching capability (e.g.,a man sitting in a carriage in front of NiagaraFalls). A consensus on syntax across arts andhumanities cataloging will also drive systemvendors to incorporate this level of specificityfor search support.

A number of "picture description" languageshave been proposed by several disciplines. Hibler[12] has suggested a very practical method.

Media DifferentiatorsEach distinct multimedia type embodies percep-tual qualities that make it a unique vehicle forcommunication. It is important to investigate,and find a cataloging equivalent, for thoseunique qqalities. For example, a photographicimage is greatly influenced by the choice ofprocess used. A daguerreotype is different visual-ly from an ambrotype, even though both kindsof imagery are often housed in latched cases. Formodern photography, lens and filter choices cre-ate visible differences that contribute to theimage experience. Being able to recognize andcatalog the differences helps an image databasesupport a visually based "get more like this" sce-nario. A host of other differentiators for image,music, motion, graphics, etc. need further studyand articulation.

An example of an excellent manual that pre-sents clear descriptions of visual differentiatorsfor identifying historic photographic processesis Reilly [13].

Visual Thesaurus

Various researchers have created a number of"wish lists" pointing to an idea called variously avisual thesaurus, or picture thesaurus, or picturedictionary. All thinkers have a similar vision:having an image point to its visual "synonym."More complex versions provide a generaspecies"divide and conquer" strategy. In all cases, thevisual thesaurus provides the structure for plac-ing visually similar things with their relatives. A

5.4

visual "sense" pervades the similarities across anumber of different qualities: genre, composi-tional technique, time, etc. A recent and excel-lent example of work in this area is by Lohse[14]. Chang [15] has presented prototypes relat-ed to this topic that support visual reasoningusing a data structure called VisualNet.

Representative Sets of Images

One obstacle to advancement in content-basedtechnology is the lack of sizeable and realisticdata sets tied to application requirements fordevelopment and test purposes. It would beimmensely valuable to establish a formalmethod to provide good data sets and shareresearch results broadly between the content-based and arts and humanities communities.(Note that the data sets in use today are typical-ly from clip art CDs that contain very simplisticdepictions for analysis.) While some engineer-ing schools work closely with their institutions'art history departments, there is no umbrellaorganization that then helps to synthesize andinterpret implications more broadly.

Music and Motion RepresentationIn both the library and content-based commu-nities still images have received the greatestattention in the research literature related torepresentation and cataloging issues. Whilemusic and motion imaging are also topics forresearch, fewer studies exist than for the worldof still-image applications. Both music andmotion cataloging require more fundamentalthought in order to arrive at the right conceptu-al framework for subsequent implementation.

In music, an early dissertation by Page [16]focused on issues related to the written musicalscore as the fundamental starting point formusical representation. A paper by Wiggins[17] provides a framework for describing andevaluating music representation systems in abroader context. Hawley [18] analyzes the cre-ation of "structure out of sound" for multime-dia retrieval.

Davis 1191 presents a motion annotation systemin a prototype environment calledMediaStreams, which uses icons to describevideo content. Csinger [20] proposes a knowl-edge-based framework to support the humaneffort required for annotating motion.

CONCLUSIONIn summary, image and multimedia databasesare heavily dependent on the quality of their

IMAGE AND NI uunmEDIA RETRIEVAL 55

stored descriptions, which (whether text- orcontent-based) provide the foundation for allmeaningful interactions with a system. Severaldescriptive challenges remain to be solved inorder to create effective representations. Thesolutions, as indicated above, appear primarilyinterdisciplinary. The ideal team would natural-ly be composed of professionals in informationscience, electrical engineering/computer science,visual/mass communications, and cognitive psy-chology. Each of these disciplines holds a por-tion of the knowledge required to supportresearch in this vital and growing area.

Multimedia "objects" (image, motion, audio,graphics, compound document, etc.) acquireuseful descriptive data throughout many differ-ent stages of their existence. Some data isacquired automatically by capture devices (suchas scanners, or digital cameras), some is addedby human intervention through traditional cat-aloging methods, and yet other data is acquiredby automatic, content-based techniques. Allthese streams of data will require intelligentcoordination and constant attention. The endresult is to create a richer set of descriptions forretrieval purposes, which can be employed incombination to provide more meaningful accessto the vast heritage of the arts and humanities.

REFERENCES

I. Blum, Rudolf. Kallimachos, The Alexandrian

Library and the Origins of Bibliography.Madison: University of Wisconsin Press,1991.

2. Elaine Engst (Cornell University),"Cataloging Photographs," 1994. Originalsender: Robert MacKimmie (CaliforniaHistorical Society). [Available by e-mail:[email protected]].

3. Pentland, A.; Picard, R. W; Sclaroff, S.Photobook: Content-Based Manipulation ofImage Databases. MIT Media LaboratoryPerceptual Computing Technical ReportNo. 255. November 1993.

4. Okon, Chris. "IBM's Image RecognitionTechnology for Databases at Work: QBICor Not QBIC?" Advanced imaging 10 (May1995): 63-65.

54

56 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

5. Sklansky, Jack, et al. Final Report:International Workshop in Digital Video forIntelligent Systems. December 1993.

6. CIMI List Owner, "Points of View Meeting:Luoking for Mr. Rococo," February 13,1995. [By e-mail: [email protected]].

7. Romer, Donna M. A Keyword is Worth1,000 Images. Kodak Internal TechnicalReport. Rochester, NY: Eastman KodakCompany, June 26, 1993.

8. Enser, P.G.B. "Query Anal.,fsis in a VisualInformation Retrieval Context." Journal ofDocumentation 6- Text Management 1, no. 1(1993): 25-52.

9. Soergel, Dagobert."The Art andArchitecture Thesaurus (AAT): A CriticalAppraisal." Visual Resources X, no. 4 (1995):369-400.

10. Woo, Janice. "Indexing: At Play in theFields of Postmodernism." Visual ResourcesX, no. 3 (1994): 283-293.

11. Chakravarthy, Anil Srinivasa. "InformationAccess and Retrieval with SemanticBackground Knowledge." Ph.D. disserta-tion, Program in Media Arts and Sciences,School of Architecture and Planning,Massachusetts Institute of Technology, June1995.

12. Hibler, J.N. David; Leung, Clement H.C.;Mannock, Keith L.; and Mwara, Magi K."A System for Content-based Storage andRetrieval in an Image Database." SPIEImage Storage and Retrieval Systems 1162(1992): 80-92.

13. Reilly, James. Care and Identification of 19rhCentury Photographic Prints. KodakPublication No. G-2S. Rochester, NY:Eastman Kodak Company, 1986.

14. Lohse, Gerald L.; Biolsi, Kevin; Walker,Neff; and Ruetter, Henry H. "AClassification of Visual Representations."Communications of the ACM 37, no. 12(December 1994): 36-49.

15. Chang, Shi-Kuo. "A Visual LanguageCompiler for Information Retrieval byVisual Reasoning." IEEE Transactions onSoftware Engineering 16, no.10 (October1990): 1136-1149.

16. Page, Stephen Dowland. "Computer Toolsfor Music Information Retrieval." Ph.D.dissertation, New College and ProgrammingResearch Group, University of Oxford,1988.

17. Wiggins, Geraint; Miranda, Eduardo;Smaill, Alan; and Harris, Mitch. "AFramework for the Evaluation of MusicRepresentation Systems." Computer MusicJournal17, no. 3 (Fall 1993): 31-42.

18. Hawley, M. J. "Structure Out of Sound."Ph.D. dissertation, Media Laboratory,lvlassachusetts Institute of Technology,1993.

19. Davis, M.E. "Media Streams: An IconicLanguage for Video Annotation."Proceeding; of the IEEE Workshop on VisualLanguages, Bergen, Norwo 1993.

20. Csinger, Andrew, and Booth, Kellogg S."Reasoning about Video: Knowledge-basedTranscription and Presentation." Proceedingsof the 27th Annual Hawaii internationalConferer;ce o:7 System Sciences, Hawaii, 1994:599-608.

LEARNING AND TEACHING

Janet H. Murray, Massachusetts Institute ofTechnology

With the wide availability and increasing usefulness of electronic media, arts and humanities educa-tion is poised for significant change. Some of these changes are already under way, others are justbeginning to appear on the horizon. They are being met with enthusiasm from some and strongresistance from others. The key to the changes now under way is that a new medium makes possiblenew methods of teaching and learning and a new epistemology: new structures for representingknowledge. Those who have already been engaged in pushing the boundaries of their disciplines arethe most likely to be early adopters of the technology.

RESEARCH AND PRACTICE TO DATE

Writing and Foreign Language Learning

The skill-based disciplines of writing and language learning have been the most active early users ofthe technology. It is significant, and perhaps forms a useful paradigm for other humanistic disci-plines, that in both these cases the adoption was driven by methodological changes.

In the teaching of writing the process model, advocated by theorist-practitioners like DonaldMurray, Peter Elbow, and Linda Flowers, was coming into wide acceptance during the bte 1970sand early 1980s. The arrival of personal computers starting in the mid-1980s made process teachingmuch easier by making it easier to create and critique multiple drafts and share the product withpeers as well as teachers. Many aids to writing have been created and are in use across the country,most notably The Writer's Workbench (Bell Labs/Colorado State), which includes process aids inaddition to its original set of more controversial style checkers, and Daedalus (University of Texas),which allows students to hand in papers online. University-specific networked systems are in use atCarnegie-Mellon, where much imaginative early work was done in modeling a process-approachelectronic writing environment, and at MIT, where the sys.:em includes an electronic classroom, afacility for adding teacher's corrections to papers handed in electronically, and an online textbook fortechnical writing. The use of electronic classrooms in which work can be displayed, critiqued, andedited on large-screen displays has made it possible to demonstrate the process of writing in theclassroom with an ease not available under the constraints of paper and blackboards. Currently, thereis much interest in exploring computers for teaching collaborative writing.

Commercial systems have superseded much of the work on writing software attempted by university-developed systems. Spell-checkers, outliners, annotation icons, and mult:ple versioning software areall available in word processors or document systems. But the ready availability of commercial prod-ucts that do the job is the exception, not the rule, for humanities software, and even when commer-cial products are available their use is often limited by platform dependencies.

Writing shares with foreign language teaching a laboratory approach, and writing centers and lan-guage laboratories arc natural sites for the adoption of new technology. In foreign languages theinterest began even before the advent of the microcomputer. The University of Illinois offered pro-grammed language learning on the mainframe-bascd PLATO system The first use of microcomput-ers was electronic drill programs that relieved the drudgery of workbook grading. Several of thesehave been developed and are in wide use, including Dasher (University of Iowa), CALLIS (Duke),and MacLang (Harvard). Brigham Young University developed a system for testing students onlinein order to determine what level of language course to offer thcm. A more flexible approach was theinclusion of grammar and dictionary information in specialized word processing software, a tacticthat was also well used by James Noblitt (University of North Carolina) for foreign language learn-ing. Although some have used the opportunity of computer-based language learning to study the

56

57

58 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

patterns of second-language acquisition, thisremains an underdeveloped area of inquiry.

Starting in the early 1980s as the communicativeapproach to language learning was becomingaccepted, multimedia was identified as offeringtremendous potential for communicativemethodologies. Like the process approach currentin writing studies, the communicative approachwas a good fit to the medium because it empha-sized process over product, stressing the impor-tance of exposure to authentic native speech(which can be delivered on video, richly annotat-ed and cross-referenced), and viluing the acquisi-tion of context-sensitive language functions (suchas expressing agreement, asking for help, greetinga friend or a stranger) over the memorization ofword lists and grammar paradigms.

Multimedia for language learning was pursuedactively at MIT, which produced narratives anddocumentaries specifically scripted and shot forinteractivity (Athena Language LearningProject) and at the University of Iowa (PICT)and the University of Pennsylvania, both ofwhich produced systems for adding subtitlingand phrase-by-phrase control to existing visualmaterial. The Iowa project focused on acquiringthe rights to foreign television; the Pennsylvaniaproject focused on films available on videodisc.The military service academies have made wideuse of interactive video workstations, mostlyusing re-purposed educational videos, and theCIA is currently working on course materialsthat would eliminate the teacher from the sys-tem, starting with introductory courses inSpanish, Russian, and Arabic. Military-spon-sored efforts, though well funded, have oftenbeen pursued at a distance from universitymethodologies and research.

University-centered efforts have not looked toeliminate the teacher but to reform languageteaching in order to incorporate more authenticvideo, facilitate discovery learning by students,and move the teacher to the role of a taskdesigner rather than the sole provider of infor-mation. The difficulty with a teacher- and text-centered approach to language learning is thatthe teacher, often not a native speaker, becomesthe sole model of the language. The text pre-sents language in a way that emphasizes writtenover oral forms and sometimes leaves studentsunable to speak or comprehend spoken lan-guage. By contrast, electronic media can offermultiple native conversationalists and introducenative speech from the earliest stages of Ian-

guage learning without overwhelming the learn-er. What is needed next is a more clearlydefined methodology to exploit the technologyappropriately.

Two advanced potential areas of language learn-ing software await a more developed technolo-gy: grammar correction and pronunciationpractice. Natural language processing systemshave been used to model language teaching(Xerox PARC, MIT, Carnegie-Mellon,University of Maryland), but this remains anarea of research with only limited experiencewith actual students. The technology for creat-ing spectrograms is now widely available ondesktop computers, but despite promising earlywork in adopting it for language learning(MIT) it has not yet been developed for wideuse. Both of these await development over thenext decade.

History, Literature, and CultureIn the traditional humanities core disciplineselectronic educational materials have been devel-oped in response to the demands of specific sub-ject matter. Although no methodology has beenexplicitly articulated, there has been a generalattempt to introduce dense primary materials atthe undergraduate level and to synthesize com-plex materials that had previously been studicdseparately. In the field of history two simulationsof the 1980s provide models that have not yetbeen widely followed. One, "The Would-BeGentleman" (Stanford) invited students to expe-rience ancien regime France in the persona of ayoung man trying to succeed at court. It includ-ed economic simulations as well as culturalknowledge, such as how to make an advanta-geous marriage. Another, "The Great AmericanHistory Machine" (Carnegie-Mellon) offeredcensus data and numerous ways of configuring itand representing it graphically, allowing studentsto explore many possible correlations in socialtrends. Interactive video simulations have alsobeen used at Carnegie-Mellon to introduce phi-losophy students to issues in ethics. These are allireas in which hands-on manipulation of a sim-ulated world or statistical model can foster theprocess of humanistic exploration of manyanswers to the same question, or many causes ofone result. Despite their promise, little effort hasgone into the creation of such models so far.

The marketplace is responding at the level ofthe electronic textbook; commercial and univer-sity publishers have begun offering literary andcritical works in electronic form. The most

5

ambitious of these, the Voyager Company, hascreated a format well suited for teaching pur-poses In Voyager's Extended Books the teachercan i...epare a teaching edition complete withmarginal notes, highlighting of passages, mark-ing of pages, automated searches for keywords,and a notebook for copying citations completewith source and page number.

With the advent of CD-ROMs some of thesebooks have multimedia extensions. Among themost promising of the Voyager series are "WhoBuilt America?," a history of the United Statesfrom a working class and leftist viewpoint;"American Poetry," an anthology that includesreadings of the poems in digital audio; and ArtSpiegelman's "Maus," a presentation of his com-pelling graphic novel on the Holocaust withprimary documents and records of his drawingprocess. The extended book is clearly a formatthat publishers are comfortable with, since itretains the beak metaphor and allows the use oftexts that already have a reputation and a fol-lowing. Although current software is slow andawkward in many ways, and the book metaphorcan be very limiting, extended books hold greatpromise as classroom presentation tools and forlibrary reference. Their use will probably besupplementary in the immediate the futureuntil reductions in cost and the spread of elec-tronic technology make it practical to use elec-tronic media as the primary delivery mediumfor texts.

More ambitiously, several comprehensive proj-ects have aimed at using hypertext architectureto present teaching materials. The PerseusProject (developed at Harvard, but now housedat Tufts University) presents a wide range ofvisual objects from ancient Greece combinedwith the texts of Greek literature. At BrownUniversity the lntermedia Project of the 1980swas enthusiastically adopted in the humanitieswith critical webs developed for nineteenth-cen-tury authors under the direction of GeorgeLandow. When the Intermedia software becameobsolete, these webs were later transferred toStory Space. The project demonstrated thathypertext could be used to model the method-ology of the humanities as well as represent itscontent. It also raised many still-unansweredquestions about the difficulties of navigation inhypertextual environments.

Most of the current work in hypermedia envi-ronments has centered on single-author collec-tions, including the development of electronic

LEARNING AND ITACHING 59

editions, which combine texts with photofac-similes of original texts, and wit'l video andaudio of performances. For instance, work inthis area is in progress on Manrique (Universityof North Carolina), Goethe (Dartmouth), Yeats(University of Tennessee), and James Joyce(Boston University). Other projects (Rossetti atthe University of Virginia, and Shakespeare atMIT) transcend the edition and move towardcreating comprehensive electronic archives thatserve both teaching and research purposes. Theattempt of these projects (and many othersrapidly springing to life) is to bring together inappropriate proximity to one another materialsthat are hard to find or not previously found.For every large project with substantialresources there are probably a hundred home-grown hypercard stacks (or ToolBook stacks, or,increasingly, HTML Web sites) developed forindividual courses at single institutions. Thewidespread use of simple hypertext and hyper-media structures will increase the level ofsophistication and the demand for more com-plex tools among humanists in general.

Although it is limited to text, the WomenWriters' Project (Brown) is remarkable in thatthe compiling of an electronic archive has facili-tated the teaching of otherwise unknown orinaccessible :.exts, although the texts themselvesare often issued in book form. The Brown proj-ect is an exception to what seems to be a trendto establish archives of single male authors.Clearly more work needs to be done to makesure the electronic environment offers wide cov-erage of our cultural heritage and is not devel-oped haphazardly.

The work of the Text Encoding Initiative (TEI)group has been a tremendous boon in offeringstandards for archiving text, but that takes careof only one part of the puzzle. A similar effortis needed in developing software for accessingthese text archives, especially hypermediaarchives. Attempts at multimedia authoringenvironments at Brown, MIT, Stanford, andDartmouth have been either too large or toosmall, but never just right. Furthermore, themarketplace is unlikely to supply the kind ofarchiving environment needed by humanists,who require both precision of reference andpreservation of context, and who also need toshift focus from one document to another (andone medium to another) as they work. It wouldbe useful to encourage several archive/editionprojects to collaborate in developing a standardcross-platform, open-architecture authoring and

ti

60 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

reference environment for humanists. Providedthat the range of users were broad enough andthe resources for developing the environmentwere sufficient, an archive architecture withmultiple examples of implementation could bedeveloped within five years. With an openarchitecture it could continue to be improvedupon and refined with code that could beshared among institutions.

Film and Media Studies

Electronic technologies offer great promise forthe field of film and media studies, but thispromise is hampered for now by copyrightissues. Several promising projects, includingLarry Friedlander's Shakespeare Project of the1980s (Stanford) and the UCLA Roger Rabbitproject, were prevented from reaching wider dis-tribution owing to copyright issues. As morefilms become available in electronic format, theycan be bought separately and then used in con-junction with educational software. But this willnot solve the problem of network delivery. Legalsolutions are more important to this area of edu-cational innovation than software solutions.

Teaching Creative Artists in the New MediaIn addition to furthering the study of the exist-.ing arts and humanities, the electronic mediaare giving rise to new art forms. Michael Joyce'safternoon (1987) and Stuart Moulthrop'sVictory Garden (1992) are notable examples ofthe genre of hypertext fiction. Electronic fictioncourses have been offered at Brown (by RobertCoover) and at MIT (by myself) since the early1990s, long enough to ' gin to see new genresemerging as young writers born into a world ofinteractive media come to maturity. Central tothis effort is the perception that the computerand the Internet are not just telephone wires forcarrying "content" in traditional linear formats:they constitute a new medium that will have itsown structures of representation and thereforeits own appropriate forms of artistic expression.

Again, teaching efforts are hampered by a lackof software development. The current authoringenvironments for hypertext narrativesWebbrowsers, HyperCard and its imitators, andStory Spaceare all structurally limited. Thereis a pressing need for software that will facilitatespatialized writing (i.e., writing that is navigatedrather than paged through), making links, andcreating interactive structures without program-ming knowledge. More ambitiously, there is aneed to adopt the methodologies of artificial

intelligence, particularly knowledge representa-tion and agent-building, for the making of plot,character, and narrative form.

The Use of the InternetAccess to materials over the Internet is increas-ing exponentially for scholars and students. Theincrease in material on the global spaghetti plateknown as the World Wide Web makes the jobof humanities librarians particularly crucial. Theeditorial functions cf reviewing, filtering, vet-ting, listing, and annotating sources will becomeincreasingly valuable as available materials prolif-erate. Teachers will need guides to importantresources and assessments of their reliability.Students will need training in how to navigate,use, and evaluate Internet resources. Softwarewill be needed to access the many kinds ofinformationbibliography, hyperlinks, quanti-tative databases, audio and video fileson theWeb and make it readily available to students.Humanities educators will be in particular needof clearer copyright rulings, and of the extensionof "fair use" rules to electronic media.

The Perceived Threat to the Bookand to the TeacherOne of the results of the increase in the use ofelectronic media is a re-evaluation of books as atechnology for disseminating knt.dledge.Ongoing scholarship on the beginnings of theprint era is helping to contextualize currentunease at the supplanting of the book as theprimary means of intellectual communication.A debate has been joined over whether thoughtitself depends upon the linear presentation andphysical pages and binding of the book, orwhether other modes of organization and pre-sentation may sometimes be preferable for cap-turing the richness of the human intellect.

At the same time economic forces are calling forelectronic delivery of "distance learning" inde-pendent of the instructor. The humanities andthe arts are particularly vulnerable, with weakerfunding sources but a higher level of dependen-cy on personal interchange.

Both of these challenges call for a caieful con-sideration of the appropriate roles for electronicmedia in carrymg forward the work of human-ists and artists. Attention should be paid toidentifying what kinds of intellectual processesare facilitated by the new media. It will beimportant to sponsor significant educationalinnovations, large enough to constitute a depar-ture from usual procedures, and to develop

59

reliable, qualitative methods of evaluating edu-cational results in the humanities. The anthro-pological approach developed at Brown for theIntermedia Project might serve as a good modelfor qualitative evaluation.

FUTURE DIRECTIONSThe next step for humanities teaching andlearning is the creation of course-sized materi-als, the electronic equivalent of the textbook,and the development of new curricula based onelectronic delivery of information. For instance,foreign language teaching should be rethoughtnow that it is possible to deliver large databanksof authentic speech with extensive annotationthat make them accessible to the novice.Shakespeare studies, which have long struggledwith videotapes, can be reformulated once wecan deliver an environment that allows forimmediate retrieval of quarto, folio, and multi-ple performances. History can be taught with amuch larger access to databases and primarymaterials at the undergraduate level. Now thatwe understand some of the basic elements ofhumanities educational computing, the nextstage will be to develop core reference/learningenvironments and to reformulate curricula totake advantage of them.

The new learning paradigms will requireredesign of classrooms as well, with special careto create spaces where students can speak to oneanother and to the teacher as well as interactwith computer displays. The next few yearsshould begin to offer us some models for work-ing humanities classrooms, based on models inwide use now at such places as Stanford,Brown, MIT, and Penn State.

The creation of course-sized electronic curriculawill require work along the other directionsalready mentioned: the standardization of deliv-ery environments; the collective design anddevelopment of authoring software specializedfor humanities applications; the development ofnew copyright procedures for digital material;the need for refining qualitative evaluation pro-cedures for humanities education.

In all of these areas, it is important that human-ists take the initiative in shaping thc education-al environment of the next century.

G

LEARNING AND TEACHING 61

ARCHIVING AND AUTHENTICITY

David Bearman, Archives & Museum Informatics

STATE OF THE ARTThe proliferation of electronic information and communication systems has created a crisis ofaccountability and evidence. As more and more of the records of our society are available in electron-ic form, users are asking how they can be sure electronic records created in the past will be availablein the future and how they can be sure those received today are trustworthy. The issue is critical forall aspects of humanistic studies because these scholarly disciplines depend on the study of originaltexts, images, and multimedia sources. To even imagine the humanities, it is essential to have correctattribution, certainty of authenticity, and the ability to view sources many decades or centuries afterthey are created.

While the question of how to create and preserve electronic evidence (records with provable authen-ticity) has been with us as long as computing, research in this field is relatively new in part becauseuntil very recently few source materials were created electronically, and available solely in electronicform. Thus, in 1991, when the U.S. National Historical Records and Publications Commissionsponsored a working meeting on Research Issues in Electronic Records, virtually no publishedresearch was available. Since the publication of the report of that meeting, the field has proliferated(see special issues of American Archivist (U.S.), Archivaria (Canada), and Archives & Manuscripts(Australia), within the past year), although major areas are still underdeveloped.

Currently the research in archiving and authenticity falls into four broad categories:

Preserving signals recorded on different mediaPreserving "recordness," or the attributes that ensure evidential value, which some refer to as"intellectual preservation"Preserving functionality, or ensuring software independence and interoperabilityEstablishing a social and legal standard for evidence, supported by best practices and guidelines

On the simplest level, archiving has to do with preserving bits. Because electronic recording mediaare inherently unstable, it has always been a matter of concern to ensure that the electronic signal bepreserved over time. Practical interest in denser and longer-lasting methods of storing data has meantthat the short history of electronic recording has witnessed the commercialization of a large numberof different data storage media and media formats. The rapid evolution of media has meant thatconsiderable attention has been devoted to avoiding obsolescence and developing methods to readand copy media from previous generations of systems. In general, previous media, layouts, and for-mats can be read with appropriate hardware and special-purpose software, but devising new methodsto read old signals in old media is becoming more complex as media proliferate, recording and lay-out methods become more proprietary, and firmware plays a greater role in decoding.

Archivists, and increasingly scholars, are aware that beyond preservation of bits lies the arena of pre-serving "recordness." Research into what makes an electronic document or dataset a record, and howthe constituent parts can be bound together, has become critical as communication of electronicinformation has become more widespread. In the past several years, electronic mail, groupware, anddigital image banks have forced society to confront the issue of authenticity or reliability of an elec-tronic communication and spawned much research. Most recently, research has attempted to definethe functional requirements for recordkeeping and the meta-data attributes of evidence.

Electronic records are always software dependent, but the extent of these dependencies varies widely.More and more electronic objects are not merely static entities, but parts of systems in which they

61

63

64 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

represent potential functionality. In recentyears, dynamic links, objects that affect systemstates, and data entities that respond to theirenvironment have significantly increased thedifficulty of preserving electronic records. Newquestions are arising about the concept ofmigrating functionality and the meaning ofinteroperability. Methods of overcoming, or atleast representing, software dependence overtime are critical to the survival of the record.

Finally, society has responded unevenly to thespread of electronic communication capabilities.Some new legal and professional standards havebeen established; elsewhere research is underway to define new practices and guidelines forelectronic documentation and action. Methodsfor bilateral commercial contractual communi-cation are in place, but multilateral methods arestill being studied. How to enable electronicpatient records, patent documentation, or copy-right registration, and how to ensure privacy,confidentiality, protection of proprietary infor-mation, and the management of similar infor-mation-related risks is the subject of activeresearch on the interface between sociology, pol-icy, and technology.

CURRENT RESEARCH ANDITS PROMISEWhile research continues on each new medium,to establish its life and the best conditions forits storage and use, the research agenda hasmoved beyond storing bits with the growingacceptance that the only way to preserve elec-tronic data across time is to periodically copy(refresh) the information to new storage mediaand, at appropriate timcs, to new formats.Leadership in these technical means of preserv-ing bits has belonged to the National MediaLaboratory, a spin-off of the 3M Company andthe contractor used by federal projects and bythe National Institute of Standards, whichestablishes tests for media. in recent years con-siderable research has focused on how to deter-mine the right time for media conversion, howto choose appropriate new media, and how topredict long-term costs. While this research isimportant to computer operations, it does notcontribute specifically to arts and humanitiescomputing.

The issue of the authenticity of records, on theother hand, is at the heart of all humanisticscholarship. If we do not know the context inwhich information was created, and who partic-ipated in creating it, many of the questions of

greatest interest to historians, philosophers, lin-guists, and creative artists are unanswerable.Contemporary electronic information systemsgenerally do not create or store records that sat-isfy these criteria. Not surprisingly, research intomethods of ensuring the creation and retentionof electronic evidence is a hot topic in archives,museums, and electronic libraries. The mostimportant research in this area has focused onthe functional requirements for records. It hasappeared under the corporate names of theNational Archives of Canada, the World Bank,and more recently the University of Pittsburgh.'It is recognized in the published research of TheRand Corporation"' and the Dutch Ministry ofthe Interior.' This research joins a recent threadof discussion and debate in the library commu-nity, regarding what Peter Graham of RutgersUniversity has called "intellectual preservation."Although this concern is the focus of discussionin the Task Force on Digital Archiving spon-sored by The Research LibrariesGroup/Commission on Preservation andAccess, at present it is not really the subject oforiginal research in the library community.

Current research on software dependence andinteroperability, which is not largely driven byarchival concerns, takes a relatively short-termview of the requirement to preserve functionali-ty. Little research has been done on modelingthe information loss that accompanies multiplemigrations or the risks inherent in using com-mercial systems before standards are developed,yet these are the critical questions being posedby archives. Little in these studies specificallyaddresses the humanities, except that thehumanities are particularly heavy users of olddocumentation and thus especially need todevelop means of overcoming system dependen-cies in data.

62

Margaret Hedstrom of the New York StateArchives, and the University of Pittsburgh proj-ect, have led the way in exploring the social andlegal guidelines for electronic records manage-ment. The Association for Information andImage Management has sponsored conferencesand a task force that examines these issues; theCenter for Electronic Law at VillanovaUniversity is also working in this area.' Therehas been substantial research in electronic labo-ratory notebooks and electronic patient records,but oddly little research has been done to iden-tify critical dimensions of archiving for programaudits in areas like decision support systems,groupware and team support systems, or even

traditional "management information systems"or project management environments.

Related areas of research include:

Methods for conversion of paper-basedinformation to electronic media; research atCornell University' and Yale University''are most noteworthy.Knowledge representation, including espe-cially the documentation of archives usingSGML, as reflected in the work of the TextEncoding Initiative.'"

FUTURE RESEARCH NEEDSThe most significant area for research in thenear finure is the meta-data required for record-ness and the means to capture this data andensure that it is bonded to electronic communi-cations. The announcement by the NationalInstitute of Standards of a proposed FederalInformation Processing Standard (FIPS) for"Record Description Records"' could be thestimulus for immediate research, as is the pro-posal by Standards Australia, based on theUniversity of Pittsburgh research. Continuedinvestigation of mechanisms to specify meta-data encapsulated object? and capture them inimplementations' are most promising. Over thenext five years, specifications for workgrouptools and electronic office environments willneed to have these methods built in. Large-scalenetworks, and the acceptance of electronictransactions as the preferred means of intra-cor-porate communications, will depend on meth-ods of uniquely identifying messages,controlling their access and use, and decodingtheir structure, context, and content. As the sci-entific community has come to realize,'" stan-dard meta-data, grounded in a continuallyupdated understanding of disciplinary perspec-tives, is essential to future documentation.Unless generic, scaleable approaches for repre-senting humanistic points of view are developedsoon, the history of modern societies in the latetwentieth century will be extremely incomplete,to the detriment of future scholarship in allhumanities fields."

Ongoing applied research on t'n.,! archival signif-icance of dynamic documents, objv:t-orientedsoftware environments, and interoper4bility isneeded in the mcdium term. There is very littleactive work in this arca, but the potential bene-fits to archives would be substantial if even suchbasic questions as the best ways to avoid loss offunctionality in software migrations were

ARCHIVING AND AUTHENTICITY 65

answered. Solutions to most of these problemswill need to involve collaborations berweentechnologists, archival participants, and poten-tial future users. Such research projects can beexpected to be relatively costly and of extendedduration, and will be ongoing as new function-alities are propagated. Yet unless such softwareindependence can be achieved, we can hardlyimagine the widespread acceptance of intetac-tive documents or multimedia and visualiza-tions within traditional communications.

Within organizations, archivists must find auto-matic means of identifying the business processfor which a record is generated. Such data mod-eling will become increasingly critical in an eraof ongoing business re-engineering. If recordsare retained for their evidential significance andfor a period associated with risk, then certainknowledge of their functional source is essentialto their rational control. If they are retained forlong-term informational value, knowledge ofcontext is necessary to understand their signifi-cance. Work in these areas will be stimulated bystandards such as those drafted by StandardsAustralia and NIST in the spring of 1995.

Concrete work on social and legal issues will bebest focused on identifying warrant for archivalfunctional requirements in professional andorganizational practice, locating requiredchanges in law in such areas as privacy, freedomof information, and protection of proprietaryrights and in applications such as electronicpatient records, electronic laboratory notebooks,and contractually obligating electronic commu-nications and commerce. While progress can beexpected in all these areas anyway, a concertedresearch agenda would coordinate findings, has-ten the arrival of the fully electronic society, andmake it possible to realize the benefits of elec-tronic records within the next decade. Muchwork on attributes of electronic business systemsis being conducted in these areas, but it is cur-rently little informed by professional archivists.

Ultimately, we must research the use of elec-tronic records after their value for accountabili-ty has been realized. How and why are theyused? What value does their information havefor users, and is the value of information inrecords created for other purposes commensu-rate with the value of information contained inself-consciously created information sources,such as books and articles? What do we need toknow about the content of records to justifydiscovering and retrieving billions of them

63

66 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

across heterogeneous environments? What doesthe subsequent use of records itself tell us aboutthe nature of society in the years since the cre-ation of the record and the transaction it docu-rnents? Here a lead could be taken by archivists,but little substantive research has been under-taken to date except in the area of defining therequirements for networked information discov-ery and retrieval."'

It is now evident that we can envision a world inwhich virtually all records are digital, includingmuch of the knowledge of the past. How can wemake our solutions to retention, access, andpreservation of the digital cultural heritage ofthe world scaleable? What cost-efficiencies canwe achieve over keeping paper records and mak-ing them available through libraries, archives,and museums when we are deploying systems ofdistributed control and access spanning allrecords? Future research will need to focus on avariety of implementation issues having to dowith intelligent information seeking, end-to-enddelivery, and migration of data on a universalscale. Again, very little has been done in thisarea, although recent progress implementingGovernment Information Locators using theZ39.50 protocols suggest some of the potentialfor a Global Information InfrastruLture locatorand document delivery service."

REFERENCES

Earlier Research Agendas/Overviews

Bearman, David. "Electronic Evidence."Archives and Museum Informatics.Pittsburgh: 1994.

Hedstrom, Margaret. "Introduction to 2020Vision." American Archivist 57 (1994): 12-16.

. "Understanding ElectronicIncunabula." American Archivist 54 (1991):334-354.

U.S. National Historical Records andPublications Commission. Research Issues inElectronic Records: Report of a WorkingMeeting. St. Paul, MN: Minnesota StateHistorical Society, 1991.

. Suggestions for Electronic Records GrantProposals. Washington, DC: NationalHistorical Records and PublicationsCommission, 1995.

Physical Care

Association for Information and ImageManagement. Resolution as it Relates toPhotographic and Electronic Imaging. AIIMTR26-1993.

Conway, Paul, and Weaver, Shari. The SetupPhase of Project Open Book: A Report to theCommission on Preservation and Access on theStatus of an Effirt to Convert Microfilm toDigital Imagery. Washington, DC:Commission on Preservation and Access,June 1994.

Kenney, Anne, and Chapman, Stephen.Digital Resolution Requirements for ReplacingText-Based Material: Methods forBenchmarking Image Quality. Washington,DC: Commission on Preservation andAccess, April 1995.

National Media Laboratory/Commission onPreservation and Access. Magnetic TapeStorage and Handling for Archivists andLibrarians. 1995.

The Research Libraries Group/Commissionon Preservation and Access Task Force onDigital Archiving. Preserving DigitalInformation, version 1.0, November 30,1995.

FunctionailLogical Control

McDonald, John. Guidelines on theManagement of Electronic Records in theElectronic Work Environment. Ottawa:National Archives of Canada, 1995.

Cook, Terry. "It's Ten O'Clock, Do YouKnow Where Your Data Are?" TechnologyReview 98 (January 1995): 48-53.

. "Electronic Records: Paper Minds."Archives and Manuscripts 22 (1994): 300-328.

Rothenberg, Jeff. "Ensuring the Longevityof Digital Documents." Scientific American272 (January 1995): 42-47.

Bikson Tora, and Frinking, E.J. Preservingthe Present: Thwards Viable Electronic Records.The Hague: Sdu Publishers, 1993.

Hofman, J., ed., Het Papieren TijdperkVoorbij: Belied voor een digitaal geheungenvan onze samenleving [Beyond the PaperEra]. The Hague: Sdu Publishers, 1995.

University of Pittsburgh RecordkeepingFunctional Requirements Project. Reportsand Working Papers. Vol. I(LIS055/LS94001), Vol. 2 (LS95001).Pittsburgh: School of Library andInformation Science, 1994-95.

Social and Legal Guidelines/NewOrganizational Arrangements

Hedstrom, Margaret. Building Partnershipsfor Electronic Recordkeeping: Final Reportand Working Papers. Albany, NY: New YorkState Archives, January 1995.

. Guidelines for the Legal Acceptance ofPublic Records in an Emerging ElectronicEnvironment. Albany, NY: State Archivesand Records Administration, 1994.

ARCHIVING AND AUTHENTICITY 67

. "Finders Keepers, Losers Weepers:Alternative Program Models for Identifyingand Keeping Electronic Records." PlayingFor Keeps. Conference Proceedings.Canberra: Australian Archives (1995): 21-33.

National Research Council. PreservingScientific Data on our Physical Universe: ANew Strategy for Archiving the Nation'sScientific Information Resources. Washington,DC: National Academy Press, 1995.

Proposed Standards

National Institute of Standards andTechnology. Proposed Federal InformationProcessing Standard (FIPS) for "RecordDescription Records." Federal Register(February 28, 1995): 10832.

Standards Australia. Draft AustralianStandard: Records Management. DR95I94-95199. 1995.

NOTES

John McDonald, principally.

David Bearman, Richard Cox, and Ken Sochats.

Tora Bikson and Jeff Rothenberg.

i" Peter Waters.

' Henry Perrier, principally.

i Anne Kenney, principally.

"6 Paul Conway, principally.

Daniel Pitti (UC Berkeley) and Susan Hockey(Center for Electronic Texts in the Humanities).

Federal Register. February 28, 1995, p. 10832.

D. Bearman, K. Sochats.

M. Hedstrom, J. McDonald, P. Waters.

IX

xi

xil Preserving Scientific Data on our Physical Universe,Washington, DC: National Research Council,1995.

xli Jane Sledge et al., Getty Art History InformationProgram.

xi" C. Lynch et al.; CNI study team; G. Marchionini.University of Maryland.

x' B. Hakin et al., Harvard University.

xv. E. Christian, U.S. Geological Survey.

If

AMIE

NEW SOCIAL AND ECONOMIC

MECHANISMS TO ENCOURAGE ACCESS

John Garrett, Corporation for National Research Initiatives

Presumably man's spirit should be elevated if he can better reviewhis shady past and analyze more completely and objectively hispresent problems. He has built a civilization so complex that heneeds to mechanize his record more filly if he is to push his exper-iment to its logical conclusion and not merely become boggeddown part way there by overtaxing his limited memoty. Hisexcursion may be more enjoyable if he can reacquire the privilegeof forgetting the manifold things he does not need to have imme-diately at hand, with some assurance that he can find them againif they prove important.'

Vannevar Bush understood the multi-polarity of technologically induced and -supported change:computing, scholarship, and society weaving an intricate dance, each responding to and in turn gen-erating a complex web of new and old forces, institutions, rules and standards, ideas. Reviewing thesettings in which these transformations occur is a requisite first step toward assessing their impact onscholarship in the arts and humanities.

This paper discusses the interplay between distributed networked computing and creativity and schol-arship in the arts and humanities. The first section provides an overview of certain elements of thisevolving relationship, including role transformation and agents as well as inhibitors of continuing con-current development. The next section discusses four major uses of networked computing for the artsand humanities, and the final section identifies an agenda for further research and development.

ROLES, RESPONSIBILITIES, EXPECTATIONSOver the last several years, traditional distinctions among key actors and activities within scholarlycreation and communications have begun to disappear. Words like "creator," "publisher," "user,""work," "doc ument," "institution," "record" have become problematic, as the activities they representand the borders that separate them have blurred. Original source material (such as the recently dis-covered cave paintings in France, and the Whitman and Vatican archives) are increasingly availableto all users of the Internet/World Wide Web. Internet discussion groups lack traditional status mark-ers (such as "Doctor," "professor"): according to the by now well-known New Yorker cartoon showingtwo dogs seated in front of a computer terminal, "On the Internet, nobody knows you're a dog."The lack of status markers can empower institution-free research: the demarcation between academicand private scholarship, already dissipating in the sciences, is difficult to sustain when majorresources and outlets for research are widely distributed.

Parallel transformations are taking place in the major institutions that sustain and utilize arts andhumanities scholarship. Scholarly publishers (in the arts and humanities, largely but not exclusivelysmaller publishers and societies) feel threatened by alternative modes of dissemination (by individualsand libraries, for instance) and the proliferation of peer-reviewed electronic journals accessible on thcInternet. Some of these journals, such as Psycologuy and the Bryn Mawr Classical Review, have anInternet circulation that greatly exceeds the subscription list for many print journals.

Research libraries face similar uncertainties: budget reductions coupled with continuing priceincreases for scholarly books and juurnals have forced even the largest, best-endowed libraries to con-sider access rather than ownership as a key measure of excellence. But ensuring access to research

6b

70 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

information also means replacing the currentlibrary-centric system with a multi-institutionalmodel supporting distributed information man-agement, with associated structures for con-tracting, budgeting, billing, and payment. Andthe increase in electronic access to originalmaterial means that museums and galleriesmust change as well. The technical require-ments of distributed dissemination and owner-ship of scholarly information are relativelystraightforward; the institutional ones are diffi-cult to define, and much harder to resolve.

BOUNDARIES AND BOTTLENECKSThe pace of change is rapid, and difficult toassess. Several other bottlenecks, arising fromthe complex transition from traditional to net-work-driven scholarship, are worth mentioningas well. First, the universes of discourse in thearts and humanities and in computing are fun-damentally different. To oversimplify quite abit, the humanities and the arts are about struc-ture, -!ialogue, insight, and expanding frame-works; computing is about answers. Computerscientists are more uncomfortable with theWorld Wide Web than humanists are: it's goodat generating questions, bad at answering them.

Traditionally, one must pass through at leastthree key gates (with their gatekeepers) in orderto become a recognized scholar: complete thedissertation, be hired by the right institution,get tenure. Not only in the arts and humanities,but even in the sciences, computer-assisted

Creator PrivateAccess Only

scholarship and dissemination have little if anyrole in these critical processes. At a recent con-ference, participants rejected as totally unrealistica five-year goal of tenure entirely supported byelectronic scholarship. Without movement inthis direction, however, only already tenured andprivate scholars will be able to make full use ofthe power and promise of computer-supportedresearch and dissemination.

In a networked world the lines separating cre-ator, publisher, library, and museum becomeblurred. Further complicating the situation areuncertainties about the basic nature of electron-ically created and disseminated information. Ina print-centric world, for example, the differ-ence between an original and a copy is obvious;it is difficult to alter the text of a book or pic-ture without leaving traces. But there is no dis-cernible difference between an orighial ard aninstantiation of a computer-accessible book orpicture, and alterations are hard to identify andtrace. Furthermore, the difference between pub-lished and unpublished print works is under-stood; in a networked world, electronic mail(for instance) is owned by its originator, andprobably (usually) unpublished.

Rather than looking for new roles (with newboundaries) to replace the older ',TICS, it mayhelp to think about managing annuli, or zonesof progressive release (see figure below)."Note that this model includes no roles, onlyprocesses. Roles bear assumptions about the

TERMS AND CONDITIONS

Private Use smer,/

NEW SOCIAL AND ECONOMIC MECHANISMS TO ENCOURAGE ACCESS

present into the future, while processes are easi-er to define and debate.

THE USES OF COMPUTING IN THEARTS AND HUMANITIESFour potential contributions of computing tothe arts and humanities are discussed here:resource identification; analysis; collaborationand re-creation; dissemination. Each canincrease access to information in the arts andhumanities, despite significant social, economic,organizational, and technical challenges.

Resource IdentificationLycos (the largest World Wide Web searchengine) currently indexes more than sixteenmillion home pages. By the time this study isdistributed, there will be several million more.In addition, there are several thousand Internetand Usenet mailing lists, and thousands moreon private systems like CompuServe andAmerica On Line. Traffic on the Internet con-tinues to double about every eight months.

Currently, Internet/World Wide Web users dis-cover resources by means of an intricate meshof personal relationships (often mediated byelectronic mail), hyperlinks to related resources(as defined by the link's creator), print and elec-tronic directories, and serendipity.

This process is frustrating and time-consumingat best, intensified by the intrinsic uncertaintiesin the Internet (e.g., whether a resource hasmoved or disappeared, and whether it can bereached). Improving resource discovery is less atechnical than a social and organizational prob-lem, bringing to bear thc skills of scholars andlibrarians: scholars to direct the construction ofdomain ontologies, for example, and librariansto generate and manage distributed subjectmatter and ensure access to and coherence of agiven collection.

The explicit and implicit systems for assessingvalue in the print world are scarce or absent innetworked information: peer review across a fullrange of disciplines; the standing of the particu-lar publication, gallery, or museum; the back-ground, experience, and credentials of thcauthor or creator. Except for a few peer-reviewed electronic journals, these value mark-ers have not been translated into the digitalworld: indeed, resistance to externally mandatedassessment is rooted deep in Internet culture.Furthermore, librarians have traditionallyfocused or developing collections and identify-

ing resources rather than assessing the value ofany particular information resource in relationto a specified need. What will be greatly neededare automated summarization, integration ofrelated works into single multimedia docu-ments, and automated tracking of the originand evolution of particular works. As value-added services evolve, users will demand qualitystandards; at present, neither the tools nor thesocial and economic infrastructure exist to sup-port them.

Analysis

Structured digital archives like ARTFL (forFrench language and literature) permitresearchers to search a document corpus andlocate related texts within and among variousdocuments. Advanced programs make it possibleto use semantic analysis to compare the styles ofvarious works and authors. For some time, data-base programs have allowed users to introducecomplex statistical analyses into arts and human-ities scholarship (e.g., cliometrics in history).

These investigations are possible because thefundamental elements (words, sentences, andparagraphs) of written and oral communica-tions are clearly defined for any given language,and carry a shared constellation of meanings.For pictorial or sound works, however, the situ-ation is murkier. Currently, works in non-textu-al media are cataloged by attaching to each ofthem sets of descriptive words using a prede-fined structure and vocabulary. These wordspermit a searcher in a photogiaphic archive, forexample, to find pictures of sunsets, or boats ina harbor; depending on the conventions used todescribe the photographs, finding pictures ofboats at sunset may also be possible. Despiteextensive research, tools for identifring similarpictures, for instance, are erratic and primitive;it is hard to imagine a social infrastructure andtechnology that would provide a helpful answerto a question like "I want more music whichmakes me feel like the last piece did."

Collaboration and Re-CreationNetwork-supported scholarship is intrinsicallycollaborative. Electronic mail, for instance, per-mits physically separated colleagues to colabo-rate on research and publication. Equallyimportant, Internet listservs help researchersidentify others who share common interests,which may ultimately lead to new, collaborativeresearch projects. Finally, networks supportexpanding authorship. In the last several years,the average number of authors of scientific

68

71

72 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

papers has increased significantly: in some sci--entific disciplines, papers with 100 or moreauthors are not uncommon. These new capabil-ities contradict the traditional model of the soli-tary scholar seeking tenure, or the lone painterin her attic at midnight.

Such collaborations require support from newmodels for identifying and managing author-ship and ownership. Clearly, increasing fromhundreds to thousands of authors for individualworks simply exacerbates the problem, but cur-rently there are no clear methods for establish-ing and measuring the relative contributions ofeach. In fact, it is hard to imagine how suchmethods might operate: how much credit, forinstance, should go to an author who wrote halfan article, as opposed to another who providedthe critical insight but wrote none of the words?These problems are difficult enough in staticenvironments. In a networked, digital world,works will be created, revised, and expanded;new media will be incorporated; links to exter-nal resources will be generated; the resultingwork may not share a single sentence or imagewith the original one, despite a clear chain ofprovenance. Whose work is it? Legally?Intellectually? Morally?

Dissetn;nation

Inextricably linked to evolving systems for col-laboration and re-creation of information arenew methods of disseminating scholarly resultsin the arts and humanities. The proliferation ofscholarly subspecialties has led to an increase inthe number, and narrowing of the scope, ofscholarly publications. With circulation declin-ing as a result of budget reductions for libraries,among other factors, it is increasingly difficultfor scholarship in the arts and humanities tofind an audience. Artists and composers facesimilar obstacles.

Networked dissemination via theInternet/World Wide Web substantially reducesthe barriers to entry, and lowers the cost of dis-semination. For example, setting up a Web siteto display an artist's works requires only a net-work connection, one of the several Web elec-tronic or print manuals, and patience. And newWeb sites (particularly if they are announced viaNCSA's "What's New" page, for example) willbe sought out by Web surfers. Absent standardsof assessment," such as the institutional trap-pings of peer review, private Internet dissemina-tion or distribution through a non-reviewedelectronic journal are unlikely to further tradi-

tional careers. There is a real risk that individualdisciplines will develop an intensified version ofC.P. Snow's two cultures: one lodged in univer-sities and print, the other everywhere else.'"

AGENDA FOR FURTHERDISCUSSION, RESEARCH, ANDDEVELOPMENT

InfrastructureArtists and humanists depend on a reliable, pre-dictable, coherent, and comprehensive informa-tion infrastructure. Users of maj,)r researchlibraries, for instance, can depend on well-orga-nized, comprehensive collections; consistentintellectual coherence from one library toanother; and timely access to the majorresources required. These systems, in turn, aresupported by common sets of expectations andstandards, painfully developed over many yearsin the library and museum communities. Whilecertain coherent standards (such as URLs(Uniform Resource Locators) and Internet pro-tocols) already exist in the universe of digitalinformation, other important ones (includingnaming, registration, and archiving conven-tions) are required. Further, the distributed,centrifugal force of the Internet is not alwayscompatible with the centripetal force of shared,consistent protocols and standards.

The World Wide Web amply demonstrates thata system dependent on URLs does not scaleupward easily. URL-identified servers move ordisappear; popular sites are inaccessible owingto burgeoning demand; location-dependentmirror sites are rapidly submerged in requests.Location-independent naming conventions(such as the handle system developed by theCorporation for National Research Initiatives),which are easily resolved into the location(s) ofthe digital information, would address thisproblem. But standardizing around any particu-lar convention is difficult for the Internet. Inthe meantime, the standards of coherence andreliability represented by libraries and museumswill be lacking for many types of networkedinformation.

Global, consistent naming conventions derivetheir usefulness from standardized methods forregistering digital information objects. Systemsare required that permit creators and theiragents to register the existence of a particularinformation object, determine the terms andconditions for its use, and identify which if anydigital library systems are authorized to store

69

NEW SOCIAL AND ECONOMIC MECHANISMS TO ENCOURAGE ACCESS 73

and disseminate it. In addition, reliable record-ing systems are needed to allow potential usersof information to identify, who owns what. Thetechnical requirements for these systems are wellunderstood; the organizational frameworkremains to be developed.

For centuries, libraries and museums have pro-tected rare works of art and scholarship fromdestruction. In a networked environment, how-ever, there gre no straightforward methods todetermine thai a particular byte stream is, infact, the last instantiation of a given work.Culturally, it is easy to delete a message, muchharder to throw away a book.' Technically, itmight be possible, for instance, to link any raredigital information object to :7_ program thatsearches the Net for another instantiationbefore permitting itself to be deleted. Building acommon framework supporting institutionalcooperation across millions of digital collectionsand billions of information obiects over hun-dreds of years will be much more difficult.

Already, recently developed digital informationobjects (such as the 1960 U.S. census and someearly NASA data) are inaccessible owing toarcane and untranslatable data snuctures. Thereare cemplex technical and organizational prob-lems in refreshing large volumes of digital infor-mation to ensure compatibility with newformats. The Task Force on Archiving of DigitalInformation, sponsored by The Commission onPreservation and Access and The ResearchLibraries Group, is reviewing these issues andwill present its findings in the summer of 1995.

Enhancing Access

The idea of access embodies several distinct,potentially divelgent models of technology, rela-tionships, and the individual creator or user. Onemodel defines a funnel from (ideally) potentiallyinfinite information resources at one end to (ide-ally) a specific answer to a stated question at theother: a historian seeking a date or a geographerlooking for a map, for instance. While this modelmay support limited interaction between infor-mation seeker and information resource, the pur-pose of the interaction is to narrow the funnel,not expand it.

Several new capabilities are required to supportthis model of access. As mentioned earlier,methods are needed for determining aadattaching quality assessments to informationresources, tuned for particular purposes; so arcautomated techniques to condense, summarize,

integrate, translate, invoice, and pay for infor-mation from different sources. Underlying thesetechnologies, social and organizational struc-tures are required for building and supportingflexible domain-specific ontologies.

A different model, of which browsing is anexample, seeks common threads among appar-ently disparate information resources. Here,interactions between user and resource generallyfocus on expanding the funnel, or altering thecourse of the information flow. Tracing theWorld Wide Web's hyperlinks, for example,leads a user along intricately woven pathsdefined by each Web page's creators, endingonly with exhausticn of the user's time, money,or patience.

A third model focuses on a dialogue betweenthe user and a set of information resources(including its crea...-,r and other users); theinformation resource provides a framework forinitial exchanges, which may result in new ortransformed resources that may initiate new dis-cussions. This model links the network as aninformation resource with the network as aframework for interchange (demonstrated, forinstance, in Internet chat and mailing lists). Atleast primitive technologies exist to support allthree of these models; only the second one(hyperlinks) is widely supported at present.

This model depends on a range of capabilitiesthat are only just being identified. First, itrequires seamless links between and among per-sonal, collaborative, and public work and playspaces, dynamically controlled by the user. Theannuli model of progressive release, outlinedabove, provides an initial version of this capabil-ity. A multi-dimensional workspace, for exam-ple, would permit a creator/user (an artist, apoet, a scholar) to manage dialogues about par-ticular works along a path from private to pub-lic, determining at every point whatinformation to retain, what to seek, what toshare, when to talk, when to listen?

Second, this model mandates seamless linkages,controlled by the creator or user, among informa-tion objects in all media. It should be straightfor-ward, for example, to add voice or video toelectronic mail; or to participate in a virtual con-ference, seated at a virtual conference table,observing the expressions and movements of one'svirtual colleagues; or to translate speech to text,and text to speech.' It should be possible to carryon most aspects of our private and public lives,

70

74 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

choosing face-to-face contact when it is desired,not when it is required for communication.

CONCLUSION"The historian, with a vast chronologicalaccount of a people, parallels it with a skip trailwhich stops only at the salient items, and canfollow at any time contemporary trails whichlead him all over civilization at a particularepoch. There is a new profession of trail blazers,those who find delight in the task of establish-ing useful trails through the enormous mass ofthe common record. The inheritance from themaster becomes, not only his additions to theworld's record, but for his disciples the entirescaffolding by which they were erected."

How far are we from achieving Bush's vision?'ho will be the trailblazers? What social andeconomic mechanisms will be required to sup-port trailblazers in the arts and humanities, aswell as those who come after?

These questions need to be asked and answeredin and through a complex, dynamic dialogueamong multiple communities of practice,including individuals and institutions in the artsand humanities and computing, libraries, librari-ans and information scientists, policy makers,creators, publishers, distributors of print, sound,visual, multimedia, and digital information, pri-vate scholars, students, and many more. Thedialogue involves speakers, listeners, and thespoken-for: all too often, the views of (forinstance) artists, humanities scholars, and librari-ans have been presented by others.

A major purpose of this paper, and the continu-ing discussions it is intended to stimulate andframe, is empowering the spoken-for to speakfor themselves, by finding a shared languageand a collective voice. Bush began this dialoguefifty years ago, and Bush's vision remains power-ful because it encapsulated technology in serviceto larger intellectual and social goals.Negotiating those goals, and identifying thetechnologies that will serve them, remains assignificant and challenging as it was for Bush. Itis time for new voices to be heard, and newaudiences to hear them.

NOTES

i Bush, Vannevar. "As We May Think." TheAtlantic Monthly (July 1945): Section 8, paragraph9, page 14. [Pagination of the HTML version willdiffer from this citation, which refers to theASCII version available over the Internet.)

ii The claim that a million monkeys typing at a mil-lion word processors for a million years wouldsooner or later produce the works of Shakespearehas been disproved by the Internet (anecdotecourtesy of Michael Lesk, itinerant sage).

iii A friend of mine teaches selected essences ofdeconstructionist theory to computer science stu-dents. Since it matches their model of the world,they find it generally straightforward and obvious.

iv At a recent conference, I proposed the followingcriteria for determining the effectiveness of a glob-al system of digital libraries: that within five years,it would be as easy to throw away a book as todelete a message. There was an audible gasp fromthe audience.

v Buckminster Fuller used to tell a story about aMaster of one of the colleges of CambridgeUniversity, who noticed a deep crack in the mas-sive beam supporting the college's dining hall. Notknowing where to report it, he eventually notifiedthe Royal Forester. who told him that he had beenexpecting the call. The Forester's predecessor's pre-decessor had planted the tree for the new beam,and it was ready. This, Fuller noted, was how asociety ought to work.

vi The last goal has been straightforward (and elu-sive) for thirty years.

vii Bush, Section 8, paragraph 2.

rit

TOPICAL INDEX

TOPICAL INDEX TO THE PAPERS

PAPER I. TOOLS FOR CREATING AND EXPLOITING CONTENT

KOLKER AND SHNEIDERMAN

27 Diversity of the state of the art

27 Disparity of equipment and access; mostly less than ideal

27 Few involved full time in creation of tools for their disciplines

27f. Internet/networked access; primitive organization of resources and access methods

28 Exemplary Internet sites

29 Asian languages and interface research

29 Workstation software, especially parsers

29 Future need to bring electronic resources to students

29 Future needs for computer literacy of faculty

29f. I%led for interface standards and methods of content access

PAPER 2 KNOWLEDGE REPRESENTATION

HOCKEY

31 Broad issues in knowledge representation

31 Fidelity in text representation

31 Genre or form in text representation

31 Problem of representing structure and content independently

32 Role of meta-data in making implicit information more explicit

32 Text representation and the role.of SGML in preventing obsolescence

32 SGML-based text representation projects

32 Multiple parallel hierarchies in SGML as a research problem

32 HTML

33 Standards; non-text conversion at bit rather than content level

33 Representation of abstract categories such as weight, time, measure%

33 Representation of missing, incomplete, and sourced information

33 Representations as surrogates

33f. Representations as more than surrogates

33f. Versions

33 Non-linearity

34 Representing the processes/contcxt of crcation

34 Representing representation conventions employed

34 Linking; how to link objects of different modalities; research issues

34 *ryped relations and functionality

34 Representing derived knowledge

34 Representing traditional/paper sources

34 Representing legacy data

72

75

76 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

34 Automatic conversiou of representations

34 Cost factors of representation

34 Quality of representations research

PAPER 3 RESOURCE SEARCH AND DISCOVERY

MARCHIONINI

35 Remote access increases need for search and discovery

35 Need to integrate search and discovery tools with creation, use, and communication tools

35 Existing genres of finding tools need electronic analogs

35 Definition of search and discovery, and distinction between the two

35 Map conceptual space to physical locations

35 Primary, secondary, and tertiary sources all on Internet together

35 Need for dynamic updating

36 Evolving methods of string searching

36 Little progress other than in text

36 Ranking of results

36 Using domain-based knowledge in retrieval

36 Filtering and user profiles in retrieval

37 Browsing as a method of discovery

37 Guided discovery the use of links

37 Feedback of representations in discovery and browsing

37 Relevance feedback

38 Automatic indexing of resources toolsets and issues

38 Interactive interfaces and visualization as feedback

38 Value placed on variety in expression in humanities as penalty to retrieval

38 Value placed on older sources in humanities as penalty to discovery

38 Evolution of concepts over time as penalty to retrieval

38 Multilinguality as penalty

38 Data acquisition costs in humanities

39 Imprecision of audience

39 Need to combine multiple approaches and integrated methods

39 User perspectives

39 Thesaurus merging

39 Commentaries, pathfinders, and too:s with a point of view

39 Levels of knowledge-based access

39 Multilingual issues

39 Critical mass

39 Pattern matching

39 Audience analysis/feedback

39 Readers as authors

40 Bibliography on search and discovery research

PAPER 4 CONVERSION OF TRADITIONAL SOURCE MATERIALS Iwo Mum FoRmKENNET

41 Digital surrogates for paper/film

41 Problem that bit maps aren't indexable or searchable

41 History of digital text surrogacy efforts

TOPICAL INDEX 77

41 Purpose of surrogacy in adding value for analysis

41 History of digital image surrogacy efforts

42 Why digital capture became cheaper in 1990s

42 Continuing demand for content/keyword searching

42 Research issues posed by large compendia

42 Other large-scale projects

43 Research issues in the use and impact of digital surrogates

43 Capture and quality standards

43 Near-term research: benchmarks for quality by purpose

43 Near-term issucs evaluation criteria

43 Near-term issues production and throughput

44 Color management

44 Automatic capture settings

44 Image transmission and end-user perception of usability

45 Intelligent files

45 Pattern matching and object recognition

45 Rastervector conversion and functionality

45 Compression research

45 Cost-effectiveness

45 Automated selection and control

46 Business case

46 User needs and perceptions

46 Display dramatic improvement needed

46 Bibliography on digital capture

PAPER 5 IMAGE AND MULTIMEDIA RETRIEVAL

ROMER

49 Lack of both tools and approaches for multimedia cataloging

49 Software for image databases proprietary and weak on retrieval

49 Lack of tradition in image cataloging

49 Retrieval results based on data representation

50 Non-textually-based retrievals and auto-indexing

50 Can text-based approach enhance image-based approach?

50 Research on what is meant by similarity in different modalities

51 Map to languages and symbols

52 Formal properties of genres in different modalities

52 How to escape from words

52 Content attribute identification within images

52 State of art still too primitive and domain specific

52 User-based search models and points of view research

53 Layered questions, layered representations

53 Visual thinking thought processes need to be understood

53 Likeness as a criterion

52f. Points of view as future research need

54 Evaluation of text-based retrieval results

54 Media-based significant attributes need to be identified

54 Visual thesaurus functionality

78 RESEARCH AGENDA FOR NETWORKED CULTURAL HERITAGE

55 Representing sets of images rather than individual images

55 Issues in music representation

55 Motion representation schemes

55 Use of image and multimedia depends on quality and purpose of their representations

55 Representations are multiple and acquired throughout life cycle, need coordination

55 Bibliography on image representation and retrieval research

PAPER 6 LEARNING AND TEACHING

MURRAY

57 New medium will make possible new methods of teaching and learning

57 Used to date in skill-based disciplines writing and reading

57 Many early research tools now embodied in commercial software

57 Foreign-language instruction, especiaily laboratory and online exams, benefit

58 New approaches to instruction developed to use communicative capacity of multimedia

58 Purpose to use teacher in role of task designer rather than sole source of information

58 Grammat and pronunciation practice still require better technology

58 Simulations to teach history

58 Electronic textbook technology and the market

59 Corpora and rich webs in disciplines and specialties

59 Hypermedia archives around single authors

59 Lack of systematic coverage or coordination of hypermedia projects

59 Need for standard for text management software

60 Promise of multimedia for media studies affected by legal and delivery issues

60 Non-linear authoring, for creative writing, needs better tools

60 Information retrieval and editorial review critical for Internet

61 Rise of distance learning demands research on how learning takes place

61 Need to explore course-length hypermedia packages

61 Redesign of classrooms needs research/implementation

61 Evaluation methods

61 Natural language processing and speech recognition promising for language teaching

61 Hypermedia authoring and reference environments urgently needed

61 Creative arts software support for non-linearity required

PAPER 7 ARCHIVING AND AUTHENTICITY

LORMAN

63 Humanistic studies depend on attribution, sourcing, and context

63 Long-term intelligibility and usability is a necessity

63 Proliferation of studies in past few years cited

63 Preserving bits requires recopying media

63 Preserving recordness requires meta-data

64 Preserving functionality requires robust representations

64 Cultural and legal concepts of evidence

64 Current research on preserving uits not very important to hamanities

64 Current research on recordness is critical to humanities

64 Little current research on preserving functionality

64 Cultural concept of evidence research

65 Related areas in digitization and knowledge representation

TOPICAL INDEX 79

65 Future work in meta-data standards for evidence

65 Future research in collaborative tools

65f. Future research needs in migration and dynamic document management

65 Future research required in business processes and self-documenting records

65 Literary warrant for evidence research

65 Issues in use of records

66 Problems of scaleabiiity and implementation

66 Bibliography of recent electronic archiving research

PAPER 8 NEW SOCIAL AND ECONOMIC MECHANISMS

GARRET

69 Interplay between technology, scholarship, and society

69 Traditional roles in scholarship breaking down (creator, user, publisher etc.)

69 Scholarly publishers, libraries, journals all changing and threatened

70 Gatekeeper roles blurred

70 Original and copy, hence the act of creativity itself, is blurred

70 Possible approach is to manage processes in life cycle of ideas

71 Resource identification systems

71 Evaluation, automatic summarization, and integration of sources

71 Analysis of the characteristics of large databases

72 Collaboration tools with mechanisms for assigning responsibility and credit

72 Impact of lowered entry barriers for scholarship/publishing

72 Need for reliable, standard infrastructt.

72 Location-independent naming of objects

72 Registration methods for digital objects

73 Methods to prevent destruction of last copy/archive copy

73 Methods to ensure usability of digital objects over long term

73 Methods to increase precision in searches

73 Methods to increase recall with and beyond browsing

73 Dynamic, interactive dialogue in retrieval

74 Mechanisms to support trailblazers

76

80 RESEARCH AGENDA FOR NETWORKED CULTURAL HER1TAG.

GLOSSARY

AAT Art & Architecture Thesaurus SGML Standard Generalized Markup

AHIP The Getty Art History Information Language

Program TEl 't Encoding Initiative

ARTFL A database of French language and TLG 7 esaurus Linguae Gmecaeliterature URL Universal Resource Locator (address

CETH Center for Electronic Texts in the on World Wide Web)Humanities USGS U.S. Geological Survey

CHIO Cultural Heritage Information WAIS Wide Area Information ServerOnline, a CIMI project

Web, WWW World Wide WebCIMI Computer Interchange of Museum

Information XDOD Xerox [document system]

CNI Coalition for Networked Information Yahoo A search engine on the World WideWeb

FAQ Frequently Asked Questions

FIPS Federal Information ProcessingStandard

GIS Geographical Information System

H-Net A group of 57 listservs in the human-ities

HTML Hypertext Markup Language

IATH Institute for Advanced Technology inthe Humanities

ICONCLASS A computer-based system for classify-ing iconography

IOLS integrated online library system

ISO International Standards Organization

LCTGM Library of Congress Thesaurus ofGraphic Materials

Lycos A search engine on the World WideWeb

MARC Methodology for Art Reproduction inColor (also, Machine ReadableCataloging)

MOO Multi-User Dungeon, ObjectOriented environment

MTF modulation transfer function

NEH National Endowment foi theHumanities

NRC National Research Council

NSF/ARPA National ScienceFoundation/Advanced ResearchProjects Agency

optical character recognition

portable document format

Query by Image Content

Rochester Institute of Technology

The Research Librai ies Group

The Research Libraries InformationNetwork

OCR

PDF

QBIC

RIT

RLG

RUN


Recommended