AbstractWorldwide initiatives toward digital library (DL) support for electronictheses and dissertations (ETDs), facilitated by the work of theNetworked Digital Library of Theses and Dissertations (NDLTD), are a keypart of the move toward open access. When all graduate students learnto use openly available ETDs, and have experience with authoring andsubmission in connection with their own research results, it will be easyfor them to continue these efforts through other contributions to openaccess. When all universities support ETD activities, they will be keyparticipants in institutional repositories and open access, and will haveengaged in discussion and infrastructure development supportive offurther open access activities. Understanding of open access also canbe facilitated through modeling of all of these efforts using the 5Sframework, considering the key aspects of DL development: Societies,Scenarios, Spaces, Structures, and Streams.


5S (societies, scenarios, spaces, structures, streams). Curricula.DL (digital library). ETD (electronic thesis or dissertations).NDLTD (networked digital library of theses and dissertations).Open access. OAI (open archives initiative). Standards.

ETDs, NDLTD e acesso aberto: uma perspectiva 5S

ResumoIniciativas internacionais para o suporte de teses e dissertaçõeseletrônicas (ETDs) através de bibliotecas digitais (DL), facilitadas pelotrabalho da Biblioteca Digital em Rede de Teses e Dissertações (NDLTD),são um fato chave no caminho ao acesso aberto. Quando os alunos depós-graduação aprenderem a usar as ETDs disponíveis e tiveremexperimentado a criação e a submissão dos trabalhos resultantes desuas pesquisas, ele serão participantes ativos nos repositóriosinstitucionais e no acesso aberto. Ao mesmo tempo, poderão se engajarnas discussões e na criação de infraestrutura que suporte o crescimentodo acesso aberto. A compreensão do acesso aberto pode ser facilitadapela modelagem 5S aplicada aos aspectos fundamentais das biblitoecasdigitais: Societies (Sociedades), Scenarios (Cenários), Spaces(Espaços), Structures (Estruturas) e Streams (Correntes).


5S (societies = sociedades, scenarios = cenários, spaces = espaços,structures = structures, streams = correntes). Currículos. DL (digitallibrary = bibliotecas digitais). ETD (electronic thesis or dissertations =teses ou dissertações eletrônicas). NDLTD (networked digital library oftheses and dissertations = biblioteca digital em rede de teses edissertações). Acesso aberto. OAI (open archives initiative = iniciativados arquivos abertos). Padrões.

Edward A. FoxExecutive Director, NDLTD; professor.

Seungwon YangPhD Student.

Seonho KimPhD Student.

Department of Computer Science, Virginia Tech, Blacksburg,VA 24061 USA.http://www.cs.vt.eduE-mail: fox,seungwon,


One of the easiest and most effective ways to promoteopen access to research and educational content involvessupport of electronic theses and dissertations (ETDs) –covering their: authoring, submission, workflowprocessing, storage, archiving, harvesting, discovery,reading, and referencing. As a result of such support, in2006 there is widespread discovery and use of thehundreds of thousands of ETDs already freely accessible,from every continent around the globe, covering alltopical areas. Advocacy activities, community building,development of standards, documentation of bestpractices, and other assistance for ETD initiatives arecoordinated by the Networked Digital Library of Thesesand Dissertations (NDLTD) [25]. It relies uponworldwide engagement of university students, faculty,and staff (especially those involved in libraries orgraduate programs) – as well as support from corporationsand initiatives operating at regional and national levels,typically related to open access, networking, library/information science, and/or graduate education.

The first author of this paper became interested in thisin 1987 [34], when UMI hosted a workshop in AnnArbor, Michigan, USA to explore how the ElectronicManuscript Project [69], which was based on using SGML[42] for electronic publishing, might relate to doctoraldissertations. It is our hope that by the 20th anniversaryof that discussion, the number of ETDs archived in asingle year will exceed 100,000. By then, we fully expectETDs to be one of the most important genre in theunfolding of the open access movement [40]. This articlesummarizes progress and plans in that direction.


NDLTD is part of the movement toward digital libraries(DLs). Early visions of digital libraries (related to, andsometimes called: content management systems, digitalrepositories, electronic libraries, institutionalrepositories, knowledge management systems, or virtuallibraries) date back to the 1960s [60] and before. Enablingresearch to promote development of DLs received initialfunding in USA [31] and other nations in the 1990s, inpart as a result of efforts of those in the computing,

library, and information science fields to explore thesynergies and applications of decades of fundamentalinvestigations [29]. By 1993 the DL field was perceivedas a hot topic [28]. By 1996 there were major annualprofessional conferences in the field [39] leading to evenlarger coordinated events for the Americas [11], Europe[22], and Asia [51], along with workshops and national/regional events.

There are many publications in the DL area, includingbrief overviews [27] and longer reviews [41]. There areonline magazines [18] and journals [52]. In connectionwith a DL curriculum development project funded for2006-2008, we examined a substantive sample of the DLmagazine and conference paper literature to help usidentify what topics relate to DL, and which of thosetopics might be considered “core”. Figure 1 summarizesour first attempt to identify the key topics that relate toDL, to specify what topics might be covered in a DL“knowledge module” (typically corresponding to aportion of a course), and to suggest how these might fitinto various DL curricula. We identified 9 modules weconsidered core, numbered 1-9, shown in the middle ofthe figure. Since these were most important for our work

on DL curricula and educational resources, we refinedthese into the set shown in Table 1. Then we were ableto study the DL literature and manually classify worksaccording to those revised 9 topics (modules). Figure 2shows the topical coverage of topics 1-9 for D-LibMagazine [18]. The colors reflect year of publication, sofor each topic it is possible to perceive the evolution,and to note shifts in degrees of coverage over the years.By way of comparison, Figure 3 shows the topical coveragefor papers in DL conferences. Figure 4 is similar, butinstead of classifying papers, we classified sessions (i.e.,small groups of papers presented together) at DLconferences.

These figures suggest that there has been significantaccomplishment by those in the DL community, andrelatively rapid movement in the directionsrecommended in the early 1990s. Further progress willbe assured if research and development activities aresupported by adequate funding programs, and guided bystandards and other types of community agreement.Since two key goals of NDLTD are to help advance thedigital library field, and to ensure that graduate studentsbecome knowledgeable about DLs, we aim to help

FIGURE 1Initial set of curricular modules for DL topics

TABLE 1Digital Curriculum Module Scopes

N° Title Content Details

1 Collection Development Digitization

Document and E-publishing Markup


2 Digital objects / Composites /


Text Resources

Multimedia streams/structures, Capture/representation, Compression/

coding: content-based analysis, multimedia indexing, multimedia

presentation rendering

3 Metadata, Cataloging, Author


Thesauri, Ontologies, Classification, Categorization

Bibliographic information, Bibliometrics, Citations

4 Architecture, Interoperability Agents, buses, wrappers/mediators

5 Spaces (conceptual,

geographic, 2/3D, VR)


Repositories, Archives

6 Services (searching, linking,

browsing, etc.)

Info needs, Relevance, Evaluation, Effectiveness

Search & search strategy, Info seeking behavior, User modeling, Feedback

Routing, Filtering, Community filtering

Sharing, Networking, Interchange

Info summarization, Visualization

7 Intellectual property rights

management, Privacy,

Protection (watermarking)

Defines the purpose of copyright and copyright protection of DL resources

Discusses the controversial issues related to privacy

Deals with technical methods to protect the authorship of resource creators

8 Social issues / Future DLs Related to DL design and development for a specific group of users or of

particular topics, and future DL descriptions or projections

9 Archiving and preservation


Long-term plans for digital resource preservation, migration, emulation, etc.

Fundamental strategies to preserve digital resources, preservation models

FIGURE 2Topical coverage in the DL magazine literature

ETDs, NDLTD, and open access: a 5S perspective


FIGURE 3Topical coverage in the DL conference literature

FIGURE 4Topical coverage for sessions in DL conferences

especially in these regards. The sections below highlightprogress and plans – showing how they contribute toopen access, and describing them by way of a frameworkthat also might be of use when describing other DL-related activities.


Internet, WWW: One of the key foundations of successin the information, computing, and communicationsarea has been the development of appropriate standards.The Internet was based on communication standardslike TCP and IP, and a growing number of protocols forservices like FTP and SMTP. The WWW was built uponinformation standards like HTML and XML, namingagreements such as URLs and DOIs, and protocolssupported by web servers and browsers. Here wesummarize key standards related to DLs (especially thoseapplicable to the efforts of NDLTD), many of which havefacilitated the movement toward open access.

Content and Formats: Since the DL field is so broad, indescribing DL standards we elect for the sake of brevityto focus on those that relate strongly to ETDs. If we startwith the actual content, the most popular are PDF andXML. Though the earliest interest in ETDs arose fromconsideration of SGML, and though some ETDs havebeen prepared (dating back to 1988) in accordance withthat standard, the cost of suitable authoring tools andtraining made widespread use of SGML for ETDsproblematic. Fortunately, XML has many of the sameadvantages of SGML, supporting descriptive markup andeven more flexible rendering (e.g., with XSLT), so hasbeen the cornerstone of a number of ETD initiatives(e.g., in Chile and France).

But ETDs are prepared by students, often with somewhatnarrow experience in the areas of word processing andelectronic publishing, and most students followcommunity practice in using Microsoft Word or similarprograms for authoring. However, since ETDs representarchival publications, and in many regions must be keptaccessible for 50 years or more, Word is generally notacceptable as the sole representation of a work.Fortunately, in the early 1990s, as the DL field wasunfolding, and the WWW was emerging, PDF appearedas a format that rapidly became popular for preservingthe rendered form and appearance of electronicdocuments [59, 8]. While from the earliest days of themove toward NDLTD it was agreed that having both alogical/descriptive (e.g., SGML or XML) and a renderedversion (e.g., PDF) of ETDs was desirable, in most

universities the expedient choice was made to launchefforts with a focus on use of PDF, along with on-demandsupport for those interested in using XML. Fortunately,NDLTD has goals of supporting continuously improvingeducation and training for graduate students, andempowerment of universities to move forward inadoption of the most effective technologies, so we seethis matter as an area for continuous improvement ratherthan a source of contention.

Multimedia: Another area of improvement regardingETDs is the extension of content types to go beyondsimple text. Electronic publishing facilitates inclusionof color diagrams and figures, photos (typically using theJPEG standard [53] now commonplace in digitalcameras), animations, audio files (now even moreaccessible through streaming servers and podcasts,usually as MP3 but sometimes using other formats), andvideos (often as MPEG-1 or MPEG-2 [15]), dependingon the quality of capture and the need to communicateprecisely). In addition, there are many special formatsemployed for a wide variety of sensors and instruments,including those related to medical and health care, aswell as science and engineering. Long lists ofrecommended standards have been prepared over theyears, but practices shift widely. For example, somestudents prefer to work with state-of-the-art technologywhile others wish technology would disappear and letthem focus on their research. Accordingly, policies lackconsistency. For ETDs, where it is important to considerpreservation for the long term, it may be best to beconservative and work with well establishedinternational standards (at least for the core of a thesis).The result should be that large numbers of future readerswill benefit from the thoughtfulness of authors. Futurereaders also can benefit from multimedia inclusions inETDs, aimed at communicating scholarly discoveries inthe most effective manner, which can result from theauthor’s willingness to supplement their subject matterexpertise with skills regarding multimedia presentationand archiving. When new technologies are employed, asa hedge against the future, such supplemental files canbe made available in a number of formats, so that at leastone version is likely to be supported years later.

Datasets: Another emerging extension of ETDs dealswith datasets. This is a hot topic in the e-science world,but clearly has wider scope. Many researchers arebecoming aware that future advancement of knowledge,not just in science but more broadly across all areas ofscholarship, depends upon long-term preservation andaccess support for raw data. Since more and more of that

ETDs, NDLTD, and open access: a 5S perspective


data is available only electronically, and since some ifnot all versions of remaining data collections also havedigital representations, keeping electronic datasets forthe long term is crucial if theses and dissertations are tolead to validation or follow-on research. Some of thesedatasets are managed by government or professionallibraries and archives, in which case students may deposittheir data and simply keep a pointer or identifier in theirETD. But all too often, the preservation of datasets isleft to the good auspices of students, faculty, or researchgroups/centers, which typically have little expertise orfinancial support for this task. Fortunately, universitysupport for ETDs can easily be extended to facilitatedataset preservation, if the datasets are stored togetherwith a submitted ETD, or are uploaded at roughly thesame time to some separate local repository. It is stronglyencouraged that universities institute policyrecommendations in this regard (in keeping withdisciplinary practices and with legal and economicprocedures and decision making processes related tomanagement of intellectual property), typically inconjunction with records management or library andarchiving activities, preferably as part of theorganization’s information infrastructure. Here again,work with ETDs can be a driver for local efforts toenhance university support for research, and to increaseinvolvement in discussion about long term needs.

Naming, hypermedia, and superimposed information:Naming is an important role for the discoverer. Havingpersistent names, which can be effectively used for thelong term to connect with named entities, is anotherkey part of the emerging global informationinfrastructure (cyberinfrastructure). Scholars have longfaced these problems, now made visible as a result of thewidespread use of (ever-changeable) URLs instead ofURNs, URIs, DOIs, or other types of stable resourceidentifiers in the WWW [78]. Persistent names areneeded for ETDs, for related datasets, for multimediafiles connected with ETDs, and for other electroniccontent described in ETDs. In addition to having meansto refer to such digital objects, it is desirable to refer toparts of such objects (e.g., a word, phrase, sentence,excerpt, paragraph, page, section, table, or figure in apublication; a face in a group photo; a tumor outlinevisible in an X-ray image; a theme being studied in amusical composition; a step in a procedure documentedin a video). Hypermedia systems may provide suchassistance, but often that is hard to sustain into thefuture. In conjunction with XML documents, there areschemes like XPath that provide appropriatefunctionality. More generally, markup schemes, like

those developed for various classes of documents throughthe work of the Text Encoding Initiative, provide genre-specific aids. Gradually, as efforts mature, for examplethe superimposed information middleware work basedat Portland State University [73], it will bestraightforward to “mark” (parts of) objects in an effectiveand persistent manner.

Metadata: Another type of supplement to an ETD is ametadata record (which may include some or all of thetypical types of metadata such as descriptive,administrative, and structural). Older scholars will recallcard catalogs, wherein a card (or several if different typesof organizational schemes were employed, based on title,author, and category) described each work in the librarycollection. As these moved to electronic versions,standards like MARC-21 [24] emerged as the mainapproach to connect author, date, publisher, categories,keywords, and other attributes to document theprovenance and to facilitate discovery of the work. Whilefull-text indexing supports a (perhaps better) way tosearch for an ETD, searching with full-text plus metadata(plus citation and other supplemental information,possibly including content-based retrieval tailored toaudio, image, and/or video content) is even better.Accordingly, the Dublin Core Metadata Initiative [17]emerged to develop metadata standards for electronicresources [100]. Thus, if an ETD has no MARC-21 orsimilar standard metadata record that has resulted fromlocal library processing, at the very least there should bea Dublin Core [101] record with as many as appropriateof the standard 15 elements (i.e., fields or attributes)specified [19]. Even better, there should be an extendedDublin Core record, which also has elements of specialimportant for theses and dissertations. Toward that end,and as a result of over three years of internationalmeetings and discussions, in 1997 ETD-ms, the first ETDmetadata standard was developed under NDLTD auspices[5]. The NDLTD Standards Committee is revisiting thiswork to extend it based on a decade of experience withETD collections and a broader international perspectiveon needs and terminology.

For NDLTD-supported resource discovery of ETDs fromaround the globe, such a standard is especially valuable.However, it is expected that university, national, andregional standards for metadata also will exist because oflocal needs. Crosswalks from those metadata standardsto ETD-ms will allow local and global situations to evolvein parallel for the widest benefit.

Ultimately, however, the quality of metadata about ETDswill depend on the training of authors to understandthat describing their works is a responsibility of documentcreators. But, at least for the foreseeable future, therealso is need for assessing that quality, improving thattraining, and supplementing the work of authors withthe efforts of catalogers (metadata librarians) and otherprofessionals. University librarians have an importantrole to play in these activities.

Harvesting: In addition to content-related standards,standards for communication protocols also have beenimportant for ETDs. Theses and dissertations areproduced in a decentralized manner, by graduate studentsattending thousands of colleges and universities aroundthe globe. Their local institution is obliged to keep copies,and in some cases policies preclude putting copies in thelibraries or collections of other organizations, so somemeans of dissemination is needed that involves the homeinstitution.

The direct dissemination of actual works is feasible fromhome institutions if each ETD has a persistent name(e.g., URI), and that name is known to an interestedparty. But discovery of relevant works, which each leadto a persistent name, typically requires somecommunication scheme.

One such popular scheme involves crawling. This is themethod employed by search engines, such as Google.However, ETDs are not always found during a crawl, andsearch engines may have trouble in provided coordinatedaccess to the various pieces and related files of an ETD(e.g., when each chapter and multimedia attachment isin a separate file). Crawling does not locate works in theDeep Web [65]. Those works are more amenable tofinding through federated search or harvesting.

Federated search is supported by the internationalstandard Z39.50 [64]. Universities or regional servicesthat store metadata about ETDs can index their localcontent and handle queries sent using the protocol forZ39.50. All sites of interest can be searched in parallel,and the results can be merged for each query by a serveror client program. However, as the number of sites beingsearched increases, performance often degrades relativeto other approaches [3]. Furthermore, quality (e.g., withregard to performance, metadata completeness,presentation of results, and search functionality) islimited by the least helpful of all the sites in a federationthat supports Z39.50.

Accordingly, many distributed services like NCSTRLhave shifted toward “harvesting” as a more appropriateway to support communities of users [3]. The mostpopular de facto standard for harvest-based services wasdeveloped by 2000 [98] as part of the Open ArchivesInitiative (OAI) [97]. A site that maintains a catalog ofETDs can expose the metadata in that catalog by runningsoftware that supports the Protocol for MetadataHarvesting (OAI-PMH) [58]. Services like ARC [61],which tries to find all OAI “data providers” and harvesttheir metadata into a single collection that covers a widevariety of sources (e.g., individual to global collection,with location or topic based scope) and genres (e.g., e-prints, pre-prints, bibliographies, student works,educational resources, reports), have very broad coverage.

There are many ways that OAI can be used to supportwork with ETDs [92]. First, if a university wishes to shareits metadata with regard to its collection of ETDs, it canselect any of a number of software systems to help. Thesimplest and most focused is etd-db [56], which grewout of efforts at Virginia Tech in the late 1990s, and isbeing enhanced further in 2006. But OAI repositoriescan have a “set” structure imposed atop the collectionof metadata records, so institutional repositories (e.g.,DSpace [71]), that aim to collect all of the types ofdocuments prepared at a university, can have a separateset for the local ETDs. Then a harvester interested onlyin ETDs, when connecting with an institutionalrepository, can restrict its request (for new works) tothose in the ETD set.

Second, since NDLTD encourages universities toenhance their DL-related infrastructure, it is appropriatethat they learn to test their ETD data provider with theOAI Repository Explorer [88]. This can help ensure thatothers can harvest desired data.

Third, and the most visible way that OAI connects withETDs, it is helpful to develop union catalogs. These canbe built using suitable harvesting procedures. One waslaunched in 2001 by NDLTD. In 2003, management ofcatalog was shifted from Virginia Tech to OCLC (actingon behalf of NDLTD) [93]. The NDLTD Union Catalogrun by OCLC [75] included 257K records from 60 dataproviders as of July 2006. It is hoped that, as use of theunion catalog grows, and more and more services arebuilt atop it, more universities will support the OAI,thereby greatly facilitating the discovery of their ETDs.

Logging: Another area where standardization can be ofbenefit for DLs is with regard to data collection, analysis,

ETDs, NDLTD, and open access: a 5S perspective


and evaluation. It is difficult to assess to what degreecollections of ETDs are popular, to find which ones aremost desired, to contrast the use of metadata records vs.full-text vs. multimedia files, or to ascertain which partsof the internet have the largest numbers of readersinterested in ETDs. One source of data in this regard isfrom DL logs; we have proposed a standard in that regard[47]. Hopefully the DL community, or subsets of it likethose working with ETDs, will log similar data about DLsystem operation and user behavior, so local and aggregatestatistics can be produced. These can provide helpfulinsights, as will be seen below with regard to thediscussion of Figures 6 and 7.


Thousands have been involved in the unfolding of theworldwide ETD initiative. Discussions have proceededsince 1987, with early events discussed in a series ofarticles in D-Lib Magazine. The 1996 paper covers earlyefforts, included US activities funded starting in 1995by SURA and the US Department of Education [34].The NDLTD acronym was retained in 1997, when thefirst word in the long name shifted from “National” to“Networked” [35], indicating a broadening of scope toserve the international community. A 1998 D-Lib papershowed how multilingual access was supported by afederated search system [79]. A two-part D-Lib seriesappeared in 2001 to summarize progress, including theshift from federated search to harvesting to supportsearching [91, 90]. Later that year a paper appeared aboutOpen Digital Libraries (ODL, see [94]), a scheme tosupport a component-based approach to DLconstruction, which was deployed to facilitate searchingof the NDLTD Union Catalog.

In 2004 Marcel Dekker published a book about ETDs[36], to supplement its other works to support scholars.This edited volume covers a broad range of internationalperspectives regarding ETD initiatives. It considers theconcerns of students, faculty, libraries, graduate schools,administrators, and technologists. There is discussionof intellectual property and copyright, of PDF and SGML/XML, and of novel modes of expression that involvemultimedia and hypermedia. Such innovation by ETDauthors has been encouraged in recent years by an awardprogram sponsored by Adobe. Adobe also has a websitewith documentation about ETD activities. Adobe fundedthe development of tutorial materials to help authorswho are creating ETDs in PDF [1].

There also is an online book about ETDs, originallyfunded by UNESCO, in multiple languages (e.g., English,

French, Greek, Spanish). The ETD Guide [72] was theresult of an international collaboration, withcontributors from, e.g., Australia, Brazil, Canada, Chile,France, Germany, and USA. Work on the Guide waslaunched in part as a result of a 1999 workshop atUNESCO headquarters in Paris [77]. NDLTD plans toprovide updates to the Guide, initially coordinatedthrough a wiki.

Further documentation about ETD initiatives hasappeared through the proceedings of a series ofinternational conferences on this topic [74]. NDLTDhas been the key sponsor. Recent meetings have been inGermany (2003), USA (2004 [57]), Australia (2005),and Canada (2006). Meetings in 2007 and 2008 will bein Sweden and the United Kingdom.

An easy way to obtain information about ETDs is fromthe NDLTD site [25]. In addition to documentation,information about membership and committees, andlinks to conference announcements and publications,one can select any of a number of services to facilitatesearching and browsing. Virginia Tech supports oneservice based on ODL [94]; a mirror version adapted forthe Chinese language is hosted by CALIS in Beijing [14].Additional search services are run by VTLS [99] (withversions of the interface, and metadata records, in anumber of languages) and Scirus [23] (with full-textindexing). Discussions are underway with a number ofsearch engine sites (e.g., Google, Microsoft) to provideadditional services to help ensure broader use of ETDsworldwide.

Virginia Tech also runs an experimental system,operating atop the search system by FAST. Seonho Kimhas been logging and analyzing activity with that system[55]. For example, Figure 5 shows his reporting of thegrowth of the number of ETDs based on their date ofcreation. It is likely that numbers will continue to riserapidly in upcoming years, as more and more institutionslaunch ETD initiatives, and as existing initiatives matureand lead to more aggressive policies on submission ofETDs to a local repository. When submission (which isdifferent from providing access) is required, the numbersgo up quite rapidly! Some institutions also haveretrospective conversion programs to digitize olderworks, either when they are requested, or as acomprehensive effort (as is being done at Virginia Technow); these also help increase the number of ETDsavailable. We look forward to when the NDLTD UnionCatalog has more than a million records, and hashundreds of thousands of works added each year.

FIGURE 5Growth in numbers of ETDs


Since worldwide activities with regard to ETDs arediverse, since NDLTD’s efforts to support these activitiesare varied, and since open access relates to a large numberof issues, it is important to have a powerful framework inwhich to characterize the situation. Since 1999 we havebeen developing just such a formal framework for DLs[30]. Key aspects of our 5S framework are summarized inTable 2.

The 5S framework is particularly applicable to DLmodeling. It has been used for a variety of case studies,such as to model DLs for archaeological sites as well asregional and global DLs built by harvesting from thelocal DLs [85]. Two case studies were undertaken in 1999to explore the use of 5S for describing educational DLs[32]. These covered educational resources for computing,and ETDs. A 2004 case study focused on ETDs was basedin Brasilia [80].

A good summary of 5S, including how it can be used todescribe ETD activities, appeared in 2004 [45]. It drawsin large part on the dissertation work of Gonçalves [44].The 2006 dissertation by Shen [84] builds on this, addingin key results related to quality, interoperability, andintegrated support for various types of exploration (e.g.,searching, browsing, and visualization). Future work onglobal ETD services, considering the increasinglysophisticated regional and national efforts in the

Americas, Australasia, and Europe, could benefit fromthe advances in 5S made by Gonçalves and Shen.

Modeling the Societies that relate to a DL is of particularimport, from a 5S perspective. In the case of ETDs, thereclearly are many considerations in this regard. At thebroadest, we have an international community that ismoving toward tighter collaboration, across space(leading to a global consciousness) and time (involvingold as well as new ETDs, and involving students new tothe world of research, as well as those with extensivepublication experience) [38]. A key Society is that ofpeople involved in graduate education [21]. New to thatscene are the authors of ETDs, who need various kindsof support [76]. But they also are the ultimate innovators,who will make sure that the genre of ETDs develops andmatures, allowing them to communicate ever moreeffectively [40]. While some critics have suggested thatstudents would feel burdened if required to work withETDs, for most students this is a non-issue. A variety ofsurveys have shown that students generally are favorablydisposed toward ETDs; in reality there are no seriousproblems [2] [20]. Indeed, if one considers that thesesand dissertations are the main, and sometimes the onlyartifact resulting from years of student labor, and thathaving ETDs available may increase the number whoread them by a factor of 100 or 1000, students are amongthose with the most to gain from ETD initiatives [66].They also can gain when there is strong support for ETDauthors [76].

ETDs, NDLTD, and open access: a 5S perspective


Ss Examples Objectives

Streams Text; video; audio; image Describes properties of the DL

content such as encoding and

language for textual material or

particular forms of multimedia data

Structures Collection; catalog; hypertext;

document; metadata

Specifies organizational aspects of the

DL content

Spaces Measure; measurable,

topological, vector, probabilistic

Defines logical and presentational

views of several DL components

Scenarios Searching, browsing,


Details the behavior of DL services

Societies Service managers, learners,

teachers, etc.

Defines managers, responsible for

running DL services; actors, that use

those services; and relationships

among them

TABLE 2S overview

Modeling the Societies that relate to a DL is of particularimport, from a 5S perspective. In the case of ETDs, thereclearly are many considerations in this regard. At thebroadest, we have an international community that ismoving toward tighter collaboration, across space(leading to a global consciousness) and time (involvingold as well as new ETDs, and involving students new tothe world of research, as well as those with extensivepublication experience) [38]. A key Society is that ofpeople involved in graduate education [21]. New to thatscene are the authors of ETDs, who need various kindsof support [76]. But they also are the ultimate innovators,who will make sure that the genre of ETDs develops andmatures, allowing them to communicate ever moreeffectively [40]. While some critics have suggested thatstudents would feel burdened if required to work withETDs, for most students this is a non-issue. A variety ofsurveys have shown that students generally are favorablydisposed toward ETDs; in reality there are no seriousproblems [2] [20]. Indeed, if one considers that thesesand dissertations are the main, and sometimes the onlyartifact resulting from years of student labor, and thathaving ETDs available may increase the number whoread them by a factor of 100 or 1000, students are amongthose with the most to gain from ETD initiatives [66].They also can gain when there is strong support for ETDauthors [76].

Such support, however, typically only occurs when thereis active leadership and support for change in the localuniversity community [37]. The amount and level ofsuch leadership is a key determiner of how quickly aneffective ETD program can be put in place. In manyuniversities, launching an ETD program, and evolving itso that students are required to submit works, may takeseveral years. But with strong high level support, thewhole process can be completed in half a year [49]. Thisis getting easier as time goes by, since effective practicesand policies are well known and have been reported [26].There also is a growing cadre of people with experience inimplementing successful ETD programs, a strongcommitment to mentoring, and collaboration betweenmore and less developed nations [87].

One other fortunate situation with regard to ETDprograms, relating authors and readers, is a rough balancein supply and demand. Seonho Kim studied this, usingworks in ETD collections to characterize supply, andquery logs to characterize demand [55]. To provide acontext for comparison, he used 77 different topicalcategories, and classified ETDs and queries based onthose categories. Figures 6 and 7 show the results forthose 77 categories. Though for many topics anapproximate balance exists, for a small number ofcategories – perhaps good topics for future research –there is more demand than supply.

FIGURE 6First part of supply/demand comparison for ETDs

FIGURE 7Second part of supply/demand comparison for ETDs

ETDs, NDLTD, and open access: a 5S perspective


Modeling the Scenarios that relate to NDLTD leads to adiscussion of services provided, through systems, byinstitutions. Fundamental are those that help with localactivities [68, 67]. Typically, libraries, else computing /information technology centers, manage those services.Clearly they are the most appropriate to devise andenforce policies, support authors, certify quality, operateinstitutional repositories, and facilitate long termarchiving and preservation. However, other parties mustbe involved if those seeking helpful research works in aglobal context are to find the right ETDs from amongthe collections of many thousands of educationalinstitutions.

One type of institution with interest in supporting accessis the national library. Borbinha, discussing activities atthe Portuguese National Library, argued in 1998 forfederated access and services [10]. To help suit the needsaround the globe, multilingual federated search was testedat Virginia Tech, starting in 1998 [79]. Besidesfunctionality, however, usability also is a keyconsideration regarding services. A 1999 usability studyof several digital libraries, both commercial and opensource, covering both proprietary and open accesscollections, found the NDLTD services acceptable, butalso highlighted areas for improvement (for all systemstested) [54]. Consequently, a range of services have beendeveloped, as discussed near the end of the prior section.

Many additional services could be offered. A 1999 studythrough focus groups, with an accompanying pilot study,made clear that annotation services are of interest [54].Improved methods for resource discovery, search,browsing, etc. could be of help [48] [63]. There is almosta complete void with regard to potential support formultimedia content-based access [70]. Richardson hasbeen working on a promising approach to multilingualsummarization and resource discovery by way of conceptmaps accompanied by machine translation (that makesuse of identification of parallel corpora) [82, 83]. As theseand other services develop, they can be added tocomponent pools [89]. Components can be broughttogether in DLs, or, if made available through a serviceoriented architecture, can help in the move toward theSemantic Web [7].

Services also help with the integration of ETDs in theWeb infrastructure. Ultimately we hope that all ETDswill be harvested using OAI-PMH, so there can be acomprehensive Union Catalog [95]. However, someinstitutions lack expertise with that protocol, and areused to just putting up works on the WWW, with the

expectation that crawlers will find them and help provideaccess. Though they may be right, not all services willpick up ETDs in their entirety, and fewer still will supportsearch that utilizes both metadata and full-text indexing.One promising scenario to deal with these challenges isto construct a DL by semi-automatically identifying smallETD collections on the Web [13]. We have demonstratedthat the Web-DL approach [13] can help in this regard,but a fair amount of work is involved, which may not befeasible for a light-weight organization like NDLTD.

Scenarios by default are based on an assumption of quality.In real life, however, high quality services are difficult tobuild and maintain, so focusing on quality is notuniversal. But DL quality [96] is a key issue for NDLTD[33], since we hope to attract new authors and readers,and to ensure they are comfortable life-long users. Thus,NDLTD is one of the few DL organizations that considersthe entire information life cycle. Hence, it was possibleto assess a number of indicators of DL quality by studyingthe content connected with the ETD Union Catalog[44]. Working with a range of indicators, one can fitthem into models to help predict intention to (re)use aDL [84]. We hope ultimately to have a morecomprehensive view of DL quality, and to facilitatesupport of broad communities of those working withDLs [46]. These then can be extended to apply tosituations like NDLTD, where we move from a regularDL, through interoperability, to a union DL [84].

Beyond Scenarios, in 5S we have Spaces, Structures, andStreams. Spaces clearly can be used to describe thelocations of ETD collections around the globe. Spacesalso can describe the 2D or 3D interfaces facilitatinginteraction with DL systems [6] [16] [54] [76].

Structures cover all types of organizations, includingdata structures and databases. Classification ofcollections based on policies is a simple type ofstructuring [26]. Documents also can be structured, suchas in accordance with the Text Encoding Initiative [12][102], or through markup encoded using XML [43].Documents can be classified according to a categorysystem, or using a taxonomy or ontology [81] [86] orother type of knowledge structure [9] [50].

Finally, Streams can be used to model the underlyingcontent in DLs. A digital object of any type ultimately isa sequence or stream of bits, though it may be easier tothink of ETDs as strings of bytes or characters or wordsor sentences or pixels or images. With regard tomultimedia content, the notion of a stream usually is

quite clear, such as when we think of an audio or videotrack. Streams also can be used to describe flows, such asof ETDs from students to universities to the globalresearch community. They can be used to describe theflow of work through DLs [4]. When user profiles areconsulted and users are alerted to new works throughrouting or filtering systems [62], we also have a type ofstream processing.


Since 1987 there has been movement toward open accessto the vast literature of graduate research, which includesreports, theses, and dissertations. Global efforts in a broadrange of ETD initiatives have benefited fromcoordination by NDLTD. Making ETDs freely availablehas clear benefit to student authors, since their worksbecome much more widely read, and they become muchmore visible in the research community. Likewise, openaccess to ETDs is of help to universities, since it increasesthe awareness of their research activities around theglobe.

ETD initiatives have positive influence on other openaccess efforts since students who have prepared ETDshave learned about digital libraries, and have made aconstructive contribution to open access through theirauthoring and submission activities. Further, havingengaged in an open access activity, and having learned abit about the related issues, they may be more likely tobe supporters of open access in general.

We have seen how 5S can be used in checklist-form todescribe DLs. We have touched on how 5S relates toopen access, but a more focused investigation in thatregard could be pursued. Of particular interest would bemore discussion of Societies and Scenarios, includingeconomic, legal, political, and other socialconsiderations. We encourage such an exploration,building upon the abovementioned involvement ofNDLTD and others in worldwide ETD initiatives.


This work was supported in part by the DL curriculumproject funded by NSF through a grant to Virginia Tech(IIS-0535057) as well as one to the University of NorthCarolina, Chapel Hill (IIS-0535060). In that regard wealso thank co-PIs Barbara M. Wildemuth and JeffreyPomerantz. Our work with superimposed informationwas funded by NSF through DUE-0435059. Much of therecent 5S work has been funded by NSF through IIS-

0325579. We also acknowledge the many contributorsat Virginia Tech’s Digital Library Research Laboratoryand those at other institutions who have participated inour varied collaborative projects.


Ci. Inf., Brasília, v. 35, n. 2, p. 75-90, maio/ago. 2006

Edward A. Fox / Seungwon Yang / Seonho Kim

