+ All Categories
Home > Documents > WADEX : (Word And Author Index)

WADEX : (Word And Author Index)

Date post: 26-Sep-2016
Category:
Upload: mechanical-engineering
View: 216 times
Download: 0 times
Share this document with a friend
6
MECHANICAL ENGINEERING WmEX (word And Author I&) ACKNOWLEDGMENT This article by Messrs. E. A. Ripperger, H. Worstcr, and S. Juhasz is reprinted from the March 1964 issue of "Mechanical Engineering." THE yearly growth of the population of the world is around 3 percent and the corresponding growth figure of the technical literature (sometimes re- ferred to as the "Literature Explosion") is around 10 percent, The problem which this tremendous growth rate presents is complicated further by the publishing of new periodicals and irregularly is- sued publications such as Books, Reports, Mon- graphs, Proceedings of Symposia, and the like, which appear with ever-increasing frequency on the technical literature scene. Obviously, if engineers and scientists are to keep abreast of this rising tide, some new techniques must be developed to assist them. Applied Mechanics Reviews (a critical review magazine, edited at Southwest Research Institute and published by ASME), and other review and abstracting publications have been intensively en- gaged in the experimental development of different and somewhat revolutionary indexing and index production techniques to meet the challenge. The conventional subject indexes, "automatic in- dexes" of the KWIC and KWOC type, and Applied Mechanics Reviews' recent WADEX are described. "CONVENTIONAL" SUBJECT INDEX OF AMR The word "conventional" is used nowadays in the documentation literature [6] in reference to sub- ject indexes to distinguish them from the new com- puter-made indexes which require much less intellectual effort to prepare. The purpose of a subject index in a review or ab- Figure I. KWrC-Type indexes. Only one line is assigned per title and a keyword in the title is printed in context, in a vertically aligned position. The keyword is in the middle of th line. stract journal is to help search the published litera- ture and also to provide a convenient measure of the activity in a given field as indicated by the num- ber of papers pubhshed in that area. It should be borne in mind that a "good" subject index must consider not only the publication in question but also the users it serves. The AMR subject index consists of some 1500 sub ject headings under whch the 8000 or so critical re- views, authors' abstracts, and so on, are htd sev- Naval Enqitmeri Journal, October, 1964 685
Transcript

MECHANICAL ENGINEERING

W m E X (word And Author I&)

ACKNOWLEDGMENT

This article by Messrs. E . A . Ripperger, H . Worstcr, and S . Juhasz i s reprinted from the March 1964 issue of "Mechanical Engineering."

THE yearly growth of the population of the world is around 3 percent and the corresponding growth figure of the technical literature (sometimes re- ferred to as the "Literature Explosion") is around 10 percent, The problem which this tremendous growth rate presents is complicated further by the publishing of new periodicals and irregularly is- sued publications such as Books, Reports, Mon- graphs, Proceedings of Symposia, and the like, which appear with ever-increasing frequency on the technical literature scene. Obviously, if engineers and scientists are to keep abreast of this rising tide, some new techniques must be developed to assist them.

Applied Mechanics Reviews (a critical review magazine, edited at Southwest Research Institute and published by ASME), and other review and abstracting publications have been intensively en- gaged in the experimental development of different and somewhat revolutionary indexing and index production techniques to meet the challenge.

The conventional subject indexes, "automatic in- dexes" of the KWIC and KWOC type, and Applied Mechanics Reviews' recent WADEX are described.

"CONVENTIONAL" SUBJECT INDEX OF A M R

The word "conventional" is used nowadays in the documentation literature [6] in reference to sub- ject indexes to distinguish them from the new com- puter-made indexes which require much less intellectual effort to prepare.

The purpose of a subject index in a review or ab-

Figure I. KWrC-Type indexes. Only one line is assigned per title and a keyword in the title is printed in context, in a vertically aligned position. The keyword is in the middle of th line.

stract journal is to help search the published litera- ture and also to provide a convenient measure of the activity in a given field as indicated by the num- ber of papers pubhshed in that area. It should be borne in mind that a "good" subject index must consider not only the publication in question but also the users it serves.

The AMR subject index consists of some 1500 sub ject headings under w h c h the 8000 or so critical re- views, authors' abstracts, and so on, are htd sev-

Naval Enqitmeri Journal, October, 1964 685

WADEX MECHANICAL ENGINEERING

WADEX Index

Figure 2. WADEX index. A retrclspertive index. titles are printed fully w i t h a descriptor (word or author’\ name), ref- erence nurnher. nnd nrranged in two columns.

cral times each year by number. The number is the one assigned to the item in the Re7 w w s .

This index has the advantage of being compact (60 pages for the 1000 pages of the 1962 volume including authors index) . It is. however. severely limited in its usefulness because the user can rarely tell from the subject heading and the identifying number if the item is pertinent to his immediate interests. particularly i f there are many numbers following the subject heading. This means. namely. that the searcher’s interest is narrower than this subject heading. To ascertain whether the paper is o f interest, he has to refer to the Reviews. This is a time-consuming and often frustrating procedure.

The preparation uf a subject index requires much intellectual effort. First, the list of subject headings must be devised (or revised yearly), and then de- cisions must be made as to which subject headings are appropriate for each of the items to be indexed. For 8000 items, appearing on an average of three times each in the subject index, 24,000 decisions are required in this phase of the preparation alone. These decisions should be made by scientists and engineers, with training and experience roughly equivalent to that of those who write the papers and use the index.

COMPUTER-PREPARED INDEXES

During the past five years a t least 30 different organizations in this country have produced indexes in which a computer does most of the clerical work. While the computer does not do intellectual work, suitable design of the system may reduce substan- tially the intellectual work required to prepare an index.

One o f the earliest of the machine-made indexes was devised and named KWIC (Keyword in Con- text) by H. P. Luhn of IBM [ 5 ] , In this index, only one line IS assigned per title and a keyword in the title is printed in context, in a vertically aligned position. The unique feature of this KWIC system is that the keyword is in the middle of the line. Key- words are those words in the title which convey in- formation concerning the content of the paper. Words which are noninformative are called “forbid- den words” because they are not used as keywords. Allowing only one line for the title. the beginning or end of the title or both might be chopped off. I t is argued that the compactness of the index which is achieved by using this system more than compen- sates for the 1o.s o f clarity. The user of the index usually has no trouble in deciphering a shortened title i f it pertains to his area of interest. The advan- tage o f printing the keyword in context is that the words preceding and following the keyword can be regarded as subheadings or modifiers. which quickly indicate to the user how pertinent this title is to his area of interest.

Since the first KWIC index appeared. numerous modfications o f it have been developed by various organizations concerned with information dissem- ination and retrieval. For example, Chemical Ab- stracts Services publishes Chemical Tit les [4]. In this index and in several subsequently prepared in- dexes, some of the empty space of the original KWIC is avoided by using “snapback,” called also “recircu- lation.” or “wrap around” [ 61, the consequence of which is that the beginning of the title might follow

ti86 N a v a l €nginw.fr Journal, Octobmr, I9M

MECHANICAL ENGINEERING WADEX

rather than precede the keyword. While this is awkward to read. a larger portion of the title can be published. Some other KWIC producers are Bio- logical Abstracts, publishing Basic [21, and The American Meteorological Society, publishing Mete- orological & Geoastrophysical Titles [8 ] and UNI- DEK. Samples of KWIC-type indexes (with and without recirculation) are shown in Figure 1.

Other producers of computer-made indexes are the Office of Technical Services, producing the K e y - word Titles: NASA, publishing the Scientific &

Terhnzcnl Aerospace Reports [ 3 ) ; and AIAA, pub- lishing International Amospace Abstracts. The latter three are organized so that the entire title is printed in one or more lines with a keyword or discriminator appearing out of context a t the beginning of the title. These indexes are sometimes referred to as KWOC (Keyword Out of Context), and therefore repre- sent a distinct departure from the format of KWIC index.

There is a fundamental difference in the philoso- phy of the first and latter two types of KWOC in- dexes mentioned above. In the Keyword Titles of OTS. discriminators are machine “selected” (see principle described later) , and in NASA’s STAR and AIAA’s IAA the selection is done by profes- sional indexers. The word “selection” here refers to two distinct activities. the first one to a machine operation ( t o separate discriminators from forbid- den words) and the latter one to an intellectual activity. Despite this fact, the same term is used (as in the technical literature) for the sake o f simplicity.

WADEX

Because the conventional subject index o f past years seemed likely to be unable to cope with the anticipated volume o f the publication in future years. the editors of A M R decided to experiment with a new type o f computer-made index. The first issue of this index is for the 1962 volume of A M R . It has been given the name WADEX where the let- ters stand for Word & Authors inDEX. WADEX is produced in addition to the “conventional” annual index and marketed as a separate publication.

Unlike the different automatic “indexes” WADEX is intended for retrospective search rather than for current awareness information, though i t could also be used for the latter purposes.

WADEX incorporates many excellent features of previously published machine-made indexes, par- ticularly those of KWOC-type indexes, and where- ever it differs from others, the alterations have been made with specific objectives in mind.

Sample portions of the 1962 WADEX are shown in Figure 2. In this sample these signifiicant fea- tu re s can be seen:

1. Titles are printed fully including author name(s) . Each title begins a t the left side of the column and occupies as many lines as necessary.

2. A dixriminator is either a word in the title (except forbidden words) or an author’s name. Dis- criminators are sequenced alphabetically. They are given out of context at the left of the title.

3. If there is more than one identical discrimina- tor titles are alphabetized according to the alphabeti- cal position of the9first author. If there are several identical discriminators with identical authors, they are sequenced according to review number.

4. The reference number indicates year, month, and review number in A M R proper. It is printed in line with discriminator on the right side of the col- umn.

5. WADEX is arranged in two columns with pagination at the bottom and a dictionary entry at the top of the page. The page is reduced about 50 percent from the “compuscript.”

Both columns were printed at the same time by the computer. The computer printout is a camera- ready copy except for the 26 “alphabetics.” The WADEX for A M R for 1962 is about 600 pages long.

Users frequently remember papers by the names of their authors. Combining subject and author in- dexes in one format enables the user to search only one index, not two-a fact long recognized in the preparation of book indexes and library card files. but one which appears to be novel to the automatic indexing art. Treating the authors’ names as key- words also simplifies the preparation of the indcx both from the standpoint of punched card prepara- tion and from the standpoint of computer program- ing.

Listing the complete title and having it appear in the index under approximately six hscriminators (the average for the WADEX), reduces the time re- quired for literature search by increasing the proba- bility that the user will find items of interest under the first discriminator that occurs to him.

WADEX is well suited to browsing in addition to search. As a matter of fact, this is frequently of great help to the searcher, whose question is often not clearly defined when the search starts. The user wants to “conduct a dialog“ with the index, rather than have a monolog delivered by a machme in response to his own monolog. The richness of entries, together with the full title listing given in WADEX similarly to other KWOC-type indexes, offers in- creased opportunities, at least in comparison to KWIC indexes, for the user to encounter new ass* ciation trails as he moves among the index entries. This is a convenience which is considered well worth the increased size of a WADEX for a publica- tion the size of A M R .

PREPARATION OF WADEX

The procedure followed in preparation is shown by the WADEX flow chart, Figure 3. This reflects the latest improved procedure rather than that fol- lowed in preparing the 1962 WADEX.

Naval Enpiwars J o u r w d . October, l9U 687

WADEX MECHANICAL ENGINEERING

Y

t

F i n 3. WADEX Flow C M . Procedure foUowed in preparation of index.

688 Naval Enqimwl Journal, Octobu, 1944

MECHANICAL ENGINEERING WADEX

First, the titles as they appear in the magazine are edited by an engineer or scientist to remove or change all those features which cannot be properly handled by the keypunch and printout equipment [ 9 1 . These titles are then keypunched. After verifi- cation, the cards are fed into the IBM 1401 (Pro- gram A) whch prints out their contents for another human postediting. After corrections indicated by this editing are inserted, machine processing begins. From this stage in the preparation until printout, all title processing is done by the machines.

Following this (A) . every word in each title, in- cluding the authors’ names, is regarded as a poten- tial discriminator. As indicated previously, the dis- criminators which will be retained are determined by deciding which words will be thrown out. The words which are to be eliminated are called “for- bidden” words and “suppressed” words. They fall into the 8 categories shown.

Next, Crude WADEX I is prepared (B) . Discrimi- nators are pulled out and set up with titles essen- tially as they appear in the final version of the WADEX. Types 1 , 2, and 3 forbidden words are eliminated at this point. Then (C) , Crude WADEX I, is arranged in alphabetical order. The h r i m i - nators are pulled out (D) and the frequency of oc- currence of each is determined (E) . Two lists of words, one sorted alphabetically and the other ac- cording to frequency order, UnGWAC I and I1 (UnEdited Word Author Counts) a re prepared (F, G, and H) . They are used to,select additional for- bidden words of types 4, 5, and 6. The basic list is given in Figure 5. Each list contains some 16,000 entries representing the total of approximately 85,000 words appearing in all the titles.

Types 2, 3, 4, and 5 forbidden words must be key- punched, listed, edited, and then read into the com- puter ( I ) . At this point all of the forbidden words are removed from Crude WADEX I1 (K) . Also type 7 suppressed words are removed in the same opera- tion. These are the words which are allowable dis- criminators but which appear more than once in the title. If left in they would cause the title to be listed as many times in succession as the discriminator is repeated in the title. The result is Crude WADEX 111 which contains all of the entries which will ap- pear in the finished product. Then the compound reference number by which the title is identified in A M R is added and the titles are arranged in the final output form. Next, the WADEX is arranged from the one-column format (L) into two columns (M) and then the first and last discriminators on each page are extracted for printing at the top of the page (N) . In the last of the programs (0) type 8 suppressed words are eliminated. They are repeti- tions of the discriminators. Thus a discriminator which appears 10 times, for example, will be printed only the first time it appears. The discriminator is then omitted for the succeeding 9 titles, unless one of the 9 happens to be the first line of a column. (In the 1962 WADEX the type 8 words have not yet

Figure 4. Types of forbidden and suppressed words fall into eight categories.

been suppressed.) In the same final program, 3 tapes are merged; namely, first column tape, second col- umn tape, and dictionary entry tape. The output from h s printing is the final form of the WADEX ready for photographing and printing after the alphabetics have been added. This actual printing of the 600-page WADEX (0) correspondmg to the typesetting of more conventional means, took no more than 2.5 hr. There was no prwfreacbng.

With the computer programs prepared and the titles keypunched, it is estimated that not more than 20 hours of time on the IBM I401 will be required to produce the second WADEX in a ready-to-phot+ graph form.

TITLE DESIGN OF TECHNICAL PAPERS

Experiences with the preparation of WADEX and other computer-processed titles of technical papers [ 1 ] show an increased responsibility of authors and editors in selectmg useful and informative titles for technical papers. The n u m k r of times that a given title is cited in WADEX (or any other machine- made index for that matter) depends largely upon the number of meaningful words in that title. A title composed solely of hgh-frequency- words may convey information to the human reader; but the computer may well not list it at all.

Conventional bibliographic practices do not take kindly to retitling authors’ papers once they have been published: the computer must accept a title as a fait accompli. There are several thlngs an author can do to make sure that the computer gives his

Naval Enqimmrs Journal. Octobmr. 1964 689

WADEX MECHANICAL ENGINEERING

WADEX : LIST OF FORBIDDEN WORDS

article far treatment. One of these is to use hyphens and the computational facilities and professional as- to combine certain words. even though this is not sistance so freely and cooperatively made available normal English usage. For instance, the words "con- by the University o f Texas staff. The National vection" and "free" might each be forbidden words; Science Foundation is a sponsor of Applied Mechnn- they a re used too often in the titles of papers to I C S R c ~ r w w s and its indexes. carry any information-but the combination, "free- convection," is a useful phrase. Conversely. exces- sively long words. e .g . , magnetofluiddynamic-wake- flux. might well be artificially broken, as suggested in 17 1, to create more useful indexing entries.

Many authors like short titles. In such cases, they should consider the advantages of using a subtitle as well. Meaningful words from both title and subtitle will he entered into the index system.

At present, the editors o f WADEX a re inclined to accept titles as given. Both devices, that o f artificial hyphenation and writing subtitles, can be done as part of the input editing process, but it is difficult to maintain consistency while doing this. It would be simpler, and probably better, for the original papers to be appropriately titled not only for human under- standing but also for machine processing. This would reduce human intellectual effort required in the preparation o f indexes and thus would increase the uniformity of the index and make machine proces- sing more effective as an indexing tool.

ACKNOWLEDGMENT

The editors of Applied Mechniitcs Reviews are in debted to the Air Force Office of Scientific Research and the Office of Naval Research for the special grant for editorial expenses of the first WADEX

REFERENCES

11 I W Brandenberg. "Write Titles for Machine Index In- formation Retrieval Systems," Aictotnation and Sr ien- tifir Commiriiicntion. AD1 Annual Meeting, 1963. Short Papers. Part 1. pp. 57-58.

[ 2 I M. Conrad. "New Developments in the Merchandising of Biological Research Information," A m e n c a n Srieii- t w t . Vol. I . Dec.. 1962.

131 M. Day. "Loral Access to Aerospace Technical Liters- ture." ASLlB Proceedings. Vol. 15. Julv. 1963. pp. 211- 217.

141 R R. Freeman and G. M. Dyson. "Development and Production of 'Chemical Titles,' A Current Awareness Index Publication Prepared With the Ald of a Com- puter," Journal Chemical Documentation. Vol. 111.

151 H. P Luhn. "Keyword-in-Context Index." (KWIC In- d e x ) , Journal of Technical Literature, IBM Advanced Systems Development Ihv., 1959.

161 J. Marcus, "State of the Arts of Published Indexes," American Documentatwn. Vol. XIII, Jan., 1962. pp. 15-30.

17 I P. V. Parkins, "Approaches to Vocabulary Management in Permuted-Title Indexing of Biological Abstracts." Airtomatton and Scient i fk Communication. ADM annual meeting. 1963. Short Papers, Par t I. pp. 27-28.

[ R ] M. Rigby. Introduction, Meteorologral and Geoastro- physical Titles. Vol. 11, March, 1962.

19) E. A. Ripperger, H. Wooster, S. Juhasz, and F. Roach. "Preparation of Input Card Deck From Bibliographical Headings in Applied Mechanics Review," AMR Rep. No. 30. Aug., 1963, 8 pp.


Recommended