AD-AO92 500 DARTMOUTH COLL HANOVER N H DEPT OF MATHEMATICS F/6 5/27 NATURAL LANGUAGE DATA BASE QUERY.(U)UNLSIE OCT 80 L Rt HARRIS NC00I4-75-C-0514
UN PSIIED ML
* IhhhhhhINflflfflflfflflfflfll.ND
- INAL REPOT.
Submitted to the Office of Naval Research
for a grant in support of research entitled
( (J atur~al Language Data Base Query,
D I T P ao .onr: %Larry R. Hlarris
iil.I P Tcip e nvestigator
ease Dartmouth College~j~i~,tl~flHanover, NH 03755
Secr 68CauificationDOCUMENT CONTROL DATA.- R&D
(Security grI..aitcattan of dit. body at abstract a" nda~igin anotation must b0 angered when the ovat rel port to chgahdi.E
I ORIGINATING ACTIVITY (Coapota*. aullir)2aRP T C RI LASF AIO
UNCLASSIFIEDDARTMOUTH COLLEGE 2 b -owHANOVER, NH 03755
3 REPORT TITLEFINAL REPORT SUBMITTED TO THE OFFICE OF NAVAL RESEARCHFOR A GRANT IN SUPPORT OF RESEARCH ENTITLED
/NATURAL -LANGUAGE DATA BASE QUERY,_,..4 09SCRIPTIVE NOTES (Type of report and inclusive doe&)
5 AUTHOR(S) (Lost nefm., 10tIIA 11am. Wntla))
HARRIS, LARRY R.
6. REPORT DATE a OA O b O FRF
I OCT 1980 27ia. CONTRACT ORt 4RANT NO. 94. CRIGINAlON'S REPORT NUhiUER(S)
N00014-75-&-4&4- O.:/1"b. PROJECT NO.
S NR049-344 Ob. OTHER R PORT NO(S) (Any alA., numb.,. diet may be assigned
d.
10- A V A IL ABILITY/LIMITATION NOTICES
I I SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY
OFFICE OF NAVAL RESEARCH
13 ABSTRACT
SThis final report is intended to be a summary of the researchon Natural Language Data Base Querype -~~--,iiiii 11:-stpported-by -the Offie -aa~aer~ %cne-44173I It has beenthe goal of this research to determine a minimal set of techniquessufficient to provide a practical natural language capability fordata base query. This report summarizes the basic requirements forsuch a capability and suggests techniques for meeting theserequirements. As such, this report is in effect, a specificationof the minimal functionality for a practical natural language database query capability.
DD I AN4 1473 UNCPJASSIFIED
Security Classification _______
14KY OD LINK A 1.111 8E Lok
ROLE OFT ROLEi WT 1101. Utr
NATURAL LANGUAGEDATA BASE QUERY
v PARSING
INSTRUCTIONS
1. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statementsof the contractor, subcontractor. grantee. Department of De- such as:fense activity or other organization (corporate author) issuing (1) "Qualified requesters may obtain copies of thisthe report. report from DDC"Is. REPORT SECUINTY CLASSIFICATION: Entter the over- (2) "Foreign announcement and disser. ation of thisall at, urity classification of the report. Indicate whether rpr yDCI o uhrzd"Restricted Date" is included, Markting is to be in accord- eotb D sntatoie.Ance with appropriate security regulations. (3) "U.. S. Government agencies may oL,.ain copl..a of
this report directly from DPC. Other qualified DDC2b GROUP: Automatic downgrading is specified in DoD Di- users sha!l request throughft,ve 5200. 10 and Armed Forces Industrial Manual. Entert group number. Also, when applicable, show that optional t
Flnarkiflgs have been used for Group 3 and Group 4 as author- (4) 11U. S. military agencies may obtain copies of thistied.report directly from DDC. Other qualified users
3. REPORT TITLE: Enter the complete report title In all shall request throughc apiitlttLers. Titles in all cases should be unclassified.1It a meaningful title cannot be selected without classifica-tiori, show title classification in all capital& in parenthesis (S) "All distribution of this report is controlled. Qual-immrediately following the title. ified DDC users shall request through
4. D)ESCRIPTIVE NOTES, If appropriate, enter the type of____________________ 'report, e.g., Interim, progress, summary, annual, or final, Ifthe report has been furnished to the Office of TechnicalGive the inclusive dates when a specific reporting period is Services, Department of Commerce, for sale to the public, indi-C,,vcr.,d. cats this fact and enter the price, if known.S. AUTlKII(9) Enter the name(*) of author(s) as shown on 11. SUPPLEMENTARY NOTES: Use for additional explaa.-or in the report. Enter last name, first name, middle initial. tory notes.If ma~litary. show rank and branch of service. The name ofthe principal ..;sthor is sn absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the name of
6. RPOR DAE. nte th dae a th reortas ay. the departmental project office or laboratory sponsoring (par-u.h yea'rT orAot. Enear. t dao then rnepr das dppas for the rsearch and development, Include address.
cmoth, ea;or mot ya.Ifmrshnoe date appearston 13. ABSTRACT; Enter an abstract giving a brief and factual
7a. OTA NUMER F PGES:Thetotl pae cunt summary of the document indicative of the report, even though7a. OTA NUMER F PGES:Thetotl pae cunt it may also appear elsewhere in the body of the technical re-
sh.uld follow normal pagination procedures, ie., enter the port. If additional apace Is required, a continuation sheet shall'iuanhcr of pages containing informstion. be attached.76. NUijER OF REFERENCES: Enter the total number of It in highly desirable that the abstract of classified reportsreferences cited In the report. be unclassified. Each paragraph of the abstract shall end withSo CONTRACT OR GRANT NUMBER; If appropriate, enter an indlcstiun of the military security classification of the in-the applicable number of the contract or grant under which formation in the paragraph, represented as (TS). (s). (Ci. or (11).the report was written. There is no limitation on the length of the abstract. How-6b. &, & 11d. PROJECT NUMBER: Enter the appropriate ever, the suggested length is from I5O to 225 words.military department identification, such as project number, 14KEWOD: eywrsaetcnalymnifutrmsubproject number, systems numbetrs, task number. etc.14KEWOD.eywrsa tcnalymnigutrs
or short phrases that characterize a report snd may be used s9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi- index entries for cataloging the report. Key words must beeikl report number by which the document will be identified selected so that no security classification As required. Identi-and controlled by the originating activity. Tis number must firits, such as equipment model designation, trade name, militarybe unique to this report. project code name, geographic location, may be used as key
9b. OTHER REPORT NUMBER(S): If the report has been words but will be followed by an indication of technical con-assigned any other repert numbers (either by the oriinalor text. The assignment of links, rules, and weights is optional.or by the sponsor), also enter thip number(s).
10. AVAIL.AIILITY/LIMITATION NOTICEE& Enter Loy lint.-Rtations on lut then dissemination of the report, other thsn thosel
UNCLASSIFIEDSecurity Classification
FINAL REPOTSft)mitted to the Off ic. of Naval Research for a grant in sipnpnrt
of research entitled Nattiral Language Data Bansp Otpjry.
Larry R. HarrisPrinciple Investigator
Dartmouth College
Hanover, MH 03755
Abstract
This final report is intended to be a summary of the rp.arch
on Natural LanguaQe Data Base Oipry performe.d at Dartmo,|th Colle.
supported by the Office of Naval Research since 1973. It hs bepn
the goal of this research to determine a minimal st of tPchniqu S
sufficient to nrori ..- oractical nitiiral langliaqe capability, for
Jate base quiery. fhis reoort stummarizes the basic re(qitirements For
stch a capability nd sir.ijests techniques for mneting these
reqifrements. As siuch, this report is in effect, a specificrtion
of the minimal fuinctionality for a practical nituiral lang3(taqe Hata
base query capability. IEli
(4 4 W
00U'(D R
Pnipe 2
I nLroductjon
iahen rese-irch tind! r thlis contr-ict he~jq n in 1973, the cst-ti, ofthe art in Drnctic,)l nnt'rrnIl 1nqua-1-1 data hasqp quepry vm-s
e,-ssentially non-nxistent. All of the- then exi.sting resea,-rch
iyste ms (and r1vIrn of todaiy's systems) were semnnticaqll" liqit,-l to
-i single dom-iin of disrouirse. The primvary requisito of --
"lpractical" cviprv rapanhility is thAt it be "applicatlon
independent". Achi,3vinq this aprlicatlon independence requiresq a
fund-imnta1 co'-i iit~nent throuighot the des;ign of thae system. it is
Pncoiiragin] to gen that the research commuity as :I whole hasl
stairtedi rovirin in this; direction.
Phis ro.-;ort orst of a suimmary descrription of the -itnimnil
reqtuirpemnt'i fnr i -,,r-iticil query capability. The torhniiqul.-
depvnlopod to iii-L thes.i rr1 1 nm~ni~re oreml ,#, I id. Ih~s (i,,
,-,Jor cormpronpntr of operition tha't Make lip tho nrorcessinj cyrle o~f
ai ripipst. Thpp~ crwmronont, arP the lexic-il nwiyzi r (the
-;c',nner), the syntictic rinaflyz.r (the narser), the Hats bnr
striictire analyzer (the navigaitor) -and the Int-i nrocesslqng iouje
Pnin 3
ik L- xircii AnalIyzer -- fho S'irrinner
The hAsir function of the !-,canne-r i-s to etermine lqhat the
lnrlividiil unrdW n - tol-Qns., nre. TFhe scanner hreakq thn inpuit
.2 trrevn Into n qqincP of tokenF. A modification of the finitp
-vito-nnton scanner lised in compilers; i- sufficient for this task.
1'hP no-fificantions -ire reli(jireti to deal. with Phrase, amrl
recojnition of 'spncial nuimeric tokens. Phrases, slich ns "Vice-
eresidentu or "Je)w York" must be recognizedi -I, Similp tokpens even
thuih they containi q spicn. r'urnrical .9trincis such ns N;0/01h/3111
mtist be reco-pniv'e iq 1 sqinqle token (represePnting -) date) .,herreas
.11/311 inust h(e re oniiw ns three tokens reprPesentini1 "on,- Hivirini
:)y tonreel.
Scanner-, with these capaibilities nre commonplace- ninonq AT
n-atural ing~iip syc-,tpms. Tfhp most common pitfi 11 is to lrnbPi-
inellin3 detection, or worre yet, snpllino correction within thn
srcannpr. Foth ,rpellin,,.crrerction nd~ spelllnq rfetortion renuiir-
cidv.,nre knowle'~il of nIll worrds that crin be -PmployP,4 h, li-prq. Tfhis
is prohibativp in i domnin inderendepnt approach, since thiq set
of words must clearly contain ill the words in the datA ae
fhernforev the %canner inust Accept words Abouit which it knowsq
nothing -- which couild be, of course, potentially iisspellrd
vinrlis. fhP dsitrectirn of i !spnlling7 error 1I best ad sn of
4P
L'1P -o-t mor~nar ",lvifn n srintence fail,; to h, ii,ferstoo:i 1)v thn
svqtpn. Ibhiq aporoich s-itisf ips the dfoinin inrf1epen1PnrP crite-rion
'11 '.'*?ll as illo,,,in-1 vnli- reltilests; with no spepllin~ Prro,; th-it
.ort-iin to Jita- valueps not actlullyv in the data) hnas to 5P handfle-
)ropnrly.
fhi ' Synt-ictic Anailyzer -- The Pirse-r
The parser is the hnart of the ninturn-l langtin11 component of
the system. Its role is to syntactically relate the comoonents' of
the requjest. This )rocpq-, driver, the construction of the semantir
striictujres that represent the menninci of the requiest. Trherer -re
several compuiting pnradi-Ims for natural langiiarle pairsprs. In thq
earlyi p-art of o'ir research, we strccesfir1Iy emnioved - top-down
context-free pirser. Later we switched to an Aiqrnented Trransition
:letwork parser (AT 1). .4n feel there nre several advintilf..s to the
AT11 -pproacli th-it itakp it fnr more convenient to uise, nlthouqr~h
suiccessftil context-free narsers couild be built ns wqell. IRerentlv
several new 1)arsirvq schemes; have been reporter; in the lIteratlurp,
so that it may ,vil he the case that the AM'? te-chnology is now
datedt. However, it seems clear that the choice of parsing
algorithm is less important than the mechanism by wvhich the syntnx
controls the generation of the requiired semintic striuctrrs.
Amhi IIity
The 4ef initi'' pro'-lrl of this type iqc h-it of i~mbirllitNI.
jev-ra'l dfis-tinrct -::*iintir rerprnspntritions i;iiist he r)en-r-ite]l frori 'I
singlie input strin-i. Pie Hiff icuilty here is not how' to ,-o it, hilt
:'iow to limit tivq (ener~tion of too many interpretations. ! (flY
rese-irchers hav-i cilosen to limit the parser to genernting only one
interpretation inr{ stopping. This approarh rikes the riprcijori of
wlhic:i rparse to ptirstte 9i verv criticail one. It ilso mnkpq rin-il in.-
ivit~i truly ainbijliiois requepsts vpry diff icutlt.
Oujr rpcomr~nPtion is to solve the problemr from the other qnri
of tine spectrium hy non-rftrinistic-illy qenerntinrill po-,qih)]1
interpretations. P-iis trainsforiTis the issueP Fron one of tr-vioi to
deci le on a rel-itivp hais which of tw'o pnrtinl pnrsns loo! s monre
progiisinq, to ne of trying to deide on in 1h-'olitte haqis which
of two romnilete pirsepq is more irrieningftil. j'hPrse is also n nroh1en
of aff iciency in ropin'j with the potentIi e-xpnepnti,-l nuimher of
interpretat ions. Jt should be clenr th-it depcisions Mlc~e on nn
absoluite basis sifter the parse should be more icciira~te than
decisions made on a relatiVe hnsis duirnl the parse. The effects
of trie remnininj portion of the inpujt have hnH a chqnce to ilnnqrt
L1 iecision in th for-iir caseP, hbt not in Lhi 1latter. Ifniuever,
Uhe problemn of Qx;'onentiql growth must be I1enlt with very
careftilly. Fortirnacely it qeems9 thait, qt lshast for n-itural
lain.cjinge query, this c-in bo dePAlt with by t'ininj the pqrse;r to
reduice the non-deptorminism. Fortunntely the length of the inntit is;
uisually very sri-ill (les.s than 20 token,,) nd- the nirrIner of >
Incisions is "reasonihlv" small.
Aindincj Values to i~iplds
(Oe type of Amiguity that arisePs frequently in diata hnase
ieries is that of choosing the field to which -1 given v.alue I.,
related. t.lost rese-irch systems solve this problem with dictionary
let initions. This;, of couirse, is clearly a violation of domain
incde~erdence sincn it requires enutmerating all uniqueP dgta baise
vlu~es in the lexion. (Pur )ppronch has been to dynrimir.ally
determine this fromi indices maintained by the DBMS. In addition
to t:-is, we allow three levels of strength in defining datai
'talii-is in the dictionary. 1'hpy can be tightly hoiind, weapki"t houind
or iinbounf. This allows for suifficient generality at the ;-imp time
It permits definitions that couild limit the non-dietnrminismi.
ligh-Level Semantic Entities
Another imnortant component in the relating of syntax to
semantics is the ability to deal with entities that themselves
imply a significant semintic striicture. This involves both complex
1efinitions in the dictionary, as well as the billity to den]
wit., these definitions in the parser. i:xanples of this would he
terms like "hnrhelor" or "profit margin". The first .pecifip a
complex descriptiorn whereas the reconrl specifies a formula for
calculating profit margin from other entities that arp more
directly available. It is of critical importance for a nat'|ral
languiage systea to he able to dpal with such t,rm directly,
rather than forcinj the user to continlally define them.
Another t,pe of word that implies a ,substnntive holv of
iemntics is the nronoun. fhe distinction her. is that the
'"neaning" of the pronoun is not to he fouinrl in the firtionarv itit
in the context of the dialog. In anrfltlon, there m',,, he ;ome
ambiguity that irises in determining what the pronoun refers to.
A wide variety of solutilons appears in the l iterattire for this
problen. Ouir approich is to maintain two context re(listers
- 2* - - -f
-l)Ll- ! j irl th Ow -,w-mL j I trut,r s of~ )rpv in, i c: jinr I(,,. O)n~i I t ie
rpvlo'is reitipe't, Lhe other Is- the re-luies;t ti~t the pre',ioliq
rpqvir-st may hnr,-Fnrrr!, LO. fr. ir"ditinn, intrn--s-nt,-nti;I!
irofliff refprpc! -ire 1150 ;nll~
v i hvp. f opin-1 th is iroronri. to he stif fici'-nt f or riqnl lvn
LhPe vnst mijority o-f p~ronoiin rpfer,-nres in fit-i hnse n'1ierIps,
incl'zdlng the 'lifficijt spences ".dihnt i,; thq mnximim siir" In
issotiri?" 11'hh -I-irns it?". Contrary to poplv-r helief, thp
nron..,rn "it" clois t j!st refpr to the nnswetr of the- first
request. If it di"4, the secondi reqiuest vqoi1 rv-nqr~te rill np
~~rinjtl.it slrevin thoseq outsice 'fi'-sotirI.
Amlbicjuoijs pronolin raiferenres are den ilt with in thn s--I~ip 1.1n'1v
-is other nrmhi-Iiiitir, s. All of thp possihle r-ft-rent- aire concilprQ'4
-id rion-determiniLic lnorpretitions nre rrentri'$ fir each
.,.o 5.5 1) 11ity. n:sw internre-tntions*retn opr.'onajhl
tasii later nlonr) .- ith othir Interpretation-, reat(ed by nt'iar
Lypq!; of nmhimittv. ip have fouind a'erh1;rotrizr o
invriltirible in werini out imrodesirpr' interrretntinnq cro-tpA h)y
i npraiper pronouin r.iferncs. The, optimizoer -ftctc thp lnriir-al
cootrn-lict Ions thnt often get creatpl in this 14ny.
uqr1. for: :31 i -- tvi-t to "~rrr('p in to nritiri I I r'uvI'~"t',b
jijrif-i Is that of arith-wLir Apljl 1~.Iv(3 fotir I4hs
*'r'c~rslye es-ri t aFor'. I lanpioranfi -mrorirh r'.r n-irsinI
in blr~t c x~)~ ': j~~ to Ie coriji letilly rnnsvi 'Lent --i th thj- &T'l
;r.i Id'. ouvrr oth n-fcrrnjij forms, of iritfviptir
'.:~rn j~f~ ~? ~ in )-itiir i 1.inoiiij q(i'iriP5;. For Qxoiipnl 11211ow'.
mr i Is :iq -In"' f III,- romji I '-s on?"1 Her;- w' s;ee "lq l i-io(I
; C~ )ii. i Ol- :,ronoiin '1lb b reiak in j lip t'i- winrindi-
)por-ttor-opr3rvJ i?~I~~fo*~ f inemterits to the bneir r~riirrs1,
Ins'~tijrfrcc' ,rp r, -ifireJ to Aleil with these, orobin is.
ih. lkiivp found revQrf,1 hmuristics neprs-c,,ry in cfetpr::ulninq the
)ro,-qr rp,.r)oflsp for i rjivr'n jiiry. Thes;e 4irp prinril' ilun to the.
infcirfna, phr~ilri of quleries thit occulr in not',ral Iir'c
*)fteni tsers dio not expliritly risk to ho q.ivqn Inforr~r.tirn thb-it
they qiuite obvioiisly np;,d to interpret the -ns,-ers -inr pjiltQ
rightfully expect the system to provide. For extinpie, the raqueslt
"Print the salary of Sinith and Lawler" implies the printino of
1rime even thotighl 1u literril interpretatlon wouldi not print it.
"ri t: .,)it L fr in LIri I L, i n-ii 1- ; r: -iI- P- L r ~ , H
imo-ih1Ip for tho use;r LOto t, ii -zjt l'' 'n ithihjl
'DiniltrI', niY1-rri 2Vul'*r~ ~~F it' I ri Ipar r-i.Y'r -illyI
i, ~n,; P c; t i a ,r o in L (r pr i, L t oo 1 i t. ,ij tI v1'.'. i -.rr Y i Io
ii iln?" i I11:7Lrito- this , oint . I t Lhii sys-L.? .;Pre -L) rp[rh- n i aI lv
i 11L c it "'.ii11 a i n if i i r It i (',I tO r int ni', i f t .id I Ir ,nr
t1it 1 to th1e re q 1 w -;t. (: Mr I rY, t 1e (Ir ij;t i q 'IZ t h- 3 T en I o%, PrIl to
1. toc t s mch -,it' ti n -mfl il'v to fn lt nr-iinc whan L U , rr r r' pnv,
iikoiih! im. ITo -ictiwitiofl -f ';ijch hmiri-,tic~i mt t ItIl!:Lely/ hp
.,verhe-iring s;ilC( LI'.e -w~nr lo,;(., the ahil i ty to pr, ci -e! v control
LI~e resporv fn tlv)'~ -i titonm; where it Is imlport ntfr h
t. o;t. X, -I C 1.c 1 ! i- t i t i; t llI
The Nlaviqator
After the :qvi ry is Parsed, the semantic struictire ,mist he
protected onto the .iatabise scineme resiultinm in a ;trntelv for
extracting the desired information from the database. he
lifficulty of the navi,-tlon problem can range from trivial to
arbitrarily conolpx dependinj on data model employed by the Or".)
and how well the given data base is organized.
For single flat file organizations, navigation is at its
simplest. Hut even for single flat files, dlifficultieq can arise.
if the file was created by flattening ouit a hierarchy or a
network. In these cases, the notion of a record correspondinq to a
real world entity is lost. Hence the navigator must take alvnnt-,r,
of the fact that the file was originally non-flat to conqtr ict thle
proper means of iccess into the file.
For the more struirtured data models -,uch as rplationnl,
network and le rnrch Iral, thi, ri, viontion p oc , ;,; t. i ,i l Ii '
importance. I)etnrmininj the proper entry point into the strc tlirp
as wnll as the proper linking relationshins can critically affect
the contents of the final response, not to mention the impact on
efficiency. In this regard, the relational model's greatest asset
is its closed form expression of a query. This makes it at least
isiscosdfr
PAqe 12
!,ossible to expre-; nnviaiclon in a high-level lan i-ne like
JmEUHL. For hierarchies and networks no suich rlosed form
expr'esion is iused by the I)rS and therefore it is impossible to
express navifjationil choices without resortina to n nroceduiral
representation. This is the essence of the reason why no nool hiqh-
level qujery lanjires, even of a formal nattlre, exist for network
and hierarchial Dl'kS's. Some intermediate level renrepsentation is
clearly needed here.
Even for the relational systems we have not been uiniformly
satisfierd with SEQUEL as an intermediate language. fhe
navijational linkale is specifierd in an uindtily intricate fashion,
and the functionn] Itv is incomplete. This latter point is
indicative of tlh fact that mtich of SEOUELS power is misdirected0
in terms of the needs of a naive end user of a natuiral language
sysLem, at least in terms of ourr experience. Vhereqs SEOUEL
provides no assistince tn answering many of the difficult natural
lariage requests we encounte-, its power in intricate cyclc
navi)ation is too suhtle to he uisefuil in a natural lanuage
setting.
Paqe 13
Javiciational Ont imiat ion
Another a'ect of navioation is quiery optimization. The
na viational cIoice made have a profound effect on the effIcienry
of Ienerating tne mnswer. 9ift even when dealini with a ;inriln
r elation (or a sin-1Ie flat file Hatabase), there is a qignificnnt
'mount of query optimization that can he done. For example,
.uestions asking from the maximum, the minimum, or uniqje listin-ys
can be answerei directly from the DBIMS indices if they are
avallable. Thefse o:)timizations can change a several minute
resoonse into an instantaneouts response. These kinds of
optinizations reqiiire knowledge of how the data is goinI to he
procassed after retrieval as well as knowledge of and access to
the I)BS indices.
Knowledge of the TIS indices is also - critical factor in
navigation because it determines how long the DBMS will takp to
respond. Clearly we want to optimize the work done hy the DB'IAS andi
:)refer to ,)ennr-ite requesqts that make uise of Indices or hash
coding rather than file pa.s searches. Rt, glvPn the varietv of
ways In which DBAISs behave on rixtures of keyed and non-keyed
.earches, It is clear that the interface muist have its own ahility
to perform the f||nctions of searching and sorting. lhern is a nice
neshlnq of these functions that makes it possible to Avoid nny
Pnje 14
IKirestrictions in this; area nnod nt the samer time oive-.s thim
interface control of the situation so that expensive requests c-n
1e trapped. In ,jeneral, once the )B1S lets control, the user must
I walt ,,ntil the 1)B1'i hris finished processing the reqiest, which may
-)e qlite a while. ry selectively sharing some of the searching rInd
sortinj work, it is possible to maintain control an-i .,!grn th. ,ser
,ihen things oet 'xnqnive.
This flexiility is achieved only hy a-ded complexity. Not
only must the interface have the searching and sorting
functionality hilt it must also be prepared to represent the
partitioned workloa1 and, of course, compute the partition that
Willl effect the greatest efficiency. With some Dr,'ISs, this
rapability is only an efficiency ootion. 'ith other DFISs that dio
not sulpport non-keyed searching or sorting at ill, this
capellity becoaes critical in terms of being -ible to answer th.
request at all.
Jecjrity
Another aspect of the navigation problem is how secuirity is
taken into acroint. It is clear that for naive end ,rzers of a
natfiral lanquae .nery system that qecurity from tunguthorized
PaqP 15
-iccess is a critic-il 0iueiOr originil hopt- in this regardf was
rhat we could merely rely on the [)B. S qtibschemi to providep the
ncisary security. Unfortuinatply, vie foundf the nranidlarity of thp
suibichema security to he too lnrcjP--l.n. -iccess; is -qrnnterf on a
field-by-field nnsis. Nhis is; fine for 9pplication proorims huit
Loo restrictive for datai baise quiery. WJe have proposed that
-;ectrity also he def ineri on recordl-by-record basis so thait n
uiser might have access to cert-iin fields only for a specif ic se-t
of records.
The implic-itinn of all of this on navigation is that the
navijational choices madle by the system mujst b,- a fuinction of
dhat dfata is availablep to the current riser. For some uisers,
/ithotit access to alrelations, this may reqtire lsdirect
paths than would otherwis;e be requtiredf. It is up to the n.avirqtor
to find the test p-ith to relate all the necPessa,,ry +dIt- qith-oujt
violating any of tie secuirity constraints nlong the way.
The final Issuep related to navigation is one that is
cujrre3ntly uinres;olved1 at this time. This issuje is the mechanism by
whicn the parser communicates to the navigator the explicit
fire lit ionship" information that the tiser includedr in the reqiiest.
In general, the navigator must be prepared to work in the absence
of suich informition making use of predfefined 'Inntirrl paths".
However, in those cases in which the iiser wishes to o',errldepI the'se precfefine(d nnths by explicitly mentioning anotherrelitionship, the system mtust hp rpparedi to act Accordingly. For
ixnM-)1:, in a d-itabasce of professors and studeints related by both
a 'teaches" an1 an "advisePs" relationship, the two requlest: "Who
.1-ire Professor Harris;' students?" is different in A navirlational
sePnse from "a-o aire Professor Harris' *avisees?" or "W',ho dloes
Professor Harriz; -i Ivifse?" It is cleair that for this qimo~le case
Lhe use of the word "a--dvises" or "arivlsees" indiicaites to the
navi-jator which rplaitionqhip to employ. Riut in more rcomn,)px cases
aherp the same two rela~tions must be *Joined more than once in a
request, it is not clear that all such rplationshi,,s shouild he
controlled by the e-xplicit tise of one rplation-shin w'ords'. Oif
couirse if not all suich relationship choices are impacted, the-n vie
nust decide which ones are andi which ones are not, presuimably on
the basis of the original synt--x. However, at this point, it is
not clear that rnol ish syntax couldH (or even should) provide this
kind of Information. This remains An open issuep at this point in
toi )'i L ti r) c . t" 1-1.!~t~ 1 : C
Af ter a qIia-ry is ~ pro)y 1-rt~ in cli i t~r 1
seParch ing or sort iryj i perfotn Ly ti in L i jtri Ij r
eventually arrive :tt a C; .1 t e L C' i I- f '' I. 1, i i 1l- t '- " w rs the
* u~lser' . requejst. Thiq cia La still r'[ -oroV id th
user with thp inforrmation ornu lori. 'I m in, the 4rsi-
way . OIn the onti h-an,1, Lii; kin, .f KL fri p )cc- im ~ 'r'
conlionpla cp--c-,i i ut i nc !-,th cot ii I f m-a tt Ii -1rn nc , et c. ")n t,)
ot!-ier hand , tI is mob tiIe wot~ I I(I i Ii ' )' Ie (-'-j -1 o r C rrv 1riV-n i~t
a n 3r i trary q n.' ip ienc ofI rcuco-' :i ni- . I I t i i.; cejt heoni s th
ititona tjIc proqrvifn i ncj problem. )t ifnlte r-ro -ii - (12 01 r~ro
capat1i-'0lity is solu.-iht.
Thbe placef'Hrit of vr-ri liC,1 7r f100'jj 1jininiii -)n:4
a x i:numv i, e tc . can of't r4n inr i t hr'Fr tn p 1)L f, 1' r i n t hc i n t rf -i r-
Put 5 represePnt-, ano-:ther -x xanp 1 (uf Ilft' the :,or! loni~ ran hp
eih3red. S itv i ir 1, t he , ait.,j nr ou e '' irl :1 W I I 'I nt j.W~ I I L i w- to ly
Wvith other aspects of the interfa-ce tha-t vlnint-)in th ronaulo)
context, This is truep lbeausep it iq not unt'il the Pnd of
proce-ssing the data- that we realIly know all the c~ontexts; to which
stibseqtuent pronotins might refer. It is cprtiinly -onceivnblp thqt
some processes will restrict evfn fuirther the set of records
lisptayed for the ier on the basis of some arbitrairy prerlica-te.
i-ron Ihe tispr~q poinlt of vitiw, ri ioronoun v ViLyo r f or on)]v
to the set of rocor,-ds actua9lly rrinted., Since Lhe itferfa)cf hais no
:nowlprlge of wh-it thcm ari- itrairv prec ,a-tn i , j L word~ ! beI Pnp~Iossile for it to nr(rate. a, 1-jig--IileoIrp:;tai n of w'h-iL
-MbS Pqrent proio, in - nay\, re f er to(-. TI e i iii ct i orw ons0f th i Ir P
Iiui t profouind. Since thip sy itw!n !-,o 1 onoe; ho i h i-,h-,leve j
reprosentrttion of the ronoiun reV int it 'eo fTesr dif fjCIc11lt tn
"ecl,'Io the meaninri to the uiser or to even tail Lahnrjt
interpretation in clarification dialog-.
On the positivfe si'lth low-level rer~cnainof ai
?)ronoi in re fe renco U ut1 ' (V is nc~c;s it atea hy il ', h is c:an speItin
ill dronouin refhrnncps. I his. is pnrt lcfslcrly/ no tic-eabjl a whnn thiz
resiji t of a nori-kwer iid noarhI~c ece l ~ooiI nl",
-i high-level prano'in riop'espnta) Lion is ma inta-ined tha-n clie r'o-thlv
search mujst be recomputel. If hoth i high-l evel giIl-lvI
pronotin representat ion iq miIintriinnd, thpn thne. 3y-,p- te cn ioesCIr bo
the qutery to the uiser with the hIih-It,,vol rerre t-set ot initi Ir'
11 rectly a ccc -s U tliesI re ':ods 1,i hcrjt ctiich ti.; in) the In"--
level representaition. Th-is the effect of hiil'Iin-i ain indePx for
in rirbitrary set on the fly inl us-ing it to speedi up sujhseqIient
iccesses to the same set.
Page 19
i,ith reqnrl to thv'o rinta rroc -inrj rolltines themselves, there
is in interesting relationship to syntictic qwmntiflcation. There
is 9 direct sem-ntic connectinn hetween the category specification
evnployed by the procesneq on certnin types of quantification.
consider the request "How many salesmen are over 100 percent of
.pisotri in each recjion?" he qujantification "in earh region"
semntically deflrns the categories to he employed by the colintin 9
process. This gives nn interesting simplified representation for
Lhis kind of quintifir..tion.
f
I
Pae 20
2onclus ion
Vie have suimmarized the resilt~z of thp C!JR slipported reseparch
on nitural lnnguag,-e datahase quiery. It is interesting to note the
chan'.)e in expectntions ahout datiihngn quepry th-it ha-ve takePn nilace
luring the life of this research contr,ct. At the ouitset, people
regarded practical naturil. languacie systems as a fuituristic
notion: soneiLhinj that would not be availablP for 10-20 years. Tfhe
current atmosphere is one in which a few real worl-I applicitions
are just making it into actuail produiction.
It is fair to say that the Issueps considered most importint
atth outget of t:iis re-se.-rch (the naturallauaenlyi)i
no longer the limiting f-actor. As is evident from the disciission
given in this report, the navigational and processing fuinctione;
provide the most fertile grouind for fuitire research. As suich, the
Problem of providling practical natural languiage access,. to
database is no longer to be considered a primarily nqttural
langtiage analysis problem, buit also a theoretical dantabase
Problem, with overtones of auitomatic proqrnmminj. For this reason,
it is not likely that morp sophisticated prirsing technigue1s will
impact current capabilities as much as more general AT research
related to database semantics Is likely to dio.