+ All Categories

2985772

Date post: 04-Jun-2018
Category:
Upload: albert-garriga
View: 217 times
Download: 0 times
Share this document with a friend
10
8/13/2019 2985772 http://slidepdf.com/reader/full/2985772 1/10 On-Line Access Systems in Statistics Author(s): A. J. T. Colin Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 16, No. 2 (1967), pp. 111-119 Published by: Wiley for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2985772 . Accessed: 18/11/2013 07:50 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at  . http://www.jstor.org/page/info/about/policies/terms.jsp  . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].  . Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to  Journal of the Royal Statistical Society. Series C (Applied Statistics). http://www.jstor.org This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AM All use subject to JSTOR Terms and Conditions
Transcript
Page 1: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 1/10

On-Line Access Systems in Statistics

Author(s): A. J. T. ColinSource: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 16, No. 2(1967), pp. 111-119Published by: Wiley for the Royal Statistical Society

Stable URL: http://www.jstor.org/stable/2985772 .

Accessed: 18/11/2013 07:50

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

 .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of 

content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

of scholarship. For more information about JSTOR, please contact [email protected].

 .

Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to

 Journal of the Royal Statistical Society. Series C (Applied Statistics).

http://www.jstor.org

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 2: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 2/10

On-line ccess ystemsnStatisticsA. J. T. COLIN

Universityf Lancaster

SUMMARYIn thispapertheauthor iscussesystemsoron-line tatisticalnalysis,and describesparticularrogram rittenor n BM 1620 omputer.

1. INTRODUCTION

I SHOULDexplainfirst hatmy approachto statisticss thatofa computercientist,andnotthat fa statistician. am interestedn statisticshieflys a potential ppli-cationofon-linemultiple-accessomputer ystems.

Whencomputers erefirst sed, twas quitecommon o work at the console .To carry ut a simplestatistical alculation, henecessarymachine-code rogramwouldbe debugged tthecomputertself,nd thedatawould befed n, sometimeson cardsorpaper ape,orquiteoften na keyboard,elephone ialor row fbuttons.Duringthisprocessthecomputer pent a great deal of timedoingnothingwhilewaiting or heuserto make his nextmove,but nthosedays computerimehadnotbeen valued andthiswas acceptable.

As thenumber fcomputer sersbegantogrow,t was realized hat heonlywayof accommodatinghem ll in the very imited omputer acilities vailablewas to

make sure hat hemachinewas working ll thetime. Any kindofprocesswhich n-volved thecomputernwaiting ora human ctionwas discouraged; well-knowncomputer irm esigned ittle otices o be pinned otheirmachines aying Think-butnothere . In most nstallationshe actualcontrol f the computer assed to aprofessional peratingtaff hoseresponsibilityt was to keepthe omputerusy,notto think bout what twasdoing.

The result f thisdrive owards reaterefficiency as to remove hecomputerto an ever-increasingistance rom he user. Direct contactwith hemachinewasreplaced y omethingquivalent oa postal service whichn some aseswould akea weekormoreto return reply. Anykind of dialoguewith hecomputerwas in-

tolerably ediousthrough hismedium, utmostpresent-daysers have never x-perienced nyother tateofaffairs.

As the nternalpeedofcomputersontinuedo increase,t became lear hat venmachineswhichwere pparently ully ccupied ould be wasting great eal of time,up to 95%,waiting ormechanical eripheral evices, uchas card readers.

Multi-programmingas thefirst teptowards olving hisproblem.The store fa multi-programomputer s designed o contain wo or more entirelyndependentprogramst the ametime. At any nstant, nly ne ofthese s active;but f he ctiveprograms held up-for example,for a slowperipheral-the omputer an switchcontrol o another rogram nstead f simplywaiting.

Atthis tage twasnoticed hat his rrangementouldmake tpossible o allowthedirect ommunication ith omputers hichwe usedtoenjoy.A large omputercouldhaveconnected oitas many s 30 consoles, achwith tsowntypewriter.achconsolecould be used by a personwhowould have the mpressionhathe had the

111

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 3: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 3/10

112 APPLIED STATISTICS

entire omputer ohimself. n reality,he nternal acilitiesf the omputer ouldberapidly witchednturn o each ofthe ctive sers. The systems feasible ecause theinternal peed of the omputers many imes reater han he peedat which nforma-tioncan be produced, r absorbedby theuser. The pioneers n developing hiskindof multi-accessystemwere hoseconcernedwithProjectMAC at the M.I.T.

In Britain, ll themajorcomputermanufacturersre hurriedlyevelopingmulti-access systemsor heir iggestmachines,nd it can be assumed hat n three rfouryears hese ystems ill be generally vailable. However, nbothpublishedmaterialand demonstrations, ost of the emphasis s on the mechanismsnd supervisoryprograms eededforgettingmultiple n-line ccess to work, nd very ittle as beensaid aboutitspossible pplications.My particularnteresties n designing nd try-ing out various pplications, hich nclude ll those n which hecalculating nd in-formationtoringower f a computeran be usefully oupled to the humanmind'sabilityorecognize atterns,o have hunches nd to generalize.The necessaryrac-tical workcan be done on a primitive,ingle-programomputerf the oss of time

while heuserreacts an be tolerated.At LancasterUniversity,hich, lthough x-panding,s stillvery mall,we are in the fortunate osition f having nough parecomputerime o carry ut some experimentsn thisway.

One of thepotential ses ofdirect ccessto a computers in the tatisticalnalysisof data. Experience hows hat he nalysis f a large mount f nformationsuallyconsists f a process fexplorationnwhich heresults f anyone stepdeterminehenext tepto be taken. Ifa statisticalrogram s embedded n a conventional atch-processing omputerystem,hen he nalysis f one setofdatawillrequire umerousrunson themachine.The earlier unswillbe necessary o correct pecificationsndinconsistentata,andthe ater nes tofollow p interestingossibilitiesuggested y

the preliminarynalysis. Currentlyn this country, he whole processoften akesmonths ocomplete. t isclear hat f heresults f eachcomputerunwerepresentedto theuser mmediately,nd he were llowed to reply traight way,theprocessofanalysiswould be greatly peededup.

2. A PILOT PROGRAM

The programwhich shalldescribes an attempto produce primitivexperi-mentalystemor n-line tatisticalnalysis. t waswrittenor nIBM 1620 omputer,which, lthough malland slow, s usually quippedwith consoletypewriternd adiscfile. Sincethe chiefpurposeof theprograms to experimentn man-machine

communication,he statistical acilities rovided re crude and few n number, uttheprogramsdesignedo that t can be usedwithout riornstructionypeoplewhoknowsomethingf statistics ut are entirelygnorant f computer rogramming.Each stepto be followed ytheuser s clearly yped ythemachine, nd everytemof nformationeeded s elicited ya question.No restrictionsre aidon items uchas variablenamesor theformat fnumbers,ndan importantspectoftheprogramis that t cannotbe broken bymistakes;tdetects onsense r nconsistencies,ypesa warning,nd invitesheuser ogivehismessage gain. If mistakesrerepeated,hewarningsncrease nseverity.

Theappendixhows transcriptf a typical ialoguewith hecomputer. he last

sectionFig. 1) is an exact scaled) reproductionftheconsoletypewriterutput, uttheresthasbeen lightlydited oeconomize n space. Heretheuser'smessages avebeenprintednRomantype, nd themachine's epliesn italics. The RS symbol,mentioned ythemachine, as been omitted.

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 4: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 4/10

ON-LINE ACCESS SYSTEMS IN STATISTICS 113

The program egins y explaining ow to typemessages ndthen nvites heuserto givehisname. Thereafterll datasupplied restored n a discfile nder hatnameandcan be recalled aterwithout ormality.

Next,the program sks the user to describe he ayout ofhis data, which, t isassumed, avebeenpunched ncards. It is importanthat hisdescriptionecorrect;

so theuser s given ummariesfwhathe has said,and sconstantlynvited oconfirmthat hese re right.

Whenthe ayout fthe data has beendescribed,hey re read. The values of thevariables n thefirst ardare printed ut to provide final heckoln he accuracy fthedefinitions. hereafterny ardwhich sdetectably rongmakes heprogramtopandprint warning,ndtheuser s given hechance ocorrecthefaulty ard and toinput t again. Data, onceread, re permanentlytored n thedisc file.

In thefinal hase, heprogram llowstheuser o call for nyof a smallrepertoireofstatisticalrocesses o be appliedto his data. Sinceall results reclearlyabelled,and data at this stage cannot be altered,ogicalerrors such as naming hewrong

variable) re not erious nd no special hecks remade. In this hase, heuser s alsoallowed oseparate isdata ntomutually xclusive roups ccordingo simpleogicalrelations.Statistics anbe computed or he ndividual roups.

Since twas written,he program as been used experimentallyya number fpeople, ll selected ohave some knowledge f statistics. f thosefamiliar ith om-puters, he majoritywere ble to use the programmmediately,ut about300%wereafflictedy computer right nd were pparentlynable to read or understandhefirstmessage yped y the omputer.When hiswas explained,nd it became leartothem hatmistakes n their artwerenot disastrous,hemembers f thisgroup oongainedconfidencend learnedhowto use theprogram.

A secondgroup of experimentalsers consisted f thosewho alreadyhad ex-perienceof conventional omputing.Many in this groupwere initially esetbyimaginaryifficultiesuch s thepermittedorm f dentifiers .n the nd,however,theywere s successfuls thefirst roup n using heprogram.

The programhas also been used for practical omputationn connectionwithabout 10 differentrojects nd appearsto be genuinelyseful.Nevertheless,thas anumber fdrawbacks.Someofthese temdirectlyrom he mall izeand lowspeedof the1620;butothers re defectsntheconversationalspectof theprogramwhichhave shownup clearlyn experimentaluns.

3. POSSIBLE UTUREDEVELOPMENTSThe IBM 1620at Lancasterhas recently een replacedbyan ICT 1909. Among

ourplansfor hismachines a newversion f theon-line tatisticalrogram,nwhichmany f thedefectsn thepilotprogram illbe eliminated. he newprogramwill lsobe experimentaln thesense hatweshallcontinuouslyather ommentsndsugges-tionsfrom tsusers, nd incorporatehemwhere ossible.

From thestatisticalointof view, henew programwillbe muchmorepowerful.We shalladapt existingibrary rograms o providemanymore tatisticalperations,and we shall nclude ll thefacilitiesordefiningew variables nd for electingub-groupswhich re found n conventionaltatistical ystems uchas OPAL or MYC.

In addition,we hope toprovide data structure hich s moreflexiblehanthose ncommonuse now.The principle nderlyingheconversationalspectof ourpresent rograms that

items f nformationhouldbe supplied ne at a time n response o questions.This

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 5: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 5/10

114 APPLIED STATISTICS

method as theadvantage vermoreformal tyles f problem efinitionhat here svirtually o syntax o be learnedbytheuser, nd thatmistakes an be correctedm-mediately ithout etypingntire inesor paragraphs.The method lso suffersrombeing esscompact, ut n spite fthiswe ntend o retain t for ur newprogram.

The experimentshowthatmistakesn using heprogram reoftwokinds, imple

spellingmistakes nd those caused by a basicmisunderstandingf thesystem.Atpresent, oth ypes fmistake re treateds if heyweremisunderstandings,o that ntypingsay) COULD COVER nsteadof CLOUD COVER theuser receives lengthyexplanationhat n thiscontext e is supposedto give the name of one ofthesevari-ables, ndthatnothinglse willdo. The newprogramwill ttempto filterffpellingslips byusing one of the standard lgorithms,efore ssuming hat genuinemis-understandingas occurred.

Anotherdisadvantage f the present rogram s that t is written orthecom-pletely nskilled ser nd does notadapt tself o one who hasmore xperience.Theexplanationswhich re essential orthe beginner ecometedious nd irritatingith

constant epetition. henewprogramwillkeeprecords f each user's killby count-ing the number f times ach questionhas beenput and thenumber fnonsensicalreplies iven, nd will ondense hequestion ccordingly. or example, sequence fdefinitions ight akethe following orm herethe commentsmade bythemachineare in italics):

Variable(l). Whatnamedo yougive o this ariable?HeightWhat s thefirstolumnwhere alues f this ariablere punched? 4What s the ast olumn? 7Heights punchedn 4 columnstartingt column 4. Is this ight? es

Variable(2).Name? WeightFirst olumn? 8Last column?0Weight: columnstartingt 38. Right?Yes

Variable(3).Name, irstnd ast olumns, ith ommasn betweenAge, 1,422 columns, ight?Yes

Variable(4). ex, 43,431column?Yes

Variable(5)

Colouirfeyes, 3,431column,Right?No

Variable(5).Colour feyes, 4,42This s impossible. ry gain.Variable(5) ame?Colour f yes etc.

In practice,twillbe necessaryoexperimentithdifferentatesof condensation.The present rogram,whengivena command to evaluatea statistic, beys it

immediatelynd the command s not stored. This meansthatthe advantagesofhaving tored program re lost,and duringthe courseof a longanalysis ertainmessagesmayhaveto be typed epeatedly. his faultdidnotshowup in theexperi-

ments, ut was pointedout byseveralpracticalusersoftheprogram,who had tocarry ut the same process on a number fdifferent,ut similarvariables. Thereappeartobe twopossible olutions o thisproblem,ndwe ntend oexperimentithbothofthem.First, t is possible o allowthe definitionsfprocedureswithformal

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 6: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 6/10

ON-LINE ACCESS SYSTEMS IN STATISTICS 115

parametersn theALGOL 60 sense. The commands n theseprocedureswouldbestored, ut notcarried ut until heprocedurewas called with ctualparameters.Atypical equence ofdialoguewith hismechanismmight e (user'srepliesn italics):

What tatisticouldyou ike?ProcedureefinitionProcedureookatlanguage arks)What tatisticouldyou ike?AverageFor what ariable? anguagemarksWhat tatistic ouldyou ike?CorrelationBetweenfirstariable) anguagemarksAnd second ariable)MathsmarksWhat tatistic ouldyou ike?Endprocedure

What tatisticouldyou ike?LookatFrench)Average alue fFrenchs 45 9Correlationetweenrenchnd Mathsmarkss - 04568

What tatisticouldyou ike?Lookat Latinmarks)Average fLatinmarkss 67-2

and so on.This process s similar o thedefinition nd use of Macros . Secondly, he

machine an type number n front fevery xpected ommand, tore ll the com-mands s they reexecuted, nd allow a special eries f nstructionsaving heeffectofrepeating nydesired equenceofthemwithdifferentariables.Again,a typicaldialoguemight e:

34 : What tatistic ouldyou ike?AverageForwhich ariable? renchmarks

Average fFrenchmarkss 45-935 : What tatistic ouldyou ike?CorrelationBetweenrenchmarksnd MathsmarksCorrelationetweenrenchmarks nd Mathsmarkss - 0-456836 : What tatistic ouldyou ike? RepetitionFromwhat ommandumber?4Which ariables tobereplaced? renchnarksGive ist freplacements,eparated ycommasLatinmiarks,ussianmnarks,erman arks, reekmarksAverage fLatinmarkss 67-2CorrelationetweenatinmarksndMathsmarkss 0-1356

Average fGermanmarkss 23.4etc.

The second solution ppearssimpler nd, ifwe specify hat repetitionsmaybenested ndthatmore hanonevariablemayreplaced nparallel,t s almost s versatileas themethod fprocedures.

Thispaperhasconsisted ffact-reportingndcrystal-gazingn almost qual parts.Although heresults have reported re crude, am convinced hat thefuture fautomatedtatisticalnalysis sclosely onnected ith he nteractivese ofcomputers.

APPENDIX

Hello. Whenever ou type message n this ypewriter,lease erminatet with heR-Skey obefoundt the op eft-handorner f hekeyboard.fyoumake mistake,hentype d justbefore -S andwhatyouhave ust typedwillbe ignored.Haveyoueverused his rogramefore?Pleasetype es orNo.) No

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 7: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 7/10

116 APPLIED STATISTICS

Pleasegiveyour ame. A. J. T. ColinA File has been pened nder our ame.This s a statisticalrogram,nd your irsttepmust e to supply hemachine ith atato

be analysed. our data presumablyonsistsf a setofrecords,achof which ontainsnumericalalues or ne or more ariables.Wouldyou ike n example? Typeyesor

no) NoYour datamust e punched n to cards, necard o eachrecord. hevalues f he bserva-tionsmay e arrangedn anywayyou ike, rovidedhat ractionalalues lways avedecimal oint n the ight lace, nd that he rrangementsthe amefor llyour ards.You must ecide n names such s age or IQ ) for ll your ariables.

How many ariablesre theren eachrecord?Good.You must ow describehewayyour ariablesrearranged,nd give hem ames.

We shalldeal with hevariables neata time.

VariableWhatnamedo yougive o this ariable?ClassWhat s thefirstard olumnwhere alues f this ariable re punched?

What s the ast olumn?Class spunchedn 1columns tartingtcolumn . Is this ight? es

VariableWhatnamedo yougive othis ariable? exWhat s thefirstard olumnwhere alues fthis ariablerepunched?What sthe ast olumn?Somethingswrong.Try gain.What s thefirstard olumnwhere alues f this ariable repunched?What s the ast olumn?Sex s punchedn 1columns tartingt column . Is this ight?Yes

VariableWhatnamedoyougive othis ariable?.Q.What sthe irstard olumn here alues f his ariable repunched?What s the ast olumn?I.Q. ispunchedn 3 columns tartingtcolumn . Is this ight? es

VariableWhatnamedoyougive othis ariable?MathsmarksWhat sthe irstard olumnwhere alues fthis ariable repunched?NineThis snot validnumber.Now try gain. 9What sthe astcolumn? 0

Mathsmarksspunchedn2 columnstartingt column . Is this ight?YrssNo. Yourchoice s limitedothefollowinges NoNowtry gain. Yea. I repeat-youmust hoose ne of-Yes NoNowtry gain. Yes

VariableWhatnamedoyougive o this ariable?MathsmarksYou have lready sedthisname,Choose nother. iologymarksWhat s thefirstard olumnwhere alues fthis ariable repunched? 1What s the astcolumn? 2Biologymarksspunchedn2columnstartingt column 1. Is this ight? es

VariableWhatnamedoyougive o this ariable? nglishmarksWhat sthe irstard olumnwhere alues fthis ariable repunched? 3What sthe ast olumn?13

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 8: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 8/10

ON-LINE ACCESS SYSTEMS IN STATISTICS 117

Englishmarks s punched n 1 columns tarting t column13. Is thisright?NoNevermind. Let's startdefininghisvariable gain.

Variable6Whatname do you giveto thisvariable?EnglishmarksWhat s thefirstard columnwherevaluesofthisvariablearepunched?13Whatis the ast column?14Englishmarks s punched n 2 columns tartingt column13. Is thisright?Yes

The variablesyou havedefined re as follows

Firstcol. Last col. Name4 4 Class5 5 Sex6 8 I.Q.9 10 Maths marks

11 12 Biologymarks13 14 Englishmarks

Are you satisfiedhatyourdefinitionsre correct?YesLoad yourcase cards into the 1622reader,facedown,withthe nines edge towards he

machine. Put three lank cards on topand start hereader greenbutton)

Variablevalues on the firstard areClass = + 2 0000000Sex +10000000I.Q. = ?130 00000Maths marks = + 79 000000Biology marks- + 31 000000English marks = + 57-000000

Is thisright?YesNumber ofcases read = 131

The highest nd lowestvalues of yourvariables reClass 1-0000000,+?-0000000Sex + 1 0000000, + 20000000I.Q. +98000000, +16200000Mathsmarks +7 0000000, +99 000000Biologymarks + 13000000, + 99 000000Englishmarks + 50000000, + 99000000

The computers nowready o compute tatisticsnd to perform ther unctions.

Wouldyoulike to see a list ofwhat t can do? YesThe following ptionsare available.

(A) Functionsof one variableAverageStandarddeviationHistogramFrequency able

(B) Functionsofanytwo variablesCorrelationRegression quation

ScatterdiagramTwo-wayfrequencyable

(C) To separatecases intogroupsfor ndependent tatisticalnalysisTags

AS C

This content downloaded from 158.109.199.19 on Mon, 18 Nov 2013 07:50:43 AMAll use subject to JSTOR Terms and Conditions

Page 9: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 9/10

Page 10: 2985772

8/13/2019 2985772

http://slidepdf.com/reader/full/2985772 10/10

ON-LINE ACCESS SYSTEMS IN STATISTICS 119

WHAT URTHER TATISTIC WOULDYOU LIKESCATTERDIAGRAM -SBETWEENFIRST VARIABLE)I .Q. R-SAND (SECOND VARIABLE)

MATHSMARKS SHORIZONTAL XIS - I.QoSCALE FROM +94.800000 TO +165.20000

VERTICAL AXIS - MATHSMARKSSCALE FROM +2.14000000 TO +103.60000

*

* . 0

* * .- . S

*

* 5 ~ ~~00

0 55 00*~~~~ o O S*

* . .S .S

* .

* . 55

* .S

* OS .SS

*. _

* - S

* * S S S S

* **

S OS~~O

*~~~~~~~~~~~~~~~

* S o

*

* ..

*

*

KEY - . =1 POINT - =2/3 POINTS9 - =4/7POINTS,+ =8/15 POINTS, =16+ POINTS

FIG. I

Thi d l d d f 158 109 199 19 M 18 N 2013 07 50 43 AM