This PDF is a selection from an out-of-print volume from the National Bureau of Economic Research
Volume Title: Annals of Economic and Social Measurement, Volume 1, number 2
Volume Author/Editor: Sanford V. Berg, editor
Volume Publisher: NBER
Volume URL: http://www.nber.org/books/aesm72-2
Publication Date: April 1972
Chapter Title: Social Science Computer at the University of Wisconsin: SIMS and SEOSYS
Chapter Author: Max E. Ellis
Chapter URL: http://www.nber.org/chapters/c9197
Chapter pages in book: (p. 237 - 248)
a
4naTai% uf !&-oninnu and .ics1 .%f',u,enutr, 2 I)72
SOCIAL SCIENCE COMPU1'ING AT FElL UNIVERSITYOF WISCONSIN: SIMS ANt) SEOSYS
1W MAX E. ELLIS
INTROiUCTION
For the past three years, the Data and ('omputation Center for the Social Sciences(DACC) at the University of Wisconsin has been engaged in developing softwarefor social science applications. The main effort has been research and dcvelopm'entof systems for describing and processing hierarchical data files. Emphasis has beenplaced on the design of user languages for describing data already in machinereadable form and on the development of efficient algorithms and systems forretrieval and editing of large data files. Two such systems arc described in thispaper. SIMS, a Social Science Information Management System. is now underdevelopment and is our ultimate goal in providing the social scientist with acomplete modular and transportable system for processing complex structuredfiles. SEOSYS, the Survey of Economic Opportunity System, is a system developedspecilically for retrieval of information from the Survey of Economic Opportunitydata files and has been used as a model for the design and implementation of SIMS.
The University of Wisconsin has a Univac 1108 system with batch terminalsat remote sites throughout the University. DACC has a Univac 9200 computerserving as an Input/Output terminal to the I 108. The 9200 communicates withthe 1108 via coaxial cable and provides card I/O and printing at the social sciencebuilding site. Magnetic tape Illes are stored at the central 1108 site and are accessibleto all remote terminals. The 1108 hardwai e configuration consists of a centralprocessing unit, 4 memory units of 65K 36 bit words each. 2 Fastrand II drumstorage devices consisting of 22 million words each. 4 flying head drums consistingof 262K words each, 10 tape drives, a printer, card reader. punch and the com-munication devices to handle the more than 10 remote batch terminals.
The minimum computer system configuration in which SI MS can operate musthave the following attributes:
--A multi-processing capability with facility for creation and execution of ajob control stream from a user program.
--An ANSI Fortran IV or a comparable Fortran compiler which throughFortran system routines or special routines called from a Fortran program.allows I/O to a random access device such as drum or disk. Also needed are1,/0 functions comparable to the UNIVAC or CDC Fortran BUFFERIN.BUFFEROUT, DECODE and ENCOI)E [3.51
Provides users with an equivalent of 50K 36 bit words or greater core forthe program and common block and at least an equivalent of one million36 bit words of random storage.
---Allows collection or mapping of precom piled relocatable routines, routinescompiled at execution and labelled common blocks.
A compiler for ANSI Cobol.--At least 3 tape units are required for certain processing functions.
237
SEOSYS, described in the last section, is Written in Fortran and only reqtlirsthe hardware nornially made available to standard Fortran programs. Since allSEOSYS I/O is tape, no use is made of the random storage dcvicc. The sue 01SEOSYS is well within the limitation of 65K words set by the Fortran Compliers
SIMS: A SocIAi SCWNCr INFORMATION MANA(a%lEN1 SYSIFMSIMS incorporates a number of integrated processing functions for the Corn-plete processing of simple arid complex data files consisting of fixed length dataitems. Facilities exist for describing hierarchical structured files which are alreadyin machine readable form and for the complete editing of such tiles [2]. Thesetwo basic functions are complemented by a series of analytical functions such ascross-tabulation, correlation, etc. The modular constructioii of the system enablesadditional analytical routines to be added, including user supplied Fortran sub-routines. The user oriented command language of SIMS provides the social scienceresearcher with an interface to the system which is familiar to him. The syntaxand semantics of this language may be easily altered by a programmer to handleany idiosyncrasies in the terminology used by a particular class of users, or tochange the user interface entirely to conform to users other than the social scientist.Figure 1 is a sample SIMS reauest with explanations of the input Statements.it provides a general feel for the system and some properties of the language. Thisexample combines a number ofdifferent processing functions in one request or job.The user has survey data on cards and is using SIMS to "familiarize" himselfwith his data. Assume this is the first tine the data has been processed by thecomputer. In a single SIMS run, the researcher can describe the data (*DESCRIplION), validate and perform Consistency checks on data items (*EDIT) andproduce some preliminary cross-tabulations (*CROSSTABS)An input request may be Catalogued and retrieved at a later date for updatingor execution. The file description max' be entered into the SIMS library and storedin machine readable form. When the file described is referenced in subsequent runs(using the *INPUT statement) the file's descriptio,i is autoniaticall) retrieved andmade available to the SIMS retrieval and analytical routines.Initially SI MS will be limited in its statistical analysis capability since thistype of processing is readily available via other systems or generalj,d routinesand the file handling features of SIMS provide for complete editing, reformattingand extracting of data for such statistical programs. The main objective of SIMSis to provide a researcher with a file processing tool that he can use without the aidof a programmer, Figure 2 is a list of the commands for the first SIMS system.Details on the parameters of each statement are not given but the hriefdescrupt ionsof each should serve to summarize the features of SIMSThe first version of SIMS is scheduled for release by the end of 1972. Thisversion will be batch operational and will run under the EXEC $ operating systemof the Univac 1108. Most routines have been written in ANSI Fortran IV or Cobolwith additional DACC Fortran coding standards applied [I].A generalized system for implementing applications software systems has beendeveloped for the implementation of SIMS. LENS (Language intErface withNatural Semantics) [4] is a system which writes or genermtes programs from inout
238
SA
MP
LE S
IMS
RE
QU
ES
T
5EC
INU
SE
R-S
MIT
1I,A
CC
OU
N'r-
2908
.MO
DE
-PR
OD
,RU
N-E
D-1
97 1
-SU
RV
EY
-TA
BLE
S
*rj
AN
ALY
SIS
OF
SU
RV
EY
DA
TA
*7'p
1Jr
1971
-S
UR
VE
Y
*ED
IT, T
TP
EO
BS
ER
VA
TIW
S ,M
AX
-CR
RO
F.S
-lOO
VA
LID
AT
E V
AR
IAB
LES
SE
X IK
CL*
,AG
E, O
CC
UP
AT
ION
(H
EA
D)
C1C
K, I
F (
SE
X (
HE
AD
)' IS
-M
ALE
AN
D-A
GE
1E
AD
) -G
T. 2
1 A
SID
- E
$CC
eE.C
T'2
OO
O)
CH
EC
K, I
F (
SE
X (
HE
AD
) -E
S -
HA
LE.A
ND
V6E
Q' 1
-AS
ID.S
EX
(S
PO
US
E)
.zS
'FV
IALE
)
CC
RO
IST
AB
SC
ELL
S A
RE
FR
EQ
UE
NC
IES
, RC
W-P
ER
CE
NT
, CO
LU)O
5-P
ZR
CE
NT
TA
SLE
R(l-
OC
CU
PA
rLcB
CO
LUN
A-S
EX
(H
EA
D)
TA
BIL
, RJ-
OC
CU
PA
TI4
, CO
LUM
N-W
OR
K-C
OD
E, P
AG
E.S
EX
(HE
AD
), C
ELL
S A
ll P
BE
QU
1CU
S
DE
SC
RIP
T1O
N, F
ILE
-NA
ME
- 19
71-S
UR
VE
Y
AB
ST
RA
CT
:19
71 S
UR
VE
Y O
F H
EA
DS
OF
HO
US
EH
OLD
S IN
5.E
. WI5
CfS
IIIT
IlES
DA
TA
OB
TA
INE
D F
R(1
DE
PT
. OF
WE
LFA
RE
.S
TO
RA
GE
-D
ES
CR
II'IIO
U:
ST
OR
AC
E-D
E V
ICE
-CA
RD
SR
EC
OR
D-X
DS
NT
IFIC
AT
IOR
-CA
RD
-$O
OB
SE
RV
AT
IOR
-ID
EN
TIF
ICA
TE
cE4.
NA
ME
- II
EA
DID
- H
EA
D-N
UM
BE
RR
EC
CE
D-D
ES
CR
1PT
ION
:N
AM
E -
HE
AD
, CA
RD
-NO
- I
VA
R1A
3LE
1:
NA
ME
- H
EA
D-N
LB4I
IER
, FO
RM
AT
- 1
/14
Fig
ure
1
EX
I'JN
A1'
!ON
01'
VT
AT
EM
I'.N
TS
Thi
s is
a p
rodu
ctio
n ru
n fo
r S
MIT
H, t
he in
put a
tres
o ca
talo
gued
undc
t acc
ount
290
9 an
d th
e gi
ven
run
idtn
tl(te
atl,o
n 19
71-S
UR
VE
Y-
TA
BLE
S.
'Thl
.s ti
ll. .s
pI'c
ars
on a
ll pa
gn o
f prin
tor
Out
put.
Xnp
ut ii
the
1971
-SU
RV
EY
tile
dc.
czib
ed u
nder
aD
ES
CF
,ZP
TZ
ON
.
Val
idat
e th
e ce
de. f
or tt
varia
ble,
list
ed s
nd p
erfo
rn th
sco
nsis
tenc
y ch
ecks
ela
ted.
Con
tinue
unt
il M
AX
-ER
RO
RS
-IG
O.
Che
ck e
tch
entr
y or
obs
erva
tion
and
prin
t err
or m
essa
ge if
expr
essi
on is
fals
e.
Pro
duce
the
follo
win
g tw
o co
ntin
genc
y ta
bles
giv
ing
freq
uenc
ies
ofoc
curr
ence
(or
cou
ntS
) an
d pe
rcen
iace
.T
he s
econ
d ta
io 1
. 3-
dim
ensi
onal
. For
tabl
e 2
thu
glob
al p
aram
eter
s of
the
*CR
OS
ST
9Sst
atot
rent
are
ove
rrid
den
by C
ELL
S A
RC
FU
EN
CIE
S.
The
sur
vey
file
is o
n ca
rds
with
1 to
3 c
ard,
per
obs
erva
tion
or e
ntry
dep
endi
ng w
heth
er a
spo
use
is p
rese
nt a
nd if
ho.
id w
orke
d.C
ards
are
iden
tifie
d by
CA
RD
-NO
, and
obs
.rvn
tton,
by
HE
AD
-N
iIIR
ER
.C
ard
1 is
HE
AD
info
r., C
ard
2i
SP
OU
SE
and
3 in
com
ejo
for.
of H
EA
D.
The
sta
tem
onts
bct
wrt
n 'O
IYC
RIF
TIO
R a
ndE
AT
A
are
subs
tate
men
ti of
the
Dat
a D
cscr
lptio
c
rho
FO
RM
AT
is th
e "s
tart
ing
eolu
mn"
/'For
tr..n
For
mat
".T
he B
CC
.2:D
Isth
e co
de o
r v*
lue
of a
var
iabl
e or
item
.T
he II
KA
D-I
11M
AE
R a
ppor
sin
Col
a. 1
-4a
ever
y ca
rd o
r re
cord
.T
he C
AR
D-N
O. i
.i in
Col
. 3 o
fev
ery
card
.V
ALU
ES
.may
be
refe
renc
ed b
y th
eir
nam
e, e
.I. S
EIt
lM
ALE
.V
AR
LAB
LES
may
be
refe
renc
ed b
y th
clr
12 c
har.
man
e or
unk
qonu
mbe
r.
A d
etai
led
desc
riptio
n of
a v
aria
bLe
may
be
give
n en
d co
ntin
uEd
on a
dditi
onal
car
ds if
icce
snar
y (e
.g. A
CE
on
the
left)
.
MA
RIT
AL-
ST
AT
indi
cate
, if S
PO
US
E te
nd s
houl
d be
pre
sent
.Ii
SP
OU
SE
pre
sent
and
tIul
,s V
ALU
E -
2th
uco
a va
l1da
tIoer
ror
will
ho
indi
cate
d,W
OR
K-C
OD
E in
dica
te, i
f IIF
AD
-CA
SII
card
pre
sent
.O
nly'
one
SP
OU
SE
car
d an
d on
e IL
EA
D-C
AS
U c
ard
m.,y
.ppe
sr fo
r a
HE
AD
.T
his
is s
tate
d in
thso
ST
RU
CIr
rUR
E-D
ES
CR
LPrL
os.
It O
CC
UP
AT
ION
was
not
giv
en a
ME
SS
ING
VA
LUE
of 9
9 w
as a
ssig
ned.
VA
RIA
LE 2
:N
AM
EC
AR
D-N
O, F
OIU
IAT
- 5
/IlB
OU
ND
1:
-lE
AD
-CD
, VA
LUE
- 1
BO
UN
D 2
:IIA
HE
- S
PO
US
E-C
D, V
ALU
E -
2B
OU
ND
3:
MA
NE
-lE
AD
-CA
SH
, VA
LUE
- 3
VM
IAQ
LE 3
:N
AM
E -
SE
X, F
OR
MA
T6/
11S
OU
ND
I:N
AM
EM
ALE
, VA
LUE
- 1
BO
UN
D 2
:N
AM
E -
FE
MA
LE, V
ALU
E -
2
VA
RIA
BLE
4:
MA
NE
- A
CE
, FO
RM
AT
- 7
112
DE
TA
EL
- 00
IMP
LIE
S N
O A
C! C
IVE
N
YM
IAS
LE 3
:N
AM
E -
RA
CE
, FO
RM
AT
- 9
.9.5
VA
RIA
BLE
6:
NA
ME
- M
AR
ITA
L-S
TA
T, F
OR
MA
T -
16/
11B
OU
ND
1:
NA
ME
- M
AR
RIE
D, V
ALU
E -
1S
OU
ND
2:
NA
ME
- S
XN
CLE
, VA
LUE
s 2
VA
PIA
BIIS
7:N
AM
EW
OR
K-C
OD
E, F
OR
MA
T -
IS/Il
SO
UN
D 1
:N
AM
E -
NO
r-W
OR
KIN
G, V
ALU
E -
0B
OU
ND
2:
NA
ME
- W
OR
KIN
G,
VA
LUE
- 1
VA
RIA
BLE
SN
AM
E -
OC
CU
PA
TE
1, F
OR
MA
T -
16/
12D
ET
AIL
- N
OT
ALL
OC
CU
PA
TIO
NS
AR
! GIV
EN
BO
UN
D 1
:N
AM
E -
BR
ICK
lAY
ER
, VA
LUE
- I
BO
UN
D 2
:N
AM
E -
CA
RP
EN
TE
R, V
ALU
E -
2B
OU
ND
3:
NA
ME
- O
TH
ER
, VA
LIS
E -
3-9
8S
OU
ND
4:
NA
ME
- M
ISS
XN
C, V
ALU
E -
99
a
SM
P12
SIM
SR
EQ
UE
ST
Exp
AN
Art
C,4
0?
S7A
1NT
SR
EC
OR
D-D
ES
CR
ZP
rZ(M
;S
'OU
SE
2 (S
A1E
AS
RE
AD
RE
CO
RD
VA
J1A
BLE
l-)
VA
AB
LE 9
:!A
O-K
WlB
FR
l/t'.
WH
OSP
OU
SE B
ELC
NC
STO
VA
RIA
BLE
10:
CA
RD
-NO
5/IL
CA
RD
/RE
CO
RD
WC
TIF
ZC
AT
I(B
IB
OU
ND
1:
HE
AD
-CD
1A
D D
EH
RA
PH
ZC
INF
O.
BO
UN
D 2
:S
pOU
SE
-CD
2S
PO
US
ES
D1O
GR
AF
NZ
CIN
TO
.B
OU
ND
3;
HE
AD
-CA
SH
26I
CE1
EY V
ALU
ESIF
HE
AD
WO
R3E
DV
AR
IAB
LE 1
1;S
EX
6/Il
BO
UN
D 1
:M
ALE
1B
OU
ND
2:
PY
.NM
22
VA
RIA
BLE
12:
AC
E1/
IlA
CE
IN Y
EA
RS
VA
RIA
BLE
13:
RA
CE
9/A
STHE VA.UZ ISAINDREC
Note that MOEv*1, naaaa
are numerals mnd
the Ictsa
BOUND 1:
1w
uxrE
ceded value.
are nawe..
BOUND
2:2
BM
CX
BOUND
3:3
OTH
ERR
EC
OR
D-D
ES
CR
IPT
ION
:H
EA
D-C
AS
H,
3V
MIA
ELE
16:
INC
Q(E
40/7
10.2
CR
OS
SIN
C/Y
EAR
VM
IAB
LZ 1
3:A
SS
ET
S30
/710
.2T
OtA
L A
SS
ET
SV
AR
IAO
LZ 1
6;LI
AB
ILIT
IES
6O/?
'.0.
TO
TA
L LI
AB
ILIT
IES
ST
RU
CD
..R5-
DE
SC
RX
PT
IC*I
:H
ElD
RE
CO
RD
ISF
0U.J
ED
BY
HE
AD
-CA
SH
RE
CO
RD
IF W
OR
E-C
OD
EIS
WO
RK
ING
ELS
EIS
TO
LUZ
JED
BY
SP
OU
SE
RE
CO
RD
1? M
AR
ITA
L-S
TA
TE
QU
ALS
1.
SP
OU
SE
RE
CO
RD
ISV
OLL
4ED
BY
HE
AD
RE
CO
RD
.H
EA
D-C
AS
H R
EC
OR
DIS
P0I
LaJE
D B
YS
i'CV
SE
RE
CO
RD
.iT
MA
RV
rAL-
S'E
AT
EQ
UA
LS I
ELS
E IS
FO
UaJ
ED
BY
HEA
D R
ECO
RD
.*O
AT
A, F
tLS
-Mi
- 19
71-S
UR
VE
Y(D.r4 Cerda tot
Sf1 Survel 711.)
1 (Continued)
The SPOUSE record
dcecrptton appears
,tcb 2s,neternscD
'nISSInK and utthence dcatl
descrIption
or
able
SP
QU
SE
car
d..rc coded "2' in
th
rccord ID code
or
CA
RD
-NO
.Thu
value,
2,te hated on
the
RE
CO
XD
-DE
SC
PIP
lEO
Nst.c.ent.
DelSaiter. are
ptLon.l a, only
'blanks" ore
required.
Bounda need notbe pect(Led
a. can be errs
froe the..
ccntinuoue .oney
vatuca.
The
ST
RU
CT
UR
E-D
ES
CR
IPT
IONdeecribee te
hoLcal reletlon
aeona the. 3 card.
or record types.
Note t.,t
IE\D
Card
can be followed
by any one of
the 3 cord
tjpee,
SPOUSE Card
can be followed
only by a IUA1)C.r. and
lEA
D-C
AS
HCard by
a HEAD
or a
SP
OtIS
ECard dependl
, ,.wr&;.l stet..
A data card fIle
neu.r by prcccdcd
ny an *OATA
.atcmrnt,
Tine
ft1
sa,ee act arce
wIth th.st
prtnc .
tie °LPII'lJP And
*DE
SC
RZ
PT
LON
.tnte,uent..
End of SIRS
Input requelt.
control Scatemenenj
CS
EC
INThis statement precedes each SIMS request and identifies
the user and job.
*END
A StMS request is terminated by WEND.
Hor, than on. request may be ubmittid and is
lentified by a
b,gtnning *BEGIN and an ending 'END.
*ST and *flOST
rhes. stattments if ratbeddrd in the input request either turn on
or turn off a listing of the input
request cards.
*REMOVE
All input requests are catalogued under the RUN-ID of the
'SEGIN statement (tt present) and are
removed or uncatalogued with the
REMOVE statement.
*1151k
Ujars Fortran subroutines must be preceded by this statement.
*DAV
Data on cards sub.itted as part of the SIMS it put request arepreceded by Chic statement.
8IHS Statumflflts
This statement identifies the file that is to
'a the input to the processing function or
functions
specified.
The major statement.
*IN
PtJ
7,is glbal to all processing functions unless a major pro-
cessing function statement (e.g. 'CROSSTAB. 'FDIT etc.) is followed
by an INPUT statement
(nO
aste
risk)
,then the file li,ted on rite
INP
UT.catcmtnt will be used as input to the function re-
quo. ted.
Thc.e statements see used to select or omit observationsand/Or variables for processing.
Tttc sane
global relation as eaplained for
*IN
IUT
and
INP
UT
appl
ies
to th
ese
stet
cmen
ts.
This statement and its 12 ,ubstatement5 (Not listed here) representthe Date Doscriptior' lm',fc
(DDL) of SIMS.
This statement and its 4 substacoernotl
(Not
listed here) represent the SIMS variable rcdeflnit ton
capability.
The statements are uged to reco4c variables or compute newyariableu, and dci&,c and
assign values and value names to variables.
This statement is analogous to the
ST
RU
CT
UR
Estatement of the Dot (See the
amplo request>,
It
enables the logical structura of the file to be respecit'isd atexecution time thereby incrc.trrI
the retrieval efficiency.
Figure 2
nDut Statements
'INPUT or INPUT
'SELECT or SELECT
OMLT
or
t1T
*DESCRIPT7(1l
*PSUUCTURE
S
SIM
9 St
eteo
nta
snt.
Otrr
pur o
r C*i
fiUf
Thta
stat
emen
ttd
sntif
i.s a
n ou
tput
ILl..
The
*am
e gl
obal
ralit
ton
a, x
plsi
ned
(Or '
INPI
.'T a
ndIS
PUT
eppi
ts. t
o th
isst
atem
ent.
*TIT
L.E
Th. t
itle
spec
ified
appe
ers o
n ev
ery
page
cf o
utpu
t.'E
DIT
This
stat
emen
t sod
its 2
subs
tace
nent
u(N
or li
sted
her
ebu
t app
earin
g or
the
sam
ple)
repr
rsen
t the
stns
file
val
idat
ion
and
con.
t.ten
eych
ecki
ng c
apab
ilitie
s.Ed
it op
erat
ion.
on th
, inp
ut fi
le.p
enift
d in
clud
evs
lidat
lon
of 1
)ob
serv
atio
n st
ruct
ure.
2) v
aria
ble
form
at.,
and
3)va
riabl
eco
des o
r val
ues
and
cons
iste
ncy
chec
king
.tso
gva
riabL
es.
this
stat
emen
tac
com
panI
ed b
y up
datc
trans
actio
n ca
rd.
prov
ides
a m
eans
for
dele
ting,
add
ing
orco
rrec
ting
obse
rvat
ion,
or v
arid
hieg
of t
hein
put f
ile a
peci
fied.
5DU
hIP
Rec
ords
of t
he in
pot
file
spec
ified
are
dum
ped
or p
rinra
dto
re*d
abt,
for.
in a
form
at d
epen
dont
on th
e re
cord
ing
mod
eof
the
file
and
o?ci
ona
,pec
if te
d by
the
user
.Th
e in
put f
ilesp
ecifi
ed L
a co
pied
in th
e so
me
form
at,
This
stat
emen
tsp
ecifi
es c
ondi
tiona
let
ract
ion
of o
baer
vatf,
ona
or v
risbl
ea p
rodu
cing
iubp
opul
.toe.
of th
e in
put f
ilesp
erifi
eiV
ersi
on 1
of S
ilts
assu
mes
aer
ial o
rse
quen
tial p
roce
ssin
gof
dat
a.So
rting
of a
spec
ifind
Inpu
t file
is sp
ecifi
ed u
sing
tie *
SOR
T St
atC
oent
.M
EPC
ETw
o fil
es o
f the
nam
e bi
tes1
stru
ctur
esa
y be
mer
ged.
The
mer
ge c
riter
iaen
d "h
It-m
iss'
optio
n,ar
e ap
ci(ie
d on
thIs
stat
emen
t.*S
J.tPI
ZA
rsnd
sam
ple
or a
snep
levh
ich
incl
udes
rare
occ
urrin
g va
lues
for v
aria
ble,
ispr
oduc
ed ir
on th
ein
put f
ile sp
ctlu
icd.
The
varia
bles
use
da.
the
sam
plin
g cr
iteria
are
li.te
d as
per
t of
tIs sL
ate-
men
t.'$
AR
CIX
AI$
One
-dim
ensi
onal
or m
argi
nal
freq
uenc
y di
strib
utio
nson
vat
iabi
c, a
re p
rodu
ced
from
the
optio
nan
dw
aris
ba li
st o
f rita
stat
emen
t.*C
RO
SS'E
SN
-dim
ensi
onal
tabi
es o
ffr
eque
ncte
s, nn
uns,
urns
, ste
ndor
d dv
intio
ns,
row
per
cent
uies
or
orn
perc
enta
ges a
re sp
ecifi
edus
ing
this
ata
tune
ntas
wel
t as e
saoc
ited
atat
tatie
s suc
h as
ri'l-.
qsre
,va
rianc
e, st
anda
rdde
vitio
ncc
c.71
NTS
Ban
s mom
ent m
aIde
n,or
eut
rLce
s of ø
elge
ted
varia
bles
sum
s and
sue,
of c
ross
-pru
duct
sar
c pr
odoc
edfr
om th
e op
tions
end
vani
ubic
, lis
ted
inth
tc st
atem
ent.
'CO
RR
E1A
TXG
t5C
orrc
latio
n m
atric
esor
nob
ecie
d va
ri&,tc
sar
c pr
oduc
ed fr
Om
tie
varI
able
s in
this
atut
rnco
t,W
TAC
XW
ino
in b
atch
nod
cse
t tin
,, of
il,
IIM
S m
achi
re rc
adub
lodc
curn
cnte
tion
will
be
prin
ted
ucco
rl'g
toth
u pr
oble
m a
reas
the
user
has
indl
rtitd
a, p
art o
f thi
sst
atem
ent.
In in
tera
ctiv
eor
00-
line
mod
oth
is st
atem
ent
initi
ates
an
inte
ract
ive
icac
hing
fund
ton
inw
hich
the
user
doaw
era
ques
t toe
' rel
e-va
nt to
this
pro
blem
.Ti
c in
tera
ctiv
eff
.Cil
(unc
tion
wIll
not b
e av
aila
ble
inve
rsiO
n 1
of S
L.15
.
Pigu
re 2
(Con
tinue
d)
describiii the .soiuee language (SIMS statements) and the target language (thegenerated or precompiled SI MS tob stream which is to he executed). Rules aregiven to LENS for the mapping of the source language to the target language. Inthe case of SIMS, the rules are tire complete description of the SI MS commandlanguage. For sonic other general applications program the rules would he thedescription of the resultant program's control cards and control card processing.During the mapping process detailed error messages are printed as statements of thesource language are checked for syntax and order. Statementsof the target language are stored on a random access device !br later execution.This then completes the LENS processing.
In summary, the SIMS system is composed of generalized relocatable routinessuch as an El)IT print routine, a cross-tabulation subroutine etc.. and LENSmacros and nets which describe the source and target languages or SI MS state-ments and generated job stream, respectively. Each user has access to the entiresystem and as such can create his own data base of flies, file descriptions and libraryof SIMS requests unique to his application. If he so chooses he may produce hisown version of the SIMS request language and associated generated output. Thiscan be done through alteration of the LENS input. The modular construction ofLENS and other SIMS routines plus the paging capability of LENS and the hostoperating system IrciIitates many SIMS users to run SIMS simultaneously. Finally.SIMS provides both a novice and experienced computer user with a tool for J
processing simple. complex, large or small jobs in a manner familiar to him.
SEOSYS: A GENERALIZED Sysmsi FOR EXTRACTION FROSI AND ANALYSIS OF TIlE1966-1%7 SURvEY or E(oNowc OI'I'ORTUNIT\ DATA FILEs
1. Logical Structure of the 1966-1967 SEO Data Files
The 1966 and 1967 "Surveys of Economic Opportunity" were conducted bythe Bureau of the Census at the request of the 0111cc of Economic Opportunity inorder to augment the information regularly collected in the Current PopulationSurveys (CPS) for February and March of each year. In addition to a number ofitems common to both surveys (such as age. family status, work experience andincome), the SEO also provides information on other characteristics such as housing,marital history. training, assets and liabilities. The main purpose in collecting thisinformation was to provide a base for micro-analytic research in exploring thecauses and correlates of poverty. The files have beeii specially designed, edited anddocumented to this end.
The 1966 SEO sample consisted of about 30,000 households and was madeup of two parts : (1) a national sample (about 18.000) drawn in the same Wa as theCurrent Population Survey Sample and (2) a supplementary sample (about 12.000)in areas with a large concentration of nonwhites. The sample was designed in thisway to improve estimates of the characteristics of the poor, in particular, thenonwhite poor.
The 1967 SEO saniple consisted of reinterviews of tire same addresses in-cluded in the 1966 SEO. Most of the questions asked in 1966 were asked again in1967 making some measures of change possible for persons interviewed in bothyears.
243
interview Unit, and "adult'' information for some of these people. It may be usefulto think of the information for each SE() household or address as organized withina 4-level structure with the segments of information for the household connectedby a simple "tree structure" as illustrated opposite. The four levels implicit in thestructure each contain one of the four segment types within the file.
2. Phsiva1 Characteristics of tIw 1966--I 967 SEQ Data Files
Although the tree structure is conceptually useful for describing the or-ganization of the file, the organization of the file on magnetic tape is sequential.Segments for each household appear on the tape file in "left list" order. i.e.. thatsequence in which they occur when the tree structure is traversed from left to rightalong its branches. For the above example. the segments would appear on magnetictape in the following order:
Segments describing a given household are contiguous on the file. Non-interviewhouseholds are represented by a household segment only.
Input to SEOSYS must be either the 1966 or 1967 SEO as produced by theData and Computation Center. These versions of ihe SEO files contain fixedlength physical records or blocks with a record being 9 numeric BCD (BinaryCoded Decimal) characters. Blocks contain 30 records each and records of ahousehold may continue over more than one block.
3. Data Retriet'al
SEOSYS provides an efficient means for retrieval, extraction and analysisof information from the SEO data tiles. Most standard statistical progranis orsystems are not capable of directly processing files with complex structures suchas SEO. They usually require the data to be of a "rectangular" or matrix structure,in which the columns of the matrix are the variables and the rows the observations.Most often an observation is synonymous with a tape record or card. SEOSYSbridges this gap by retrieving information from the hierarchical tree structure (asillustrated in the sample household) and creating a rectangular file for analysis,This reformatting or structure change may be combined simultaneously with
245
Segment Level ('onlent
H H L D I HOUSCI1OI(t dataIN I V 2 Inierv ie Unit No. I dataI'FRS 3 Person I dataADLT 4 Adult I dataPERS 3 Person 2 daaADLT 4 Adult 2 dataPERS 3 Person 3 dataI NT\' 2 Intervie Unit No. 2 dataPERS 3 Person I dataADLT 4 Adut I dataIER.S 3 Person 2 data
analysis, or may be done separately by producing an extract or work tile which is asubpopulation of SEO to be analyzed at a later date.
Pertinent physical characteristics of the SE() tapes arc provided to ShOS\Svia an abbreviated machine readable version of the SE() codehook. tJsin thisinhrmation and "knowing'' the possible tree structures a! households in thefile, SEOSYS iscapableof retrievingattrihutes from anyofthe four levels, household,interview, person or adult. The user specilies at what level his analysis will he.SEOSYS then searches a "household tree," "remembering" at what level theanalysis will he based and retrieves the attributes or variables to be selected Iromany leveL A fixed length observation vector containing these data items from anylevel is then created. one observation for the level of analysis.
Consider a study of all l)CESOflS in the survey who arc black, have incomesless than 53,000 and who live in multi-family dwellings. The umi o/ anti! rsis orlevel of analysis in this case is the person. Therefore an observation possibly con-taining information froni all levels (e.g.. HOUSE SIZE from the HHLI). RACEfrom the 1NFV, AGE from the PERS and INCOME from the ADLT) would becreated for every person who satisfies the selection criteria. SEOSYS, as it is tra-versing a household, "saves' attributes or variables from higher levels (e.g..HHLD and INTV) if need be and "I ks ahead" for data from lower levels (e.g.,A DLT). During this retrieval proci earching is terminated immediately if thedata interrogated do not satisfy the . ction criteria, thereby minimizing retrievaltime. For the example request rn oncd above, if the household being queriedconsisted of only one family, tb itribute # FAM (number of families) of thehousehold record or segment b.: g equal to I would indicate to SEOSYS thatpersons in this household shoulo not be included in the sample. Any further check-ing of race or income etc.. would be omitted and SEOSYS would then search for thebeginning of the next household.
Most analyses performed on survey type data files require some transforma-tion of the data in the master file, creation of new variables or conditional extractionor selection ola sample population. The SEO tiles are no exception. Because of theextensive amount of information for a household and the complex structure of thefIles, users of the SEO data will almost always require some form of data trans-formation to create a subpcpuiation analysis. SEOSYS allows a user completeinteraction with the system through user supplied Fortran subroutines. Suchroutines facilitate transgeneration of variables at all levels and selection ofobserva-tions. A user may also perform his own analysis in these supplied routines.
SEOSYS has been developed specifically for the purpose of providing aresearcher with a user-oriented system for accessing. extracting, and analyzingdata of u veys of Economic Opportunity. Since SEOYS has been customdesigned for these data files, the retrieval algorithm in SEOSYS provides efficientaccess to the data while giving users a general system for r.rocessing the data.The general features of SEOSYS allow almost any request to he handled withminimal computer time and little or no programming time.
4. Documentation 4 railahie
The following documents are available free through the University ofWisconsin Data and Computation Center:
246
---1966 Survey of Economic Opportunity Codehook---1 967 Survey of Economic Opportunity Codebook---1966 and 1967 Survey of Economic Opportunity Sample Design and
Weighting---The Comparison of Selected Economic and E)emographC Characteristics
from the 1966 and 1967 Surveys of Economic Opportunity and the Corn-parable Current Population Surveys
---1966 Survey of Economic Opportunity Unweighted Counts (Includingweighted estimates of Income, Asset and Liability items)
1967 Survey of Economic Opportunity Unweighted Counts (Includingweighted estimates of Income, Asset and Liability items)
---1966 and 1967 Survey of Economic Opportunity Sample Variance Esti-mates
1966 and 1967 Survey of Economic Opportunity Cross-Year Tabulations---SEO Data Files--Fixed Length Format----SEOSYS: A Generalized System for Extraction from and Analysis of the
1966-1967 Survey of Economic Opportunity Data Files- Users Manual
The documents listed above and others have been compiled by F.. JoAn Olsoninto the Surrey of Economic Opportunity Bibliography. The bibliography is inmachine readable form and is printed by the computer via the indexing system.UWIS. developed by the Madison Academic Computing Center at the Universityof Wisconsin.
The list of documents is indexed by author and documents with more thanone author appear once for each author. The entries of the bibliography have
been assigned to one of the following categories:User Guide (6) Working PaperThesis (B.A.) (7) PublishedThesis (M.A.) (8) ConferenceThesis (PhD) (9) OtherForthcoming
The category name appears on the listing. The bibliography has also been indexed
by key title words.
ACKNOWLEDGMENTS
The SIMS system has been funded entirely by the National Science Foundation.grant GS-1992, and has been under the faculty direction of Professor DennisAigner, with Max Ellis directing the system design and implementation. Significantcontributions in development of the system have been made by the followingsenior staff members of DACC: William Katke. Kenneth Nelson, James Olsonand Shou-chuan Yang. These persons with Max Ellis have designed the system. its
user interface and programs.The development of SEOSYS was funded by the Office of Economic Oppor-
tunity and the Institute for Research on Poverty. Programming of the system was
done by Kenneth Nelson, Linda Werner, and Luise Cunliffe. Nancy \Villiamsonand Ronald Sepanik contributed significantly to the design of the system and
247
I
assisted the proglannners in the testing of SEOSYS. The portiofl of this pap.pertaining to the Sursey of Economic ()pportunjt includes corn ribut ions fromRonald Sepanik and 1)avid Richards011 I )escriptioii of the .SF() data flies habeen reproditced in part from Th 1966 and /96/ Stirrer o/ Lmiamjc ()ppoi 11tH itFiles wit! Re/wet! So/ht'are, l3rookings ('omputer ('enter Memo # June 30.1969 by George Sadowsky and Marore Reed.
Riu EREN(',s[ Ellis Ma E. Fortran (u/,n SfwU/ars/ Daa and (iputation (euler teeliuiie.ii Paper IlPa SfUnjversut of VVisconsun Madison \Vjseonsj,u E)ecenibr 197012] Ellis. Max E and K. 11. Nelson A flats, flt.or!ptu,n l.anu,,'e br llar,u-5lzi,-,,/ l)aia /3/es l'ueuikdat ACM SICFIDENT workshop
on Data Descruptuoti and Acce5s. Reprinted us Data and ('ounruI(,i.tion Center Paper ftP-I I). University of WISCDflSjfl Madison Wusco1971)13J Control Data Corporation 3400 3600 3f)0 ( mpultr .Svstenu I;rr-ar, Rsiertnu s Pt, hiNo. 60132900. A. 965.
f4J K .,i ke %V,l 11am I. E.VS ReJeri',zue .tfaniw/ Prs'/j,flin5,r F. Data a id (am p151st I on (en ICr WorJ, nPaper UnIversjt\ of W!scon5i Madison Wisconsj,i Auet,st. 971.[5] VNjV\C f-ww/wfle,,,(s 0/ hsrr UP-7S6 October 14, I 965