Post on 12-Apr-2017
transcript
Alternative Allocation Design for the Occupational Employment
Statistics (OES) Survey Ernest Lawley, Bureau of Labor Statistics
Marie C. Stetser, Bureau of Labor StatisticsDr. Eduardas Valaitis, American University
OEUS ANNUAL MEETING 2007Washington, DC
Alternative Allocation Design for the Occupational Employment
Statistics (OES) Survey• Occupational Employment Statistics (OES)
Survey• Frame Development• Frame Stratification• Sample Requirements• Prior Allocation Design• Current Allocation Design• Calculating Sh (standard error)• Reliability
OES Survey
• Partnership with 50 States + DC, Guam, Puerto Rico, US Virgin Islands
• Measures occupational employment and wages within 300+ industry groups*– Approximately 800 detailed occupations
(SOC)– Broken down by MSA—aggregated Statewide
and Nationwide*using 4-digit and 5-digit NAICS codes
Frame Development• Quarterly Census of Employment and Wages (QCEW)
– Collects non-railroad data for all business establishments for 50 States + DC, PR, USVI
– Data includes pertinent information for each establishment such as: Trade Name, Legal Name, Address information, and Monthly Employment for the past 12 months
– Data compiled into Bureau’s Longitudinal Database (LDB)• Railroad Frame File
– Collected by Bureau’s Office of Safety and Health (OSH)• Guam Frame File
– Collected by one of the BLS Regional OfficesAll three elements combined; OES Frame≈6.7 million
business establishments
Frame Stratification• Frame initially stratified geographically
– Approximately 600 geographic areas• Approximately 400 State/Metropolitan Statistical Areas (MSAs)• Approximately 200 Non-MSA Areas (“rural”)
• Frame further stratified by detailed industry (NAICS 4-digit, selected NAICS 5-digit)– Approximately 350 industries– Industry is related to occupation
• Approximately 170,000 total non-empty strata– Each business establishment in the nation fits into exactly one of
these defined strata– Each non-empty stratum contains one business establishment to
hundreds of business establishments
Frame Stratification
State 1
MSA X MSA Y
Industry 1 Industry 2Industry 1 Industry 2
State 2
MSA X MSA Z
Industry 1
Industry 2
Industry 1
Industry 2
Sample Requirements• Sample allocated by stratum• Sample Allocation≈1.2 million establishments• Individual State Sample Sizes (∑≈1.2 million)
– Confidential value for each State– Based on State employment population– Last modified in 1996Example:Hypothetically (exact values are confidential):State State Sample SizeCalifornia 120,000Texas 100,000New York 100,000Florida 85,000 And so forth… Σ≈1.2 million
Prior Allocation Design“Proportional-to-Employment”
• Maximum Employment– Maximum monthly employment value in LDB for each
establishmentSTEPS:1. Sum max employment values across stratum, Nh
2. Sum max employment values across state, ΣNh
3. Look up Individual State Sample Size, n4. Calculate stratum allocation: nh=n∙(Nh/ΣNh)5. Repeat calculation for all strata, approx. 170,000 times
Note: n may require iterative reduction to work minimum sample allocation requirements for each cell.
Prior Allocation Design
• Advantages– Simple– Strata with larger populations are allocated
more sample• Is this necessarily an advantage?
Prior Allocation Design
“A sample should allocate most heavily to those strata where the least amount of certainty exists.”
Causes for uncertainty (less reliability) within a sampled stratum:
• Undersampling a large population• Undersampling where there is a large
variability in occupations
Prior Allocation Design
• Disadvantage– Estimates in smaller strata that have large
occupational variability may not be reliable due to allocation of smaller sample size
Prior Allocation Design
Accomodations/Food Services Industry
• 90% of all employees work in 88 occupations
• 12.8 million workers in this industry
Wholesale Trade Industry
• 90% of all employees work in 175 occupations
• 6.1 million workers in this industry
EXAMPLE
Which of these cells should be allocated more sample?
Using “Proportional Allocation”:
Accom/Food Services Wholesale Trade
120,000 establishments 72,000 establishments
Current Allocation Design
Neyman Allocation
H
1hhh
hhh
SNSNnn
n=Individual State “fixed” sample sizeNh = sum of stratum frame employeesSh represents an occupational variability measure within a stratumOccupations for each stratum (or cell)
obtained from recent estimatesfile; weighted data
Denominator summed overall by state
Current Allocation Design
Neyman Allocation Proportional Allocation
H
1hhh
hhh
SNSNnn
H
1hh
hh
NNnn
“Occupational Variability” measure; notice that the “adjustment” from the Proportional Allocation formula.
Calculating Sh
1. Calculate a “coefficient of variation” for each occupation within an industry.
2. Determine 90th-percentile of occupations within each industry.
3. Sh (for each industry) is calculated by obtaining the weighted mean of CVs for the 90th-percentile of occupations within each industry.
Calculating Sh
Step 1: Calculating a “coefficient of variation” for each occupation within stratum– Using most recent weighted estimates file:
• Count # of employees in each occupation for each business establishment (call this yi)
• Count # of employees total for each business establishment (call this xi)
• Sample weight, wi, represents the number of business establishments that each establishment on the estimates file (i) represents
• Create a “weighted ratio”Rw=Σ(wi∙yi)/Σ(wi∙xi); summed over a defined cell– Note: This ratio is the ratio of occupational employment to overall
employment; ratio will always be ≤ 1.
Calculating Sh
• CV formula (unweighted)– Derived from variance formula– Relative variance (CV2) for an original variate Yi:
– Using a little algebra (remember R=y/x):
2
N
i
2i
2Y22
Y Y)1N(
YY
YSCV
RSx
1
xRS
ySCV
yyyY
R1N
xRy
x1
CV
N
1i
2ii
Y
Calculating Sh
w
ii
n
1i
2iwii
Y R
1wxRyw
x1
CVR
• CV formula (for each defined “Sh cell”), summed by cell (including weights):
• Note: x-bar is a weighted average.
n
1i
n
iii
w
xwx
Calculating ShEXAMPLE (hypothetical cell w/ sampled 2 business establishments)• Restaurant ABC; represents 5 businesses• What is ABC’s weight?• Restaurant XYZ; represents itself (1 business)• What is XYZ’s weight?
ABC’s Staffing Pattern
Occupation # employed
Waitress/Waiter 8
Cook 4
Dishwasher 2
Janitor 1
Manager 1
TOTAL 16XYZ’s Staffing Pattern
Occupation # employed
Waitress/Waiter 32
Cook 15
Dishwasher 10
Manager 3
TOTAL 60
Calculations for ABC
Waitress/Waiter Cook Dishwasher Janitor Manager
yi = 8 yi = 4 yi = 2 yi = 1 yi = 1
wiyi=5∙8=40 wiyi=5∙4=20 wiyi=5∙2=10 wiyi=5∙1=5 wiyi=5∙1=5
xi = 16 xi = 16 xi = 16 xi = 16 xi = 16
wixi=5∙16=80 wixi=5∙16=80 wixi=5∙16=80 wixi=5∙16=80 wixi=5∙16=80
Calculations for XYZ
Waitress/Waiter Cook Dishwasher Manager
yi = 32 yi = 15 yi = 10 yi = 3
wiyi=1∙32=32 wiyi=1∙15=15 wiyi=1∙10=10 wiyi=1∙3=3
xi = 60 xi = 60 xi = 60 xi = 60
wixi=1∙60=60 wixi=1∙60=60 wixi=1∙60=60 wixi=1∙60=60
Calculating Sh
w
ii
n
1i
2iwii
Y R
1wxRyw
x1
CVR
ABCyi=8
wiyi=5∙8=40
xi=16
wixi=5∙16=80
XYZyi=32
wiyi=1∙32=32
xi=60
wixi=1∙60=60 Example: CVs for Occupations
Occupation CV
Waitress/Waiter 0.060
Cook 0
Dishwasher 0.271
Janitor 1.626
Manager 0.203
Waitress/Waiter
060.0140
324016
14032406032140
32408040
1560801
CV
22
YR
The smaller the CV value, the less diverse the occupation is within the defined cell.
Step 2: Avoiding “atypical” occupations within each cell:
• Conservative approach: utilize 90th-percentile until further research is done
• Exclude bottom 10th percentile of occupations
Calculating Sh
Calculating Sh
Step 3: A CV is created for each occupation within a defined cell—How are occupations within a cell “combined” to create one value for the cell?– Weighted mean of 90th-percentile occupations
• Obtain occupational proportion for each cell• Obtain Sh by calculating weighted mean of the top-
90th-percentile of occupations– Less prevalent (bottom 10%) occupations are eliminated– Sh=weighted mean of 90th-percentile CVs within defined cell
Calculating Sh
Example (sorted in “proportional order”)
90th percentile
(Look at proportions)
• 90th-percentile Occupations– Weighted mean=Sh=Σ ”products”
≈ 0.03 + 0 + 0.04 = 0.07
Weighted Mean of CVs of All Occupations
Occupation CV Proportion Product
Waitress/Waiter 0.060 72/140≈0.51 0.060*0.51≈0.03
Cook 0 35/140=0.25 0*0.25=0
Dishwasher 0.271 20/140≈0.14 0.271*0.14≈0.04
Manager 0.203 8/140≈0.06 0.203*0.06≈0.01
Janitor 1.626 5/140≈0.04 1.626*0.04≈0.07
Calculating Sh
Defining Sh “cell”– Normality of individual CVs– Sufficient amount of data to create reliable estimate of
occupational variability (Sh)
Calculating Sh
Aggregation by National Industry (Industry-only)Concerns:– Assumption that national aggregates of industry will produce
accurate CVs and Sh values• Aggregation necessary due to lack of data for finely-detailed cells• 88.6% of industry MSA-BOS staffing patterns were similar to
corresponding nationally-aggregated industry staffing patterns (α=0.10)
Calculating Sh
Reliability• Problem of small populations in geographic areas
• Desire to produce similar reliability in large and small areas– Example: Utilizing the Neyman Allocation method illustrated,
Chicago takes up approximately 54% if Illinois’s sample allocation; this may lead to a possible unreliable sample in non-Chicago areas within Illinois
ReliabilityHow to “spread out” sample allocation?Bankier (1988): Power Allocations: Determining Sample Sizes for
Subnational Areas• Adjust exponent for Nh (numerator and denominator) in the
Neyman Allocation• Drops Chicago’s value to approx. 34% of IL’s sample allocation
H
hhh
hhh
SN
SNnn
1
Nh = sum of stratum frame employeesSh represents an occupational variability measure within a stratumOccupations for each stratum (or cell) obtained from recent estimates file; weighted dataDenominator summed overall by state
Total Allocation for Illinois
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Allo
catio
n
Neyman 90thNeyman 90th(SqRoot)
Reliability
Alternative Allocation Design for the Occupational Employment
Statistics (OES) Survey
QUESTIONS?
lawley.ernest@bls.gov