OEUS Lawley

transcript

Alternative Allocation Design for the Occupational Employment

Statistics (OES) Survey Ernest Lawley, Bureau of Labor Statistics

Marie C. Stetser, Bureau of Labor StatisticsDr. Eduardas Valaitis, American University

OEUS ANNUAL MEETING 2007Washington, DC

Statistics (OES) Survey• Occupational Employment Statistics (OES)

Survey• Frame Development• Frame Stratification• Sample Requirements• Prior Allocation Design• Current Allocation Design• Calculating Sh (standard error)• Reliability

OES Survey

• Partnership with 50 States + DC, Guam, Puerto Rico, US Virgin Islands

• Measures occupational employment and wages within 300+ industry groups*– Approximately 800 detailed occupations

(SOC)– Broken down by MSA—aggregated Statewide

and Nationwide*using 4-digit and 5-digit NAICS codes

Frame Development• Quarterly Census of Employment and Wages (QCEW)

– Collects non-railroad data for all business establishments for 50 States + DC, PR, USVI

– Data includes pertinent information for each establishment such as: Trade Name, Legal Name, Address information, and Monthly Employment for the past 12 months

– Data compiled into Bureau’s Longitudinal Database (LDB)• Railroad Frame File

– Collected by Bureau’s Office of Safety and Health (OSH)• Guam Frame File

– Collected by one of the BLS Regional OfficesAll three elements combined; OES Frame≈6.7 million

business establishments

Frame Stratification• Frame initially stratified geographically

– Approximately 600 geographic areas• Approximately 400 State/Metropolitan Statistical Areas (MSAs)• Approximately 200 Non-MSA Areas (“rural”)

• Frame further stratified by detailed industry (NAICS 4-digit, selected NAICS 5-digit)– Approximately 350 industries– Industry is related to occupation

• Approximately 170,000 total non-empty strata– Each business establishment in the nation fits into exactly one of

these defined strata– Each non-empty stratum contains one business establishment to

hundreds of business establishments

Frame Stratification

State 1

MSA X MSA Y

Industry 1 Industry 2Industry 1 Industry 2

State 2

MSA X MSA Z

Industry 1

Industry 2

Industry 1

Industry 2

Sample Requirements• Sample allocated by stratum• Sample Allocation≈1.2 million establishments• Individual State Sample Sizes (∑≈1.2 million)

– Confidential value for each State– Based on State employment population– Last modified in 1996Example:Hypothetically (exact values are confidential):State State Sample SizeCalifornia 120,000Texas 100,000New York 100,000Florida 85,000 And so forth… Σ≈1.2 million

Prior Allocation Design“Proportional-to-Employment”

• Maximum Employment– Maximum monthly employment value in LDB for each

establishmentSTEPS:1. Sum max employment values across stratum, Nh

2. Sum max employment values across state, ΣNh

3. Look up Individual State Sample Size, n4. Calculate stratum allocation: nh=n∙(Nh/ΣNh)5. Repeat calculation for all strata, approx. 170,000 times

Note: n may require iterative reduction to work minimum sample allocation requirements for each cell.

Prior Allocation Design

• Advantages– Simple– Strata with larger populations are allocated

more sample• Is this necessarily an advantage?

“A sample should allocate most heavily to those strata where the least amount of certainty exists.”

Causes for uncertainty (less reliability) within a sampled stratum:

• Undersampling a large population• Undersampling where there is a large

variability in occupations

• Disadvantage– Estimates in smaller strata that have large

occupational variability may not be reliable due to allocation of smaller sample size

Accomodations/Food Services Industry

• 90% of all employees work in 88 occupations

• 12.8 million workers in this industry

Wholesale Trade Industry

• 90% of all employees work in 175 occupations

• 6.1 million workers in this industry

EXAMPLE

Which of these cells should be allocated more sample?

Using “Proportional Allocation”:

Accom/Food Services Wholesale Trade

120,000 establishments 72,000 establishments

Current Allocation Design

Neyman Allocation

SNSNnn

n=Individual State “fixed” sample sizeNh = sum of stratum frame employeesSh represents an occupational variability measure within a stratumOccupations for each stratum (or cell)

obtained from recent estimatesfile; weighted data

Denominator summed overall by state

Current Allocation Design

Neyman Allocation Proportional Allocation

SNSNnn

“Occupational Variability” measure; notice that the “adjustment” from the Proportional Allocation formula.

Calculating Sh

1. Calculate a “coefficient of variation” for each occupation within an industry.

2. Determine 90th-percentile of occupations within each industry.

3. Sh (for each industry) is calculated by obtaining the weighted mean of CVs for the 90th-percentile of occupations within each industry.

Calculating Sh

Step 1: Calculating a “coefficient of variation” for each occupation within stratum– Using most recent weighted estimates file:

• Count # of employees in each occupation for each business establishment (call this yi)

• Count # of employees total for each business establishment (call this xi)

• Sample weight, wi, represents the number of business establishments that each establishment on the estimates file (i) represents

• Create a “weighted ratio”Rw=Σ(wi∙yi)/Σ(wi∙xi); summed over a defined cell– Note: This ratio is the ratio of occupational employment to overall

employment; ratio will always be ≤ 1.

Calculating Sh

• CV formula (unweighted)– Derived from variance formula– Relative variance (CV2) for an original variate Yi:

– Using a little algebra (remember R=y/x):

Y Y)1N(

Calculating Sh

1wxRyw

• CV formula (for each defined “Sh cell”), summed by cell (including weights):

• Note: x-bar is a weighted average.

Calculating ShEXAMPLE (hypothetical cell w/ sampled 2 business establishments)• Restaurant ABC; represents 5 businesses• What is ABC’s weight?• Restaurant XYZ; represents itself (1 business)• What is XYZ’s weight?

ABC’s Staffing Pattern

Occupation # employed

Waitress/Waiter 8

Cook 4

Dishwasher 2

Janitor 1

Manager 1

TOTAL 16XYZ’s Staffing Pattern

Occupation # employed

Waitress/Waiter 32

Cook 15

Dishwasher 10

Manager 3

TOTAL 60

Calculations for ABC

Waitress/Waiter Cook Dishwasher Janitor Manager

yi = 8 yi = 4 yi = 2 yi = 1 yi = 1

wiyi=5∙8=40 wiyi=5∙4=20 wiyi=5∙2=10 wiyi=5∙1=5 wiyi=5∙1=5

xi = 16 xi = 16 xi = 16 xi = 16 xi = 16

wixi=5∙16=80 wixi=5∙16=80 wixi=5∙16=80 wixi=5∙16=80 wixi=5∙16=80

Calculations for XYZ

Waitress/Waiter Cook Dishwasher Manager

yi = 32 yi = 15 yi = 10 yi = 3

wiyi=1∙32=32 wiyi=1∙15=15 wiyi=1∙10=10 wiyi=1∙3=3

xi = 60 xi = 60 xi = 60 xi = 60

wixi=1∙60=60 wixi=1∙60=60 wixi=1∙60=60 wixi=1∙60=60

Calculating Sh

1wxRyw

ABCyi=8

wiyi=5∙8=40

wixi=5∙16=80

XYZyi=32

wiyi=1∙32=32

wixi=1∙60=60 Example: CVs for Occupations

Occupation CV

Waitress/Waiter 0.060

Cook 0

Dishwasher 0.271

Janitor 1.626

Manager 0.203

Waitress/Waiter

060.0140

324016

14032406032140

32408040

1560801

The smaller the CV value, the less diverse the occupation is within the defined cell.

Step 2: Avoiding “atypical” occupations within each cell:

• Conservative approach: utilize 90th-percentile until further research is done

• Exclude bottom 10th percentile of occupations

Calculating Sh

Step 3: A CV is created for each occupation within a defined cell—How are occupations within a cell “combined” to create one value for the cell?– Weighted mean of 90th-percentile occupations

• Obtain occupational proportion for each cell• Obtain Sh by calculating weighted mean of the top-

90th-percentile of occupations– Less prevalent (bottom 10%) occupations are eliminated– Sh=weighted mean of 90th-percentile CVs within defined cell

Calculating Sh

Example (sorted in “proportional order”)

90th percentile

(Look at proportions)

• 90th-percentile Occupations– Weighted mean=Sh=Σ ”products”

≈ 0.03 + 0 + 0.04 = 0.07

Weighted Mean of CVs of All Occupations

Occupation CV Proportion Product

Waitress/Waiter 0.060 72/140≈0.51 0.060*0.51≈0.03

Cook 0 35/140=0.25 0*0.25=0

Dishwasher 0.271 20/140≈0.14 0.271*0.14≈0.04

Manager 0.203 8/140≈0.06 0.203*0.06≈0.01

Janitor 1.626 5/140≈0.04 1.626*0.04≈0.07

Calculating Sh

Defining Sh “cell”– Normality of individual CVs– Sufficient amount of data to create reliable estimate of

occupational variability (Sh)

Calculating Sh

Aggregation by National Industry (Industry-only)Concerns:– Assumption that national aggregates of industry will produce

accurate CVs and Sh values• Aggregation necessary due to lack of data for finely-detailed cells• 88.6% of industry MSA-BOS staffing patterns were similar to

corresponding nationally-aggregated industry staffing patterns (α=0.10)

Calculating Sh

Reliability• Problem of small populations in geographic areas

• Desire to produce similar reliability in large and small areas– Example: Utilizing the Neyman Allocation method illustrated,

Chicago takes up approximately 54% if Illinois’s sample allocation; this may lead to a possible unreliable sample in non-Chicago areas within Illinois

ReliabilityHow to “spread out” sample allocation?Bankier (1988): Power Allocations: Determining Sample Sizes for

Subnational Areas• Adjust exponent for Nh (numerator and denominator) in the

Neyman Allocation• Drops Chicago’s value to approx. 34% of IL’s sample allocation

Nh = sum of stratum frame employeesSh represents an occupational variability measure within a stratumOccupations for each stratum (or cell) obtained from recent estimates file; weighted dataDenominator summed overall by state

Total Allocation for Illinois

Neyman 90thNeyman 90th(SqRoot)

Reliability

Statistics (OES) Survey

QUESTIONS?

lawley.ernest@bls.gov

OEUS Lawley

Documents