Investigation of Treatment of Influential Values
Mary H. Mulry
Roxanne M. Feldpausch
Outline
• Current practices
• Methods investigated
• Results
• Next steps
Influential Observation
An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)
The Data - U.S. Monthly Retail Trade Survey
• Collect sales and inventories• Monthly survey of about 12,500 retail
business with paid employees• Sample selected every 5 years
– Sample is stratified based on industry and sales
– Quarterly sample of births– Deaths are removed
The Data
• Analysis done at published NAICS level
• Hidiroglou-Berthelot algorithm ran on the data before looking for influential values
• Horvitz-Thompson estimator
Causes of Influential Units
• One time or rare event
• Erroneous measure of size
• Change in the make-up of the unit
• Seasonal Businesses
Current Practices
• Analyst review an effect listing of micro level data and investigates units that may be influential
• When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician
Current Practices
• One time influential value– Imputation
• Recurring influential value– Weight adjustment based on the principles
of representativeness– Moving the unit to a different industry
when the nature of the business changes
Goals
• To improve upon current methodology by making it more objective and rigorous
• To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total
Assumptions
• Influential observations occur infrequently, but are problematic when they appear.
• The influential observation is true, although unusual. It is not the result of a reporting or coding error.
Strategy
Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value
Evaluation Criteria
• Number of influential observations detected, including the number of true and false detections made
• Estimate of bias
• Impact on month-to-month change
Notation
where
Yi is the sales for the i-th business in a survey sample of size n
wi is the sample weight for the i-th unit
Xi is the previous month’s sales for the ith business
i
n
iiYwY
1
ˆ
Methods Examined
• Weight trimming
• Reverse calibration
• Winsorization
• Generalized M-estimation
Weight Trimming
• Does not identify influential units
• Adjusts the weight of the observation
Weight Trimming
• Truncate the weight of the influential observation
• Adjust the weights of the non-influential observations to account for the remainder of the truncated weight
• Sum of the new weights is the same as the sum of the original weights
(Potter 1990)
Weight Trimming Notes
• Calculations were done within sample stratum.
• Choice of correction factor could be investigated. We arbitrarily chose ci=wi/3.
Reverse Calibration
• Does not identify influential units
• Adjusts the value of the observation
Reverse Calibration
1. Use a robust estimation method to estimate the total
2. Modify the influential observations to achieve that total
(Chambers and Ren 2004)
Winsorization
• Identifies influential units
• Adjusts the value of the observation
Winsorization
Type I
Type II
otherwiseY
KYKY
i
ii
,
*,
otherwiseY
KYKYKY
i
iiw
ii
,
1*
),(
Winsorization – Defining K
• Define a separate Kh for each stratum in a manner than minimizes the mse (Kokic and Bell 1994)
• Define a separate Ki for each observation in a manner that minimizes the mse (Clarke 1995)
Winsorization – Defining K
• Use unweighted data to define Kh for each stratum where Kh = h +2sh
• Use weighted data to define Kh for each stratum where Kh = h +2sh where h and sh are based on the weighted data
Winsorization-Our Implementation
Used a robust regression in SAS to estimate the parameters needed in the calculations
M-estimation
M-estimators are robust estimators that come from a generalization of maximum likelihood estimation
M-estimation
• Identifies influential units
• Adjusts either the weight or the value of the influential observation
M-estimation
Used a weighted M-estimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)
Results
Number of Outliers Detected
Weight trimming 1*Winsor by stratum 51Winsor by obs 1Winsor +2s 0Winsor wgt +2s 4Reverse Calibration 1*M-estimation obs 1M-estimation wgt 1
*Method does not detect outliers, one outlier was specified
Replacement Values (in Millions)
*Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values
Value WeightWeighted
Valueprevious month 0.6 55 31current month 7.5 55 413Weight trimming* 7.5 18 135Winsor by obs 4.0 55 220Winsor wgt +2s ** 1.6 55 87M-estimation obs 4.3 55 234M-estimation wgt 7.5 30 225
Total Sales for the IndustryTotal
(billions)Month-to-month percent change
previous month 42.4current month 38.6 -9.1weight trimming 38.3 -9.7Winsor by obs 38.5 -9.5Winsor wgt +2s 38.2 -9.9M-estimation obs 38.4 -9.5M-estimation wgt 38.4 -9.5
Chosen for Further Study
• Winsorization by each observation
• M-estimation by observation
• M-estimation by weight