3.11.19. statsmodels.stats.sandwich_covariance

Sandwich covariance estimators

Created on Sun Nov 27 14:10:57 2011

Author: Josef Perktold Author: Skipper Seabold for HCxxx in linear_model.RegressionResults License: BSD-3

3.11.19.1. Notes

for calculating it, we have two versions

version 1: use pinv pinv(x) scale pinv(x) used currently in linear_model, with scale is 1d (or diagonal matrix) (x’x)^(-1) x’ scale x (x’x)^(-1), scale in general is (nobs, nobs) so pretty large general formulas for scale in cluster case are in http://pubs.amstat.org/doi/abstract/10.1198/jbes.2010.07136 which also has the second version

version 2: (x’x)^(-1) S (x’x)^(-1) with S = x’ scale x, S is (kvar,kvars), (x’x)^(-1) is available as normalized_covparams.

S = sum (x*u) dot (x*u)’ = sum x*u*u’*x’ where sum here can aggregate over observations or groups. u is regression residual.

x is (nobs, k_var) u is (nobs, 1) x*u is (nobs, k_var)

For cluster robust standard errors, we first sum (x*w) over other groups (including time) and then take the dot product (sum of outer products)

S = sum_g(x*u)’ dot sum_g(x*u) For HAC by clusters, we first sum over groups for each time period, and then use HAC on the group sums of (x*w). If we have several groups, we have to sum first over all relevant groups, and then take the outer product sum. This can be done by summing using indicator functions or matrices or with explicit loops. Alternatively we calculate separate covariance matrices for each group, sum them and subtract the duplicate counted intersection.

Not checked in details yet: degrees of freedom or small sample correction factors, see (two) references (?)

This is the general case for MLE and GMM also

in MLE hessian H, outerproduct of jacobian S, cov_hjjh = HJJH, which reduces to the above in the linear case, but can be used generally, e.g. in discrete, and is misnomed in GenericLikelihoodModel

in GMM it’s similar but I would have to look up the details, (it comes out in sandwich form by default, it’s in the sandbox), standard Newey West or similar are on the covariance matrix of the moment conditions

quasi-MLE: MLE with mis-specified model where parameter estimates are fine (consistent ?) but cov_params needs to be adjusted similar or same as in sandwiches. (I didn’t go through any details yet.)

3.11.19.2. TODO

  • small sample correction factors, Done for cluster, not yet for HAC

  • automatic lag-length selection for Newey-West HAC, -> added: nlag = floor[4(T/100)^(2/9)] Reference: xtscc paper, Newey-West

    note this will not be optimal in the panel context, see Peterson

  • HAC should maybe return the chosen nlags

  • get consistent notation, varies by paper, S, scale, sigma?

  • replace diag(hat_matrix) calculations in cov_hc2, cov_hc3

3.11.19.3. References

John C. Driscoll and Aart C. Kraay, “Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data,” Review of Economics and Statistics 80, no. 4 (1998): 549-560.

Daniel Hoechle, “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence”, The Stata Journal

Mitchell A. Petersen, “Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches,” Review of Financial Studies 22, no. 1 (January 1, 2009): 435 -480.

A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller, “Robust Inference With Multiway Clustering,” Journal of Business and Economic Statistics 29 (April 2011): 238-249.

not used yet: A.C. Cameron, J.B. Gelbach, and D.L. Miller, “Bootstrap-based improvements for inference with clustered errors,” The Review of Economics and Statistics 90, no. 3 (2008): 414–427.

3.11.19.4. Functions

S_crosssection(x, group) inner covariance matrix for White on group sums sandwich
S_hac_groupsum(x, time[, nlags, weights_func]) inner covariance matrix for HAC over group sums sandwich
S_hac_simple(x[, nlags, weights_func]) inner covariance matrix for HAC (Newey, West) sandwich
S_nw_panel(xw, weights, groupidx) inner covariance matrix for HAC for panel data
S_white_simple(x) inner covariance matrix for White heteroscedastistity sandwich
cov_cluster(results, group[, use_correction]) cluster robust covariance matrix
cov_cluster_2groups(results, group[, ...]) cluster robust covariance matrix for two groups/clusters
cov_crosssection_0(results, group) this one is still wrong, use cov_cluster instead
cov_hac(results[, nlags, weights_func, ...]) heteroscedasticity and autocorrelation robust covariance matrix (Newey-West)
cov_hac_simple(results[, nlags, ...]) heteroscedasticity and autocorrelation robust covariance matrix (Newey-West)
cov_hc0(results) See statsmodels.RegressionResults
cov_hc1(results) See statsmodels.RegressionResults
cov_hc2(results) See statsmodels.RegressionResults
cov_hc3(results) See statsmodels.RegressionResults
cov_nw_groupsum(results, nlags, time[, ...]) Driscoll and Kraay Panel robust covariance matrix
cov_nw_panel(results, nlags, groupidx[, ...]) Panel HAC robust covariance matrix
cov_white_simple(results[, use_correction]) heteroscedasticity robust covariance matrix (White)
group_sums(x, group) sum x for each group, simple bincount version, again
lagged_groups(x, lag, groupidx) assumes sorted by time, groupidx is tuple of start and end values
se_cov(cov) get standard deviation from covariance matrix
weights_bartlett(nlags) Bartlett weights for HAC
weights_uniform(nlags) uniform weights for HAC

3.11.19.5. Classes

Group(group[, name])