3.11.19. statsmodels.stats.sandwich_covariance¶
Sandwich covariance estimators
Created on Sun Nov 27 14:10:57 2011
Author: Josef Perktold Author: Skipper Seabold for HCxxx in linear_model.RegressionResults License: BSD-3
3.11.19.1. Notes¶
for calculating it, we have two versions
version 1: use pinv pinv(x) scale pinv(x) used currently in linear_model, with scale is 1d (or diagonal matrix) (x’x)^(-1) x’ scale x (x’x)^(-1), scale in general is (nobs, nobs) so pretty large general formulas for scale in cluster case are in http://pubs.amstat.org/doi/abstract/10.1198/jbes.2010.07136 which also has the second version
version 2: (x’x)^(-1) S (x’x)^(-1) with S = x’ scale x, S is (kvar,kvars), (x’x)^(-1) is available as normalized_covparams.
S = sum (x*u) dot (x*u)’ = sum x*u*u’*x’ where sum here can aggregate over observations or groups. u is regression residual.
x is (nobs, k_var) u is (nobs, 1) x*u is (nobs, k_var)
For cluster robust standard errors, we first sum (x*w) over other groups (including time) and then take the dot product (sum of outer products)
S = sum_g(x*u)’ dot sum_g(x*u) For HAC by clusters, we first sum over groups for each time period, and then use HAC on the group sums of (x*w). If we have several groups, we have to sum first over all relevant groups, and then take the outer product sum. This can be done by summing using indicator functions or matrices or with explicit loops. Alternatively we calculate separate covariance matrices for each group, sum them and subtract the duplicate counted intersection.
Not checked in details yet: degrees of freedom or small sample correction factors, see (two) references (?)
This is the general case for MLE and GMM also
in MLE hessian H, outerproduct of jacobian S, cov_hjjh = HJJH, which reduces to the above in the linear case, but can be used generally, e.g. in discrete, and is misnomed in GenericLikelihoodModel
in GMM it’s similar but I would have to look up the details, (it comes out in sandwich form by default, it’s in the sandbox), standard Newey West or similar are on the covariance matrix of the moment conditions
quasi-MLE: MLE with mis-specified model where parameter estimates are fine (consistent ?) but cov_params needs to be adjusted similar or same as in sandwiches. (I didn’t go through any details yet.)
3.11.19.2. TODO¶
small sample correction factors, Done for cluster, not yet for HAC
automatic lag-length selection for Newey-West HAC, -> added: nlag = floor[4(T/100)^(2/9)] Reference: xtscc paper, Newey-West
note this will not be optimal in the panel context, see Peterson
HAC should maybe return the chosen nlags
get consistent notation, varies by paper, S, scale, sigma?
replace diag(hat_matrix) calculations in cov_hc2, cov_hc3
3.11.19.3. References¶
John C. Driscoll and Aart C. Kraay, “Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data,” Review of Economics and Statistics 80, no. 4 (1998): 549-560.
Daniel Hoechle, “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence”, The Stata Journal
Mitchell A. Petersen, “Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches,” Review of Financial Studies 22, no. 1 (January 1, 2009): 435 -480.
A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller, “Robust Inference With Multiway Clustering,” Journal of Business and Economic Statistics 29 (April 2011): 238-249.
not used yet: A.C. Cameron, J.B. Gelbach, and D.L. Miller, “Bootstrap-based improvements for inference with clustered errors,” The Review of Economics and Statistics 90, no. 3 (2008): 414–427.
3.11.19.4. Functions¶
S_crosssection (x, group) |
inner covariance matrix for White on group sums sandwich |
S_hac_groupsum (x, time[, nlags, weights_func]) |
inner covariance matrix for HAC over group sums sandwich |
S_hac_simple (x[, nlags, weights_func]) |
inner covariance matrix for HAC (Newey, West) sandwich |
S_nw_panel (xw, weights, groupidx) |
inner covariance matrix for HAC for panel data |
S_white_simple (x) |
inner covariance matrix for White heteroscedastistity sandwich |
cov_cluster (results, group[, use_correction]) |
cluster robust covariance matrix |
cov_cluster_2groups (results, group[, ...]) |
cluster robust covariance matrix for two groups/clusters |
cov_crosssection_0 (results, group) |
this one is still wrong, use cov_cluster instead |
cov_hac (results[, nlags, weights_func, ...]) |
heteroscedasticity and autocorrelation robust covariance matrix (Newey-West) |
cov_hac_simple (results[, nlags, ...]) |
heteroscedasticity and autocorrelation robust covariance matrix (Newey-West) |
cov_hc0 (results) |
See statsmodels.RegressionResults |
cov_hc1 (results) |
See statsmodels.RegressionResults |
cov_hc2 (results) |
See statsmodels.RegressionResults |
cov_hc3 (results) |
See statsmodels.RegressionResults |
cov_nw_groupsum (results, nlags, time[, ...]) |
Driscoll and Kraay Panel robust covariance matrix |
cov_nw_panel (results, nlags, groupidx[, ...]) |
Panel HAC robust covariance matrix |
cov_white_simple (results[, use_correction]) |
heteroscedasticity robust covariance matrix (White) |
group_sums (x, group) |
sum x for each group, simple bincount version, again |
lagged_groups (x, lag, groupidx) |
assumes sorted by time, groupidx is tuple of start and end values |
se_cov (cov) |
get standard deviation from covariance matrix |
weights_bartlett (nlags) |
Bartlett weights for HAC |
weights_uniform (nlags) |
uniform weights for HAC |