6.8.5. statsmodels.sandbox.stats.multicomp

from pystatsmodels mailinglist 20100524

Notes:
  • unfinished, unverified, but most parts seem to work in MonteCarlo
  • one example taken from lecture notes looks ok
  • needs cases with non-monotonic inequality for test to see difference between one-step, step-up and step-down procedures
  • FDR doesn’t look really better then Bonferoni in the MC examples that I tried
update:
  • now tested against R, stats and multtest, I have all of their methods for p-value correction
  • getting Hommel was impossible until I found reference for pvalue correction
  • now, since I have p-values correction, some of the original tests (rej/norej) implementation is not really needed anymore. I think I keep it for reference. Test procedure for Hommel in development session log
  • I haven’t updated other functions and classes in here. - multtest has some good helper function according to docs
  • still need to update references, the real papers
  • fdr with estimated true hypothesis still missing
  • multiple comparison procedures incomplete or missing
  • I will get multiple comparison for now only for independent case, which might be conservative in correlated case (?).

some References:

Gibbons, Jean Dickinson and Chakraborti Subhabrata, 2003, Nonparametric Statistical Inference, Fourth Edition, Marcel Dekker

p.363: 10.4 THE KRUSKAL-WALLIS ONE-WAY ANOVA TEST AND MULTIPLE COMPARISONS p.367: multiple comparison for kruskal formula used in multicomp.kruskal

Sheskin, David J., 2004, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd ed., Chapman&Hall/CRC

Test 21: The Single-Factor Between-Subjects Analysis of Variance Test 22: The Kruskal-Wallis One-Way Analysis of Variance by Ranks Test

Zwillinger, Daniel and Stephen Kokoska, 2000, CRC standard probability and statistics tables and formulae, Chapman&Hall/CRC

14.9 WILCOXON RANKSUM (MANN WHITNEY) TEST
  1. Paul Wright, Adjusted P-Values for Simultaneous Inference, Biometrics

    Vol. 48, No. 4 (Dec., 1992), pp. 1005-1013, International Biometric Society Stable URL: http://www.jstor.org/stable/2532694

(p-value correction for Hommel in appendix)

for multicomparison

new book “multiple comparison in R” Hsu is a good reference but I don’t have it.

Author: Josef Pktd and example from H Raja and rewrite from Vincent Davis

6.8.5.1. TODO

  • handle exception if empty, shows up only sometimes when running this
  • DONE I think
Traceback (most recent call last):
File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 711, in <module>
print(‘sh’, multipletests(tpval, alpha=0.05, method=’sh’)
File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 241, in multipletests
rejectmax = np.max(np.nonzero(reject))

File “C:ProgramsPython25libsite-packages

umpycore romnumeric.py”, line 1765, in amax

return _wrapit(a, ‘max’, axis, out)

File “C:ProgramsPython25libsite-packages

umpycore romnumeric.py”, line 37, in _wrapit

result = getattr(asarray(obj),method)(*args, **kwds)

ValueError: zero-size array to ufunc.reduce without identity

  • name of function multipletests, rename to something like pvalue_correction?

6.8.5.2. Functions

Tukeythreegene(first, second, third)
Tukeythreegene2(genes) gend is a list, ie [first, second, third]
catstack(args)
compare_ordered(vals, alpha) simple ordered sequential comparison of means
contrast_all_one(nm) contrast or restriction matrix for all against first comparison
contrast_allpairs(nm) contrast or restriction matrix for all pairs of nm variables
contrast_diff_mean(nm) contrast or restriction matrix for all against mean comparison
distance_st_range(mean_all, nobs_all, var_all) pairwise distance matrix, outsourced from tukeyhsd
ecdf(x) no frills empirical cdf used in fdrcorrection
fdrcorrection0(pvals[, alpha, method, is_sorted]) pvalue correction for false discovery rate
fdrcorrection_bak(pvals[, alpha, method]) Reject False discovery rate correction for pvalues
fdrcorrection_twostage(pvals[, alpha, ...]) (iterated) two stage linear step-up procedure with estimation of number of true
get_tukeyQcrit(k, df[, alpha]) return critical values for Tukey’s HSD (Q)
get_tukeyQcrit2(k, df[, alpha]) return critical values for Tukey’s HSD (Q)
homogeneous_subsets(vals, dcrit) recursively check all pairs of vals for minimum distance
maxzero(x) find all up zero crossings and return the index of the highest
maxzerodown(x) find all up zero crossings and return the index of the highest
mcfdr([nrepl, nobs, ntests, ntrue, mu, ...]) MonteCarlo to test fdrcorrection
multicontrast_pvalues(tstat, tcorr[, df, ...]) pvalues for simultaneous tests
multipletests(pvals[, alpha, method, ...]) test results and p-value correction for multiple tests
randmvn(rho[, size, standardize]) create random draws from equi-correlated multivariate normal distribution
rankdata(x) rankdata, equivalent to scipy.stats.rankdata
rejectionline(n[, alpha]) reference line for rejection in multiple tests
set_partition(ssli) extract a partition from a list of tuples
set_remove_subs(ssli) remove sets that are subsets of another set from a list of tuples
simultaneous_ci(q_crit, var, groupnobs[, ...]) Compute simultaneous confidence intervals for comparison of means.
test_tukey_pvalues()
tiecorrect(xranks) should be equivalent of scipy.stats.tiecorrect
tukey_pvalues(std_range, nm, df)
tukeyhsd(mean_all, nobs_all, var_all[, df, ...]) simultaneous Tukey HSD
varcorrection_pairs_unbalanced(nobs_all[, ...]) correction factor for variance with unequal sample sizes for all pairs
varcorrection_pairs_unequal(var_all, ...) return joint variance from samples with unequal variances and unequal
varcorrection_unbalanced(nobs_all[, srange]) correction factor for variance with unequal sample sizes
varcorrection_unequal(var_all, nobs_all, df_all) return joint variance from samples with unequal variances and unequal

6.8.5.3. Classes

GroupsStats(x[, useranks, uni, intlab]) statistics by groups (another version)
MultiComparison(data, groups[, group_order]) Tests for multiple comparisons
SimpleTable(data[, headers, stubs, title, ...]) Produce a simple ASCII, CSV, HTML, or LaTeX table from a rectangular (2d!) array of data, not necessarily numerical.
StepDown(vals, nobs_all, var_all[, df]) a class for step down methods
TukeyHSDResults(mc_object, results_table, q_crit) Results from Tukey HSD test, with additional plot methods