6.8.5. statsmodels.sandbox.stats.multicomp¶
from pystatsmodels mailinglist 20100524
- Notes:
- unfinished, unverified, but most parts seem to work in MonteCarlo
- one example taken from lecture notes looks ok
- needs cases with non-monotonic inequality for test to see difference between one-step, step-up and step-down procedures
- FDR doesn’t look really better then Bonferoni in the MC examples that I tried
- update:
- now tested against R, stats and multtest, I have all of their methods for p-value correction
- getting Hommel was impossible until I found reference for pvalue correction
- now, since I have p-values correction, some of the original tests (rej/norej) implementation is not really needed anymore. I think I keep it for reference. Test procedure for Hommel in development session log
- I haven’t updated other functions and classes in here. - multtest has some good helper function according to docs
- still need to update references, the real papers
- fdr with estimated true hypothesis still missing
- multiple comparison procedures incomplete or missing
- I will get multiple comparison for now only for independent case, which might be conservative in correlated case (?).
some References:
Gibbons, Jean Dickinson and Chakraborti Subhabrata, 2003, Nonparametric Statistical Inference, Fourth Edition, Marcel Dekker
p.363: 10.4 THE KRUSKAL-WALLIS ONE-WAY ANOVA TEST AND MULTIPLE COMPARISONS p.367: multiple comparison for kruskal formula used in multicomp.kruskal
Sheskin, David J., 2004, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd ed., Chapman&Hall/CRC
Test 21: The Single-Factor Between-Subjects Analysis of Variance Test 22: The Kruskal-Wallis One-Way Analysis of Variance by Ranks Test
Zwillinger, Daniel and Stephen Kokoska, 2000, CRC standard probability and statistics tables and formulae, Chapman&Hall/CRC
14.9 WILCOXON RANKSUM (MANN WHITNEY) TEST
- Paul Wright, Adjusted P-Values for Simultaneous Inference, Biometrics
Vol. 48, No. 4 (Dec., 1992), pp. 1005-1013, International Biometric Society Stable URL: http://www.jstor.org/stable/2532694
(p-value correction for Hommel in appendix)
for multicomparison
new book “multiple comparison in R” Hsu is a good reference but I don’t have it.
Author: Josef Pktd and example from H Raja and rewrite from Vincent Davis
6.8.5.1. TODO¶
- handle exception if empty, shows up only sometimes when running this
- DONE I think
- Traceback (most recent call last):
- File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 711, in <module>
- print(‘sh’, multipletests(tpval, alpha=0.05, method=’sh’)
- File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 241, in multipletests
- rejectmax = np.max(np.nonzero(reject))
File “C:ProgramsPython25libsite-packages
umpycore romnumeric.py”, line 1765, in amax
return _wrapit(a, ‘max’, axis, out)File “C:ProgramsPython25libsite-packages
umpycore romnumeric.py”, line 37, in _wrapit
ValueError: zero-size array to ufunc.reduce without identity
- name of function multipletests, rename to something like pvalue_correction?
6.8.5.2. Functions¶
Tukeythreegene (first, second, third) |
|
Tukeythreegene2 (genes) |
gend is a list, ie [first, second, third] |
catstack (args) |
|
compare_ordered (vals, alpha) |
simple ordered sequential comparison of means |
contrast_all_one (nm) |
contrast or restriction matrix for all against first comparison |
contrast_allpairs (nm) |
contrast or restriction matrix for all pairs of nm variables |
contrast_diff_mean (nm) |
contrast or restriction matrix for all against mean comparison |
distance_st_range (mean_all, nobs_all, var_all) |
pairwise distance matrix, outsourced from tukeyhsd |
ecdf (x) |
no frills empirical cdf used in fdrcorrection |
fdrcorrection0 (pvals[, alpha, method, is_sorted]) |
pvalue correction for false discovery rate |
fdrcorrection_bak (pvals[, alpha, method]) |
Reject False discovery rate correction for pvalues |
fdrcorrection_twostage (pvals[, alpha, ...]) |
(iterated) two stage linear step-up procedure with estimation of number of true |
get_tukeyQcrit (k, df[, alpha]) |
return critical values for Tukey’s HSD (Q) |
get_tukeyQcrit2 (k, df[, alpha]) |
return critical values for Tukey’s HSD (Q) |
homogeneous_subsets (vals, dcrit) |
recursively check all pairs of vals for minimum distance |
maxzero (x) |
find all up zero crossings and return the index of the highest |
maxzerodown (x) |
find all up zero crossings and return the index of the highest |
mcfdr ([nrepl, nobs, ntests, ntrue, mu, ...]) |
MonteCarlo to test fdrcorrection |
multicontrast_pvalues (tstat, tcorr[, df, ...]) |
pvalues for simultaneous tests |
multipletests (pvals[, alpha, method, ...]) |
test results and p-value correction for multiple tests |
randmvn (rho[, size, standardize]) |
create random draws from equi-correlated multivariate normal distribution |
rankdata (x) |
rankdata, equivalent to scipy.stats.rankdata |
rejectionline (n[, alpha]) |
reference line for rejection in multiple tests |
set_partition (ssli) |
extract a partition from a list of tuples |
set_remove_subs (ssli) |
remove sets that are subsets of another set from a list of tuples |
simultaneous_ci (q_crit, var, groupnobs[, ...]) |
Compute simultaneous confidence intervals for comparison of means. |
test_tukey_pvalues () |
|
tiecorrect (xranks) |
should be equivalent of scipy.stats.tiecorrect |
tukey_pvalues (std_range, nm, df) |
|
tukeyhsd (mean_all, nobs_all, var_all[, df, ...]) |
simultaneous Tukey HSD |
varcorrection_pairs_unbalanced (nobs_all[, ...]) |
correction factor for variance with unequal sample sizes for all pairs |
varcorrection_pairs_unequal (var_all, ...) |
return joint variance from samples with unequal variances and unequal |
varcorrection_unbalanced (nobs_all[, srange]) |
correction factor for variance with unequal sample sizes |
varcorrection_unequal (var_all, nobs_all, df_all) |
return joint variance from samples with unequal variances and unequal |
6.8.5.3. Classes¶
GroupsStats (x[, useranks, uni, intlab]) |
statistics by groups (another version) |
MultiComparison (data, groups[, group_order]) |
Tests for multiple comparisons |
SimpleTable (data[, headers, stubs, title, ...]) |
Produce a simple ASCII, CSV, HTML, or LaTeX table from a rectangular (2d!) array of data, not necessarily numerical. |
StepDown (vals, nobs_all, var_all[, df]) |
a class for step down methods |
TukeyHSDResults (mc_object, results_table, q_crit) |
Results from Tukey HSD test, with additional plot methods |