6.8.5. statsmodels.sandbox.stats.multicomp¶
from pystatsmodels mailinglist 20100524
- Notes:
- unfinished, unverified, but most parts seem to work in MonteCarlo
- one example taken from lecture notes looks ok
- needs cases with non-monotonic inequality for test to see difference between one-step, step-up and step-down procedures
- FDR doesn’t look really better then Bonferoni in the MC examples that I tried
- update:
- now tested against R, stats and multtest, I have all of their methods for p-value correction
- getting Hommel was impossible until I found reference for pvalue correction
- now, since I have p-values correction, some of the original tests (rej/norej) implementation is not really needed anymore. I think I keep it for reference. Test procedure for Hommel in development session log
- I haven’t updated other functions and classes in here. - multtest has some good helper function according to docs
- still need to update references, the real papers
- fdr with estimated true hypothesis still missing
- multiple comparison procedures incomplete or missing
- I will get multiple comparison for now only for independent case, which might be conservative in correlated case (?).
some References:
Gibbons, Jean Dickinson and Chakraborti Subhabrata, 2003, Nonparametric Statistical Inference, Fourth Edition, Marcel Dekker
p.363: 10.4 THE KRUSKAL-WALLIS ONE-WAY ANOVA TEST AND MULTIPLE COMPARISONS p.367: multiple comparison for kruskal formula used in multicomp.kruskal
Sheskin, David J., 2004, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd ed., Chapman&Hall/CRC
Test 21: The Single-Factor Between-Subjects Analysis of Variance Test 22: The Kruskal-Wallis One-Way Analysis of Variance by Ranks Test
Zwillinger, Daniel and Stephen Kokoska, 2000, CRC standard probability and statistics tables and formulae, Chapman&Hall/CRC
14.9 WILCOXON RANKSUM (MANN WHITNEY) TEST
- Paul Wright, Adjusted P-Values for Simultaneous Inference, Biometrics
Vol. 48, No. 4 (Dec., 1992), pp. 1005-1013, International Biometric Society Stable URL: http://www.jstor.org/stable/2532694
(p-value correction for Hommel in appendix)
for multicomparison
new book “multiple comparison in R” Hsu is a good reference but I don’t have it.
Author: Josef Pktd and example from H Raja and rewrite from Vincent Davis
6.8.5.1. TODO¶
- handle exception if empty, shows up only sometimes when running this
- DONE I think
- Traceback (most recent call last):
- File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 711, in <module>
- print(‘sh’, multipletests(tpval, alpha=0.05, method=’sh’)
- File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 241, in multipletests
- rejectmax = np.max(np.nonzero(reject))
File “C:ProgramsPython25libsite-packages
umpycore romnumeric.py”, line 1765, in amax
return _wrapit(a, ‘max’, axis, out)File “C:ProgramsPython25libsite-packages
umpycore romnumeric.py”, line 37, in _wrapit
ValueError: zero-size array to ufunc.reduce without identity
- name of function multipletests, rename to something like pvalue_correction?
6.8.5.2. Functions¶
Tukeythreegene(first, second, third) |
|
Tukeythreegene2(genes) |
gend is a list, ie [first, second, third] |
catstack(args) |
|
compare_ordered(vals, alpha) |
simple ordered sequential comparison of means |
contrast_all_one(nm) |
contrast or restriction matrix for all against first comparison |
contrast_allpairs(nm) |
contrast or restriction matrix for all pairs of nm variables |
contrast_diff_mean(nm) |
contrast or restriction matrix for all against mean comparison |
distance_st_range(mean_all, nobs_all, var_all) |
pairwise distance matrix, outsourced from tukeyhsd |
ecdf(x) |
no frills empirical cdf used in fdrcorrection |
fdrcorrection0(pvals[, alpha, method, is_sorted]) |
pvalue correction for false discovery rate |
fdrcorrection_bak(pvals[, alpha, method]) |
Reject False discovery rate correction for pvalues |
fdrcorrection_twostage(pvals[, alpha, ...]) |
(iterated) two stage linear step-up procedure with estimation of number of true |
get_tukeyQcrit(k, df[, alpha]) |
return critical values for Tukey’s HSD (Q) |
get_tukeyQcrit2(k, df[, alpha]) |
return critical values for Tukey’s HSD (Q) |
homogeneous_subsets(vals, dcrit) |
recursively check all pairs of vals for minimum distance |
maxzero(x) |
find all up zero crossings and return the index of the highest |
maxzerodown(x) |
find all up zero crossings and return the index of the highest |
mcfdr([nrepl, nobs, ntests, ntrue, mu, ...]) |
MonteCarlo to test fdrcorrection |
multicontrast_pvalues(tstat, tcorr[, df, ...]) |
pvalues for simultaneous tests |
multipletests(pvals[, alpha, method, ...]) |
test results and p-value correction for multiple tests |
randmvn(rho[, size, standardize]) |
create random draws from equi-correlated multivariate normal distribution |
rankdata(x) |
rankdata, equivalent to scipy.stats.rankdata |
rejectionline(n[, alpha]) |
reference line for rejection in multiple tests |
set_partition(ssli) |
extract a partition from a list of tuples |
set_remove_subs(ssli) |
remove sets that are subsets of another set from a list of tuples |
simultaneous_ci(q_crit, var, groupnobs[, ...]) |
Compute simultaneous confidence intervals for comparison of means. |
test_tukey_pvalues() |
|
tiecorrect(xranks) |
should be equivalent of scipy.stats.tiecorrect |
tukey_pvalues(std_range, nm, df) |
|
tukeyhsd(mean_all, nobs_all, var_all[, df, ...]) |
simultaneous Tukey HSD |
varcorrection_pairs_unbalanced(nobs_all[, ...]) |
correction factor for variance with unequal sample sizes for all pairs |
varcorrection_pairs_unequal(var_all, ...) |
return joint variance from samples with unequal variances and unequal |
varcorrection_unbalanced(nobs_all[, srange]) |
correction factor for variance with unequal sample sizes |
varcorrection_unequal(var_all, nobs_all, df_all) |
return joint variance from samples with unequal variances and unequal |
6.8.5.3. Classes¶
GroupsStats(x[, useranks, uni, intlab]) |
statistics by groups (another version) |
MultiComparison(data, groups[, group_order]) |
Tests for multiple comparisons |
SimpleTable(data[, headers, stubs, title, ...]) |
Produce a simple ASCII, CSV, HTML, or LaTeX table from a rectangular (2d!) array of data, not necessarily numerical. |
StepDown(vals, nobs_all, var_all[, df]) |
a class for step down methods |
TukeyHSDResults(mc_object, results_table, q_crit) |
Results from Tukey HSD test, with additional plot methods |