6.8.5. statsmodels.sandbox.stats.multicomp¶

from pystatsmodels mailinglist 20100524

Notes:

unfinished, unverified, but most parts seem to work in MonteCarlo
one example taken from lecture notes looks ok
needs cases with non-monotonic inequality for test to see difference between one-step, step-up and step-down procedures
FDR doesn’t look really better then Bonferoni in the MC examples that I tried

update:

now tested against R, stats and multtest, I have all of their methods for p-value correction
getting Hommel was impossible until I found reference for pvalue correction
now, since I have p-values correction, some of the original tests (rej/norej) implementation is not really needed anymore. I think I keep it for reference. Test procedure for Hommel in development session log
I haven’t updated other functions and classes in here. - multtest has some good helper function according to docs
still need to update references, the real papers
fdr with estimated true hypothesis still missing
multiple comparison procedures incomplete or missing
I will get multiple comparison for now only for independent case, which might be conservative in correlated case (?).

some References:

Gibbons, Jean Dickinson and Chakraborti Subhabrata, 2003, Nonparametric Statistical Inference, Fourth Edition, Marcel Dekker

p.363: 10.4 THE KRUSKAL-WALLIS ONE-WAY ANOVA TEST AND MULTIPLE COMPARISONS p.367: multiple comparison for kruskal formula used in multicomp.kruskal

Sheskin, David J., 2004, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd ed., Chapman&Hall/CRC

Test 21: The Single-Factor Between-Subjects Analysis of Variance Test 22: The Kruskal-Wallis One-Way Analysis of Variance by Ranks Test

Zwillinger, Daniel and Stephen Kokoska, 2000, CRC standard probability and statistics tables and formulae, Chapman&Hall/CRC

14.9 WILCOXON RANKSUM (MANN WHITNEY) TEST

Paul Wright, Adjusted P-Values for Simultaneous Inference, Biometrics

Vol. 48, No. 4 (Dec., 1992), pp. 1005-1013, International Biometric Society Stable URL: http://www.jstor.org/stable/2532694

(p-value correction for Hommel in appendix)

for multicomparison

new book “multiple comparison in R” Hsu is a good reference but I don’t have it.

Author: Josef Pktd and example from H Raja and rewrite from Vincent Davis

6.8.5.1. TODO¶

handle exception if empty, shows up only sometimes when running this

DONE I think

Traceback (most recent call last):

File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 711, in <module>: print(‘sh’, multipletests(tpval, alpha=0.05, method=’sh’)
File “C:Josefeclipsegworkspacestatsmodels-josef-experimental-gsocscikitsstatsmodelssandboxstatsmulticomp.py”, line 241, in multipletests: rejectmax = np.max(np.nonzero(reject))

File “C:ProgramsPython25libsite-packages

umpycore romnumeric.py”, line 1765, in amax

return _wrapit(a, ‘max’, axis, out)

File “C:ProgramsPython25libsite-packages

umpycore romnumeric.py”, line 37, in _wrapit

result = getattr(asarray(obj),method)(*args, **kwds)

ValueError: zero-size array to ufunc.reduce without identity

name of function multipletests, rename to something like pvalue_correction?

6.8.5.2. Functions¶

`Tukeythreegene`(first, second, third)
`Tukeythreegene2`(genes)	gend is a list, ie [first, second, third]
`catstack`(args)
`compare_ordered`(vals, alpha)	simple ordered sequential comparison of means
`contrast_all_one`(nm)	contrast or restriction matrix for all against first comparison
`contrast_allpairs`(nm)	contrast or restriction matrix for all pairs of nm variables
`contrast_diff_mean`(nm)	contrast or restriction matrix for all against mean comparison
`distance_st_range`(mean_all, nobs_all, var_all)	pairwise distance matrix, outsourced from tukeyhsd
`ecdf`(x)	no frills empirical cdf used in fdrcorrection
`fdrcorrection0`(pvals[, alpha, method, is_sorted])	pvalue correction for false discovery rate
`fdrcorrection_bak`(pvals[, alpha, method])	Reject False discovery rate correction for pvalues
`fdrcorrection_twostage`(pvals[, alpha, ...])	(iterated) two stage linear step-up procedure with estimation of number of true
`get_tukeyQcrit`(k, df[, alpha])	return critical values for Tukey’s HSD (Q)
`get_tukeyQcrit2`(k, df[, alpha])	return critical values for Tukey’s HSD (Q)
`homogeneous_subsets`(vals, dcrit)	recursively check all pairs of vals for minimum distance
`maxzero`(x)	find all up zero crossings and return the index of the highest
`maxzerodown`(x)	find all up zero crossings and return the index of the highest
`mcfdr`([nrepl, nobs, ntests, ntrue, mu, ...])	MonteCarlo to test fdrcorrection
`multicontrast_pvalues`(tstat, tcorr[, df, ...])	pvalues for simultaneous tests
`multipletests`(pvals[, alpha, method, ...])	test results and p-value correction for multiple tests
`randmvn`(rho[, size, standardize])	create random draws from equi-correlated multivariate normal distribution
`rankdata`(x)	rankdata, equivalent to scipy.stats.rankdata
`rejectionline`(n[, alpha])	reference line for rejection in multiple tests
`set_partition`(ssli)	extract a partition from a list of tuples
`set_remove_subs`(ssli)	remove sets that are subsets of another set from a list of tuples
`simultaneous_ci`(q_crit, var, groupnobs[, ...])	Compute simultaneous confidence intervals for comparison of means.
`test_tukey_pvalues`()
`tiecorrect`(xranks)	should be equivalent of scipy.stats.tiecorrect
`tukey_pvalues`(std_range, nm, df)
`tukeyhsd`(mean_all, nobs_all, var_all[, df, ...])	simultaneous Tukey HSD
`varcorrection_pairs_unbalanced`(nobs_all[, ...])	correction factor for variance with unequal sample sizes for all pairs
`varcorrection_pairs_unequal`(var_all, ...)	return joint variance from samples with unequal variances and unequal
`varcorrection_unbalanced`(nobs_all[, srange])	correction factor for variance with unequal sample sizes
`varcorrection_unequal`(var_all, nobs_all, df_all)	return joint variance from samples with unequal variances and unequal

6.8.5.3. Classes¶

`GroupsStats`(x[, useranks, uni, intlab])	statistics by groups (another version)
`MultiComparison`(data, groups[, group_order])	Tests for multiple comparisons
`SimpleTable`(data[, headers, stubs, title, ...])	Produce a simple ASCII, CSV, HTML, or LaTeX table from a rectangular (2d!) array of data, not necessarily numerical.
`StepDown`(vals, nobs_all, var_all[, df])	a class for step down methods
`TukeyHSDResults`(mc_object, results_table, q_crit)	Results from Tukey HSD test, with additional plot methods