6.3.6.2.9. statsmodels.sandbox.distributions.gof_new.ks_2samp

statsmodels.sandbox.distributions.gof_new.ks_2samp(data1, data2)[source]

Computes the Kolmogorov-Smirnof statistic on 2 samples.

This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution.

Parameters:

a, b : sequence of 1-D ndarrays

two arrays of sample observations assumed to be drawn from a continuous distribution, sample sizes can be different

Returns:

D : float

KS statistic

p-value : float

two-tailed p-value

Notes

This tests whether 2 samples are drawn from the same distribution. Note that, like in the case of the one-sample K-S test, the distribution is assumed to be continuous.

This is the two-sided test, one-sided tests are not implemented. The test uses the two-sided asymptotic Kolmogorov-Smirnov distribution.

If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.

Examples

>>> from scipy import stats
>>> import numpy as np
>>> from scipy.stats import ks_2samp
>>> #fix random seed to get the same result
>>> np.random.seed(12345678);
>>> n1 = 200  # size of first sample
>>> n2 = 300  # size of second sample

different distribution we can reject the null hypothesis since the pvalue is below 1%

>>> rvs1 = stats.norm.rvs(size=n1,loc=0.,scale=1);
>>> rvs2 = stats.norm.rvs(size=n2,loc=0.5,scale=1.5)
>>> ks_2samp(rvs1,rvs2)
(0.20833333333333337, 4.6674975515806989e-005)

slightly different distribution we cannot reject the null hypothesis at a 10% or lower alpha since the pvalue at 0.144 is higher than 10%

>>> rvs3 = stats.norm.rvs(size=n2,loc=0.01,scale=1.0)
>>> ks_2samp(rvs1,rvs3)
(0.10333333333333333, 0.14498781825751686)

identical distribution we cannot reject the null hypothesis since the pvalue is high, 41%

>>> rvs4 = stats.norm.rvs(size=n2,loc=0.0,scale=1.0)
>>> ks_2samp(rvs1,rvs4)
(0.07999999999999996, 0.41126949729859719)