Discrete Statistical Distributions

Discrete random variables take on only a countable number of values. The commonly used distributions are included in SciPy and described in this document. Each discrete distribution can take one extra integer parameter: L. The relationship between the general distribution p and the standard distribution p0 is

p(x)=p0(xL)

which allows for shifting of the input. When a distribution generator is initialized, the discrete distribution can either specify the beginning and ending (integer) values a and b which must be such that

p0(x)=0x<a or x>b

in which case, it is assumed that the pdf function is specified on the integers a+mkb where k is a non-negative integer ( 0,1,2, ) and m is a positive integer multiplier. Alternatively, the two lists xk and p(xk) can be provided directly in which case a dictionary is set up internally to evaluate probabilities and generate random variates.

Probability Mass Function (PMF)

The probability mass function of a random variable X is defined as the probability that the random variable takes on a particular value.

p(xk)=P[X=xk]

This is also sometimes called the probability density function, although technically

f(x)=kp(xk)δ(xxk)

is the probability density function for a discrete distribution [1] .

[1]XXX: Unknown layout Plain Layout: Note that we will be using p to represent the probability mass function and a parameter (a XXX: probability). The usage should be obvious from context.

Cumulative Distribution Function (CDF)

The cumulative distribution function is

F(x)=P[Xx]=xkxp(xk)

and is also useful to be able to compute. Note that

F(xk)F(xk1)=p(xk)

Survival Function

The survival function is just

S(x)=1F(x)=P[X>k]

the probability that the random variable is strictly larger than k .

Percent Point Function (Inverse CDF)

The percent point function is the inverse of the cumulative distribution function and is

G(q)=F1(q)

for discrete distributions, this must be modified for cases where there is no xk such that F(xk)=q. In these cases we choose G(q) to be the smallest value xk=G(q) for which F(xk)q . If q=0 then we define G(0)=a1 . This definition allows random variates to be defined in the same way as with continuous rv’s using the inverse cdf on a uniform distribution to generate random variates.

Inverse survival function

The inverse survival function is the inverse of the survival function

Z(α)=S1(α)=G(1α)

and is thus the smallest non-negative integer k for which F(k)1α or the smallest non-negative integer k for which S(k)α.

Hazard functions

If desired, the hazard function and the cumulative hazard function could be defined as

h(xk)=p(xk)1F(xk)

and

H(x)=xkxh(xk)=xkxF(xk)F(xk1)1F(xk).

Moments

Non-central moments are defined using the PDF

μm=E[Xm]=kxmkp(xk).

Central moments are computed similarly μ=μ1

μm=E[(Xμ)m]=k(xkμ)mp(xk)=mk=0(1)mk(mk)μmkμk

The mean is the first moment

μ=μ1=E[X]=kxkp(xk)

the variance is the second central moment

μ2=E[(Xμ)2]=xkx2kp(xk)μ2.

Skewness is defined as

γ1=μ3μ3/22

while (Fisher) kurtosis is

γ2=μ4μ223,

so that a normal distribution has a kurtosis of zero.

Moment generating function

The moment generating function is defined as

MX(t)=E[eXt]=xkexktp(xk)

Moments are found as the derivatives of the moment generating function evaluated at 0.

Fitting data

To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.

If f_{i}\left(k;\boldsymbol{\theta}\right) is the PDF of a random-variable where \boldsymbol{\theta} is a vector of parameters ( e.g. L and S ), then for a collection of N independent samples from this distribution, the joint distribution the random vector \mathbf{k} is

f\left(\mathbf{k};\boldsymbol{\theta}\right)=\prod_{i=1}^{N}f_{i}\left(k_{i};\boldsymbol{\theta}\right).

The maximum likelihood estimate of the parameters \boldsymbol{\theta} are the parameters which maximize this function with \mathbf{x} fixed and given by the data:

\begin{eqnarray*} \hat{\boldsymbol{\theta}} & = & \arg\max_{\boldsymbol{\theta}}f\left(\mathbf{k};\boldsymbol{\theta}\right)\\ & = & \arg\min_{\boldsymbol{\theta}}l_{\mathbf{k}}\left(\boldsymbol{\theta}\right).\end{eqnarray*}

Where

\begin{eqnarray*} l_{\mathbf{k}}\left(\boldsymbol{\theta}\right) & = & -\sum_{i=1}^{N}\log f\left(k_{i};\boldsymbol{\theta}\right)\\ & = & -N\overline{\log f\left(k_{i};\boldsymbol{\theta}\right)}\end{eqnarray*}

Standard notation for mean

We will use

\overline{y\left(\mathbf{x}\right)}=\frac{1}{N}\sum_{i=1}^{N}y\left(x_{i}\right)

where N should be clear from context.

Combinations

Note that

k!=k\cdot\left(k-1\right)\cdot\left(k-2\right)\cdot\cdots\cdot1=\Gamma\left(k+1\right)

and has special cases of

\begin{eqnarray*} 0! & \equiv & 1\\ k! & \equiv & 0\quad k<0\end{eqnarray*}

and

\left(\begin{array}{c} n\\ k\end{array}\right)=\frac{n!}{\left(n-k\right)!k!}.

If n<0 or k<0 or k>n we define \left(\begin{array}{c} n\\ k\end{array}\right)=0