6.3.3. statsmodels.sandbox.distributions.estimators

estimate distribution parameters by various methods method of moments or matching quantiles, and Maximum Likelihood estimation based on binned data and Maximum Product-of-Spacings

Warning: I’m still finding cut-and-paste and refactoring errors, e.g.
hardcoded variables from outer scope in functions some results don’t seem to make sense for Pareto case, looks better now after correcting some name errors
initially loosely based on a paper and blog for quantile matching
by John D. Cook formula for gamma quantile (ppf) matching by him (from paper) http://www.codeproject.com/KB/recipes/ParameterPercentile.aspx http://www.johndcook.com/blog/2010/01/31/parameters-from-percentiles/ this is what I actually used (in parts): http://www.bepress.com/mdandersonbiostat/paper55/

6.3.3.1. quantile based estimator

only special cases for number or parameters so far Is there a literature for GMM estimation of distribution parameters? check

found one: Wu/Perloff 2007

6.3.3.2. binned estimator

  • I added this also
  • use it for chisquare tests with estimation distribution parameters
  • move this to distribution_extras (next to gof tests powerdiscrepancy and continuous) or add to distribution_patch

example: t-distribution * works with quantiles if they contain tail quantiles * results with momentcondquant don’t look as good as mle estimate

TODOs * rearange and make sure I don’t use module globals (as I did initially) DONE

make two version exactly identified method of moments with fsolve and GMM (?) version with fmin and maybe the special cases of JD Cook update: maybe exact (MM) version is not so interesting compared to GMM
  • add semifrozen version of moment and quantile based estimators, e.g. for beta (both loc and scale fixed), or gamma (loc fixed)
  • add beta example to the semifrozen MLE, fitfr, code -> added method of moment estimator to _fitstart for beta
  • start a list of how well different estimators, especially current mle work for the different distributions
  • need general GMM code (with optimal weights ?), looks like a good example for it
  • get example for binned data estimation, mailing list a while ago
  • any idea when these are better than mle ?
  • check language: I use quantile to mean the value of the random variable, not quantile between 0 and 1.
  • for GMM: move moment conditions to separate function, so that they can be used for further analysis, e.g. covariance matrix of parameter estimates
  • question: Are GMM properties different for matching quantiles with cdf or ppf? Estimate should be the same, but derivatives of moment conditions differ.
  • add maximum spacings estimator, Wikipedia, Per Brodtkorb -> basic version Done
  • add parameter estimation based on empirical characteristic function (Carrasco/Florens), especially for stable distribution
  • provide a model class based on estimating all distributions, and collect all distribution specific information

6.3.3.2.1. References

Ximing Wu, Jeffrey M. Perloff, GMM estimation of a maximum entropy distribution with interval data, Journal of Econometrics, Volume 138, Issue 2, ‘Information and Entropy Econometrics’ - A Volume in Honor of Arnold Zellner, June 2007, Pages 532-546, ISSN 0304-4076, DOI: 10.1016/j.jeconom.2006.05.008. http://www.sciencedirect.com/science/article/B6VC0-4K606TK-4/2/78bc07c6245546374490f777a6bdbbcc http://escholarship.org/uc/item/7jf5w1ht (working paper)

Johnson, Kotz, Balakrishnan: Volume 2

Author : josef-pktd License : BSD created : 2010-04-20

changes: added Maximum Product-of-Spacings 2010-05-12

6.3.3.2.2. Functions

fit_mps(dist, data[, x0]) Estimate distribution parameters with Maximum Product-of-Spacings
fitbinned(distfn, freq, binedges, start[, fixed]) estimate parameters of distribution function for binned data using MLE
fitbinnedgmm(distfn, freq, binedges, start) estimate parameters of distribution function for binned data using GMM
fitquantilesgmm(distfn, x[, start, pquant, ...])
gammamomentcond(distfn, params, mom2[, quantile]) estimate distribution parameters based method of moments (mean,
gammamomentcond2(distfn, params, mom2[, ...]) estimate distribution parameters based method of moments (mean,
getstartparams(dist, data) get starting values for estimation of distribution parameters
hess_ndt(fun, pars, args, options)
logmps(params, xsorted, dist) calculate negative log of Product-of-Spacings
momentcondquant(distfn, params, mom2[, ...]) moment conditions for estimating distribution parameters by matching
momentcondunbound(distfn, params, mom2[, ...]) moment conditions for estimating distribution parameters using method
momentcondunboundls(distfn, params, mom2[, ...]) moment conditions for estimating loc and scale of a distribution