6.8.7. statsmodels.sandbox.stats.stats_dhuard¶
from David Huard’s scipy sandbox, also attached to a ticket and in the matplotlib-user mailinglist (links ???)
6.8.7.1. Notes¶
out of bounds interpolation raises exception and wouldn’t be completely defined
>>> scoreatpercentile(x, [0,25,50,100])
Traceback (most recent call last): ...
raise ValueError(“A value in x_new is below the interpolation “
ValueError: A value in x_new is below the interpolation range. >>> percentileofscore(x, [-50, 50]) Traceback (most recent call last): ...
raise ValueError(“A value in x_new is below the interpolation “
ValueError: A value in x_new is below the interpolation range.
6.8.7.2. idea¶
6.8.7.2.1. histogram and empirical interpolated distribution¶
dual constructor * empirical cdf : cdf on all observations through linear interpolation * binned cdf : based on histogram both should work essentially the same, although pdf of empirical has many spikes, fluctuates a lot - alternative: binning based on interpolated cdf : example in script * ppf: quantileatscore based on interpolated cdf * rvs : generic from ppf * stats, expectation ? how does integration wrt cdf work - theory?
Problems * limits, lower and upper bound of support
does not work or is undefined with empirical cdf and interpolation
- extending bounds ? matlab has pareto tails for empirical distribution, breaks linearity
6.8.7.2.2. empirical distribution with higher order interpolation¶
- should work easily enough with interpolating splines
- not piecewise linear
- can use pareto (or other) tails
- ppf how do I get the inverse function of a higher order spline? Chuck: resample and fit spline to inverse function this will have an approximation error in the inverse function
- -> doesn’t work: higher order spline doesn’t preserve monotonicity see mailing list for response to my question
- pmf from derivative available in spline
-> forget this and use kernel density estimator instead
6.8.7.2.3. bootstrap/empirical distribution:¶
discrete distribution on real line given observations what’s defined? * cdf : step function * pmf : points with equal weight 1/nobs * rvs : resampling * ppf : quantileatscore on sample? * moments : from data ? * expectation ? sum_{all observations x} [func(x) * pmf(x)] * similar for discrete distribution on real line * References : ? * what’s the point? most of it is trivial, just for the record ?
Created on Monday, May 03, 2010, 11:47:03 AM Author: josef-pktd, parts based on David Huard License: BSD
6.8.7.2.4. Functions¶
empiricalcdf (data[, method]) |
Return the empirical cdf. |
percentileofscore (data, score) |
Return the percentile-position of score relative to data. |
scoreatpercentile (data, percentile) |
Return the score at the given percentile of the data. |