7.19. The Datasets Package¶
statsmodels
provides data sets (i.e. data and meta-data) for use in
examples, tutorials, model testing, etc.
7.19.2. Using Datasets from R¶
The Rdatasets project gives access to the datasets available in R’s core datasets package and many other common R packages. All of these datasets are available to statsmodels by using the get_rdataset()
function. The actual data is accessible by the data
attribute. For example:
In [1]: import statsmodels.api as sm
In [2]: duncan_prestige = sm.datasets.get_rdataset("Duncan", "car")
In [3]: print duncan_prestige.__doc__
+----------+-------------------+
| Duncan | R Documentation |
+----------+-------------------+
Duncan's Occupational Prestige Data
-----------------------------------
Description
~~~~~~~~~~~
The ``Duncan`` data frame has 45 rows and 4 columns. Data on the
prestige and other characteristics of 45 U. S. occupations in 1950.
Usage
~~~~~
::
Duncan
Format
~~~~~~
This data frame contains the following columns:
type
Type of occupation. A factor with the following levels: ``prof``,
professional and managerial; ``wc``, white-collar; ``bc``,
blue-collar.
income
Percent of males in occupation earning $3500 or more in 1950.
education
Percent of males in occupation in 1950 who were high-school
graduates.
prestige
Percent of raters in NORC study rating occupation as excellent or
good in prestige.
Source
~~~~~~
Duncan, O. D. (1961) A socioeconomic index for all occupations. In
Reiss, A. J., Jr. (Ed.) *Occupations and Social Status.* Free Press
[Table VI-1].
References
~~~~~~~~~~
Fox, J. (2008) *Applied Regression Analysis and Generalized Linear
Models*, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) *An R Companion to Applied Regression*,
Second Edition, Sage.
In [4]: duncan_prestige.data.head(5)
Out[4]:
type income education prestige
accountant prof 62 86 82
pilot prof 72 76 83
architect prof 75 92 90
author prof 55 90 76
chemist prof 64 86 90
7.19.3. R Datasets Function Reference¶
get_rdataset (dataname[, package, cache]) |
download and return R dataset |
get_data_home ([data_home]) |
Return the path of the statsmodels data dir. |
clear_data_home ([data_home]) |
Delete all the content of the data home cache. |
7.19.4. Available Datasets¶
- 7.6.3.13. statsmodels.discrete.discrete_model.BinaryModel
- 7.6.3.14. statsmodels.discrete.discrete_model.BinaryResults
- 7.6.3.15. statsmodels.discrete.discrete_model.CountModel
- 7.6.3.8. statsmodels.discrete.discrete_model.CountResults
- 7.6.3.11. statsmodels.discrete.discrete_model.DiscreteModel
- 7.6.3.12. statsmodels.discrete.discrete_model.DiscreteResults
- 7.6.3.1. statsmodels.discrete.discrete_model.Logit
- 7.6.3.6. statsmodels.discrete.discrete_model.LogitResults
- 7.6.3.3. statsmodels.discrete.discrete_model.MNLogit
- 7.6.3.16. statsmodels.discrete.discrete_model.MultinomialModel
- 7.6.3.9. statsmodels.discrete.discrete_model.MultinomialResults
- 7.6.3.5. statsmodels.discrete.discrete_model.NegativeBinomial
- 7.6.3.10. statsmodels.discrete.discrete_model.NegativeBinomialResults
- 7.6.3.4. statsmodels.discrete.discrete_model.Poisson
- 7.6.3.2. statsmodels.discrete.discrete_model.Probit
- 7.6.3.7. statsmodels.discrete.discrete_model.ProbitResults
- 7.15.1.1. statsmodels.distributions.empirical_distribution.ECDF
- 7.15.1.2. statsmodels.distributions.empirical_distribution.StepFunction
- 7.9.2.1. statsmodels.duration.hazard_regression.PHReg
- 7.9.2.2. statsmodels.duration.hazard_regression.PHRegResults
- 7.13.3.1. statsmodels.emplike.descriptive.DescStat
- 7.13.3.3. statsmodels.emplike.descriptive.DescStatMV
- 7.13.3.2. statsmodels.emplike.descriptive.DescStatUV
- 7.3.2.3.2. statsmodels.genmod.cov_struct.Autoregressive
- 7.3.2.3.1. statsmodels.genmod.cov_struct.CovStruct
- 7.3.2.3.3. statsmodels.genmod.cov_struct.Exchangeable
- 7.3.2.3.4. statsmodels.genmod.cov_struct.GlobalOddsRatio
- 7.3.2.3.5. statsmodels.genmod.cov_struct.Independence
- 7.3.2.3.6. statsmodels.genmod.cov_struct.Nested
- 7.2.3.3.2. statsmodels.genmod.families.family.Binomial
- 7.2.3.3.2.1. statsmodels.genmod.families.family.Binomial.deviance
- 7.2.3.3.2.2. statsmodels.genmod.families.family.Binomial.fitted
- 7.2.3.3.2.3. statsmodels.genmod.families.family.Binomial.initialize
- 7.2.3.3.2.4. statsmodels.genmod.families.family.Binomial.loglike
- 7.2.3.3.2.5. statsmodels.genmod.families.family.Binomial.predict
- 7.2.3.3.2.6. statsmodels.genmod.families.family.Binomial.resid_anscombe
- 7.2.3.3.2.7. statsmodels.genmod.families.family.Binomial.resid_dev
- 7.2.3.3.2.8. statsmodels.genmod.families.family.Binomial.starting_mu
- 7.2.3.3.2.9. statsmodels.genmod.families.family.Binomial.weights
- 7.2.3.3.1. statsmodels.genmod.families.family.Family
- 7.2.3.3.1.1. statsmodels.genmod.families.family.Family.deviance
- 7.2.3.3.1.2. statsmodels.genmod.families.family.Family.fitted
- 7.2.3.3.1.3. statsmodels.genmod.families.family.Family.loglike
- 7.2.3.3.1.4. statsmodels.genmod.families.family.Family.predict
- 7.2.3.3.1.5. statsmodels.genmod.families.family.Family.resid_anscombe
- 7.2.3.3.1.6. statsmodels.genmod.families.family.Family.resid_dev
- 7.2.3.3.1.7. statsmodels.genmod.families.family.Family.starting_mu
- 7.2.3.3.1.8. statsmodels.genmod.families.family.Family.weights
- 7.2.3.3.3. statsmodels.genmod.families.family.Gamma
- 7.2.3.3.3.1. statsmodels.genmod.families.family.Gamma.deviance
- 7.2.3.3.3.2. statsmodels.genmod.families.family.Gamma.fitted
- 7.2.3.3.3.3. statsmodels.genmod.families.family.Gamma.loglike
- 7.2.3.3.3.4. statsmodels.genmod.families.family.Gamma.predict
- 7.2.3.3.3.5. statsmodels.genmod.families.family.Gamma.resid_anscombe
- 7.2.3.3.3.6. statsmodels.genmod.families.family.Gamma.resid_dev
- 7.2.3.3.3.7. statsmodels.genmod.families.family.Gamma.starting_mu
- 7.2.3.3.3.8. statsmodels.genmod.families.family.Gamma.weights
- 7.2.3.3.4. statsmodels.genmod.families.family.Gaussian
- 7.2.3.3.4.1. statsmodels.genmod.families.family.Gaussian.deviance
- 7.2.3.3.4.2. statsmodels.genmod.families.family.Gaussian.fitted
- 7.2.3.3.4.3. statsmodels.genmod.families.family.Gaussian.loglike
- 7.2.3.3.4.4. statsmodels.genmod.families.family.Gaussian.predict
- 7.2.3.3.4.5. statsmodels.genmod.families.family.Gaussian.resid_anscombe
- 7.2.3.3.4.6. statsmodels.genmod.families.family.Gaussian.resid_dev
- 7.2.3.3.4.7. statsmodels.genmod.families.family.Gaussian.starting_mu
- 7.2.3.3.4.8. statsmodels.genmod.families.family.Gaussian.weights
- 7.2.3.3.5. statsmodels.genmod.families.family.InverseGaussian
- 7.2.3.3.5.1. statsmodels.genmod.families.family.InverseGaussian.deviance
- 7.2.3.3.5.2. statsmodels.genmod.families.family.InverseGaussian.fitted
- 7.2.3.3.5.3. statsmodels.genmod.families.family.InverseGaussian.loglike
- 7.2.3.3.5.4. statsmodels.genmod.families.family.InverseGaussian.predict
- 7.2.3.3.5.5. statsmodels.genmod.families.family.InverseGaussian.resid_anscombe
- 7.2.3.3.5.6. statsmodels.genmod.families.family.InverseGaussian.resid_dev
- 7.2.3.3.5.7. statsmodels.genmod.families.family.InverseGaussian.starting_mu
- 7.2.3.3.5.8. statsmodels.genmod.families.family.InverseGaussian.weights
- 7.2.3.3.6. statsmodels.genmod.families.family.NegativeBinomial
- 7.2.3.3.6.1. statsmodels.genmod.families.family.NegativeBinomial.deviance
- 7.2.3.3.6.2. statsmodels.genmod.families.family.NegativeBinomial.fitted
- 7.2.3.3.6.3. statsmodels.genmod.families.family.NegativeBinomial.loglike
- 7.2.3.3.6.4. statsmodels.genmod.families.family.NegativeBinomial.predict
- 7.2.3.3.6.5. statsmodels.genmod.families.family.NegativeBinomial.resid_anscombe
- 7.2.3.3.6.6. statsmodels.genmod.families.family.NegativeBinomial.resid_dev
- 7.2.3.3.6.7. statsmodels.genmod.families.family.NegativeBinomial.starting_mu
- 7.2.3.3.6.8. statsmodels.genmod.families.family.NegativeBinomial.weights
- 7.2.3.3.7. statsmodels.genmod.families.family.Poisson
- 7.2.3.3.7.1. statsmodels.genmod.families.family.Poisson.deviance
- 7.2.3.3.7.2. statsmodels.genmod.families.family.Poisson.fitted
- 7.2.3.3.7.3. statsmodels.genmod.families.family.Poisson.loglike
- 7.2.3.3.7.4. statsmodels.genmod.families.family.Poisson.predict
- 7.2.3.3.7.5. statsmodels.genmod.families.family.Poisson.resid_anscombe
- 7.2.3.3.7.6. statsmodels.genmod.families.family.Poisson.resid_dev
- 7.2.3.3.7.7. statsmodels.genmod.families.family.Poisson.starting_mu
- 7.2.3.3.7.8. statsmodels.genmod.families.family.Poisson.weights
- 7.2.3.4.2. statsmodels.genmod.families.links.CDFLink
- 7.2.3.4.3. statsmodels.genmod.families.links.CLogLog
- 7.2.3.4.1. statsmodels.genmod.families.links.Link
- 7.2.3.4.4. statsmodels.genmod.families.links.Log
- 7.2.3.4.5. statsmodels.genmod.families.links.Logit
- 7.2.3.4.6. statsmodels.genmod.families.links.NegativeBinomial
- 7.2.3.4.7. statsmodels.genmod.families.links.Power
- 7.2.3.4.8. statsmodels.genmod.families.links.cauchy
- 7.2.3.4.9. statsmodels.genmod.families.links.cloglog
- 7.2.3.4.10. statsmodels.genmod.families.links.identity
- 7.2.3.4.11. statsmodels.genmod.families.links.inverse_power
- 7.2.3.4.12. statsmodels.genmod.families.links.inverse_squared
- 7.2.3.4.13. statsmodels.genmod.families.links.log
- 7.2.3.4.14. statsmodels.genmod.families.links.logit
- 7.2.3.4.15. statsmodels.genmod.families.links.nbinom
- 7.2.3.4.16. statsmodels.genmod.families.links.probit
- 7.3.2.1.1. statsmodels.genmod.generalized_estimating_equations.GEE
- 7.3.2.2.2. statsmodels.genmod.generalized_estimating_equations.GEEMargins
- 7.3.2.2.1. statsmodels.genmod.generalized_estimating_equations.GEEResults
- 7.2.3.1.1. statsmodels.genmod.generalized_linear_model.GLM
- 7.2.3.2.1. statsmodels.genmod.generalized_linear_model.GLMResults
- 7.16.2.2. statsmodels.graphics.boxplots.beanplot
- 7.16.2.1. statsmodels.graphics.boxplots.violinplot
- 7.16.3.1. statsmodels.graphics.correlation.plot_corr
- 7.16.3.2. statsmodels.graphics.correlation.plot_corr_grid
- 7.16.7.1. statsmodels.graphics.factorplots.interaction_plot
- 7.16.4.3. statsmodels.graphics.functional.banddepth
- 7.16.4.1. statsmodels.graphics.functional.fboxplot
- 7.16.4.2. statsmodels.graphics.functional.rainbowplot
- 7.16.1.4. statsmodels.graphics.gofplots.ProbPlot
- 7.16.1.2. statsmodels.graphics.gofplots.qqline
- 7.16.1.1. statsmodels.graphics.gofplots.qqplot
- 7.16.1.3. statsmodels.graphics.gofplots.qqplot_2samples
- 7.16.7.2. statsmodels.graphics.mosaicplot.mosaic
- 7.16.3.3. statsmodels.graphics.plot_grids.scatter_ellipse
- 7.16.5.5. statsmodels.graphics.regressionplots.abline_plot
- 7.16.5.6. statsmodels.graphics.regressionplots.influence_plot
- 7.16.5.4. statsmodels.graphics.regressionplots.plot_ccpr
- 7.16.5.1. statsmodels.graphics.regressionplots.plot_fit
- 7.16.5.7. statsmodels.graphics.regressionplots.plot_leverage_resid2
- 7.16.5.3. statsmodels.graphics.regressionplots.plot_partregress
- 7.16.5.2. statsmodels.graphics.regressionplots.plot_regress_exog
- 7.16.6.3. statsmodels.graphics.tsaplots.month_plot
- 7.16.6.1. statsmodels.graphics.tsaplots.plot_acf
- 7.16.6.2. statsmodels.graphics.tsaplots.plot_pacf
- 7.16.6.4. statsmodels.graphics.tsaplots.quarter_plot
- 7.17.2.1. statsmodels.iolib.foreign.StataReader
- 7.17.2.2. statsmodels.iolib.foreign.StataWriter
- 7.17.2.3. statsmodels.iolib.foreign.genfromdta
- 7.17.2.4. statsmodels.iolib.foreign.savetxt
- 7.17.2.8. statsmodels.iolib.smpickle.load_pickle
- 7.17.2.7. statsmodels.iolib.smpickle.save_pickle
- 7.17.2.9. statsmodels.iolib.summary.Summary
- 7.17.2.10. statsmodels.iolib.summary2.Summary
- 7.17.2.5. statsmodels.iolib.table.SimpleTable
- 7.17.2.6. statsmodels.iolib.table.csv2st
- 7.14.1.1. statsmodels.miscmodels.count.PoissonGMLE
- 7.14.1.2. statsmodels.miscmodels.count.PoissonOffsetGMLE
- 7.14.1.3. statsmodels.miscmodels.count.PoissonZiGMLE
- 7.14.2.1. statsmodels.miscmodels.tmodel.TLinearModel
- 7.11.4.8. statsmodels.nonparametric.bandwidths.bw_scott
- 7.11.4.9. statsmodels.nonparametric.bandwidths.bw_silverman
- 7.11.4.10. statsmodels.nonparametric.bandwidths.select_bandwidth
- 7.11.4.2. statsmodels.nonparametric.kde.KDEUnivariate
- 7.11.4.5. statsmodels.nonparametric.kernel_density.EstimatorSettings
- 7.11.4.3. statsmodels.nonparametric.kernel_density.KDEMultivariate
- 7.11.4.4. statsmodels.nonparametric.kernel_density.KDEMultivariateConditional
- 7.11.4.7. statsmodels.nonparametric.kernel_regression.KernelCensoredReg
- 7.11.4.6. statsmodels.nonparametric.kernel_regression.KernelReg
- 7.11.4.1. statsmodels.nonparametric.smoothers_lowess.lowess
- 7.1.3.1.2. statsmodels.regression.linear_model.GLS
- 7.1.3.1.4. statsmodels.regression.linear_model.GLSAR
- 7.1.3.1.1. statsmodels.regression.linear_model.OLS
- 7.1.3.2.2. statsmodels.regression.linear_model.OLSResults
- 7.1.3.2.1. statsmodels.regression.linear_model.RegressionResults
- 7.1.3.1.3. statsmodels.regression.linear_model.WLS
- 7.1.3.1.5. statsmodels.regression.linear_model.yule_walker
- 7.5.3.1. statsmodels.regression.mixed_linear_model.MixedLM
- 7.5.3.2. statsmodels.regression.mixed_linear_model.MixedLMResults
- 7.1.3.1.6. statsmodels.regression.quantile_regression.QuantReg
- 7.1.3.2.3. statsmodels.regression.quantile_regression.QuantRegResults
- 7.4.3.3.1. statsmodels.robust.norms.AndrewWave
- 7.4.3.3.2. statsmodels.robust.norms.Hampel
- 7.4.3.3.3. statsmodels.robust.norms.HuberT
- 7.4.3.3.4. statsmodels.robust.norms.LeastSquares
- 7.4.3.3.5. statsmodels.robust.norms.RamsayE
- 7.4.3.3.6. statsmodels.robust.norms.RobustNorm
- 7.4.3.3.7. statsmodels.robust.norms.TrimmedMean
- 7.4.3.3.8. statsmodels.robust.norms.TukeyBiweight
- 7.4.3.3.9. statsmodels.robust.norms.estimate_location
- 7.4.3.1.1. statsmodels.robust.robust_linear_model.RLM
- 7.4.3.2.1. statsmodels.robust.robust_linear_model.RLMResults
- 7.4.3.4.1. statsmodels.robust.scale.Huber
- 7.4.3.4.2. statsmodels.robust.scale.HuberScale
- 7.4.3.4.4. statsmodels.robust.scale.huber
- 7.4.3.4.5. statsmodels.robust.scale.hubers_scale
- 7.4.3.4.3. statsmodels.robust.scale.mad
- 7.4.3.4.6. statsmodels.robust.scale.stand_mad
- 7.19.4.1. statsmodels.sandbox.descstats.descstats
- 7.19.4.2. statsmodels.sandbox.descstats.sign_test
- 7.15.2.3. statsmodels.sandbox.distributions.extras.ACSkewT_gen
- 7.15.2.8. statsmodels.sandbox.distributions.extras.NormExpan_gen
- 7.15.2.2. statsmodels.sandbox.distributions.extras.SkewNorm2_gen
- 7.15.2.1. statsmodels.sandbox.distributions.extras.SkewNorm_gen
- 7.15.2.10. statsmodels.sandbox.distributions.extras.mvnormcdf
- 7.15.2.9. statsmodels.sandbox.distributions.extras.mvstdnormcdf
- 7.15.2.7. statsmodels.sandbox.distributions.extras.pdf_moments
- 7.15.2.5. statsmodels.sandbox.distributions.extras.pdf_moments_st
- 7.15.2.6. statsmodels.sandbox.distributions.extras.pdf_mvsk
- 7.15.2.4. statsmodels.sandbox.distributions.extras.skewnorm2
- 7.15.3.3. statsmodels.sandbox.distributions.transformed.ExpTransf_gen
- 7.15.3.4. statsmodels.sandbox.distributions.transformed.LogTransf_gen
- 7.15.3.5. statsmodels.sandbox.distributions.transformed.SquareFunc
- 7.15.3.1. statsmodels.sandbox.distributions.transformed.TransfTwo_gen
- 7.15.3.2. statsmodels.sandbox.distributions.transformed.Transf_gen
- 7.15.3.6. statsmodels.sandbox.distributions.transformed.absnormalg
- 7.15.3.7. statsmodels.sandbox.distributions.transformed.invdnormalg
- 7.15.3.8. statsmodels.sandbox.distributions.transformed.loggammaexpg
- 7.15.3.9. statsmodels.sandbox.distributions.transformed.lognormalg
- 7.15.3.10. statsmodels.sandbox.distributions.transformed.negsquarenormalg
- 7.15.3.11. statsmodels.sandbox.distributions.transformed.squarenormalg
- 7.15.3.12. statsmodels.sandbox.distributions.transformed.squaretg
- 7.19.4.3. statsmodels.sandbox.regression.anova_nistcertified.anova_ols
- 7.19.4.4. statsmodels.sandbox.regression.anova_nistcertified.anova_oneway
- 7.12.1.1. statsmodels.sandbox.regression.gmm.GMM
- 7.12.1.2. statsmodels.sandbox.regression.gmm.GMMResults
- 7.12.1.3. statsmodels.sandbox.regression.gmm.IV2SLS
- 7.12.1.4. statsmodels.sandbox.regression.gmm.IVGMM
- 7.12.1.5. statsmodels.sandbox.regression.gmm.IVGMMResults
- 7.12.1.6. statsmodels.sandbox.regression.gmm.IVRegressionResults
- 7.12.1.7. statsmodels.sandbox.regression.gmm.LinearIVGMM
- 7.12.1.8. statsmodels.sandbox.regression.gmm.NonlinearIVGMM
- 7.19.4.5. statsmodels.sandbox.regression.try_catdata.cat2dummy
- 7.19.4.6. statsmodels.sandbox.regression.try_catdata.convertlabels
- 7.19.4.7. statsmodels.sandbox.regression.try_catdata.groupsstats_1d
- 7.19.4.8. statsmodels.sandbox.regression.try_catdata.groupsstats_dummy
- 7.19.4.9. statsmodels.sandbox.regression.try_catdata.groupstatsbin
- 7.19.4.10. statsmodels.sandbox.regression.try_catdata.labelmeanfilter
- 7.19.4.11. statsmodels.sandbox.regression.try_catdata.labelmeanfilter_nd
- 7.19.4.12. statsmodels.sandbox.regression.try_catdata.labelmeanfilter_str
- 7.19.4.13. statsmodels.sandbox.regression.try_ols_anova.data2dummy
- 7.19.4.14. statsmodels.sandbox.regression.try_ols_anova.data2groupcont
- 7.19.4.15. statsmodels.sandbox.regression.try_ols_anova.data2proddummy
- 7.19.4.16. statsmodels.sandbox.regression.try_ols_anova.dropname
- 7.19.4.17. statsmodels.sandbox.regression.try_ols_anova.form2design
- 7.10.6.3. statsmodels.sandbox.stats.multicomp.GroupsStats
- 7.10.6.4. statsmodels.sandbox.stats.multicomp.MultiComparison
- 7.10.6.11. statsmodels.sandbox.stats.multicomp.StepDown
- 7.10.6.5. statsmodels.sandbox.stats.multicomp.TukeyHSDResults
- 7.10.6.12. statsmodels.sandbox.stats.multicomp.catstack
- 7.10.6.13. statsmodels.sandbox.stats.multicomp.ccols
- 7.10.6.14. statsmodels.sandbox.stats.multicomp.compare_ordered
- 7.10.6.15. statsmodels.sandbox.stats.multicomp.distance_st_range
- 7.10.6.16. statsmodels.sandbox.stats.multicomp.ecdf
- 7.10.6.2. statsmodels.sandbox.stats.multicomp.fdrcorrection0
- 7.10.6.17. statsmodels.sandbox.stats.multicomp.get_tukeyQcrit
- 7.10.6.18. statsmodels.sandbox.stats.multicomp.homogeneous_subsets
- 7.10.6.19. statsmodels.sandbox.stats.multicomp.line
- 7.10.6.20. statsmodels.sandbox.stats.multicomp.maxzero
- 7.10.6.21. statsmodels.sandbox.stats.multicomp.maxzerodown
- 7.10.6.22. statsmodels.sandbox.stats.multicomp.mcfdr
- 7.10.6.1. statsmodels.sandbox.stats.multicomp.multipletests
- 7.10.6.23. statsmodels.sandbox.stats.multicomp.qcrit
- 7.10.6.24. statsmodels.sandbox.stats.multicomp.randmvn
- 7.10.6.25. statsmodels.sandbox.stats.multicomp.rankdata
- 7.10.6.26. statsmodels.sandbox.stats.multicomp.rejectionline
- 7.10.6.27. statsmodels.sandbox.stats.multicomp.set_partition
- 7.10.6.28. statsmodels.sandbox.stats.multicomp.set_remove_subs
- 7.10.6.29. statsmodels.sandbox.stats.multicomp.tiecorrect
- 7.10.6.7. statsmodels.sandbox.stats.multicomp.varcorrection_pairs_unbalanced
- 7.10.6.8. statsmodels.sandbox.stats.multicomp.varcorrection_pairs_unequal
- 7.10.6.9. statsmodels.sandbox.stats.multicomp.varcorrection_unbalanced
- 7.10.6.10. statsmodels.sandbox.stats.multicomp.varcorrection_unequal
- 7.10.4.7. statsmodels.sandbox.stats.runs.Runs
- 7.10.4.6. statsmodels.sandbox.stats.runs.cochrans_q
- 7.10.4.1. statsmodels.sandbox.stats.runs.mcnemar
- 7.10.4.3. statsmodels.sandbox.stats.runs.median_test_ksample
- 7.10.4.4. statsmodels.sandbox.stats.runs.runstest_1samp
- 7.10.4.5. statsmodels.sandbox.stats.runs.runstest_2samp
- 7.10.4.2. statsmodels.sandbox.stats.runs.symmetry_bowker
- 7.19.4.18. statsmodels.sandbox.sysreg.SUR
- 7.19.4.19. statsmodels.sandbox.sysreg.Sem2SLS
- 7.19.4.20. statsmodels.sandbox.tools.tools_pca.pca
- 7.19.4.21. statsmodels.sandbox.tools.tools_pca.pcasvd
- 7.8.4.17. statsmodels.sandbox.tsa.fftarma.ArmaFft
- 7.19.4.22. statsmodels.sandbox.tsa.movstat.movmean
- 7.19.4.23. statsmodels.sandbox.tsa.movstat.movmoment
- 7.19.4.24. statsmodels.sandbox.tsa.movstat.movorder
- 7.19.4.25. statsmodels.sandbox.tsa.movstat.movvar
- 7.7.2.1. statsmodels.stats.anova.anova_lm
- 7.19.4.26. statsmodels.stats.correlation_tools.corr_clipped
- 7.19.4.27. statsmodels.stats.correlation_tools.corr_nearest
- 7.19.4.28. statsmodels.stats.correlation_tools.cov_nearest
- 7.10.4.8. statsmodels.stats.descriptivestats.sign_test
- 7.10.1.17. statsmodels.stats.diagnostic.CompareCox
- 7.10.1.19. statsmodels.stats.diagnostic.CompareJ
- 7.10.1.6. statsmodels.stats.diagnostic.HetGoldfeldQuandt
- 7.10.1.5. statsmodels.stats.diagnostic.acorr_breush_godfrey
- 7.10.1.4. statsmodels.stats.diagnostic.acorr_ljungbox
- 7.10.1.14. statsmodels.stats.diagnostic.breaks_cusumolsresid
- 7.10.1.15. statsmodels.stats.diagnostic.breaks_hansen
- 7.10.1.18. statsmodels.stats.diagnostic.compare_cox
- 7.10.1.20. statsmodels.stats.diagnostic.compare_j
- 7.10.1.10. statsmodels.stats.diagnostic.het_arch
- 7.10.1.8. statsmodels.stats.diagnostic.het_breushpagan
- 7.10.1.7. statsmodels.stats.diagnostic.het_goldfeldquandt
- 7.10.1.9. statsmodels.stats.diagnostic.het_white
- 7.10.1.23. statsmodels.stats.diagnostic.kstest_normal
- 7.10.1.24. statsmodels.stats.diagnostic.lillifors
- 7.10.1.11. statsmodels.stats.diagnostic.linear_harvey_collier
- 7.10.1.13. statsmodels.stats.diagnostic.linear_lm
- 7.10.1.12. statsmodels.stats.diagnostic.linear_rainbow
- 7.10.1.22. statsmodels.stats.diagnostic.normal_ad
- 7.10.1.16. statsmodels.stats.diagnostic.recursive_olsresiduals
- 7.10.1.21. statsmodels.stats.diagnostic.unitroot_adf
- 7.10.3.4. statsmodels.stats.gof.chisquare_effectsize
- 7.10.3.3. statsmodels.stats.gof.gof_binning_discrete
- 7.10.3.2. statsmodels.stats.gof.gof_chisquare_discrete
- 7.10.3.1. statsmodels.stats.gof.powerdiscrepancy
- 7.10.5.4. statsmodels.stats.inter_rater.aggregate_raters
- 7.10.5.1. statsmodels.stats.inter_rater.cohens_kappa
- 7.10.5.2. statsmodels.stats.inter_rater.fleiss_kappa
- 7.10.5.3. statsmodels.stats.inter_rater.to_table
- 7.10.10.10. statsmodels.stats.moment_helpers.corr2cov
- 7.10.10.9. statsmodels.stats.moment_helpers.cov2corr
- 7.10.10.1. statsmodels.stats.moment_helpers.cum2mc
- 7.10.10.2. statsmodels.stats.moment_helpers.mc2mnc
- 7.10.10.3. statsmodels.stats.moment_helpers.mc2mvsk
- 7.10.10.4. statsmodels.stats.moment_helpers.mnc2cum
- 7.10.10.5. statsmodels.stats.moment_helpers.mnc2mc
- 7.10.10.6. statsmodels.stats.moment_helpers.mnc2mvsk
- 7.10.10.7. statsmodels.stats.moment_helpers.mvsk2mc
- 7.10.10.8. statsmodels.stats.moment_helpers.mvsk2mnc
- 7.10.10.11. statsmodels.stats.moment_helpers.se_cov
- 7.10.6.6. statsmodels.stats.multicomp.pairwise_tukeyhsd
- 7.10.1.25.1. statsmodels.stats.outliers_influence.OLSInfluence
- 7.10.1.25.2. statsmodels.stats.outliers_influence.variance_inflation_factor
- 7.10.8.5. statsmodels.stats.power.FTestAnovaPower
- 7.10.8.6. statsmodels.stats.power.FTestPower
- 7.10.8.3. statsmodels.stats.power.GofChisquarePower
- 7.10.8.4. statsmodels.stats.power.NormalIndPower
- 7.10.8.1. statsmodels.stats.power.TTestIndPower
- 7.10.8.2. statsmodels.stats.power.TTestPower
- 7.10.8.8. statsmodels.stats.power.tt_ind_solve_power
- 7.10.8.7. statsmodels.stats.power.tt_solve_power
- 7.10.8.9. statsmodels.stats.power.zt_ind_solve_power
- 7.10.9.3. statsmodels.stats.proportion.binom_test
- 7.10.9.4. statsmodels.stats.proportion.binom_test_reject_interval
- 7.10.9.5. statsmodels.stats.proportion.binom_tost
- 7.10.9.6. statsmodels.stats.proportion.binom_tost_reject_interval
- 7.10.9.12. statsmodels.stats.proportion.power_binom_tost
- 7.10.9.13. statsmodels.stats.proportion.power_ztost_prop
- 7.10.9.1. statsmodels.stats.proportion.proportion_confint
- 7.10.9.2. statsmodels.stats.proportion.proportion_effectsize
- 7.10.9.9. statsmodels.stats.proportion.proportions_chisquare
- 7.10.9.10. statsmodels.stats.proportion.proportions_chisquare_allpairs
- 7.10.9.11. statsmodels.stats.proportion.proportions_chisquare_pairscontrol
- 7.10.9.7. statsmodels.stats.proportion.proportions_ztest
- 7.10.9.8. statsmodels.stats.proportion.proportions_ztost
- 7.10.9.14. statsmodels.stats.proportion.samplesize_confint_proportion
- 7.10.2.4. statsmodels.stats.sandwich_covariance.cov_cluster
- 7.10.2.5. statsmodels.stats.sandwich_covariance.cov_cluster_2groups
- 7.10.2.1. statsmodels.stats.sandwich_covariance.cov_hac
- 7.10.2.7. statsmodels.stats.sandwich_covariance.cov_hc0
- 7.10.2.8. statsmodels.stats.sandwich_covariance.cov_hc1
- 7.10.2.9. statsmodels.stats.sandwich_covariance.cov_hc2
- 7.10.2.10. statsmodels.stats.sandwich_covariance.cov_hc3
- 7.10.2.3. statsmodels.stats.sandwich_covariance.cov_nw_groupsum
- 7.10.2.2. statsmodels.stats.sandwich_covariance.cov_nw_panel
- 7.10.2.6. statsmodels.stats.sandwich_covariance.cov_white_simple
- 7.10.2.11. statsmodels.stats.sandwich_covariance.se_cov
- 7.10.1.1. statsmodels.stats.stattools.durbin_watson
- 7.10.1.2. statsmodels.stats.stattools.jarque_bera
- 7.10.1.3. statsmodels.stats.stattools.omni_normtest
- 7.10.7.2. statsmodels.stats.weightstats.CompareMeans
- 7.10.7.1. statsmodels.stats.weightstats.DescrStatsW
- 7.10.7.9. statsmodels.stats.weightstats._tconfint_generic
- 7.10.7.10. statsmodels.stats.weightstats._tstat_generic
- 7.10.7.11. statsmodels.stats.weightstats._zconfint_generic
- 7.10.7.12. statsmodels.stats.weightstats._zstat_generic
- 7.10.7.13. statsmodels.stats.weightstats._zstat_generic2
- 7.10.7.3. statsmodels.stats.weightstats.ttest_ind
- 7.10.7.4. statsmodels.stats.weightstats.ttost_ind
- 7.10.7.5. statsmodels.stats.weightstats.ttost_paired
- 7.10.7.8. statsmodels.stats.weightstats.zconfint
- 7.10.7.6. statsmodels.stats.weightstats.ztest
- 7.10.7.7. statsmodels.stats.weightstats.ztost
- 7.18.1.3.1. statsmodels.tools.eval_measures.aic
- 7.18.1.3.2. statsmodels.tools.eval_measures.aic_sigma
- 7.18.1.3.3. statsmodels.tools.eval_measures.aicc
- 7.18.1.3.4. statsmodels.tools.eval_measures.aicc_sigma
- 7.18.1.3.9. statsmodels.tools.eval_measures.bias
- 7.18.1.3.5. statsmodels.tools.eval_measures.bic
- 7.18.1.3.6. statsmodels.tools.eval_measures.bic_sigma
- 7.18.1.3.7. statsmodels.tools.eval_measures.hqic
- 7.18.1.3.8. statsmodels.tools.eval_measures.hqic_sigma
- 7.18.1.3.10. statsmodels.tools.eval_measures.iqr
- 7.18.1.3.11. statsmodels.tools.eval_measures.maxabs
- 7.18.1.3.12. statsmodels.tools.eval_measures.meanabs
- 7.18.1.3.13. statsmodels.tools.eval_measures.medianabs
- 7.18.1.3.14. statsmodels.tools.eval_measures.medianbias
- 7.18.1.3.15. statsmodels.tools.eval_measures.mse
- 7.18.1.3.16. statsmodels.tools.eval_measures.rmse
- 7.18.1.3.17. statsmodels.tools.eval_measures.stde
- 7.18.1.3.18. statsmodels.tools.eval_measures.vare
- 7.18.1.2.1. statsmodels.tools.numdiff.approx_fprime
- 7.18.1.2.2. statsmodels.tools.numdiff.approx_fprime_cs
- 7.18.1.2.3. statsmodels.tools.numdiff.approx_hess1
- 7.18.1.2.4. statsmodels.tools.numdiff.approx_hess2
- 7.18.1.2.5. statsmodels.tools.numdiff.approx_hess3
- 7.18.1.2.6. statsmodels.tools.numdiff.approx_hess_cs
- 7.18.1.1.3. statsmodels.tools.tools.ECDF
- 7.18.1.1.1. statsmodels.tools.tools.add_constant
- 7.18.1.1.2. statsmodels.tools.tools.categorical
- 7.18.1.1.4. statsmodels.tools.tools.clean0
- 7.18.1.1.5. statsmodels.tools.tools.fullrank
- 7.18.1.1.6. statsmodels.tools.tools.isestimable
- 7.18.1.1.7. statsmodels.tools.tools.monotone_fn_inverter
- 7.18.1.1.8. statsmodels.tools.tools.rank
- 7.18.1.1.9. statsmodels.tools.tools.recipr
- 7.18.1.1.10. statsmodels.tools.tools.recipr0
- 7.18.1.1.11. statsmodels.tools.tools.unsqueeze
- 7.8.2.1.1. statsmodels.tsa.ar_model.AR
- 7.8.2.1.2. statsmodels.tsa.ar_model.ARResults
- 7.8.2.2.3. statsmodels.tsa.arima_model.ARIMA
- 7.8.2.2.4. statsmodels.tsa.arima_model.ARIMAResults
- 7.8.2.2.1. statsmodels.tsa.arima_model.ARMA
- 7.8.2.2.2. statsmodels.tsa.arima_model.ARMAResults
- 7.8.4.1. statsmodels.tsa.arima_process.ArmaProcess
- 7.8.4.2. statsmodels.tsa.arima_process.ar2arma
- 7.8.4.3. statsmodels.tsa.arima_process.arma2ar
- 7.8.4.4. statsmodels.tsa.arima_process.arma2ma
- 7.8.4.5. statsmodels.tsa.arima_process.arma_acf
- 7.8.4.6. statsmodels.tsa.arima_process.arma_acovf
- 7.8.4.7. statsmodels.tsa.arima_process.arma_generate_sample
- 7.8.4.8. statsmodels.tsa.arima_process.arma_impulse_response
- 7.8.4.9. statsmodels.tsa.arima_process.arma_pacf
- 7.8.4.10. statsmodels.tsa.arima_process.arma_periodogram
- 7.8.4.11. statsmodels.tsa.arima_process.deconvolve
- 7.8.4.12. statsmodels.tsa.arima_process.index2lpol
- 7.8.4.13. statsmodels.tsa.arima_process.lpol2index
- 7.8.4.14. statsmodels.tsa.arima_process.lpol_fiar
- 7.8.4.15. statsmodels.tsa.arima_process.lpol_fima
- 7.8.4.16. statsmodels.tsa.arima_process.lpol_sdiff
- 7.8.5.1. statsmodels.tsa.filters.bk_filter.bkfilter
- 7.8.5.3. statsmodels.tsa.filters.cf_filter.cffilter
- 7.8.5.4. statsmodels.tsa.filters.filtertools.convolution_filter
- 7.8.5.7. statsmodels.tsa.filters.filtertools.fftconvolve3
- 7.8.5.8. statsmodels.tsa.filters.filtertools.fftconvolveinv
- 7.8.5.6. statsmodels.tsa.filters.filtertools.miso_lfilter
- 7.8.5.5. statsmodels.tsa.filters.filtertools.recursive_filter
- 7.8.5.2. statsmodels.tsa.filters.hp_filter.hpfilter
- 7.8.8.1. statsmodels.tsa.interp.denton.dentonm
- 7.8.2.2.5. statsmodels.tsa.kalmanf.kalmanfilter.KalmanFilter
- 7.8.1.2. statsmodels.tsa.stattools.acf
- 7.8.1.1. statsmodels.tsa.stattools.acovf
- 7.8.1.9. statsmodels.tsa.stattools.adfuller
- 7.8.1.13. statsmodels.tsa.stattools.arma_order_select_ic
- 7.8.1.7. statsmodels.tsa.stattools.ccf
- 7.8.1.6. statsmodels.tsa.stattools.ccovf
- 7.8.1.11. statsmodels.tsa.stattools.grangercausalitytests
- 7.8.1.12. statsmodels.tsa.stattools.levinson_durbin
- 7.8.1.3. statsmodels.tsa.stattools.pacf
- 7.8.1.5. statsmodels.tsa.stattools.pacf_ols
- 7.8.1.4. statsmodels.tsa.stattools.pacf_yw
- 7.8.1.8. statsmodels.tsa.stattools.periodogram
- 7.8.1.10. statsmodels.tsa.stattools.q_stat
- 7.8.6.1. statsmodels.tsa.tsatools.add_constant
- 7.8.6.2. statsmodels.tsa.tsatools.add_trend
- 7.8.6.3. statsmodels.tsa.tsatools.detrend
- 7.8.6.4. statsmodels.tsa.tsatools.lagmat
- 7.8.6.5. statsmodels.tsa.tsatools.lagmat2ds
- 7.8.7.1. statsmodels.tsa.varma_process.VarmaPoly
- 7.8.2.3.3. statsmodels.tsa.vector_ar.dynamic.DynamicVAR
- 7.8.3.2. statsmodels.tsa.vector_ar.irf.IRAnalysis
- 7.8.3.3. statsmodels.tsa.vector_ar.var_model.FEVD
- 7.8.2.3.1. statsmodels.tsa.vector_ar.var_model.VAR
- 7.8.3.1. statsmodels.tsa.vector_ar.var_model.VARProcess
- 7.8.2.3.2. statsmodels.tsa.vector_ar.var_model.VARResults
- 7.8.1.15. statsmodels.tsa.x13.x13_arima_analysis
- 7.8.1.14. statsmodels.tsa.x13.x13_arima_select_order
7.19.5. Usage¶
Load a dataset:
In [5]: import statsmodels.api as sm
In [6]: data = sm.datasets.longley.load()
The Dataset object follows the bunch pattern explained in proposal. The full dataset is available in the data
attribute.
In [7]: data.data
Out[7]:
rec.array([(60323.0, 83.0, 234289.0, 2356.0, 1590.0, 107608.0, 1947.0),
(61122.0, 88.5, 259426.0, 2325.0, 1456.0, 108632.0, 1948.0),
(60171.0, 88.2, 258054.0, 3682.0, 1616.0, 109773.0, 1949.0),
(61187.0, 89.5, 284599.0, 3351.0, 1650.0, 110929.0, 1950.0),
(63221.0, 96.2, 328975.0, 2099.0, 3099.0, 112075.0, 1951.0),
(63639.0, 98.1, 346999.0, 1932.0, 3594.0, 113270.0, 1952.0),
(64989.0, 99.0, 365385.0, 1870.0, 3547.0, 115094.0, 1953.0),
(63761.0, 100.0, 363112.0, 3578.0, 3350.0, 116219.0, 1954.0),
(66019.0, 101.2, 397469.0, 2904.0, 3048.0, 117388.0, 1955.0),
(67857.0, 104.6, 419180.0, 2822.0, 2857.0, 118734.0, 1956.0),
(68169.0, 108.4, 442769.0, 2936.0, 2798.0, 120445.0, 1957.0),
(66513.0, 110.8, 444546.0, 4681.0, 2637.0, 121950.0, 1958.0),
(68655.0, 112.6, 482704.0, 3813.0, 2552.0, 123366.0, 1959.0),
(69564.0, 114.2, 502601.0, 3931.0, 2514.0, 125368.0, 1960.0),
(69331.0, 115.7, 518173.0, 4806.0, 2572.0, 127852.0, 1961.0),
(70551.0, 116.9, 554894.0, 4007.0, 2827.0, 130081.0, 1962.0)],
dtype=[('TOTEMP', '<f8'), ('GNPDEFL', '<f8'), ('GNP', '<f8'), ('UNEMP', '<f8'), ('ARMED', '<f8'), ('POP', '<f8'), ('YEAR', '<f8')])
Most datasets hold convenient representations of the data in the attributes endog and exog:
In [8]: data.endog[:5]
Out[8]: array([ 60323., 61122., 60171., 61187., 63221.])
In [9]: data.exog[:5,:]
Out[9]:
array([[ 83. , 234289. , 2356. , 1590. , 107608. , 1947. ],
[ 88.5, 259426. , 2325. , 1456. , 108632. , 1948. ],
[ 88.2, 258054. , 3682. , 1616. , 109773. , 1949. ],
[ 89.5, 284599. , 3351. , 1650. , 110929. , 1950. ],
[ 96.2, 328975. , 2099. , 3099. , 112075. , 1951. ]])
Univariate datasets, however, do not have an exog attribute.
Variable names can be obtained by typing:
In [10]: data.endog_name
Out[10]: 'TOTEMP'
In [11]: data.exog_name
Out[11]: ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']
If the dataset does not have a clear interpretation of what should be an endog and exog, then you can always access the data or raw_data attributes. This is the case for the macrodata dataset, which is a collection of US macroeconomic data rather than a dataset with a specific example in mind. The data attribute contains a record array of the full dataset and the raw_data attribute contains an ndarray with the names of the columns given by the names attribute.
In [12]: type(data.data)
Out[12]: numpy.recarray
In [13]: type(data.raw_data)
Out[13]: numpy.recarray
In [14]: data.names
Out[14]: ['TOTEMP', 'GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']
7.19.5.1. Loading data as pandas objects¶
For many users it may be preferable to get the datasets as a pandas DataFrame or
Series object. Each of the dataset modules is equipped with a load_pandas
method which returns a Dataset
instance with the data readily available as pandas objects:
In [15]: data = sm.datasets.longley.load_pandas()
In [16]: data.exog
Out[16]:
GNPDEFL GNP UNEMP ARMED POP YEAR
0 83.0 234289.0 2356.0 1590.0 107608.0 1947.0
1 88.5 259426.0 2325.0 1456.0 108632.0 1948.0
2 88.2 258054.0 3682.0 1616.0 109773.0 1949.0
3 89.5 284599.0 3351.0 1650.0 110929.0 1950.0
4 96.2 328975.0 2099.0 3099.0 112075.0 1951.0
5 98.1 346999.0 1932.0 3594.0 113270.0 1952.0
6 99.0 365385.0 1870.0 3547.0 115094.0 1953.0
7 100.0 363112.0 3578.0 3350.0 116219.0 1954.0
8 101.2 397469.0 2904.0 3048.0 117388.0 1955.0
9 104.6 419180.0 2822.0 2857.0 118734.0 1956.0
10 108.4 442769.0 2936.0 2798.0 120445.0 1957.0
11 110.8 444546.0 4681.0 2637.0 121950.0 1958.0
12 112.6 482704.0 3813.0 2552.0 123366.0 1959.0
13 114.2 502601.0 3931.0 2514.0 125368.0 1960.0
14 115.7 518173.0 4806.0 2572.0 127852.0 1961.0
15 116.9 554894.0 4007.0 2827.0 130081.0 1962.0
In [17]: data.endog
Out[17]:
0 60323.0
1 61122.0
2 60171.0
3 61187.0
4 63221.0
5 63639.0
6 64989.0
7 63761.0
8 66019.0
9 67857.0
10 68169.0
11 66513.0
12 68655.0
13 69564.0
14 69331.0
15 70551.0
Name: TOTEMP, dtype: float64
The full DataFrame is available in the data
attribute of the Dataset object
In [18]: data.data
Out[18]:
TOTEMP GNPDEFL GNP UNEMP ARMED POP YEAR
0 60323.0 83.0 234289.0 2356.0 1590.0 107608.0 1947.0
1 61122.0 88.5 259426.0 2325.0 1456.0 108632.0 1948.0
2 60171.0 88.2 258054.0 3682.0 1616.0 109773.0 1949.0
3 61187.0 89.5 284599.0 3351.0 1650.0 110929.0 1950.0
4 63221.0 96.2 328975.0 2099.0 3099.0 112075.0 1951.0
5 63639.0 98.1 346999.0 1932.0 3594.0 113270.0 1952.0
6 64989.0 99.0 365385.0 1870.0 3547.0 115094.0 1953.0
7 63761.0 100.0 363112.0 3578.0 3350.0 116219.0 1954.0
8 66019.0 101.2 397469.0 2904.0 3048.0 117388.0 1955.0
9 67857.0 104.6 419180.0 2822.0 2857.0 118734.0 1956.0
10 68169.0 108.4 442769.0 2936.0 2798.0 120445.0 1957.0
11 66513.0 110.8 444546.0 4681.0 2637.0 121950.0 1958.0
12 68655.0 112.6 482704.0 3813.0 2552.0 123366.0 1959.0
13 69564.0 114.2 502601.0 3931.0 2514.0 125368.0 1960.0
14 69331.0 115.7 518173.0 4806.0 2572.0 127852.0 1961.0
15 70551.0 116.9 554894.0 4007.0 2827.0 130081.0 1962.0
With pandas integration in the estimation classes, the metadata will be attached to model results:
7.19.5.2. Extra Information¶
If you want to know more about the dataset itself, you can access the following, again using the Longley dataset as an example
>>> dir(sm.datasets.longley)[:6]
['COPYRIGHT', 'DESCRLONG', 'DESCRSHORT', 'NOTE', 'SOURCE', 'TITLE']
7.19.6. Additional information¶
- The idea for a datasets package was originally proposed by David Cournapeau and can be found here with updates by Skipper Seabold.
- To add datasets, see the notes on adding a dataset.