6.7.11. statsmodels.sandbox.regression.treewalkerclass¶
6.7.11.1. Formulas¶
This follows mostly Greene notation (in slides) partially ignoring factors tau or mu for now, ADDED (if all tau==1, then runmnl==clogit)
leaf k probability :
Prob(k|j) = exp(b_k * X_k / mu_j)/ sum_{i in L(j)} (exp(b_i * X_i / mu_j)
branch j probabilities :
Prob(j) = exp(b_j * X_j + mu*IV_j )/ sum_{i in NB(j)} (exp(b_i * X_i + mu_i*IV_i)
inclusive value of branch j :
IV_j = log( sum_{i in L(j)} (exp(b_i * X_i / mu_j) )
this is the log of the denominator of the leaf probabilities
L(j) : leaves at branch j, where k is child of j NB(j) : set of j and it’s siblings
6.7.11.2. Design¶
splitting calculation transmission between returns and changes to instance.probs - probability for each leaf is in instance.probs - inclusive values and contribution of exog on branch level need to be
added separately. handed up the tree through returns
question: should params array be accessed directly through self.recursionparams[self.parinddict[name]] or should the dictionary return the values of the params, e.g. self.params_node_dict[name]. The second would be easier for fixing tau=1 for degenerate branches. The easiest might be to do the latter only for the taus and default to 1 if the key (‘tau_‘+branchname) is not found. I also need to exclude tau for degenerate branches from params, but then I cannot change them from the outside for testing and experimentation. (?)
SAS manual describes restrictions on tau (though their model is a bit different), e.g. equal tau across sibling branches, fixed tau. The also allow linear and non-linear (? not sure) restriction on params, the regression coefficients. Related to previous issue, callback without access to the underlying array, where params_node_dict returns the actual params value would provide more flexibility to impose different kinds of restrictions.
6.7.11.3. bugs/problems¶
- singleton branches return zero to top, not a value I’m not sure what they are supposed to return, given the split between returns and instance.probs DONE
- Why does ‘Air’ (singleton branch) get probability exactly 0.5 ? DONE
6.7.11.4. TODO¶
- add tau, normalization for nested logit, currently tau is 1 (clogit) taus also needs to become part of params MOSTLY DONE
- add effect of branch level explanatory variables DONE
- write a generic multinomial logit that takes arbitrary probabilities, this would be the same for MNL, clogit and runmnl, delegate calculation of probabilities
- test on actual data, - tau=1 replicate clogit numbers, - transport example from Greene tests 1-level tree and degenerate sub-trees - test example for multi-level trees ???
- starting values: Greene mentiones that the starting values for the nested version come from the (non-nested) MNL version. SPSS uses constant equal (? check transformation) to sample frequencies and zeros for slope coefficient as starting values for (non-nested) MNL
- associated test statistics - (I don’t think I will fight with the gradient or hessian of the log-like.) - basic MLE statistics can be generic - tests specific to the model (?)
- nice printouts since I’m currently collecting a lot of information in the tree recursion and everything has names
The only parts that are really necessary to get a functional nested logit are adding the taus (DONE) and the MLE wrapper class. The rest are enhancements.
I added fake tau, one fixed tau for all branches. (OBSOLETE) It’s not clear where the tau for leaf should be added either at original assignment of self.probs, or as part of the one-step-down probability correction in the bottom branches. The second would be cleaner (would make treatment of leaves and branches more symmetric, but requires that initial assignment in the leaf only does initialization. e.g self.probs = 1. ???
DONE added taus
still todo: - tau for degenerate branches are not identified, set to 1 for MLE - rename parinddict to paramsinddict
Author: Josef Perktold License : BSD (3-clause)
6.7.11.5. Functions¶
getbranches (tree) |
walk tree to get list of branches |
getnodes (tree) |
walk tree to get list of branches and list of leaves |
iteritems (obj, **kwargs) |
replacement for six’s iteritems for Python2/3 compat |
itervalues (obj, **kwargs) |
|
randintw (w[, size]) |
generate integer random variables given probabilties |