Notes from Ben Goodrich 2020-03-24
The rstanarm::stan_polr function doesn't support normal priors on coefficients marginally. Rather, it inputs a prior mode, median, or mean for the R^2 in explaining the variation in the latent variable in an ordinal model. This generally works well and is consistent with the idea that harm is just as likely as benefit but large effects either way are unlikely. But it is non-standard (even for Bayesian analysis), so if you don't want to use / explain that, you could use brms::brm with family = cumulative.
BTW, if you want to do your thing with the age predictor instead of a quadratic, this works:
stan_polr(y ~ rms::rcs(age) + ...)
Also,
As recommended by our current research, intercepts will be given essentially nonparametric distributions using the Dirichlet distribution.
The default prior in rstanarm::stan_polr is Dirichlet(1, 1, ..., 1) --- perhaps you meant noninformative rather than nonparametric --- on the probability of someone with **average** values of the covariates falling in each of the seven categories. That is probably still reasonable, although average values of dummy predictors are a bit unintuitive. The intercepts or cutpoints are conditionally deterministic given these probabilities involving the inverse CDF of the logistic distribution.
There is more info about the priors in
https://mc-stan.org/rstanarm/articles/polr.html
Let me know if you have more questions. Finally, I would be interested to hear from Frank (whether by email or on Datamethods) any thoughts about teaching technical material online in these circumstances, as in what worked well and less well with the BBR thing.
--------
The stan_polr function in rstanarm has a particular set of priors, but you can do almost anything you want with Stan code. I am attaching a Stan file for the most basic ordered logit model.
The rstanarm package essentially assumes that users do not know what they are doing, so it tries to make it easy for that audience to do something quickly and without encountering warning messages. For people who know what they want to do, it is sometimes difficult or impossible to get rstanarm to do what you want. With binary outcomes, you can use stan_glm with marginal priors or stan_polr with joint priors. But for ordinal outcomes with more than two categories, it is harder to get reliable results across a wide range of datasets without doing a lot of reparameterization tricks that people are not familiar with if they have only seen maximum likelihood estimation.
Even though you know what you are doing, I would caution against vague independent normal priors on the linear and quadratic coefficients for age (or similarly if you go with a spline or really any design matrix with highly correlated columns). That prior is so weak that there is a decent chance of getting warning messages, either from stan() or from loo() afterward. It tends to be a bit better if you center the columns of the design matrix. Or at least spell out in your preregistration how you are going to react if you do see such warning messages (adjusting the adapt_delta upward, calculating the expected log-score explicitly rather than via loo, etc.)
------------------------------- Nathan James 2020-03-26
Currently there's no off-the-shelf Stan based functions that have all 3 of these features:
1) Dirichlet prior on transformation of cutpoints
2) can handle complex formula/design matrix specification (e.g. splines)
3) standard (e.g. individual noninformative or weakly informative) priors on beta parameters
rstanarm::stan_polr() has 1 & 2, brms() has 2 & 3 (although there is an open issue to implement #1 here) and my code currently only handles 1 & 3.
I looked through some of the materials briefly and since the trial has a small number of outcome categories, I think the best bet will be to use either brms() or just straight Stan code based on the template Ben sent. With a small number of categories, the independent normal priors for the cutpoints implemented in brms seem to work fine.