Beta and Dirichlet distributions
October 14, 2019 — April 4, 2022
\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
Suppose the joint pdf of \(\rv{d}_{1}, \ldots, \rv{d}_{k-1}\) is \[\begin{aligned} f\left(y_{1}, \ldots, y_{k-1}\right) &=\frac{\alpha_{1}+\cdots+\alpha_{k}}{\Gamma\left(\alpha_{1}\right) \cdots \Gamma\left(\alpha_{k}\right)} y_{1}^{\alpha_{1}-1} \cdots y_{k-1}^{\alpha_{k-1}-1}\left(1-y_{1}-\cdots-y_{k-1}\right)^{\alpha_{k}-1},\\ &=\frac{\Gamma(\alpha)}{\prod_{i=1}^k\Gamma(\alpha_i)}\prod_{i=1}^k y_i^{\alpha_i-1} \end{aligned}\] where \(y_{i}>0, y_{1}+\cdots+y_{k-1}<1, i=1, \ldots, k-1\) and \(\alpha=\sum_i\alpha_i\). Then the random variables \(\rv{d}_{1}, \ldots, \rv{d}_{k-1}\) follow the Dirichlet distribution with parameters \(\alpha_{1}, \ldots, \alpha_{k}\). Usually, I write this as a vector random variate, with vector parameters, rather than a long list, \[\vrv{d}\sim\operatorname{Dirichlet}(\vv{\alpha}).\]
The Beta distribution is a special case of the Dirichlet distribution with parameters \(\vv{\alpha}=[\alpha_1,\alpha_2]\), i.e. the bivariate case.
There is more information in Wikipedia, although these pages are IMO unusually uninspired and confusing. My prose is terrible because I rarely have time to revisit it. What is Wikipedia’s excuse?
1 A Beta RV is a ratio of Gamma RVS
TBD.
2 A Dirichlet RV is a normalized sum of independent Gamma RVS
TBD.
3 Beta as exponential family
Beta distribution: \(Y \sim \operatorname{Beta}(\alpha, \beta)\) \[ \begin{aligned} f_{Y}(y \mid \alpha, \beta)=& \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} y^{\alpha-1}(1-y)^{\beta-1} \\ =& {[y(1-y)]^{-1} \exp (\alpha \log (y)+\beta \log (1-y)} \\ +&\log \Gamma(\alpha+\beta)-\log \Gamma(\alpha)-\log \Gamma(\beta)) \end{aligned} \] with \[ \begin{aligned} \eta(\alpha, \beta) &=(\alpha, \beta)^{\top} \\ T(y) &=(\log (y), \log (1-y))^{\top}. \end{aligned} \]
4 Dirichlet as exponential family
The Dirichlet distribution is an exponential family and can be written in canonical form as \[ \operatorname{Dirichlet}(\boldsymbol{\theta} \mid \boldsymbol{\alpha})=f(\boldsymbol{\theta}) g(\boldsymbol{\alpha}) e^{\phi(\boldsymbol{\alpha})^{T} u(\boldsymbol{\theta})} \] with \[ f(\boldsymbol{\theta})=1, g(\boldsymbol{\alpha})=1 / B(\boldsymbol{\alpha}) \] where \[ B(\boldsymbol{\alpha})=\prod_{t=1}^{D} \Gamma\left(\alpha_{t}\right) / \Gamma\left(\sum_{t=1}^{D} \alpha_{t}\right), \phi(\boldsymbol{\alpha})=\left(\begin{array}{c} \alpha_{1}-1 \\ \vdots \\ \alpha_{D}-1 \end{array}\right) \] and \[ u(\boldsymbol{\theta})=\left(\begin{array}{c} \ln \theta_{1} \\ \vdots \\ \ln \theta_{D} \end{array}\right) \]
5 Conjugate prior for Dirichlet RVs
Lefkimmiatis, Maragos, and Papandreou (2009) argue:
Since for any member of the exponential family there exists a conjugate prior that can be written in the form \[ p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto g(\boldsymbol{\alpha})^{\eta} e^{\phi(\boldsymbol{\alpha})^{T} \mathbf{v}} \] a suitable conjugate prior distribution for the parameters \(\boldsymbol{\alpha}\) of the Dirichlet is \[ p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto \frac{1}{B(\boldsymbol{\alpha})^{\eta}} e^{-\sum_{t=1}^{D} v_{t} \alpha_{t}} \]
Wikipedia claims that there is no efficient means for sampling from this, which is sad for MCMC. Generally this does not bother people because we rarely observe Dirichlet RVs directly; they are usually, e.g. a mixing probability for some other distribution.
6 Non-conjugate priors
Anything that can be transformed to be an elementwise positive vector, presumably. multivariate gamma seems natural.