This is just a note to self.

From the famous Bayes’s theorem:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]
  • $P(A|B)$ is called the posterior probability.
  • $P(B|A)$ is called the likelihood.
  • $P(A)$ is called the prior probability.
  • $P(B)$ is called the marginal likelihood.

Maximum likelihood estimation is based on maximizing $\mathcal{L} = P(B|A)$, or equivalently, minimizing $-\log \mathcal{L}$. Maximum a posteriori (MAP) estimation is based on minimizing $-\log \mathcal{P} = -\log P(A|B)$, including the prior during the minimization.