Introduction to EM

Previously I talked about Naive Bayes Classifier. In that case, all variables are observable. However, sometimes there could be some latent variables in some models. EM(Expection-Maximization) algorithm could resolve those models containing latent variables.

Gaussian Mixture Models

Before talking general case of EM algorithm, we introduce the Gaussian Mixture Models(GMM) and calculate its parameters using EM.

What is GMM?

GMM is a composition of several components with Gaussian distribution and different components. It could be written like,
$$p(x) := \sum_{k=1}^K \pi_k\mathcal{N}(x|\mu_k, , \Sigma_k) \tag{1}\label{eq:GMM}$$

Simply speaking, we can explain $\pi_k$ as the weights of each Gaussian.

However, in order to understand what the latent variable is and how it works, first we explain the formulation of the equation $\ref{eq:GMM}$ from the point of latent variables.

Imaging we have a K-dimensional binary vector $\mathcal{z}$ in which the $k^{th}$ element $\mathcal{i_k}$ equal to $\mathcal{1}$ and others are $\mathcal{0}$. Now we need to define $p(x, z)$:
$$p(x, z) := p(x|z)\cdot p(z)\tag{2}\label{eq:2}$$

in equation $\ref{eq:2}$, we need to consider $p(z)$ and $p(x|z)$.

$p(z)$

$p(z)$ is the marginal distribution of $p(x, z)$ over $\mathcal{z}$, assuming it equals to the mixing coefficients $\pi_k$, so we have:
$$p(z_k = 1) := \pi_k\tag{3}\label{eq:3}$$

since in vector $\mathcal{z}$, all values except $\mathcal{z_k}$ are $\mathcal{0}$, so equation $\ref{eq:3}$ can be written like:

$$p(z) := \prod_{k=1}^K \pi_k^{z_k}\tag{4}\label{eq:4}$$
$p(x|z)$

Also, assuming that the conditional distribution of $\mathcal{x}$ given a particular value for $\mathcal{z}$ is a Gaussian,
$$p(x|z_k=1) := \mathcal{N}(x|\mu_k, , \Sigma_k)\tag{5}\label{eq:5}$$

Similarly, equation $\ref{eq:5}$ can be written like:
$$p(x|z) := \prod_{k=1}^K \mathcal{N}(x|\mu_k, , \Sigma_k)^{z_k}\tag{6}\label{eq:6}$$

Now we can calculate the marginal distribution of $p(x, z)$ over $\mathcal{x}$:
$$p(x) := \sum_{z} p(z)\cdot p(x|z)\tag{7}\label{eq:7}$$

put equation $\ref{eq:4}$ and $\ref{eq:6}$ into equation $\ref{eq:7}$,
$$p(x) := \sum_{k=1}^K \pi_k \mathcal{N}(x|\mu_k, , \Sigma_k)\tag{8}\label{eq:8}$$

Equation $\ref{eq:8}$ is exactly same with equation $\ref{eq:GMM}$, so now we succefully explained the GMM from the point of latent variables.

Next, we need to calculate the conditional probability of $\mathcal{z}$ given $\mathcal{x}$, let us call it $\gamma(z_k)$,
$$\gamma(z_k) := p(z_k=1|x) := \frac {p(z_k=1)\cdot p(x|z_k=1)}{p(x)}\tag{9}\label{eq:9}$$

put equation $\ref{eq:3}$, $\ref{eq:5}$ and $\ref{eq:8}$ into equation $\ref{eq:9}$,
$$\gamma(z_k) := \frac {\pi_k\mathcal{N}(\mu_k,,\Sigma_k)}{\sum_{j=1}^K\mathcal{N}(x|\mu_j, , \Sigma_j)}\tag{10}\label{eq:10}$$

To prepare for the explaination using EM algorithm next, one important thing has to be emphasized:

in equation $\ref{eq:8}$, $\pi_k$ should be seen as the prior of $z_k=1$(this is what equation $\ref{eq:3}$ is saying);
in equation $\ref{eq:10}$, $\gamma(z_k)$ should be seen as posterior once we obeserved $\mathcal{x}$(this is what equation $\ref{eq:9}$ saying).

EM Algorithm and Gaussian Mixture Model

Introduction to EM

Gaussian Mixture Models

What is GMM?

EM algorithm on GMM

EM algorithm in general

Summary

Reference

Introduction to EM

Gaussian Mixture Models

What is GMM?

EM algorithm on GMM

EM algorithm in general

Summary

Reference

related Articles