## Disclamer

This post is giant wall of text and formula. I am not sure it is 100% correct so please read with your caution.

“Friendship and Mobility: User movement in location-based social networks” is an interesting paper which discusses about user movement. I think its intuition is quite clear and trivial but it is (maybe) the first paper which separates the location of users into two clusters named Home and Work. The data analysis of this paper is very nice. After all, the authors proposed a model name Periodic Mobility Model aka PMM to predict the Home and Work location of users. The whole process is follow

1. Firstly, given the time of day, it will define which is the current cluster (i.e. Home or Work) of user.
2. Secondly, based on cluster, the location of user is drawn.

Before going to the formula, let me introduce the symbol system

• $$c_i = k$$ means the check-in $$i$$ is in cluster $$k$$. In the original paper, $$k = 1 ; 2$$ because there are only two clusters.
• $$l_i$$ is the location (latitude and logitude) of check-in $$i$$.
• $$t_i$$ is the time of check-in $$i$$.
• $$x_i$$ represents the check-in $$i$$. Formally, it is a pair of location and time of check-in $$x_i = (l_i, t_i)$$.

## The first attempt

The joint distribution of each check-in will be calculated by

\begin{align} p(c\_i = k, l\_i, t\_i \| \theta) &= p(l\_i \| c\_i = k, t\_i) \; p(t\_i, c\_i = k) \\\\ &= p(l\_i \| c\_i = k) \; p(c\_i = k \| t\_i ) \; p(t\_i) \end{align}

Taking the log becomes

\begin{align} \log p(c\_i = k, l\_i, t\_i \| \theta) =& \log p(l\_i \| c\_i = k) + \log p(c\_i = k \| t\_i ) + \log p(t\_i) \\\\ =& \log \mathcal{N}(\mu\_k, \Sigma\_k) + \log N\_k(t\_i) - \log \sum\_{k' = 1}^K N\_{k'}(t\_i) + \log p(t\_i) \\\\ =& - \frac{\log \| \Sigma\_k \|}{2} - \frac{1}{2} (l\_i - \mu\_k)^T \Sigma\_k^{-1} (l\_i - \mu\_k) \\\\ & + \log P\_{c\_k} - \log \sigma\_k - (\frac{\pi}{12})^2 \frac{(t\_i - \tau\_k)^2 }{2 \sigma\_k^2} \\\\ & + \log \sum\_{k'=1}^K N\_{k'}(t\_i) + const \end{align}

Follow the EM-algorithm, we must calculate

\begin{align} p(c\_i = k \| l\_i\; t\_i) &= \frac{p(l\_i \| c\_i = k) \; p(c\_i = k \| t\_i ) \; p(t\_i)}{\sum\_{k' = 1}^K p(l\_i \; t\_i; \; c\_i= k')} \\\\ &= \frac{ p(l\_i \| c\_i = k) \; p( c\_i = k \| t\_i) }{\sum\_{k'=1}^K p(l\_i \| c\_i = k') \; p(c\_i = k' \| t\_i) } \\\\ &= \gamma\_{ik} \end{align}

where

\begin{align} p(l\_i \| c\_i = k) &\sim \mathcal{N}(\mu\_k, \Sigma\_k) \\\\ p( c\_i = k \| t\_i) &= \frac{N\_k}{\sum\_{k'=1}^K N\_{k'}} \\\\ N\_k &= \frac{P\_{c\_k}}{\sqrt{2\pi \sigma^2\_k}} \exp ( - $$\frac{\pi}{12}$$^2 \frac{(t\_i - \tau\_k)^2}{2\sigma^2\_k} ) \end{align} \begin{align} Q(\theta, \; \theta^{old}) &= \mathbb{E}\_{C \| X, \theta^{old}} [ L (\theta; X, C)] \\\\ &= \sum\_{i=1}^N \sum\_{k=1}^K p(c\_i = k \| x\_i, \theta^{old}) \; \log p(x\_i, \; c\_i = k \| \theta) \\\\ \end{align}

Updating $$\mu_k$$ and $$\Sigma_k$$ are similar to Gaussian mixture model so we have

\begin{align} \frac{\partial Q}{\partial \mu\_k} &= \sum\_{i=1}^N \gamma\_{ik}[ \Sigma\_k^{-1} (l\_i - \mu\_k)] = 0 \\\\ \mu\_k &= (\sum\_{i=1}^N \gamma\_{ik})^{-1} \; \sum\_{i=1}^N \gamma\_{ik} \; l\_i \end{align} \begin{align} \frac{\partial Q}{\partial \Sigma\_k} &= \sum\_{i=1}^N \frac{\gamma\_{ik}}{2} [ -\Sigma\_k^{-1} + \Sigma\_k^{-1} (l\_i - \mu\_k)(l\_i - \mu\_k)^T \Sigma\_k^{-1}] = 0 \\\\ \Sigma\_k &=\frac{\sum\_{i=1}^N \gamma\_{ik} (l\_i - \mu\_k) \; (l\_i - \mu\_k)^T }{ \sum\_{i=1}^N \gamma\_{ik}} \\\\ \end{align}

We take the differentiation of Q over other parameters

\begin{align} \frac{\partial Q}{\partial P\_{c\_k}} &= \sum\_{i=1}^N \frac{\gamma\_{ik}}{P\_{c\_k}} [1 + \frac{N\_k(t\_i)}{\sum\_{k'=1}^K N\_k' (t\_i)}] \\\\ \end{align} \begin{align} \frac{\partial Q}{\partial \tau\_k} &=\frac{1}{\sigma\_k^2} (\frac{\pi}{12})^2 \sum\_{i=1}^N \gamma\_{ik}(t\_i - \tau\_k) [1 + \frac{N\_k(t\_i)}{\sum\_{k'=1}^K N\_k' (t\_i)}] \\\\ \end{align} \begin{align} \frac{\partial Q}{\partial \sigma\_k} &= \sum\_{i=1}^N \gamma\_{ik} [1 + \frac{N\_k(t\_i)}{\sum\_{k'=1}^K N\_k' (t\_i)}] [-\sigma\_k^{-1} + (\frac{\pi}{12})^2 \frac{(t\_i - \tau\_k)^2}{\sigma\_k^3} ] \\\\ \end{align}

Setting them equal to 0 cannot give us a closed form solution because $$N_k(t_i)$$ contains all other parameters inside. Moreover, there are some conditions that we need to follow. First of all, $$P_{c_k}$$ is positive and the sum over k must be equal to 1. In other word, it is belong to the simplex. Secondly, $$\tau_k$$ is time so it must in range of 0 and 24. Finally, $$\sigma_k$$ should be positive also.

Optimizing Q with these constraints is the pain in the neck. Moreover, I could not find the closed form solution. It is quite weird because the authors claimed that they could derive the closed form solution.

## Another look of model

\begin{align} p(c\_i = k, l\_i, t\_i \| \theta) &= p(l\_i \| c\_i = k, t\_i) \; p(t\_i, c\_i = k) \\\\ &= p(l\_i \| c\_i = k) \; p(t\_i \| c\_i = k) \; p(c\_i = k) \end{align}

where

\begin{align} p(l\_i \| c\_i = k) &\sim \mathbb{N}(\mu\_k, \Sigma\_k) \\\\ p(t\_i \| c\_i = k) &=\frac{1}{\sqrt{2\pi \sigma^2\_k}} \exp ( - $$\frac{\pi}{12}$$^2 \frac{(t\_i - \tau\_k)^2}{2\sigma^2\_k} ) \\\\ p(c\_i = k) &= P\_{c\_k} \end{align}

This viewpoint does not follow the original intuition of model. In this modification, user will choose his/her cluster first (Home or Work) and from this cluster, he/she will choose time and location of checkin.

To calculate for the first step of EM \begin{align} p(c\_i = k \| l\_i\; t\_i) &= \frac{ p(l\_i \; t\_i \| c\_i = k) \; p(c\_i = k)}{\sum\_{k' = 1}^K p(l\_i \; t\_i; \; c\_i= k')} \\\\ &= \frac{ p(l\_i \| c\_i = k) \; p(t\_i \| c\_i = k) \; p(c\_i = k)}{\sum\_{k'=1}^K p(l\_i \| c\_i = k') \; p(t\_i \| c\_i = k') \; p(c\_i = k')} \\\\ &= \gamma\_{ik} \end{align}

\begin{align} \log p(l\_i, \; t\_i, \; c\_i = k) &= \log p(l\_i \| c\_i = k) + \log p(t\_i \| c\_i = k) + \log P\_{c\_k} \\\\ &= -\frac{1}{2} \log \| \Sigma\_k\| -\frac{1}{2} (l\_i - \mu\_k)^T \Sigma\_k^{-1} (l\_i - \mu\_k) - \log \sigma\_k - (\frac{\pi}{12})^2 \frac{(t\_i - \tau\_k)^2}{2 \sigma^2\_k} + \log P\_{c\_k} + const \end{align} \begin{align} Q(\theta, \; \theta^{old}) &= \mathbb{E}\_{C \| X, \theta^{old}} [ L (\theta; X, C)] \\\\ &= \sum\_{i=1}^N \sum\_{k=1}^K p(c\_i = k \| x\_i, \theta^{old}) \; \log p(x\_i, \; c\_i = k \| \theta) \\\\ &= \sum\_{i=1}^N \sum\_{k=1}^K \gamma\_{ik} [ -\frac{1}{2} \log \| \Sigma\_k\| -\frac{1}{2} (l\_i - \mu\_k)^T \Sigma\_k^{-1} (l\_i - \mu\_k) - \log \sigma\_k - (\frac{\pi}{12})^2 \frac{(t\_i - \tau\_k)^2}{2 \sigma^2\_k} + \log P\_{c\_k}] \end{align}

We have one constraint that $$\sum\_{k=1}^K P\_{c\_k} = 1$$ so apply Lagrange multiplier we have $$Q(\theta, \; \theta^{old})= \sum\_{i=1}^N \sum\_{k=1}^K \gamma\_{ik} [ -\frac{1}{2} \log \| \Sigma\_k\| -\frac{1}{2} (l\_i - \mu\_k)^T \Sigma\_k^{-1} (l\_i - \mu\_k) - \log \sigma\_k - (\frac{\pi}{12})^2 \frac{(t\_i - \tau\_k)^2}{2 \sigma^2\_k} + \log P\_{c\_k}] + \lambda ( \sum\_{k=1}^K P\_{c\_k} - 1)$$

To optimize $$P_{c_k}$$ we take the derivative of Q and set it to 0. Moreover, taking advantage of the contraint to infer the value of $$\lambda$$ and plug it back to find the final value of $$P_{c_k}$$

\begin{align} \frac{\partial Q}{\partial P\_{c\_k}} &= \sum\_{i=1}^N \gamma\_{ik} \frac{1}{P\_{c\_k}} + \lambda = 0 \\\\ \lambda \; P\_{c\_k} &=- \sum\_{i=1}^N\gamma\_{ik} \\\\ \lambda &= - \sum\_{i=1}^N \sum\_{k=1}^K \gamma\_{ik} \\\\ P\_{c\_k} &= \frac{\sum\_{i=1}^N\gamma\_{ik}}{\sum\_{i=1}^N \sum\_{k=1}^K \gamma\_{ik} } \end{align}

We do the same to optimize $$\tau_k$$

\begin{align} \frac{\partial Q}{\partial \tau\_k} &= \sum\_{i=1}^N \gamma\_{ik}[( \frac{\pi}{12})^2 2 \frac{t\_i - \tau\_k}{2 \sigma^2\_k} ] = 0 \\\\ \sum\_{i=1}^N \gamma\_{ik} \; t\_i &= \sum\_{i=1}^N \gamma\_{ik} \tau\_k \\\\ \tau\_k &= \frac{\sum\_{i=1}^N \gamma\_{ik} \; t\_i}{ \sum\_{i=1}^N \gamma\_{ik}} \\\\ \end{align}

Note that using this formula $$\tau_k$$ is still in range 0 and 24.

Below is the update rule for $$\sigma_k^2$$

\begin{align} \frac{\partial Q}{\partial \sigma\_k} &= \sum\_{i=1}^N \gamma\_{ik}[-\frac{1}{\sigma\_k} + \frac{\pi^2 \; (t\_i - \tau\_k)^2}{12^2\; \sigma^3\_k} ] = 0 \\\\ \sigma\_k^2 &= (\frac{\pi}{12})^2 \frac{ \sum\_{i=1}^N \gamma\_{ik} (t\_i - \tau\_k)^2 }{ \sum\_{i=1}^N \gamma\_{ik} } \end{align}

Updating $$\mu_k$$ follows the rule

\begin{align} \frac{\partial Q}{\partial \mu\_k} &= \sum\_{i=1}^N \gamma\_{ik}[ \Sigma\_k^{-1} (l\_i - \mu\_k)] = 0 \\\\ \mu\_k &= (\sum\_{i=1}^N \gamma\_{ik})^{-1} \; \sum\_{i=1}^N \gamma\_{ik} \; l\_i \end{align}

Last but not least, $$\Sigma_k$$

\begin{align} \frac{\partial Q}{\partial \Sigma\_k} &= \sum\_{i=1}^N \frac{\gamma\_{ik}}{2} [ -\Sigma\_k^{-1} + \Sigma\_k^{-1} (l\_i - \mu\_k)(l\_i - \mu\_k)^T \Sigma\_k^{-1}] = 0 \\\\ \Sigma\_k &=\frac{\sum\_{i=1}^N \gamma\_{ik} (l\_i - \mu\_k) \; (l\_i - \mu\_k)^T }{ \sum\_{i=1}^N \gamma\_{ik}} \\\\ \end{align}