*Fast and scalable approximation of Robust Principal Component Analysis (RPCA)*

- [2021. 07. 31] Paper is available in arXiv!
- [2021. 07. 29] Source codes and data of BEAR are uploaded.
- [2021. 06. 12] BEAR paper gets accepted to MICCAI 2021! Manuscript will be available soon!

## FANCY DEMO

Before begin the article, let’s take a look what BEAR can do!

There are two input videos, the calcium imaging data of GCaMP6s zebrafish obtained with confocal microscope and the video from a surveillance camera. From left to right, they are input video, backgrounds, and activities. You may easily understand why I used the words, “backgrounds” and “activities”. Note that backgrounds are also videos… where they are too stationary.

## Calcium imaging

Before begin, let me briefly introduce about ** calcium imaging**. Where there are several techniques to monitor the brain dynamics. Calcium imaging makes us possible to optically measure the calcium signal of neurons. With genetically encodable calcium indicators (GECIs), calcium status of neurons can be captured with fluorescence microscope.

Neurons with a calcium indicator have a certain baseline fluorescence level at rest, which becomes brighter by 20 to 40% when there is a single spike. Plus, most of the neurons are silent at a given time, the video consist of mostly background and a very small portion of foreground (i.e. neuronal activity), making further processing difficult. Therefore, we first separate the foreground from the backgrounds before doing downstream tasks.

## Robust principal component analysis

Robust principal component analysis (RPCA) is used to separate the foreground and the backgrounds from the video. **What is the foreground? And what is the backgrounds?** RPCA formulates that the video (or data) is copmosed of **sparse foreground** and **low-rank background**. The separation can be done with solving the following optimization problem:

where $Y$, $L$, $S$, $\vert\vert S\vert\vert_0$, and $\lambda$ are a data matrix, a low-rank matrix, a sparse matrix, the $L_0$ norm of $S$, and a hyperparameter, respectively.

Here, the data is ** matrix**, where our obtained calcium imaging videos have 3 or 4 dimensions (xyt or xyzt). RPCA can be applied for 2-D matrix thus we have to reshpae the videos to 2-D matrix in $p \times t$, where $p$ is the number of pixels ($xyz$) and $t$ is the number of timeframes.

Back to the formula, where that optimization problem is computationally intractable, the exact solution can be obtained under weak assumptions through the surrogate optimization as follows:

\[\min_{L, S}(||L||_{*}+\lambda ||S||_{1}) \textrm{ subject to } Y=L+S\]where $\vert\vert L\vert\vert_{*}$ is the nuclear norm of $L$, and $\vert\vert S\vert\vert_1$ is the $L_1$ norm of $S$.

Where this problem can be solved in several optimization algorithms, conventional RPCA usually involves singular value decomposition (SVD) of the entire data matrix. This expensive computation makes the RPCA not feasible for large-scale data in time and memory. While not all RPCA algorithms involves SVD, we will show that our approach is even faster than that algorithms.

## BEAR

BEAR is based on the following surrogate optimization:

\[\min_{L,S} \vert\vert S\vert\vert_1 \textrm{ subject to } Y=L+S \textrm{ and } rank(L)\le r\]which is attained by replacing the minimization of $\vert\vert L\vert\vert_{*}$ by the maximum rank constraint on $L$. A low-rank representation of the data is obtained by choosing a small integer $r$. Then, the constraint $rank(L)\le r$ is enforced by setting $L=WMY$, where $W\in\mathbb{R}^{n\times r}$ and $M\in\mathbb{R}^{r\times n}$. Note that $L=WMY$ is a stronger condition than $rank(L)\le r$. Furthermore, we set $M=W^T$ to reduce the number of trainable parameters and make the training more stable. Thus, the final form of the optimization problem is expressed as follows:

\[\min_{W} \vert\vert S\vert\vert_1 \textrm{ subject to } Y=L+S \textrm{ and } L=WW^TY.\]And we can implement the optimization process as training the network shown in Fig.3.

For given input matrix $Y$, we can solve the problem by training the network to minimize the loss function $\mathcal{L}=\vert\vert S \vert\vert_1$ using gradient-based optimization algorithms. This also allows us to use mini-batches for training the network; hence, it can be applied to an extremely large data matrix $Y$. Note that both forward propagation and backpropagation through the network require only two matrix multiplications. In short, adding several conditions, we can solve the surrogate of original optimization problem with significantly lower complexity and also feasible for arbitrary sized data. We will show the ** speed and scalability** of BEAR, and also, the solution using BEAR is

**.**

*quite exact*## Extensions of BEAR

Before revealing the fascinating results, I’d like to briefly talk about extension of BEAR. In this post, I’ll only cover ** BEAR with greedy rank estimation**, and second post will mainly discuss about others.

### BEAR with greedy rank estimation

One can ask that *“Smaller computational complexity of BEAR is trivial because it doesn’t find the rank of $L$, where conventional optimization problem does!”*. Right, BEAR requires the integer number $r$ to be explicitly set by the user, which has to come from prior knowledge regarding the rank of $L$. To make fair comparison, and to employ BEAR in the absence of such prior knowledge, we introduce an extension of BEAR with greedy rank estimation. We named it as Greedy BEAR.

Idea is simple. Rank estimation is done by ading an outer loop to BEAR in which the target rank is gradually increased. As soon as $rank(L) + \lambda \vert\vert S \vert\vert_1$ increases, where $rank(L)$ is the target rank and $\vert\vert S \vert\vert_1$ is the loss after training the BEAR with the target rank, the algorithm terminates, and the network from the earlier iteration of the outer loop is chosen as the final model. Note that this algorithm attempts to solve $\min_{L,S} (rank(L)+\lambda\vert\vert S\vert\vert_1)$, which is similar to the original optimiation problem.

## Experiments

### Environment setting & Comparison methods

Experiments! Let’s see if BEAR does well. Before begin, here is the setting of our environment.

```
CPU : Intel i7-9700K
GPU : NVIDIA GeForce RTX 2080 Ti
RAM : 128GB
```

And here are the rivals (and short descriptions).

```
1. Principal component pursuit (PCP) (2009)
- The first paper which starts RPCA era.
2. Inexact augmented lagrange multiplier (IALM) (2013)
- The paper that applied more faster optimization algorithm.
3. Greedy GoDec (GreGoDec) (2013)
- One of the most fastest algorithm in RPCA. It also solves the surrogate.
4. Online moving window RPCA (OMWRPCA) (2019)
- One algorithm that can handle arbitrary sized data.
```

And to make fair comparison, we used Greedy BEAR for evaluation. Note that, if you have prior knowledge of the rank of $L$, applying BEAR with setting $r$ is more faster than Greedy BEAR.

### Phase diagram in rank and sparsity

** Phase diagram** is widely used figure in RPCA papers to show how the algorithm find accurate solution. More exact, we evaluted the algorithms to recover

**varying rank**matrices from data matrices that have error with

**varying levels of sparsity**. We randomly generated each data matrix $Y\in\mathbb{R}^{n\times n}$ by adding a low-rank matrix $L$ and a sparse matrix $S$ as shown in Fig.4, where $n$ was set as 1,000. Each low rank matrix was generated as $L=XY^T$, where the entities of $X\in\mathbb{R}^{n\times r}$ and $Y\in\mathbb{R}^{n\times r}$ were drawn from a normal distribution with the mean value of zero and the variance of $1/n$. Each sparse matrix $S$ was generated by drawing each entity from a probability mass function as follows: $P(x=0.1)=\rho /2, P(x=-0.1)=\rho /2, P(x=0)= 1-\rho$.

The evaluation metrics were the relative error, defined as $\vert\vert L-\hat{L}\vert\vert_F/\vert\vert L\vert\vert_F$ where $\hat{L}$ was the recovered low-rank matrix, and the computation time. Calculating the relative error for $L$ seems the convention in RPCA field, and we can find some information about it from the footnote of the original PCP paper.

We measure relative error in terms of $L$ only, since in this paper we view the sparse and low-rank decomposition as recovering a low-rank matrix $\vert\vert L\vert\vert_0$ from gross errors. $\vert\vert S\vert\vert_0$ is of course also well-recovered: in this example, the relative error in $S$ is actually smaller than that in $L$.

Where conventional RPCA papers only show phase diagram of relative error, we plotted phase diagram with computation time. And here is the result!

Our Greedy BEAR achieved a relatively uniform level of error across a wide range of $r$ and $\rho$, while other algorithms failed in the high-rank or low-sparsity phase (Fig.5a). And Greedy BEAR was the fastest under all conditions (Fig.5b). In the rightmost plots in Fig.5, each grid, which represents each ($r$, $\rho$) pair, is assigned to the best algorithm in terms of each measure. The percentages of ($r$, $\rho$) pairs which Greedy BEAR achieved the best relative error and the computation time were 69% and 100%, respectively!

### Performance on large calcium imaging data

Then, let’s apply RPCA algorithms for large calcium imaging data. The sizes of the videos were $480(x) \times 270(y) \times 41(z) \times 150(t)$ and $480(x) \times 270(y) \times 41(z) \times 1000(t)$, respectively, and they were reshaped to $5313600 \times 150 (3.2GB)$ and $5313600 \times 1000 (21.3GB)$, respectively.

The computation times of the four algorithms are summarized in Fig.6. Greedy BEAR with GPU acceleration was faster than the existing algorithms and was capable of processing the largest data. And you can see that other competitors except OMWRPCA which was developed to deal with arbitrary sized data, spit out the out of memory error. The decomposition results obtained by the four algorithms were visually nearly indistinguishable as shown in Fig.7.

## Conclusion

In short, we proposed BEAR, which does RPCA which is **fast** and **scalable**. Because BEAR can be trained using the gradient from mini-batches, it can decompose arbitrary sized data while exploiting GPU acceleration for speed improvement. BEAR can be replaced the conventional RPCA algorithms in the pipeline to analyze the calcium imaging data. And although our demonstration was focused on processing calcium imaging data, BEAR could be used for other general RPCA applications.

## Citation

If you want more detailed information, you can refer to our paper. If you have questions or need some discussion, do not hesitate to email me **(jay0118@kaist.ac.kr)**. One thing is, it will be better to make **Issues** if you want to get help or discuss about **codes**.

```
@article{han2021efficient,
archivePrefix = {arXiv},
arxivId = {2108.01665},
title={Efficient Neural Network Approximation of Robust PCA for Automated Analysis of Calcium Imaging Data},
author={Han, Seungjae and Cho, Eun-Seo and Park, Inkyu and Shin, Kijung and Yoon, Young-Gyu},
journal={arXiv preprint arXiv:2108.01665},
year={2021},
}
```